Fast division by 15 (or 240?)?

Fast division by 15 (or 240?)?
by tokumaru on 2007-04-16 (#23246)

Does anyone know of a way to quickly divide by 15?

I imagine there are ways to optimize division by constants, but I can't seem to think of anything that works. Ah, I need the remainder too.

Thanks for the help!

by tepples on 2007-04-16 (#23247)

Does this have anything to do with the Atari 2600? And what is the domain of the numerator? 0 to 255? More? Negative numbers?

What do you get when you multiply by 17?

by tokumaru on 2007-04-16 (#23251)

tepples wrote:
Does this have anything to do with the Atari 2600?

No, no, no, no... Are you asking because of the horizontal positioning of sprites? No, I have already found a good way to do that, but I won't code anything for the 2600 until I have finished something for the NES anyway.

It has to do with metatiles (2x2 tiles) being drawn to the name tables, which can hold only 15 rows of those, so this division is for the correlation of level map metatiles with name table metatiles. I may need to divide camera coordinates by 240 too (to get the value of the Y scroll register)... I'm still doing my math, but I'll need at least one of those. Unless I decide to keep track of 2 different Y coordinates (which was my first option), but I don't want that.

Quote:
And what is the domain of the numerator? 0 to 255? More? Negative numbers?

Up to 4095, always positive.

Quote:
What do you get when you multiply by 17?

I don't see where you're going... I guess I'd get an approximation of what I want, with the remainder in the lower byte and the result in the following one(s)? Not sure. I don't think approximations are good enough for this, because placing metatiles in the wrong row is a VERY noticeable glitch! =)

by Disch on 2007-04-16 (#23252)

If you have 512 free bytes -- you can't get much faster than a LUT

Code:
divide_by_15:
LDX divisor
LDA division15_table,X
STA quotient
LDA remainder15_table,X
STA remainder
RTS

EDIT

Even if you need to divide a 16-bit number, you can run the high and low bytes through the LUT, then combine the results.... then run the remainder through the LUT and combine it.

by dvdmth on 2007-04-16 (#23253)

I wouldn't use divide-by-15 for this purpose. Instead, have two different Y position variables, one that keeps track of where you are in the level map (absolute position) and one that keeps track of where you are in the nametables (relative position). When you scroll vertically, adjust both the absolute and relative Y-positions at the same time. If the relative position drops below zero or exceeds 14, then switch nametables (or wrap around in the same nametable, if you're using vertical mirroring).

by tokumaru on 2007-04-16 (#23254)

512 bytes is a lot of space to waste with a calculation that has to be performed once per frame... I guess I'll either have a regular division algorithm (shift and subtract), or have the second Y coordinate, maintained to point to valid name table areas.

A regular division algorithm should be fast enough, since 15 is just a 4-bit number. I may have to divide by 240 though, but a 16-bit by 8-bit division should still be fast... as I said, it's no big deal, only once per frame.

Thanks for the help, sorry to bother ya! =)

EDIT: dvdmth: My original design was exactly like that. I may just stick with it. It just feels more "professional" to have everything calculated from one variable.

by tepples on 2007-04-16 (#23259)

A lot of games divide the world into 16x15-cell "acres" and then store coordinates as (acreX, acreY, pixelX, pixelY).

by tokumaru on 2007-04-16 (#23261)

Thing is I have my level stored in the form of "screens", which must be 256x256 pixels, because of the way they are made up. My camera's coordinates are coordinates within this map, and the problem is in converting these coordinates into coordinates compatible with the name tables, that are 240 pixels high.

I got it figured out, though. A divide by 240 with the shift-and-subtract method will be very fast (knowing that the divisor is 240 makes it possible to take many shortcuts), and I need to do this only when a new row or column of metatiles has to be drawn, which is not even every frame. So I think I'm fine. I got it all figured out now, really.

by tokumaru on 2007-04-16 (#23266)

Just to let you guys know, I think I'll settle for this:
Code:
ldx #$08
DivideBy240:
cmp #$78
bcc .Negative
.Positive:
sbc #$78
.Negative:
rol Result
rol
dex
bne DivideBy240

This will divide a 16-bit number whose lower byte is in a zero page location "Result" and high byte is in A. In the end, "Result" holds (duh) the result and the remainder is in A (in my case, ready to be used as Y scroll). The 16-bit value can have a maximum value of 61439 though, more than that does not work, that's is intentional, as supporting larger values would make everything slower. That means I can have 240 256-pixels high screens vertically in my level map, and that's a lot, so this is not a problem at all.

The worst case takes 145 cycles, not bad. When the code is unrolled this count drops down to 104, but I'm not sure if I should do that. Problem is that when I need to do this division, I also have to do some heavy map decoding, which means I should save all the time I possibly can.

Anyway, from this result I'll know the exact spot in the name tables that correspond to a spot in my level map, which is what I wanted.

EDIT: I don't know if I could get any extra time with this, but I just realized I will only need the lower bit of the result (to know if I sould display the top name tables or to the bottom ones, and to know where to draw metatiles), and the remainder (to use as Y scroll and to calculate the address for metatiles). I don't really need bits 1-7 of the result, but I don't think there is a way to get some speed, as the input has to be shifted anyway, and the output is shifted into it.

by Bregalad on 2007-04-17 (#23277)

Something that immediately comes in my mind is to divide by 16 by shifting it right 4 times, then correct the thing to be divided by 15 instead of 16 by doing some trick to multiply the result by the fraction 16/15 in some tricky way that could be done in a table. If the result is an integer, only a certain range of values would need any correction at all. I guess this is fine for small and limited range of values, but could be weird for very large numbers.

Something else is repeated substraction by 15, but I don't think it's fast. It works fine and is easy to code and fully understand, tough.

by dvdmth on 2007-04-17 (#23281)

tokumaru wrote:
My original design was exactly like that. I may just stick with it. It just feels more "professional" to have everything calculated from one variable.

The technique I described is quite common - I've seen it used in numerous games, including Blaster Master (which uses 256x256 screens BTW) and Final Fantasy. I don't think there's anything "unprofessional" about this method, if commercial game developers made use of it.

By abstracting the PPU screen coordinates from the level map coordinates, you remove all dependencies on the PPU screen size limitations. For example, if you want your level map to wrap vertically (instead of having hard boundaries at the top and bottom), you would normally have to restrict the vertical map size to a multiple of 15 blocks. By keeping the PPU scroll values separate from the map position, your level map can be ANY size and still wrap vertically. Both games I mentioned have maps that wrap in both directions, and neither uses a multiple of 15 for the vertical map size (they use a power of 2 instead, which is more convenient).

by tokumaru on 2007-04-17 (#23294)

Bregalad wrote:
Something that immediately comes in my mind is to divide by 16 by shifting it right 4 times, then correct the thing to be divided by 15 instead of 16 by doing some trick to multiply the result by the fraction 16/15 in some tricky way that could be done in a table.

I thought of that too, and tried to find a way to do it for hours... without success, I posted the question here! =)

Quote:
Something else is repeated substraction by 15, but I don't think it's fast.

Yeah, that'd be too slow...

dvdmth wrote:
For example, if you want your level map to wrap vertically (instead of having hard boundaries at the top and bottom), you would normally have to restrict the vertical map size to a multiple of 15 blocks.

You have a good point, but that's not necessarily true... The level wrapping should not be related to the destination address of metatiles, but to the source address. In my design at least, it'd be no problem to, say, after vertical screen number 11 is processed, read screen 0 again as if it were screen 12. It would keep rendering just fine. I think this is just a matter of how you read/decode the level map.

Keeping a separate Y coordinate for the camera means using annoying compares to see if it falls into the "forbidden zone" that are the attribute tables, and fixing the value if it does. And as I see it, having two different variables with information that is extremely correlated could result in serious bugs if their updating isn't perfectly synchronized to each other.

I think I'd rather maintain only one variable, and have whatever is related to it calculated on the fly from it, as long as this does not impose a big performance penalty, which I don't think is the case.

I'm sure that having 2 variables is a valid way to go about this (as you said, it was used in commercial games with success before), but I'm feeling more comfortable with the other method.

by dvdmth on 2007-04-17 (#23303)

Would you like an algorithm that performs a mod-30 operation (i.e. it takes a 16-bit input and calculates the remainder when divided by 30)? I think this would serve your purposes, as you can use values 0-14 for one nametable and 15-29 for the other. I have a draft of the algorithm (takes around 70 cycles to execute), but I need to test it before I can post it here.

by tepples on 2007-04-17 (#23307)

Tip: The numbers 256 and 4096 are congruent to 16 (mod 30). A number 0xDCBA, or D*4096 + C*256 + B*16 + A, is congruent to (D + C + B)*16 + A (mod 30).

This one's not tested full-coverage (0 to 65535), but I've tested it to an extent using the same test harness I used for my BCD algorithm:
Code:
.export mod30

mod30In = $0000

;;
; Calculates the remainder of a number / 30
; in roughly 70 cycles.
; For a four digit hex number $DCBA,
; D * 4096 + C * 256 + B * 16 + A
; is congruent to (D + C + B) * 16 + A (mod 30).
; @param mod30In a 16-bit number
; @return the number mod 30, in register A
mod30:

; Calculate C * 16
lda mod30In+1
asl a
asl a
asl a
asl a

; Add D * 16
clc
adc mod30In+1
and #$F0

; At each addition, make 256 wrap around to 16
; because 256 is congruent to 16 (mod 30).
bcc :+
sbc #240
:

; Add B * 16 + A
adc mod30In
bcc :+
sbc #240
:

; Subtract off portions greater than 30
cmp #240
bcc :+
sbc #240
:
cmp #120
bcc :+
sbc #120
:
cmp #60
bcc :+
sbc #60
:
cmp #30
bcc :+
sbc #30
:
rts