Related thread with a much faster 12 bit divide:
viewtopic.php?f=2&t=16911
So it was nagging at me that there must be a way to get a full 16 bit divide** by 240 using the trick I did in the last thread. (The idea is basically the same as this: http://blog.jgc.org/2012/03/how-to-divide-by-9-really-really-fast.html) I finally came up with something, and it seems to work correctly! (though has only been mildly tested) It's not nearly as fast as the 12 bit version, maybe not even fast enough to be practical when you can just keep separate world space and tile wrapped scroll counters. Either way, it's still many times faster than a generic 16/8 bit division routine.
** Since it's intended to be used for setting scroll registers, I don't bother calculating the upper 4 bits of the quotient. The assumption is that you'll be discarding everything but the LSB anyway.
edit: Some cleanup and comments. Hopefully easier to understand now.
viewtopic.php?f=2&t=16911
So it was nagging at me that there must be a way to get a full 16 bit divide** by 240 using the trick I did in the last thread. (The idea is basically the same as this: http://blog.jgc.org/2012/03/how-to-divide-by-9-really-really-fast.html) I finally came up with something, and it seems to work correctly! (though has only been mildly tested) It's not nearly as fast as the 12 bit version, maybe not even fast enough to be practical when you can just keep separate world space and tile wrapped scroll counters. Either way, it's still many times faster than a generic 16/8 bit division routine.
** Since it's intended to be used for setting scroll registers, I don't bother calculating the upper 4 bits of the quotient. The assumption is that you'll be discarding everything but the LSB anyway.
Code:
; 16 bit unsigned divide by 240.
; Returns the lower nibble of the quotient in x, and the remainder in a.
; Conceptually similar to http://blog.jgc.org/2012/03/how-to-divide-by-9-really-really-fast.html, but in base 16.
.proc div240_quick16
; Dividend is split into nibbles, (A, B, C, D)
; Because 240/16 = 15, we can simplify the problem to dividing ABC by 15.
; Then at the end, we push D in as the low bits of the remainder.
A_B = sreg + 1
C_D = sreg + 0
tmp = A_B ; A_B is read only once, reuse it for scratch.
; x stores the quotient
.code
ldx A_B
txa
lsr
lsr
lsr
lsr ; acc = A
sta tmp
txa
and #$F ; acc = B
clc
adc tmp
sta tmp ; A + B
; A + B is also the lower few bits of the quotient
; (minus some carry bits from the remainder that happen later)
tax
lda C_D
lsr
lsr
lsr
lsr ; acc = C
clc
adc tmp ; acc = A + B + C
; Divide acc by 15, carry the quotient into x.
; acc is in the range [0, 45], so divide using an unrolled loop.
; Repeating twice is good enough for dividends up to 0xEFFF.
; Could save a few cycles here by rearranging comparisons I guess.
.repeat 2
cmp #15
bcc :+
sbc #15
inx
:
.endrepeat
; acc now has the upper nibble of the remainder and needs to be combined with D.
; Could save 2 cycles with a 4 bit look up table I guess.
asl
asl
asl
asl
sta tmp
lda C_D
and #$F
ora tmp
rts
.endproc
; Returns the lower nibble of the quotient in x, and the remainder in a.
; Conceptually similar to http://blog.jgc.org/2012/03/how-to-divide-by-9-really-really-fast.html, but in base 16.
.proc div240_quick16
; Dividend is split into nibbles, (A, B, C, D)
; Because 240/16 = 15, we can simplify the problem to dividing ABC by 15.
; Then at the end, we push D in as the low bits of the remainder.
A_B = sreg + 1
C_D = sreg + 0
tmp = A_B ; A_B is read only once, reuse it for scratch.
; x stores the quotient
.code
ldx A_B
txa
lsr
lsr
lsr
lsr ; acc = A
sta tmp
txa
and #$F ; acc = B
clc
adc tmp
sta tmp ; A + B
; A + B is also the lower few bits of the quotient
; (minus some carry bits from the remainder that happen later)
tax
lda C_D
lsr
lsr
lsr
lsr ; acc = C
clc
adc tmp ; acc = A + B + C
; Divide acc by 15, carry the quotient into x.
; acc is in the range [0, 45], so divide using an unrolled loop.
; Repeating twice is good enough for dividends up to 0xEFFF.
; Could save a few cycles here by rearranging comparisons I guess.
.repeat 2
cmp #15
bcc :+
sbc #15
inx
:
.endrepeat
; acc now has the upper nibble of the remainder and needs to be combined with D.
; Could save 2 cycles with a 4 bit look up table I guess.
asl
asl
asl
asl
sta tmp
lda C_D
and #$F
ora tmp
rts
.endproc
edit: Some cleanup and comments. Hopefully easier to understand now.