Compact /10 and %10 operations?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Compact /10 and %10 operations?
by on (#225273)
I use a couple of /10 and %10 in my code (to display numbers), which import fairly big portions of runtime.lib I'd like to leave out. Anybody knows of any compat ubyte/10 and ubyte%10 functions in assembly?

I found this ubyte/10 by Omegamatrix https://forums.nesdev.com/viewtopic.php?f=2&t=11336
Code:
;Divide by 10
;17 bytes, 30 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr


Anybody knows something for ubyte%10? I could multiply by 10 (adding N<<3 and N<<1) the above results and substract, but is there a better method?

Thanks in advance.
Re: Compact /10 and %10 operations?
by on (#225274)
Tepples has an unrolled fast 8-bit-to-BCD converter here.

I bet there's other fast ones.
Re: Compact /10 and %10 operations?
by on (#225275)
It's not fast however it's the shortest:
Code:
; A = number (0-99)

    ldx #ff
    sec
-   inx
    sbc #10
    bcs -

; A = lower digit (0-9), X=upper digit(0-9)


Even in the worst case (number is in the 90s) it's still reasonably fast. If you want to go up to 255, it's simple to add a quick check for a number greater than 200, then 100 before this code.

Code:
; A = number (0-255)

    ldy #0
    cmp #200
    bcc +
    sbc #200
    ldy #2        ; Range 200-255
    bne ++

+   cmp #100
    bcc ++
    sbc #100      ; Range 100-199
    iny

++
    ldx #ff       ; Conversion for lower 2 digits like before
    sec
-   inx
    sbc #10
    bcc -

; A = lower digit (0-9), X=middle digit(0-9), Y=upper digit (0-2)

If you have a 16-bit number then it becomes worth doing an actual division/modulo operation.

(Note this code is untested so don't copy/paste without understanding and blame me if you get bugs)
Re: Compact /10 and %10 operations?
by on (#225278)
Thank you, that's exactly what I needed :) It doesn't to be very fast, as it is only called once in a while. Up to 99 is enough.
Re: Compact /10 and %10 operations?
by on (#225284)
Gee not having BCD is a bit of a pain here isn't it... Just typed out a nice response, and then remembered NES doesn't have BCD... ouch
Re: Compact /10 and %10 operations?
by on (#225285)
Quote:
NES doesn't have BCD


There's nothing stopping you from storing values this way, and modifying your code to make it work.

I like to keep values 0-9. If it goes higher, add another byte, but keep them both 0-9.

Then %10 is as simple as reading the lowest byte.
Re: Compact /10 and %10 operations?
by on (#225286)
Just for the annedote, Castlevania games 1 and 3 (but not 2 which uses a very different engine) stores the number of hearts in BCD, so if you have, say, 99 hearts it will be stored as $99. Personally I find this inneficient, as many shifts and bitmask operations are needed to retrive the digits, and this only saves a single byte of RAM (negligible). Storing data in BCD only makes sense for large arrays of numbers, where the RAM savings would be significant.
Re: Compact /10 and %10 operations?
by on (#225287)
Thwaite stores the score in what amounts to base 100. In its representation, $1163 has high byte 17, low byte 99, interpreted as 1799. Add 1, and you get 0x1200 (interpreted as 1800). This lets it correct for carry only between bytes (not between nibbles of one byte) while using the unrolled converter that lidnariq linked for display.
Re: Compact /10 and %10 operations?
by on (#225301)
For decimal numbers that are involved only in additions or subtractions, I store each digit (0-9) in a byte and do all the math in that format, manually generating a carry when the result for each digit is larger than 9.

Modifying the numbers requires more code, but no conventions are necessary for displaying them.
Re: Compact /10 and %10 operations?
by on (#225504)
I use unpacked bcd (byte per digit) when I have to display large scores, i.e. 5, or 6 digits. I just needed something simple to display numbers 0-99.
Re: Compact /10 and %10 operations?
by on (#225507)
That was my point exactly, the usage of "unpacked BCD" (each digit being stored separatedly) makes most sense, and the usage of "packed by 100" such as what tepples describes is great too, considering how compact it is to decode a number 0-99 into its two separate digits (see my code above). However, using packed BCD makes little sense; retriving digits is barely easier than doing so from "packed by 100" BCD, but also it requires much more complex code to do calculations, and that only for saving a single RAM byte per digit. That's why it's notable Castlevania does this, despite this making few sense technically.
Re: Compact /10 and %10 operations?
by on (#225509)
if you do it digit per byte, and handle the roll over case yourself ( as once must without BCD) it makes sense to store your numbers like scores etc in "tileset" numbers. Rarealy do you put 0-9 at tiles 0-9 you put them somewhere else.
Say its at 40 - 49. You can then Start at 40, if you hit 50, sub 10 and then add 1 to the next etc.
On the C64 quite a few games actually store the score, lives etc variables in the screen RAM( as in what you see on the screen is the variable ), not really that practical to do on a NES I guess.

If you want a fixed time, or a tight loop version you can shift bits and add the digits for it
say 0-99
you test bit 6 if set tens = 6 digits = 4
bit 5 if set add 3 2
bit 4 if set add 1 6
bit 3 if set add 0 8
bit 2 if set add 0 4
bit 1 if set add 0 2
bit 0 if set add 0 1
etc
here having BCD makes the code more compact and faster, as you need 1 table and 1 add. For larger bits 16/24/32 you can just repeat but change the byte you modify. (after also adding in the 1 28 case. )
Re: Compact /10 and %10 operations?
by on (#225511)
Oziphantom wrote:
if you do it digit per byte, and handle the roll over case yourself ( as once must without BCD) it makes sense to store your numbers like scores etc in "tileset" numbers. Rarealy do you put 0-9 at tiles 0-9 you put them somewhere else.

Actually, avoiding an useless ADD or OR operation is in my opinion a good reason to put them at tiles 0-9 precisely. And if you have to put them somewhere else, at least put the "0" in the first column (i.e. tile $x0) so that the conversion can be done with an "OR" operation, and a "CLC" instruction can be saved :)
Re: Compact /10 and %10 operations?
by on (#225515)
or you can put 0 at 246 and thus gain the natural overflow detection, so you code looks like
Code:
adc Digit
sta Digit
bcc _noOver
inc DigitTens
adc #245 ; carry is set
_noOver

And you avoid having to do any other adc/ora to convert them for display. If you need to worry about a tens overflow you can php/plp and then check to see if it was bpl _tensOverflow (or beq )
Re: Compact /10 and %10 operations?
by on (#225518)
Offsetting the values to line them up with the pattern indices for 0-9 is a good idea, as long as only the final number is offset and any values you add/subtract to/from it are still 0-based. You'll only get automatic wrap around detection in one direction anyway (either when adding or subtracting).
Re: Compact /10 and %10 operations?
by on (#225520)
I just use 100 byte lookup tables. :beer:
Re: Compact /10 and %10 operations?
by on (#225521)
tokumaru wrote:
You'll only get automatic wrap around detection in one direction anyway (either when adding or subtracting).

You won't - automatic overflow detection would work only when using 0-9 or when using 246-255; the second case being less intuitive but not more optimal than the first.
Re: Compact /10 and %10 operations?
by on (#225560)
I see optimality in having numbers in tiles 246-255, at least for things like score points: you rarely substract points, if ever, in a game. From the look of it, you genuinely optimized your tile for scorekeeping. I find it actually clever.
Re: Compact /10 and %10 operations?
by on (#225977)
pubby wrote:
I just use 100 byte lookup tables. :beer:

This division routine with remainder is quite fast and needs 100 bytes' worth of tables. (I didn't test it extensively.)

Code:
; Start with 0-99 in A.
; Get the quotient and the remainder of A divided by 10.
; Needs 9 bytes and 14 cycles per operation and 100 bytes for the tables.

lsr  ; keep LSB in carry
tax
lda unsigned_divide_by_5,x
; now A = the quotient
lda unsigned_modulo_5,x
rol  ; get LSB from carry
; now A = the remainder

unsigned_divide_by_5:
    .byte 0, 0, 0, 0, 0, 1, 1, 1, 1, 1
    .byte 2, 2, 2, 2, 2, 3, 3, 3, 3, 3
    .byte 4, 4, 4, 4, 4, 5, 5, 5, 5, 5
    .byte 6, 6, 6, 6, 6, 7, 7, 7, 7, 7
    .byte 8, 8, 8, 8, 8, 9, 9, 9, 9, 9

unsigned_modulo_5:
    .byte 0, 1, 2, 3, 4, 0, 1, 2, 3, 4
    .byte 0, 1, 2, 3, 4, 0, 1, 2, 3, 4
    .byte 0, 1, 2, 3, 4, 0, 1, 2, 3, 4
    .byte 0, 1, 2, 3, 4, 0, 1, 2, 3, 4
    .byte 0, 1, 2, 3, 4, 0, 1, 2, 3, 4

Edit: optimized the modulo routine.
Edit: combined the routines.
Re: Compact /10 and %10 operations?
by on (#225997)
Your division table is 50 bytes long. What happens when x >= 50?
Re: Compact /10 and %10 operations?
by on (#226001)
calima wrote:
Your division table is 50 bytes long. What happens when x >= 50?

It won't ever be >= 50, because of the LSR. If the largest number is 99, X will be at most 49. By getting rid of the least significant bit (and putting it back later) the tables can be made half the size.
Re: Compact /10 and %10 operations?
by on (#226025)
qalle wrote:
pubby wrote:
I just use 100 byte lookup tables. :beer:

This division routine with remainder is quite fast and needs 100 bytes' worth of tables. (I didn't test it extensively.)

Code:
; Start with 0-99 in A.
; Get the quotient and the remainder of A divided by 10.
; Needs 9 bytes and 14 cycles per operation and 100 bytes for the tables.

lsr  ; keep LSB in carry
tax
lda unsigned_divide_by_5,x
; now A = the quotient
lda unsigned_modulo_5,x
rol  ; get LSB from carry
; now A = the remainder

unsigned_divide_by_5:
    .byte 0, 0, 0, 0, 0, 1, 1, 1, 1, 1
    .byte 2, 2, 2, 2, 2, 3, 3, 3, 3, 3
    .byte 4, 4, 4, 4, 4, 5, 5, 5, 5, 5
    .byte 6, 6, 6, 6, 6, 7, 7, 7, 7, 7
    .byte 8, 8, 8, 8, 8, 9, 9, 9, 9, 9

unsigned_modulo_5:
    .byte 0, 1, 2, 3, 4, 0, 1, 2, 3, 4
    .byte 0, 1, 2, 3, 4, 0, 1, 2, 3, 4
    .byte 0, 1, 2, 3, 4, 0, 1, 2, 3, 4
    .byte 0, 1, 2, 3, 4, 0, 1, 2, 3, 4
    .byte 0, 1, 2, 3, 4, 0, 1, 2, 3, 4

Edit: optimized the modulo routine.
Edit: combined the routines.


Wow, that's a great solution. The same concept should apply when dividing by any number for which a power-of-two is a factor.
Like, dividing by 12 for example:

Code:
; Value from 0-143 in A.

lsr
rol temp
lsr
tax
lda unsigned_divide_by_3, x
; now A = the quotient
lda unsigned_modulo_3, x
rol
lsr temp
rol
; now A = the remainder

unsigned_divide_by_3:
    .byte 0, 0, 0, 1, 1, 1, 2, 2, 2
    .byte 3, 3, 3, 4, 4, 4, 5, 5, 5
    .byte 6, 6, 6, 7, 7, 7, 8, 8, 8
    .byte 9, 9, 9, 10, 10, 10, 11, 11, 11

unsigned_modulo_3:
    .byte 0, 1, 2, 0, 1, 2, 0, 1, 2
    .byte 0, 1, 2, 0, 1, 2, 0, 1, 2
    .byte 0, 1, 2, 0, 1, 2, 0, 1, 2
    .byte 0, 1, 2, 0, 1, 2, 0, 1, 2


It requires an extra temporary variable (since we initially divide by 4), but it's still a neat concept.

EDIT: Also this website has a ton of math implementations for 6502.
Re: Compact /10 and %10 operations?
by on (#226028)
Ah indeed, thanks for pointing that out.