Calculate nametable address

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Calculate nametable address
by on (#97652)
This is mostly for fun/sharing. I have come up with this method of calculating the nametable address, curious as to other's methods or any criticism:

nametableaddress = y * 32 + x + $2000

Observation:
Shift y value left one bit (x2) and the value in the high nibble should be added to the high(ppu address).
The value in the low nibble should be added to the low(ppu address), but to the high nibble (x16)

Code:

; start calculate nametable address
   tya             ; reg x has nametable x coord, reg y has nametable y coord
   asl             ; mult by 2....now:
                   ; high nibble holds amount to add to (high byte, low nibble) ppu address,
                   ; low nibble to (low byte, high nibble) ppu address
   asl            
   asl
   asl
   asl             ; shift low nibble to high (x16), in effect x32 with first shift
   
   stx nametableaddress   ; don't have to do anything special with x coord
   clc
   adc nametableaddress   
   sta nametableaddress   ; ppu address low byte is done
   tya             ; restore a with y coord
   lsr             ; only three shifts since we restored a from before the first asl command
   lsr
   lsr
   ora #$20       ; make it the first nametable - (code is hardcoded for 1st nametable)
   sta nametableaddress+1
   ; done nametable address
Re: Calculate nametable address
by on (#97662)
Often calculating addresses can proof to be a nightmare in 6502 code. I mean when you have to calculate both the nametable address and the attribute address, and in your formula you forgot that there is two nametables.

Therefore I like to use lookup tables for this kind of stuff. Even if you did it with pure logic, chances are that your code will be almost as big as the lookup table.
Re: Calculate nametable address
by on (#97665)
I believe this is nearly optimal in terms of speed, with the exception of a 60 byte lookup-table, of course.
By knowing that X in 0..31 and Y in 0..29, you can slightly optimize the equation to
Code:
nametableAddr = $2000 | y*32 | x,

as X only occupies the lowest 5 bits of nametableAddrLo.
Therefore, you can save 2 cycles by replacing
Code:
   stx nametableaddress
   clc
   adc nametableaddress   
   sta nametableaddress

by
Code:
   stx nametableaddress
   ora nametableaddress   
   sta nametableaddress.


By analyzing both attribute and name table addresses for all 4 nametables we find:
nametableAddr = $2000 | nt | 32*y | x, where nt = $0000, $0400, $0800, $0C00.
In binary:
Code:
nametableAddr = 0010nnyy yyyxxxxx.

Furthermore, attrTable = $23C0 | nt | (y/4)*8 | (x/4), in binary
Code:
attrAddr = 0010nn11 11yyyxxx.

So, to compute everything for all 4 nametables in one go, one could use something like this:

Code:

; IN:    X in 0..63, Y in 0..59
; OUT: nametableLo,nametableHi = nametable address of tile (X,Y).
; OUT: attrLo, attrHi = attribute table address of tile (X,Y).
; TEMP: nametableNumber = number of nametable tile (X,Y) is on.
.proc CalculateNameAndAttributeAddr
   ; ------------------------------------------------------------------------
   ; Step 1: Calculate high byte of both addresses as a function of the
   ; nametable we're on.
   ; Max 22 cycles. Occupies 20 bytes.
   ; ------------------------------------------------------------------------- 
   lda #$20             ;
   cpy #30              ;
   bcc :+               ;
   ora #$08             ;
:  cpx #32              ;
   bcc :+               ;
   ora #$04             ;
:  sta nametableHi      ;
   ora #$03             ;
   sta attrHi           ;
Code:
   ; ------------------------------------------------------------------------
   ; Step 2: Force X and Y to ranges 0..31, 0..29.
   ; Max 16 cycles. Occupies 12 bytes.
   ; -------------------------------------------------------------------------     
   txa                  ; Fix X.
   and #31              ;
   tax                  ;
   
   cpy #30              ; Fix Y.
   bcc :+               ;
   tya                  ;
   sbc #30              ;
   tay                  ;
:
Code:
   ; ------------------------------------------------------------------------
   ; Step 3: Calculate nametable address.
   ; ------------------------------------------------------------------------
   ; Low byte: X | Y<<5.
   tya
.repeat 5
   asl A
.endrepeat
   stx nametableLo
   ora nametableLo
   sta nametableLo ; Low byte done.

   ; High byte: nametableHi | Y>>3.
   tya
   lsr A
   lsr A
   lsr A
   ora nametableHi
   sta nametableHi ; High byte done.
Code:
   ; ------------------------------------------------------------------------
   ; Step 4: Calculate attribute table address.
   ; ------------------------------------------------------------------------
   ; attrLo = X/4
   txa
   lsr A
   lsr A
   sta attrLo

   ; attrLo |= $C0 | (Y/4)*8
   tya
   asl A
   and #$38
   ora #$C0
   ora attrLo
   sta attrLo

   ; attrHi has been done in Step2 already.

   rts
.endproc

This code takes 97 cycles at max (+12 jsr/rts) and uses 68 bytes (if I didn't miscount).
Formerly: 110 cycles, 71 bytes.

Edit: Fixed small typo (tax instead of txa).
Edit2: Optimized worst case a bit.
Re: Calculate nametable address
by on (#97667)
Thank you for your interesting posts.

Jsolo wrote:
...as X only occupies the lowest 5 bits of nametableAddrLo.
Therefore, you can save 2 cycles by replacing....


Good point, I assumed I still had overlapping bits, but yes, I do have 5 bit shifts left and 0..31 will fit in 5 bits.

Maybe someone can post a lookup table solution? I know that lookup tables can be faster and even take up less space sometimes, but they often end up having a lot of repeating data and it makes it hard for me to use when I see logical patterns.

EDIT:
I think there is a small mistake:

Code:
   ; ------------------------------------------------------------------------
   ; Step 4: Calculate attribute table address.
   ; ------------------------------------------------------------------------
   ; attrLo = X/4
   tax
   lsr A
   lsr A
   sta attrLo


tax should be txa
Re: Calculate nametable address
by on (#97675)
Movax12 wrote:
EDIT:
I think there is a small mistake:

Code:
   ; ------------------------------------------------------------------------
   ; Step 4: Calculate attribute table address.
   ; ------------------------------------------------------------------------
   ; attrLo = X/4
   tax
   lsr A
   lsr A
   sta attrLo


tax should be txa


Absolutely. Fixed that one :)
Re: Calculate nametable address
by on (#97676)
Cool topic. My "get address for current attribute" is 67 cycles constant (accurate, not including jsr, rts), 43 bytes (if I didn't miscount). But it also does some things not related to just getting the address that are specific to my data. What's posted below should be 45 cycles and 29 bytes (since I removed that other stuff), but I didn't test it. (It may need some of the "unrelated" stuff I removed)

Edit: These routines may assume horizontal mirroring is used.
Another Edit: The X and Y values in these refer to the x/y PIXEL (possible values 0-255, 0-239), not tile(0-31, 0-29). I just realized that's not how some of the others do it.

Code:
;Expects a screen number (0-3) in Reserved C
;Expects a scrollyscreenlow in Reserved B
;Expects the low byte of scrollx in Reserved3
;Reserved3 Contains the low scrollxvalue to keep track

;Returns the name table address high byte to Reserved8
;Returns the name table address low byte to Reserved9
;Tile1 is temp RAM
   lda <reservedC
   asl a
   asl a
   ;clc;The asl above should clear the carry
   adc #$23
   sta <reserved8

   lda <reservedB
   lsr a;Shifted right because the
   lsr a;high bits are used only to add $C0
   
   and #%11111000;Anding to make room for the three bits
   sta <tile01;X will use

   lda <reserved3
   lsr a;Shifting since only the highest three bits
   lsr a;Matter
   lsr a
   lsr a
   lsr a

   ora <tile01
   
   ora #%11000000;Effectively adds $C0
   sta <reserved9


Here's my one for nametables. 57 cycles, and 35 bytes (if I counted right).
Edit 2: It's a constant 57. I forgot to remove the cycle counter from my previous benchmark. I removed the part of the post that said it had variable execution time.
Code:
scrollPPUaddrupdate:;{
;Expects a screen number (0-3) in ReservedC
;Expects the low byte of yscreenscroll in ReservedB
;Expects the low byte of scrollx in Reserved3

;Returns the name table address high byte to Reserved8
;Returns the name table address low byte to Reserved9
   lda #$00
   sta <tile01

   lda <reservedB
   and #%11111000;We AND because the low three bits don't matter. (0-8 don't affect which tile we're on.
   asl a;We shift left once because there needs to be 5 bits free
      ; in the bottom byte for X's value
   ;Room has been made for 4 bits with this shift, so another shift is needed.
   rol <tile01;The higher bits of y still matter,
   
   asl a;Now we shift again to make room for the fifth byte.
   sta <reserved9
   
   rol <tile01

   lda <reservedC
   asl a
   asl a
   ;clc;The asl above should clear the carry
   adc #$20

   
   ;clc;Carry should still be clear
   adc <tile01
   sta <reserved8
   
   

   lda <reserved3;It's shifted right
   lsr a;becaues the bottom three bits
   lsr a; don't matter.
   lsr a

   ora <reserved9;Now we add x to the low address byte.
   sta <reserved9
   
   rts;}


I'll have to read all of your routines to see if I'll end up replacing mine, but here they are regardless.

Edit 3: I too would like to see a lookup table solution. I NEVER think to use them, and always hit my head against a wall when some faster, smaller solution exists with them.