400h bytes transfer loop?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
400h bytes transfer loop?
by on (#179288)
How to copy 400h bytes of name table and attribute data?

I found this one in a tutorial :

Code:
LoadBackground:
  LDA $2002             ; read PPU status to reset the high/low latch
  LDA #$20
  STA $2006             ; write the high byte of $2000 address
  LDA #$00
  STA $2006             ; write the low byte of $2000 address

  LDA #$00
  STA <$00
  LDA #$E0
  STA <$01

  LDY #$00              ; start out at 0
  LDX #$04
LoadBackgroundLoop:
  LDA [$00], y     ; load data from address (background + the value in
  STA $2007             ; write to PPU
  INY                   ; Y = Y + 1
  BNE LoadBackgroundLoop
  INC <$01
  DEX
  BNE LoadBackgroundLoop


But in that way the code will depend on a fixed address of name table, in the above exmple it will be $E000.

So here is my solution :

Code:
   LDA $2002                        ; Read PPU status to reset the high/low latch
   LDA #$20
   STA $2006                        ; Write the high byte of $2000 address
   LDA #$00
   STA $2006                        ; Write the low byte of $2000 address
   LDX #$00
nam_loop1:
   LDA nam_att, X                     ; Load first 100 bytes of name table and attribute data
   STA $2007
   INX
   CPX #$00                        ; Loop runs 100 times
   BNE nam_loop1
nam_loop2:
   LDA nam_att+256, X                  ; Load second 100 bytes of name table and attribute data
   STA $2007
   INX
   CPX #$00                        
   BNE nam_loop2
nam_loop3:
   LDA nam_att+512, X                  ; Load third 100 bytes of name table and attribute data
   STA $2007
   INX
   CPX #$00                        
   BNE nam_loop3
nam_loop4:
   LDA nam_att+768, X                  ; Load forth 100 bytes of name table and attribute data
   STA $2007
   INX
   CPX #$00                        
   BNE nam_loop4


But I want to know if there is any better way?
Thanks in advance
Re: 400h bytes transfer loop?
by on (#179289)
I'm not sure what you mean by "fixed address". In your second example, the address is fixed. In the first example, you can put whatever address you want into $00/$01 and use the same code at many addresses. In the second example, it's fixed to the address of "nam_att".

The other disadvantage of the second approach is that the code is longer and takes up more space. It's slightly faster because LDA abs, X is one cycle faster than LDA (indirect), Y, so it's 1000 cycles faster, but I don't see why that minor speed difference would matter.

IMO the "better way" is the first example. Just move the code that sets up the $00/$01 pointer above the code that sets up $2006 and it becomes versatile and reusable.


Two other notes about the second example.

1. CPX #$00 is not necessary, the Z flag is set by INX already, so you can just delete that instruction (saves 2 bytes and 2 cycles).

2. If your assembler supports repeats, you can use them to write shorter code in cases like this (though the generated machine code will be the same length):

Code:
   LDA $2002                        ; Read PPU status to reset the high/low latch
   LDA #$20
   STA $2006                        ; Write the high byte of $2000 address
   LDA #$00
   STA $2006                        ; Write the low byte of $2000 address
   LDX #$00
.repeat 4, I
:
   LDA nam_att + (256 * I), X
   STA $2007
   INX
   BNE :-
.endrepeat
Re: 400h bytes transfer loop?
by on (#179291)
In the first code I need to know the address of the name table data :

Code:
.
.
LDA #$E0
.
.
;PRG Bank3 E000 ~ FFFF
  .bank 3
  .org $E000
nam_att:
  .incbin "nam.nam"
.
.


But for the second code I can use the label "nam_att" and put it anywhere in the code without having to know it's address.

I am using NESASM3 and it seems it doesn't support repeat command.
Re: 400h bytes transfer loop?
by on (#179293)
You can just replace $E000 with nam_att.
Code:
lda #<nam_att
sta $00
lda #>nam_att
sta $01
Re: 400h bytes transfer loop?
by on (#179295)
What I said before about moving the $00/$01 setup out of the way, you could make this into a function that you can reuse to load many different screens. Something like this:
Code:
LoadBackground:
  LDA $2002
  LDA #$20
  STA $2006
  LDA #$00
  STA $2006
  ; (setup of $00/$01 pointer removed, do this before calling the function)
  LDY #$00
  LDX #$04
LoadBackgroundLoop:
  LDA [$00], Y
  STA $2007
  INY
  BNE LoadBackgroundLoop
  INC <$01
  DEX
  BNE LoadBackgroundLoop
  RTS ; (added to turn LoadBackground into a function)

;
; example use
;

  ; loading the title screen
  LDA #<title_screen
  STA <$00
  LDA #>title_screen
  STA <$01
  JSR LoadBackground
  ; ...

  ; later we can load a different screen but just call the same code
  LDA #<game_screen
  STA <$00
  LDA #>game_screen
  STA <$01
  JSR LoadBackground
  ; ...
Re: 400h bytes transfer loop?
by on (#179298)
rainwarrior wrote:
You can just replace $E000 with nam_att.
Code:
lda #<nam_att
sta $00
lda #>nam_att
sta $01


Awesome!
Thanks a lot!
Re: 400h bytes transfer loop?
by on (#179299)
Similarly, you can also use a named label instead of the direct numbers $00/01. Eg.
Code:
temp_pointer = $00

lda #<nam_att
sta temp_pointer+0
lda #>nam_att
sta temp_pointer+1

lda (temp_pointer), y
sta $2007

Makes it easier if you need to move your RAM values around later, you can just change the number in 1 place rather than everywhere it is used. The only requirement here is that pointers used for indirect addressing need to go on the zero page (so $00-$FE is a valid range for temp_pointer in this example).
Re: 400h bytes transfer loop?
by on (#179300)
It didn't work :

Code:
   86  01:A067                    lda #<nam_att
       Syntax error in expression!
   88  01:A06A                    lda #>nam_att
       Syntax error in expression!
# 2 error(s)
Re: 400h bytes transfer loop?
by on (#179301)
FARID wrote:
It didn't work :

Code:
   86  01:A067                    lda #<nam_att
       Syntax error in expression!
   88  01:A06A                    lda #>nam_att
       Syntax error in expression!
# 2 error(s)

You're using NESASM3. You need to read the documentation that comes with your assembler: http://www.nespowerpak.com/nesasm/usage.txt

<nam_att means "use the low byte of the address of nam_att".
>nam_att means "use the high byte of the address of nam_att".

The NESASM3 equivalents for these are the LOW() and HIGH() functions, e.g. lda #LOW(nam_att).

Get familiar with your assembler. :-)
Re: 400h bytes transfer loop?
by on (#179303)
It works now.
Thanks a lot.
Re: 400h bytes transfer loop?
by on (#179312)
koitsu wrote:
You're using NESASM3. You need to read the documentation that comes with your assembler: http://www.nespowerpak.com/nesasm/usage.txt

<nam_att means "use the low byte of the address of nam_att".
>nam_att means "use the high byte of the address of nam_att".

The NESASM3 equivalents for these are the LOW() and HIGH() functions, e.g. lda #LOW(nam_att).

Oh, sorry. That's kinda my fault. I was trying to follow the style of the examples (I don't normally use NESASM3). Because the first example used < but also square brackets I presumed that it was valid NESASM3 code? Or does < work but only on literals?
Re: 400h bytes transfer loop?
by on (#179316)
IIRC, < is for ZP addressing in NESASM.
Re: 400h bytes transfer loop?
by on (#179319)
I just realised I linked to NESASM 2.x documentation, not 3.x. I can't even find public links to the 3.x documentation, which is amusing to me for some reason. usage.txt that comes with NESASM3.ZIP is for NESASM 3.1 and attached here; line endings appear to be LF (i.e. UNIX).

I don't see anything in the documentation (for 2.x or 3.x) that indicates < can be used to force zero-page usage, only that it can be used as the usual less-than comparison operator. But regardless LOW() and HIGH() are definitely what was needed here (no problem rainwarrior, we can't be bothered to remember every community members' assembler-of-choices' syntax :-) ).
Re: 400h bytes transfer loop?
by on (#179339)
Using NESASM myself, I know that < is used to select zero-page addressing (if not specified, it emits a two byte address). (It can also be used to mean less than in compile-time comparisons.) You use LOW and HIGH to retrieve the low or high octet of a 16-bit number.

To copy $0400 bytes from ROM to the PPU address space, one thing I have thought was to store the data in a different order in ROM (for example, at $8000 $8100 $8200 $8300 $8001 $8101 $8201 $8301 etc) and then use a loop of the X or Y register from 0 to 255 and then read all four at each step and write to PPU four times in that loop. (With my "Unofficial-MagicKit", it is possible to make the assembler to automatically rearrange the data at compile time, by defining a custom output routine.)
Re: 400h bytes transfer loop?
by on (#179341)
zzo38 wrote:
To copy $0400 bytes from ROM to the PPU address space, one thing I have thought was to store the data in a different order in ROM (for example, at $8000 $8100 $8200 $8300 $8001 $8101 $8201 $8301 etc) and then use a loop of the X or Y register from 0 to 255 and then read all four at each step and write to PPU four times in that loop. (With my "Unofficial-MagicKit", it is possible to make the assembler to automatically rearrange the data at compile time, by defining a custom output routine.)

This is a transformation that optimizes speed at the expense of versatility and code size in this case. I don't think it's really appropriate to do this for loading whole nametables. (The minor speed improvement is not likely to be significant.)

Striped tables are really good for things like looking up a 16 bit value (e.g. pointer tables, jump tables), where the table is in a fixed location. There's little drawback in cases like that, especially if you have a tool that makes them convenient. I wish ca65 had such a feature, and I sometimes use macros to work around this. I think it's cool that you added it to your own assembler.
Re: 400h bytes transfer loop?
by on (#179357)
In this post, rainwarrior wrote:
Striped tables are really good for things like looking up a 16 bit value (e.g. pointer tables, jump tables), where the table is in a fixed location. There's little drawback in cases like that, especially if you have a tool that makes them convenient. I wish ca65 had such a feature, and I sometimes use macros to work around this. I think it's cool that you added it to your own assembler.

I can think of how I might make this sort of thing in a preprocessor written in Python. Would that be a good idea to add to my list of things to make for the community once The Curse of Possum Hollow wraps?
Re: 400h bytes transfer loop?
by on (#179374)
rainwarrior wrote:
Striped tables are really good for things like looking up a 16 bit value (e.g. pointer tables, jump tables), where the table is in a fixed location. There's little drawback in cases like that, especially if you have a tool that makes them convenient. I wish ca65 had such a feature, and I sometimes use macros to work around this. I think it's cool that you added it to your own assembler.
That is true, and is what I mainly used striped tables for; it is for 16-bit values and not for nametables. You are correct and it is really much better for such thing as pointer tables and jump tables. (Example of Unofficial-MagicKit macro which does this are available on wiki.)

tepples wrote:
I can think of how I might make this sort of thing in a preprocessor written in Python. Would that be a good idea to add to my list of things to make for the community once The Curse of Possum Hollow wraps?
I suppose it might help those who use ca65. I think this is a good idea, but since I do not use ca65 perhaps see what others say.

(I myself prefer JavaScript as scripting language of choice, but that is just my opinion and you can use Python if you prefer to.)
Re: 400h bytes transfer loop?
by on (#179591)
tepples wrote:
In this post, rainwarrior wrote:
I wish ca65 had such a feature, and I sometimes use macros to work around this. I think it's cool that you added it to your own assembler.

I can think of how I might make this sort of thing in a preprocessor written in Python. Would that be a good idea to add to my list of things to make for the community once The Curse of Possum Hollow wraps?


Yes please. Better tooling helps everyone, and this would be a welcome improvement.
Re: 400h bytes transfer loop?
by on (#179593)
zzo38 wrote:
tepples wrote:
I can think of how I might make [a striped table generator] in a preprocessor written in Python. Would that be a good idea to add to my list of things to make for the community once The Curse of Possum Hollow wraps?

I suppose it might help those who use ca65. I think this is a good idea, but since I do not use ca65 perhaps see what others say.

I can make it general enough to cover NESASM, ASM6, and ca65. It needs only whatever operators go in the definition of each table entry.

Our Python vs. JavaScript debate is in another castle.
Re: 400h bytes transfer loop?
by on (#179654)
I wrote a striped table generator in C for my game but it's not very user friendly and instead of being a preprocessor it just generates a file to .include (and automatically creates an enum if you ask it to, because it's meant for metatile information and similar tables rather than small 16-bit value tables).

Would it be useful to make a more general purpose version of this tool? With what syntax and features?
Re: 400h bytes transfer loop?
by on (#179667)
Myself I'm not particularly interested in any external tools to do it (they are trivial to make); I was just musing that it would nice if it were a language feature rather than something you had to set up externally.

After poking around in the ca65 source and documentation a bit, I noticed there does exist a semi-reasonable method for generating striped tables:
Code:
; allow line continuation feature
.linecont +

; create the table as a multi-line define
.define MyTable \
   $1234, \
   $5678, \
   $9ABC

; emit the striped tables
mytable_lo: .lobytes MyTable
mytable_hi: .hibytes MyTable

Maybe the line continuations with a define isn't the prettiest look, but it does seem to do the job. Labels or other expressions seem just as good as literals, too, and there doesn't seem to be any inherent size limit on the number of entries (the .define is stored as a linked list of tokens).

I find this useful, so I'm going to start a new thread about it instead of trying to discuss it further here:
https://forums.nesdev.com/viewtopic.php?f=2&t=14838