Direct page issues...

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Direct page issues...
by on (#151469)
Hello. I'm trying to update my HDMA table in WRAM ($7FA100), however.....
Code:
modGradient:
   .EQU COUNT       1
   php
   phb
   phd   ;push flags, bank and dp register, since we're tinkering with them
   lda #$7F
   pha
   plb   ;got the new WRAM data bank (where the HDMA table is)
   rep #$20
   lda #$A100 ;direct page of the WRAM location
   tcd
   sep #$20
   .REPT 6
      ldx COUNT
      dea
      stx COUNT
      .REDEF COUNT COUNT+2
   .ENDR
   pld
   plb
   plp
   rts


I've got genuinely no idea why it isn't updating the table. Here's my initialization code (though I don't think it's relevant)...Also, $A100+COUNT does exactly what I want it to do. So....that's why I suspect something's wrong with my direct page address...
Code:
InitHDMA:
   .EQU FREERAM    $7FA100 ;initialize HDMA table in WRAM
   ldx #$0F
LVL1_HDMAINIT:
   lda.w LVL1BLUE, X
   sta FREERAM, X
   dex
   bpl LVL1_HDMAINIT
   rts
LVL1BLUE:
   .db $4F,$29   ;the table will be modified continuously via modGradient, needa put it in WRAM
   .db $04,$2A
   .db $04,$2B
   .db $04,$2D
   .db $04,$2E
   .db $04,$2F
   .db $04,$30
   .db $00   


Please help, thanks
Re: Direct page issues...
by on (#151474)
Are you using ca65? ca65 has some kind of messed-up direct page behaviour that I can never remember exactly what it is, since I use WLA. As far as I can tell your code should be working.
Re: Direct page issues...
by on (#151477)
Khaz wrote:
Are you using ca65? ca65 has some kind of messed-up direct page behaviour that I can never remember exactly what it is, since I use WLA. As far as I can tell your code should be working.


Yeah, I am using WLA; I'm aware of ca65's issues from browsing the forum's earlier posts. Really sucks when I don't get what's happening lol
Re: Direct page issues...
by on (#151478)
Oh, okay. Well, uh, I've never used REPT or REDEF myself... I pretty much just write all my code out literally.

OH WAIT. I think this is your problem:

ldX
deA
stX
Re: Direct page issues...
by on (#151479)
You can't access bank $7F with direct page. It's always in bank $00.
Re: Direct page issues...
by on (#151516)
Khaz wrote:
Oh, okay. Well, uh, I've never used REPT or REDEF myself... I pretty much just write all my code out literally.

OH WAIT. I think this is your problem:

ldX
deA
stX


Lol yep I've noticed that right after I submit my original code and fixed it - I thought it had something to do with differences between the accumulator and x register addressing, but that made no sense so I changed it back.

As for the above post, If that's the actual issue, then what is the purpose of direct paging, assuming it can only address the 0 bank? In other words (I'm pretty new so please pardon the amateur questions) what's in the 0 bank? VRAM, dedicated hardware registers or what? What can I possibly use direct paging with?
Re: Direct page issues...
by on (#151519)
gnarlyWarlock wrote:
As for the above post, If that's the actual issue, then what is the purpose of direct paging, assuming it can only address the 0 bank? In other words (I'm pretty new so please pardon the amateur questions) what's in the 0 bank? VRAM, dedicated hardware registers or what? What can I possibly use direct paging with?

Yeah I shoulda remembered that. The zero bank has a lot of useful stuff actually, namely the lowest $2000 of WRAM (which is mirrored in every bank from $00 to $3F (in LOROM)) and access to all your hardware registers, both of which can benefit greatly from some extra speed. Personally I've crammed almost all my game variables and the main "object list" inside that lowest $2000, the rest of WRAM I use for huge tables and other things that are not really time-sensitive. I mostly use direct page to iterate through my list of objects in game.

Others around here have more experience than me and I'm sure could list all kinds of other clever uses for it.
Re: Direct page issues...
by on (#151521)
Khaz wrote:
gnarlyWarlock wrote:
As for the above post, If that's the actual issue, then what is the purpose of direct paging, assuming it can only address the 0 bank? In other words (I'm pretty new so please pardon the amateur questions) what's in the 0 bank? VRAM, dedicated hardware registers or what? What can I possibly use direct paging with?

Yeah I shoulda remembered that. The zero bank has a lot of useful stuff actually, namely the lowest $2000 of WRAM (which is mirrored in every bank from $00 to $3F (in LOROM)) and access to all your hardware registers, both of which can benefit greatly from some extra speed. Personally I've crammed almost all my game variables and the main "object list" inside that lowest $2000, the rest of WRAM I use for huge tables and other things that are not really time-sensitive. I mostly use direct page to iterate through my list of objects in game.

Others around here have more experience than me and I'm sure could list all kinds of other clever uses for it.


http://wiki.superfamicom.org/snes/show/ ... M+Tutorial in this tutorial it says it can address the first 64K of memory...I should've noticed that. Makes much more sense now. So, I suppose when you say it's mirrored, I assume WRAM is identical in banks 00-3F... cool, fair enough.

Thank you very much for clarifying that for me!
Re: Direct page issues...
by on (#151753)
I hope someone sees this, because yet again, the official documentation isn't clear

I'm quoting the SNES Dev Manual, page 2-21-1
Quote:
The WRAM (8K-Byte) is mapped to address (0000~1FFF) of banks (00~3F), (80~BF) and 7E> This is the WRAM used as common bank. This 8K-bytes can be accessd from any bank described above...This is the WRAM used as the common bank. This 8k-Bytes can be accessed from any bank described above...


Now:
Code:
   .EQU FREERAM    $7EA100 ;initialize HDMA table in WRAM   
   ldx #$10
LVL1_HDMAINIT:
   lda.w LVL1BLUE, X
   sta FREERAM, X
   dex
   bpl LVL1_HDMAINIT
   rts

works fine; I put #$A100 into $4302 and #$7E into $4304, start HDMA and everything's groovy. HOWEVER, that same code with FREERAM = $A100, and stz $4304 : it doesn't work. -_-".

Also, in my modGradient subroutine, if I do indeed load the table to #$7FA100, looking at the documentation, I'd assume I can direct page to it (lda #$A100, tcd, and rock and roll). That doesn't work. wtf??

ALSO, IF I put LVL1BLUE into another bank (lets say, bank 1, and try to lda.w $8000+LVL1BLUE, X), it can't find the table; the table has to be in the same bank as this subroutine.

Please help me out, the header's attached, on which I have yet ANOTHER question: I define having 8 ROM banks, and each ROM bank is $8000; does that mean my ROM is necessarily in banks $00-$08, from CPU's reference point?
Re: Direct page issues...
by on (#151760)
gnarlyWarlock wrote:
works fine; I put #$A100 into $4302 and #$7E into $4304, start HDMA and everything's groovy. HOWEVER, that same code with FREERAM = $A100, and stz $4304 : it doesn't work. -_-".

Of course not. $A100 isn't in the bottom $2000 bytes (8 kilobytes) of the bank, meaning it's not in the WRAM mirror. In bank $00, $A100 is a ROM location.

Quote:
Also, in my modGradient subroutine, if I do indeed load the table to #$7FA100, looking at the documentation, I'd assume I can direct page to it (lda #$A100, tcd, and rock and roll). That doesn't work. wtf??

Why would you assume that? Like I said, you can't use direct page to access anything outside bank $00, and it's only $7E0000-$7E1FFF that's mirrored to $0000-$1FFF in banks $00-$3F and $80-$BF. You can't access any of bank $7F via direct page, ever, and that also goes for the upper $E000 bytes (ie: most) of bank $7E.

Quote:
ALSO, IF I put LVL1BLUE into another bank (lets say, bank 1, and try to lda.w $8000+LVL1BLUE, X), it can't find the table; the table has to be in the same bank as this subroutine.

Two things: First, the increment is not $8000; it's $10000. Banks are always $10000 bytes; it's just that they aren't always 100% ROM (the banks you're using are 50% ROM and 50% WRAM/MMIO/reserved areas). Second, by using lda.w, you're probably forcing 16-bit addressing, which causes the assembler to ignore the top byte (ie: the bank byte) of the operand. Try lda.l.

Or just try to arrange your code to minimize long accesses, because they're slow. Keep in mind that the data bank (the bank used for 16-bit addressing of data) does not have to be the same as the program bank (the bank the program counter is currently in), so you can access data that's in a different bank from the code doing the accessing without having to use long addresses. Of course, if you only have a handful of accesses to do in a particular bank, it's probably faster to use 24-bit addressing than to change the data bank just for that...

It might be a good idea to study up on the SNES memory map; once you get it, everything makes way more sense.

Quote:
I define having 8 ROM banks, and each ROM bank is $8000; does that mean my ROM is necessarily in banks $00-$08, from CPU's reference point?

You mean $00-$07, right? It certainly should be (unless there's an assembler option I'm not aware of), but for a ROM that small you should also be able to find the data in $40-$47, $80-$87, and $C0-$C7. Since you're using LoROM, I wouldn't bother with the $40+ and $C0+ ranges - if you're using some variant of HiROM they allow access to all $10000 bytes of a ROM bank, but if you aren't, the only practical difference is that you can't access MMIO or the RAM mirror with 16-bit addressing in those banks. $80+ is useful for FastROM purposes, but that's out of scope so I'll leave off here.
Re: Direct page issues...
by on (#151811)
gnarlyWarlock wrote:
I hope someone sees this, because yet again, the official documentation isn't clear


You're suffering from a fundamental misunderstanding. Direct page is limited to accessing the first 64K of the CPU address space. Not the first 64K of RAM, not the first 64K of ROM, but the first 64K of the address space. The 65816 doesn't "know" anything about RAM or ROM, it only knows addresses.

That means that direct page can access the following areas:

Code:
000000-001FFF: WRAM (first 8k bytes only)
002100-0021FF: B bus (S-PPU registers, etc.)
004000-0043FF: S-CPU registers (HDMA, etc.)
008000-00FFFF: ROM (first 32k bytes if Mode 20; second 32k bytes if Mode 21)


Plus any cartridge-specific special registers (DSP, SuperFX, SA-1, etc.)

Direct page addressing adds the the 8-bit operand, the 16-bit direct page register, and possibly an index register, then ANDs the sum with $FFFF to get the effective address. The data bank register (the one you access with PHB/PLB) is completely ignored, it's treated as always zero. Only the addressing modes that take a 16-bit address use the data bank register.

Quote:
HOWEVER, that same code with FREERAM = $A100, and stz $4304 : it doesn't work. -_-".


It doesn't work because $00A100 is ROM. You can't write anything there.

Quote:
ALSO, IF I put LVL1BLUE into another bank (lets say, bank 1, and try to lda.w $8000+LVL1BLUE, X), it can't find the table; the table has to be in the same bank as this subroutine.


It can't find it because the data bank is pointing to the bank your subroutine is in, not the one your table is in. Absolute indexed addressing does use the data bank register.

Quote:
I define having 8 ROM banks, and each ROM bank is $8000; does that mean my ROM is necessarily in banks $00-$08, from CPU's reference point?


Address space banks and ROM "banks" are distinct but related concepts. I assume your ROM is Mode 20 because you're talking about $8000 banks. In that case, the first ROM bank (the first $8000 bytes) is visible in address space banks 00 and 80. The second $8000 bytes is visible in banks 01 and 81. The third $8000 bytes is visible in banks 02 and 82, et cetera. If you have 256 KB of ROM then the last $8000 bytes will be visible in banks 07 and 87.
Re: Direct page issues...
by on (#151836)
Thank you guys so much. It's always the simple stupid mistakes that get me, and I seriously JUST realized $A100 isn't in $0000-$1FFF LOL. The info on the memory map is incredibly useful though, thanks. I couldn't understand the nomenclature they use for the different ROM types (There's Modes 20, 21, 25; however, I still have a fairly weak grasp of the concepts of LoROM and HiROM, though I presume the prefixes indicate size (???)). I have been wondering how to load in the full 24 bit address and haven't realized that I can designate it by lda.l. But yeah the bank information is incredibly useful.

However, on that last part AWJ mentioned dp addresses all of bank 0...well then, if I have my data stored at offset $1000 (still in WRAM) and I transfer that to the page register, and alter the data @ that offset, in bank 0, will the WRAM in banks $00-$3F be altered immediately, as well as $7F, or is it going to take a few clock cycles to synchronize them together? In other words, is the WRAM in all those banks the same exact space of physical memory that we're altering, or are there multiple copies of the same exact pool of memory? In the documentation it does say the SNES has 8k fast WRAM, and then an extra 120k, so a total of 128k. Well, If WRAM is just one pool, lets say banks $7E and $7F are fully dedicated to WRAM, then 16^4*2 = 131k. So what happens with that extra 3k of memory?
Re: Direct page issues...
by on (#151837)
The first nibble of the mode denotes the ROM speed.
  • Mode 2x is slow ROM, where ROM is accessed at 2.7 MHz. Slow ROM must be 200 ns or faster.
  • Mode 3x is fast ROM, which allows 3.6 MHz ROM reading in $80-$FF but needs 120 ns or faster ROMs. It provides a speed advantage for code or data tables in ROM but not for DMA copies from ROM, which always run at 2.7 MHz.

The second nibble of the mode denotes the way ROM banks are arranged in S-CPU address space.
  • Mode x0 is LoROM, where 32K of ROM occupies the second half of each bank from $80-$FF (mirrored to $00-$7D). LoROM can be up to 32 Mbit (4 MiB), with 128 banks of 32 KiB each.
  • Mode x1 is HiROM, where ROM occupies the whole bank $C0-$FF (mirrored to $40-$7D, and with the second half of each 64K bank mirrorred to $00-$3F and $80-$BF). HiROM can be up to 32 Mbit (4 MiB), with 64 banks of 64 KiB each.
  • Mode x5 is ExHiROM, which consists of one 32 Mbit (4 MiB) HiROM mapped to $C0-$FF and a second 8, 16, or 32 Mbit (1, 2, or 4 MiB) HiROM mapped to $40-$7D. It's mirrored as in HiROM, meaning the vectors come from the second ROM. ExHiROM can be up to 64 Mbit (8 MiB), with 128 banks of 64 KiB each, though the first half of the last two banks is inaccessible and the second half is accessible only through the mirrors at $3E and $3F. The second ROM is also always slow, making it useful for less frequently used code or data or as a DMA source.

The WRAM at the start of $00-$3F and $80-$BF is exactly the same physical memory as $7E0000-$7E1FFF. RAM access is always slow (2.7 MHz).

Quote:
lets say banks $7E and $7F are fully dedicated to WRAM, then 16^4*2 = 131k. So what happens with that extra 3k of memory?

That's the 2.4% difference between 1000 bytes in a (metric) kilobyte and 1024 bytes in what is officially called a kibibyte (abbreviated KiB).
Re: Direct page issues...
by on (#151844)
gnarlyWarlock wrote:
However, on that last part AWJ mentioned dp addresses all of bank 0...well then, if I have my data stored at offset $1000 (still in WRAM) and I transfer that to the page register, and alter the data @ that offset, in bank 0, will the WRAM in banks $00-$3F be altered immediately, as well as $7F, or is it going to take a few clock cycles to synchronize them together? In other words, is the WRAM in all those banks the same exact space of physical memory that we're altering, or are there multiple copies of the same exact pool of memory? In the documentation it does say the SNES has 8k fast WRAM, and then an extra 120k, so a total of 128k. Well, If WRAM is just one pool, lets say banks $7E and $7F are fully dedicated to WRAM, then 16^4*2 = 131k. So what happens with that extra 3k of memory?


The SNES WRAM size is exactly $20000 bytes. The whole thing is mapped at $7E0000-$7FFFFF, and the first $2000 bytes are mirrored in banks 00-3F and 80-BF. The mirrors are all physically the same memory as each other and as $7E0000-7E1FFF. However, $7F0000-$7F1FFF is not a mirror, it's a separate chunk of RAM.
Re: Direct page issues...
by on (#151845)
gnarlyWarlock wrote:
It's always the simple stupid mistakes that get me

It happens. Just last week I fixed a bug in a display engine test ROM I'd been working on, wherein I had tried to get an IRQ to fire on the 14th scanline by setting VTIME to $0014. Before that I had trouble with my IRQs sending the CPU all over everywhere, because I had defined my IRQ trampoline location as $1000 (so far so good) and then used $IRQ0 instead of IRQ0 to actually insert the value into the ROM...

Oh, and watch out for immediate value loads that don't actually have # in front. It's a perfectly valid expression, and depending on what's in the memory location you accidentally loaded from, it can be very difficult to diagnose the problem based solely on output. Your code might even work, for a while...

"BRK" redirects here. For other uses, see BRK (disambiguation). - Wikipedia
Re: Direct page issues...
by on (#151910)
tepples wrote:
The first nibble of the mode denotes the ROM speed.
  • Mode 2x is slow ROM, where ROM is accessed at 2.7 MHz. Slow ROM must be 200 ns or faster.
  • Mode 3x is fast ROM, which allows 3.6 MHz ROM reading in $80-$FF but needs 120 ns or faster ROMs. It provides a speed advantage for code or data tables in ROM but not for DMA copies from ROM, which always run at 2.7 MHz.

The second nibble of the mode denotes the way ROM banks are arranged in S-CPU address space.
  • Mode x0 is LoROM, where 32K of ROM occupies the second half of each bank from $80-$FF (mirrored to $00-$7D). LoROM can be up to 32 Mbit (4 MiB), with 128 banks of 32 KiB each.
  • Mode x1 is HiROM, where ROM occupies the whole bank $C0-$FF (mirrored to $40-$7D, and with the second half of each 64K bank mirrorred to $00-$3F and $80-$BF). HiROM can be up to 32 Mbit (4 MiB), with 64 banks of 64 KiB each.
  • Mode x5 is ExHiROM, which consists of one 32 Mbit (4 MiB) HiROM mapped to $C0-$FF and a second 8, 16, or 32 Mbit (1, 2, or 4 MiB) HiROM mapped to $40-$7D. It's mirrored as in HiROM, meaning the vectors come from the second ROM. ExHiROM can be up to 64 Mbit (8 MiB), with 128 banks of 64 KiB each, though the first half of the last two banks is inaccessible and the second half is accessible only through the mirrors at $3E and $3F. The second ROM is also always slow, making it useful for less frequently used code or data or as a DMA source.

The WRAM at the start of $00-$3F and $80-$BF is exactly the same physical memory as $7E0000-$7E1FFF. RAM access is always slow (2.7 MHz).

Quote:
lets say banks $7E and $7F are fully dedicated to WRAM, then 16^4*2 = 131k. So what happens with that extra 3k of memory?

That's the 2.4% difference between 1000 bytes in a (metric) kilobyte and 1024 bytes in what is officially called a kibibyte (abbreviated KiB).

This clarifies so much. Thank you very much

AWJ wrote:
The SNES WRAM size is exactly $20000 bytes. The whole thing is mapped at $7E0000-$7FFFFF, and the first $2000 bytes are mirrored in banks 00-3F and 80-BF. The mirrors are all physically the same memory as each other and as $7E0000-7E1FFF. However, $7F0000-$7F1FFF is not a mirror, it's a separate chunk of RAM.

Awesome, that's what I thought

93143 wrote:
gnarlyWarlock wrote:
It's always the simple stupid mistakes that get me

It happens. Just last week I fixed a bug in a display engine test ROM I'd been working on, wherein I had tried to get an IRQ to fire on the 14th scanline by setting VTIME to $0014. Before that I had trouble with my IRQs sending the CPU all over everywhere, because I had defined my IRQ trampoline location as $1000 (so far so good) and then used $IRQ0 instead of IRQ0 to actually insert the value into the ROM...

Oh, and watch out for immediate value loads that don't actually have # in front. It's a perfectly valid expression, and depending on what's in the memory location you accidentally loaded from, it can be very difficult to diagnose the problem based solely on output. Your code might even work, for a while...

"BRK" redirects here. For other uses, see BRK (disambiguation). - Wikipedia

Ah, well, what immediates are there which I can specify without a pound sign prefixed? Cause, even with assembler defined variables (.EQU WHATEVER $1) I'd still need to add the pound in front so it doesn't assume I wanna load address $1.

Also, since you brought up the topic (and it's waaaaaaaaay ahead of where I am, but still for future reference) when doing that conditional jump based on the tracer location, would I just write scanline value to register $4209/A, and have the address of where I want it to jump to written in the header (under .SNESNATIVEVECTOR>IRQ), or is there some continuous comparison between the tracer vertical location register ($213C) and $4209/A?
Re: Direct page issues...
by on (#151923)
gnarlyWarlock wrote:
Ah, well, what immediates are there which I can specify without a pound sign prefixed? Cause, even with assembler defined variables (.EQU WHATEVER $1) I'd still need to add the pound in front so it doesn't assume I wanna load address $1.

You still write ".EQU WHATEVER $01" regardless of how you plan to use WHATEVER. When you refer to WHATEVER as a value and not an address, you put a # in front of it. .EQU just tells the assembler "Wherever you see WHATEVER, replace it with $01". So if you do lda.w #WHATEVER, you get lda.w #$01.
gnarlyWarlock wrote:
Also, since you brought up the topic (and it's waaaaaaaaay ahead of where I am, but still for future reference) when doing that conditional jump based on the tracer location, would I just write scanline value to register $4209/A, and have the address of where I want it to jump to written in the header (under .SNESNATIVEVECTOR>IRQ), or is there some continuous comparison between the tracer vertical location register ($213C) and $4209/A?

I think you've got the right idea. The scanline-interrupt uses the IRQ vector. Just make sure you enable the IRQ in $4200, or else it won't do anything! Also remember to read from $4211 at the end of your IRQ routine, to clear the interrupt.
Re: Direct page issues...
by on (#151925)
gnarlyWarlock wrote:
93143 wrote:
Oh, and watch out for immediate value loads that don't actually have # in front.

Ah, well, what immediates are there which I can specify without a pound sign prefixed?

There aren't any. It's just that unless your brain is very different from mine (which is possible), accidentally leaving it off is easier than you might think.

Quote:
Also, since you brought up the topic (and it's waaaaaaaaay ahead of where I am, but still for future reference) when doing that conditional jump based on the tracer location, would I just write scanline value to register $4209/A, and have the address of where I want it to jump to written in the header (under .SNESNATIVEVECTOR>IRQ), or is there some continuous comparison between the tracer vertical location register ($213C) and $4209/A?

The former. Basically you can just set up the interrupt to fire when a certain H-, V-, or H/V-position is reached, turn it on, and go off to do whatever. When the interrupt fires, the CPU will finish the instruction it's busy with, save its P register and program counter to the stack and jump to the location in bank $00 indicated in the 16-bit IRQ vector. At the end of the IRQ routine, you have to use rti, which automatically restores P from the stack and returns the program counter to the saved location so as to resume processing. (Note that only the P and PC registers are pushed onto the stack; if you're going to change any other registers you need to save them too. More details at the link.) On SNES, you also have to read $4211 some time during the IRQ handler, otherwise the IRQ will fire again right away as soon as you return.

https://en.wikipedia.org/wiki/Interrupt ... processors

You can also do a polling loop, wherein you continually latch the counters and read $213D ($213C is the horizontal position and should normally only be polled once $213D is the right value, if at all), and do something once it reaches a certain value. This is very time-consuming and prevents you from doing anything else efficiently; it's also much less precise, since it takes a while to check the counter even if you aren't doing anything else.

In case you were wondering, a "trampoline" in this context is just a quick snippet of code at the location pointed to by the IRQ vector, intended to redirect the CPU to a more substantial IRQ routine somewhere else - in my case, any of a number of routines in the bank $80+ region, so I can use FastROM. The beauty of putting the trampoline in RAM is that it can be rewritten to point to whatever IRQ routine you want to run next, so you don't need a single massive handler with a ton of conditional logic up front...

It should be noted (though I imagine you may be aware of this) that most SNES games use a Non-Maskable Interrupt (NMI) to trigger their VBlank routines (typically used for bulk DMA transfer of graphics data and syncing up of the game loop). NMI on the SNES is hardwired to fire at the start of VBlank, and can thus be moved only by changing the overscan bit, but it can be turned off if you don't want it.
Re: Direct page issues...
by on (#151933)
tepples wrote:
That's the 2.4% difference between 1000 bytes in a (metric) kilobyte and 1024 bytes in what is officially called a kibibyte (abbreviated KiB).

Read: people like to make our lives harder for no apparent reason. Doesn't help that what has happened is not that people switched to the new prefixes (which at worst would have been annoying), but instead changed to count in 1000s instead of 1024s, so now you have absolutely no way to tell for sure what a size really means unless you somehow knew beforehand.

For the record, I imagine this was to appease to disk manufacturers which use something in-between (1024 for kilo, then 1000 for mega, giga, etc.). Solid state drives always use the 1024 factors (since their magnitude is always based on powers of two).

EDIT: from the talk page of that wikipedia article:
Quote:
What were the IEC thinking making another term to mean 1024 bytes, they should have made a term to mean 1000 bytes and shamed drive manufacturers into using that.
This. (this probably belongs to another thread if we decide to argue about this...)
Re: Direct page issues...
by on (#151935)
Sik wrote:
For the record, I imagine this was to appease to disk manufacturers which use something in-between (1024 for kilo, then 1000 for mega, giga, etc.). Solid state drives always use the 1024 factors (since their magnitude is always based on powers of two).
Absolutely not. Both hard disk drives, SSDs, and even consumer flash (e.g. SD, CF) have always measured bytes using "proper" SI units, not this goofy powers of 2 crap.

Even the really ancient hard drives ca. 1960 measured capacity as powers of 10, e.g. 5000000 six-bit bytes.
Re: Direct page issues...
by on (#151941)
Since when do SD cards not get measured in powers of two?

lidnariq wrote:
Even the really ancient hard drives ca. 1960 measured capacity as powers of 10, e.g. 5000000 six-bit bytes.

They also had non-power of two words (and bytes didn't exist, for that matter).
Re: Direct page issues...
by on (#151942)
Sik wrote:
Since when do SD cards not get measured in powers of two?
Check for yourself. My 4GB SD card contains 3872256 1 KiB sectors, for 3965190144 bytes. My 32GB SD card contains 31166976 1 KiB sectors, for 31914983424 bytes. Definitely NOT capacities of 2^32 or 2^35.

Lest you think this is a recent thing, this random 16MB CF card contains 15904 1 KiB sectors, for 16285696 bytes.
Re: Direct page issues...
by on (#151975)
Sik wrote:
Read: people like to make our lives harder for no apparent reason. Doesn't help that what has happened is not that people switched to the new prefixes (which at worst would have been annoying), but instead changed to count in 1000s instead of 1024s, so now you have absolutely no way to tell for sure what a size really means unless you somehow knew beforehand.

It could always be worse
Re: Direct page issues...
by on (#151983)
Sik wrote:
Doesn't help that what has happened is not that people switched to the new prefixes (which at worst would have been annoying)

File managers included with free desktop environments are more likely to have switched to the new prefixes.

Quote:
For the record, I imagine this was to appease to disk manufacturers which use something in-between (1024 for kilo, then 1000 for mega, giga, etc.). Solid state drives always use the 1024 factors (since their magnitude is always based on powers of two).

The underlying size of a solid state drive is based on powers of two. But the drive's controller uses the 7.4% difference between a GB and a GiB for the wear leveling algorithm.
Re: Direct page issues...
by on (#151985)
lidnariq wrote:
Sik wrote:
Since when do SD cards not get measured in powers of two?
Check for yourself. My 4GB SD card contains 3872256 1 KiB sectors, for 3965190144 bytes. My 32GB SD card contains 31166976 1 KiB sectors, for 31914983424 bytes. Definitely NOT capacities of 2^32 or 2^35.

They're definitely not 4×10⁹ nor 32×10⁹ either, for that matter (those numbers aren't round). And both of those numbers are multiples of 1024, in fact. It looks more like it's "missing" the spare sectors used as a back-up once wear out kicks in.
Re: Direct page issues...
by on (#151994)
93143 wrote:
gnarlyWarlock wrote:
93143 wrote:
Oh, and watch out for immediate value loads that don't actually have # in front.

Ah, well, what immediates are there which I can specify without a pound sign prefixed?

There aren't any. It's just that unless your brain is very different from mine (which is possible), accidentally leaving it off is easier than you might think.

Quote:
Also, since you brought up the topic (and it's waaaaaaaaay ahead of where I am, but still for future reference) when doing that conditional jump based on the tracer location, would I just write scanline value to register $4209/A, and have the address of where I want it to jump to written in the header (under .SNESNATIVEVECTOR>IRQ), or is there some continuous comparison between the tracer vertical location register ($213C) and $4209/A?

The former. Basically you can just set up the interrupt to fire when a certain H-, V-, or H/V-position is reached, turn it on, and go off to do whatever. When the interrupt fires, the CPU will finish the instruction it's busy with, save its P register and program counter to the stack and jump to the location in bank $00 indicated in the 16-bit IRQ vector. At the end of the IRQ routine, you have to use rti, which automatically restores P from the stack and returns the program counter to the saved location so as to resume processing. (Note that only the P and PC registers are pushed onto the stack; if you're going to change any other registers you need to save them too. More details at the link.) On SNES, you also have to read $4211 some time during the IRQ handler, otherwise the IRQ will fire again right away as soon as you return.

https://en.wikipedia.org/wiki/Interrupt ... processors

You can also do a polling loop, wherein you continually latch the counters and read $213D ($213C is the horizontal position and should normally only be polled once $213D is the right value, if at all), and do something once it reaches a certain value. This is very time-consuming and prevents you from doing anything else efficiently; it's also much less precise, since it takes a while to check the counter even if you aren't doing anything else.

In case you were wondering, a "trampoline" in this context is just a quick snippet of code at the location pointed to by the IRQ vector, intended to redirect the CPU to a more substantial IRQ routine somewhere else - in my case, any of a number of routines in the bank $80+ region, so I can use FastROM. The beauty of putting the trampoline in RAM is that it can be rewritten to point to whatever IRQ routine you want to run next, so you don't need a single massive handler with a ton of conditional logic up front...

It should be noted (though I imagine you may be aware of this) that most SNES games use a Non-Maskable Interrupt (NMI) to trigger their VBlank routines (typically used for bulk DMA transfer of graphics data and syncing up of the game loop). NMI on the SNES is hardwired to fire at the start of VBlank, and can thus be moved only by changing the overscan bit, but it can be turned off if you don't want it.

Thank you very much for the info! I did know about VBlank and NMI's, but the concept of trampolines is new. I've actually fixed all my direct page problems so now I'll go tinker with IRQ's and see how they work. Don't wanna dive into making a game till I sufficiently know the hardware.

tepples wrote:
The underlying size of a solid state drive is based on powers of two. But the drive's controller uses the 7.4% difference between a GB and a GiB for the wear leveling algorithm.

Even for SSDs? Thought that that was the benefit of not having any mechanical components and surface etching (besides the quicker access speeds). Then, what does wear down in an SSD?
Re: Direct page issues...
by on (#152001)
Sik wrote:
Since when do SD cards not get measured in powers of two?
[...]
They're definitely not 4×10⁹ nor 32×10⁹ either, for that matter (those numbers aren't round). And both of those numbers are multiples of 1024, in fact. It looks more like it's "missing" the spare sectors used as a back-up once wear out kicks in.

That's entirely irrelevant. The entire point is that they never insinuated you could have 2³² or 2³⁵ bytes of storage.

They always used proper SI prefixes, and the only reason people feel entitled to binary prefixes is because some programmers cared that bitshifts are faster than division.
Re: Direct page issues...
by on (#152002)
gnarlyWarlock wrote:
tepples wrote:
The underlying size of a solid state drive is based on powers of two. But the drive's controller uses the 7.4% difference between a GB and a GiB for the wear leveling algorithm.

Even for SSDs? Thought that that was the benefit of not having any mechanical components and surface etching (besides the quicker access speeds). Then, what does wear down in an SSD?

In any EEPROM, the floating gates in a page eventually wear out after the page has being erased so many times. So an SSD has a few extra pages to let its microcontroller spread repeated writes to a given logical block address across multiple physical pages. See Wear leveling and Log-structured file system on Wikipedia.
Re: Direct page issues...
by on (#152018)
lidnariq wrote:
That's entirely irrelevant. The entire point is that they never insinuated you could have 2³² or 2³⁵ bytes of storage.

They always used proper SI prefixes, and the only reason people feel entitled to binary prefixes is because some programmers cared that bitshifts are faster than division.

My point was that the end value is misleading regardless of which of the two conventions you use.
Re: Direct page issues...
by on (#152038)
And my point is that the only things that have ever been unilaterally measured using capacities of the form 2ⁿ are RAM and 'PROMs.

Not magnetic storage media capacity, not consumer flash capacity, not clock speeds, not bandwidth, not silicon feature size, not distances, not time, not velocity.

The only reason that it's at all misleading is because someone at Microsoft STILL thinks they should be reporting {GiB that are mislabeled as GB}. That's the entire reason for the confusion.
Re: Direct page issues...
by on (#152045)
lidnariq wrote:
And my point is that the only things that have ever been unilaterally measured using capacities of the form 2ⁿ are RAM and 'PROMs.

Not magnetic storage media capacity

Floppies are measured in KiB, such as the 140 KiB Disk II 5.25" format, the 360 KiB and 1.2 MfB IBM 5.25" formats, the 400 KiB and 800 KiB Apple 3.5" formats, and the 720 KiB and 1.44 MfB 3.5" IBM formats. Here, MfB (mega-floppy-bytes) is a hybrid unit equal to 1,024,000 bytes or 1024 kB or 1000 KiB. I just made up the name and symbol, but that's what MB referred to during the high-density floppy era.

Quote:
not consumer flash capacity

Depends on whether you consider GB and GBA flash cards to be "consumer flash". They're closer to 'PROMs in use, and marked with a measure in "megabits" (128 KiB).

Quote:
not clock speeds

The Game Boy line runs at 4.19 MHz (GB), 8.39 MHz (GBC), or 16.78 MHz (GBA). As far as the homebrew community can tell, these were intended as 4, 8, and 16 MiHz.
Re: Direct page issues...
by on (#152108)
Well, all is good and well, but I've come across yet another brick wall. I've been sitting on it for a few days, tried a few things but I still can't piece why exactly it doesn't work as I want to. So, I'm trying to do a wavy background like. I'm writing to $210D, and all is well and good; when I initialize it, everything's chill, and walking my sprite down the screen, the background waves as expected (since all the scanlines are offset)
Code:
wave_table:
   .REPT 5
      .db 3, 16, 0     
      .db 4, 20, 0
      .db 6, 24, 0
      .db 7, 28, 0
      .db 6, 24, 0
      .db 4, 20, 0
      .db 3, 16, 0
      .db 2, 12, 0
      .db 3, 8, 0 
      .db 4, 4, 0
      .db 6, 0, 0   
      .db 7, 4, 0
      .db 6, 8, 0 
      .db 4, 12, 0   
   .ENDR
   .db 0   


Now, at initialization, I've uploaded the table into RAM, and I'm reading it from there, so I know for a fact the initial upload was successful. Now, I'm trying to modify the wave at every frame (basically, if the value of the scanline in front is larger than the one behind, add 4, otherwise subtract. The really last scanline is a special case so I handle that separately)
Code:
modWave:
   php
   phd   ;preserve the direct page
   rep #$20
   lda #FREERAM2   ;direct page mirrored in bank 7E to bank 0
   tcd
   sep #$30
   ldx #1   ;use this as the load offset
   ldy #69   ;42 bytes, so 39 bytes, or 13 lines, get to compare. the last line will have to be modified outside the loop
      ;it will observe the value behind it. notice the table itself is x5, so we repeat 13*5 (+4 for all "last lines" that aren't the actual last line)
   wave_begin:
      clc
      lda 1, X   ;indexed dp addressing
      cmp 4, X
      bcs decr_wave   ;if its larger than the scanline after, decrease it
   incr_wave:
      adc #4
      bra done_wave
   decr_wave:   
      sbc #4
   done_wave:
      sta 1, X
      ;increment offset
      clc
      txa
      adc #3
      tax
      ;decrement counter
      dey
      bne wave_begin
      
   ;///////////modifying last line, the leader
   lda 1, X
   cmp 1   ;first value, already modified, however, the gap doesn't matter
   bcc   decr_wave_done   ;if its larger than the scanline after, decrease it
   beq resolve_conflict ;for 4 and 24 (two cases, where the value at scanline 1 is two updates ahead, so it basically ;"passes the summit" and is on the same "altitude" as the leading scanline
   incr_wave_done:
      adc #4
      bra done_wave_done
   decr_wave_done:   
      sbc #4
   done_wave_done:
   sta 1, X
   ;...k done
   pld
   plp
   rts
   ;;;;;;
   resolve_conflict:
      cmp #4
      beq decr_wave_done
      bra incr_wave_done

I've went over it a few times, and everthing checks out for me logically. I'm using proper direct paging, indexed; I'm not omitting pounds anywhere (93143 warned about), the code seems correct. So, since I call both this subrouting as well as my HDMA sub every frame, why isn't the background updating? I'd assume the offset for every scanline would change, so....so would the background.
Re: Direct page issues...
by on (#152116)
Hmm...

- The clc at the start of the loop is wasted. cmp determines the carry flag, so its state beforehand is irrelevant.
- Looks like you're using bcs when you should be using bpl, and that's probably your problem. Branch if PLus (bpl) and Branch if MInus (bmi) are for straight-up greater than/less than comparisons. bcs and bcc are used much less often for telling if you've over- or under-flowed a value.
- You do this:
Code:
      ;increment offset
      clc
      txa
      adc #3
      tax

...when you could do this:
Code:
      ;increment offset
      inx
      inx
      inx

And I'd suggest in general, as a good practise: Put clc immediately before adc (and sec immediately before sbc) every time. In my programming so far, I've never used a clc or sec for anything other than preparing for adc/sbc; they're basically one combo-instruction to me. In this case txa doesn't affect the carry, but that way you can never mess it up if you someday slip a cmp in between or something (it happened to me once).
Re: Direct page issues...
by on (#152118)
Khaz wrote:
In my programming so far, I've never used a clc or sec for anything other than preparing for adc/sbc

In decompression code, I often find myself doing sec before rol a, rol ciBits, ror a, or ror ciBits. This is related to use of the remaining bits in a byte as a ring counter. There's an example of this in the PB53 decoder in the Action 53 menu and RHDE.

I also use carry as an exception indicator to signal to the caller whether the arguments are out of bounds, such as trying to read off the sides of the currently loaded portion of the collision map or trying to add an entry to an array that's already full. You see a few examples of this in Thwaite. And sometimes I use C to signal more directly whether a collision occurred.

Quote:
In this case txa doesn't affect the carry, but that way you can never mess it up if you someday slip a cmp in between or something (it happened to me once).

Often if I rely on a previous instruction's side effect on carry, I'll put in a commented-out clc or sec and say which instruction above is expected to have clear or set the carry.
Re: Direct page issues...
by on (#152164)
Khaz wrote:
Hmm...

- The clc at the start of the loop is wasted. cmp determines the carry flag, so its state beforehand is irrelevant.
- Looks like you're using bcs when you should be using bpl, and that's probably your problem. Branch if PLus (bpl) and Branch if MInus (bmi) are for straight-up greater than/less than comparisons. bcs and bcc are used much less often for telling if you've over- or under-flowed a value.
- You do this:
Code:
      ;increment offset
      clc
      txa
      adc #3
      tax

...when you could do this:
Code:
      ;increment offset
      inx
      inx
      inx

And I'd suggest in general, as a good practise: Put clc immediately before adc (and sec immediately before sbc) every time. In my programming so far, I've never used a clc or sec for anything other than preparing for adc/sbc; they're basically one combo-instruction to me. In this case txa doesn't affect the carry, but that way you can never mess it up if you someday slip a cmp in between or something (it happened to me once).


Changing bcc/bcs to bpl/bmi, now my instructions actually do SOMETHING, lol, thanks! Man it's confusing, cause I know at the end of the day all these branches just look at the flags. If you look at page 131 of the official 65816 manual, Table 9.2 lists the respective pairing of the branch instruction equivalent, just using standard math notation (<, >=...), and there it directs to use bcs, as a synonym of bge (greater or equal). Hence...my confusion
But yeah you're right on the wasted clc, idk why the hell I put it there lol

UPDATE: nvm it doesn't work still -_- it just flat out doesn't do anything...at first it was doing something glitchy because yesterday, I made a newer version that uses indirect HDMA. I've made slight modifications like you've suggested
Code:
   ldy #69   ;42 bytes, so 39 bytes, or 13 lines, get to compare. the last line will have to be modified outside the loop
      ;it will observe the value behind it. notice the table itself is x5, so we repeat 13*5 (+4 for all "last lines" that aren't the actual last line)
   wave_begin:
      lda 1, X   ;indexed dp addressing
      cmp 4, X
      bpl decr_wave   ;if its larger than the scanline after, decrease it
   incr_wave:
      .REPT 4
         ina   ;to alleviate confusion with the flags, just increment the variables
      .ENDR
      bra done_wave
   decr_wave:
      ;sec
      .REPT 4
         dea
      .ENDR
   done_wave:
      sta 1, X
      ;increment offset
      inx
      inx
      inx
      ;decrement counter
      dey
      bne wave_begin
      
   ;///////////modifying last line, the leader
   lda 1, X
   cmp 1   ;first value, already modified, however, the gap doesn't matter
   beq resolve_conflict ;for 4 and 24
   bpl   decr_wave_done   ;if its larger than the scanline 1, decrease it
   incr_wave_done:
      .REPT 4
         ina
      .ENDR
      bra done_wave_done
   decr_wave_done:   
      .REPT 4
         dea
      .ENDR
   done_wave_done:
   sta 1, X
   ;...k done
   pld
   plp
   rts
   ;;;;;;
   resolve_conflict:
      cmp #4
      beq decr_wave_done
      bra incr_wave_done

but alas, it still does the same as the previous version - nothing
Re: Direct page issues...
by on (#152201)
Forgive me if this is a dumb question, but can someone explain to me exactly how "REPT" works? I've never used it. Does it just copy the instruction 4 times, or does it set up an actual loop? Which, if it uses X or Y as an index for that loop, that'll break your code, yeah.

I've never used anything like that, I stick to absolutely nothing but literal instructions and macros/subroutines I myself wrote. >.<
Re: Direct page issues...
by on (#152203)
In ca65, a .repeat/.endrepeat block actually copies its body. It's often used to unroll loops or to generate mathematical lookup tables. Some examples follow:
Code:
; this code
  lda #$00
  .repeat 4
    sta $2007
  .endrepeat

; is expanded into this code
  lda #$00
  sta $2007
  sta $2007
  sta $2007
  sta $2007

; and this code
  .repeat 4, I
    lda $0100+I
    sta $2007
  .endrepeat

; is expanded into this code
  lda $0100+0
  sta $2007
  lda $0100+1
  sta $2007
  lda $0100+2
  sta $2007
  lda $0100+3
  sta $2007

; this code
xsquared:
  .repeat 16, I
    .byte I * I
  .endrepeat

; is expanded into code equivalent to this
xsquared:
  .byte 0
  .byte 1
  .byte 4
  .byte 9
  .byte 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225


According to README, WLA-DX operates similarly to ca65, with three differences in syntax: .repeat can also be written .rept, .endr must be used instead of .endrepeat, and .repeat 6, I becomes .repeat 6 index I.
Re: Direct page issues...
by on (#152205)
Hahaha. Oh wow, that last example could have saved me a LOT of time in places. I have literally turned to writing Excel VBA Macros before that produce huge repetitive chunks of code that follow a predictable pattern.

Only other thing I can think of now is that the problem might not be in that chunk of code to begin with. Is your HDMA working? Or are the table values in RAM not being written correctly? If the table values are updating (watch RAM in a hex editor while it's running?), might be helpful to see your HDMA code.

Though, if you're using WLA like me.... As much as it's painful to get used to at first, you might do well to start explicitly defining the size of your operand every chance you get. You say you're using direct page, so if lda 1, X for example is being interpreted as lda $0001, X, then it won't work right. I don't know for sure if WLA will ever do that, but what WLA WILL do for sure is if WHATEVER is defined as "$0001", it will still interpret lda WHATEVER as lda $01, which is wrong.

So what I do is add a .b or .w (byte or word operand, respectively) after every single lda, sta, and, ora, cmp, etc... Any instruction that can have different sizes of operands following it. (like, lda.b 1, x will ensure it is read as lda $01, x) Yes, it's a real pain in the ass and makes your code look ugly... But once you get used to it it comes instinctively, and it forbids WLA from screwing you like that.

I'd suggest generating a "listing file" so you can see how the code assembled and make sure all the instructions came out correctly, but I hear WLA's listing files are broken. So uh... Yeah. That's all I got at the moment.
Re: Direct page issues...
by on (#152206)
Khaz wrote:
I have literally turned to writing Excel VBA Macros before that produce huge repetitive chunks of code that follow a predictable pattern.

And I resorted to Python scripts when creating a table of the powers of the twelfth root of two for my NES music engine because I couldn't trust ca65 to handle exponents.

Khaz wrote:
So what I do is add a .b or .w (byte or word operand, respectively) after every single lda, sta, and, ora, cmp, etc... Any instruction that can have different sizes of operands following it.

Likewise, in ca65, you can use z:someaddr for direct page, a:someaddr for absolute, or f:someaddr for absolute long. Unlike WLA-DX's address size syntax, ca65's address size syntax can't be confused with the 68000's use of .b, .w, and .l for data sizes. But with 65816's movable D, lda z:<someaddr will cause recent ca65 to raise a "Suspicious address expression" warning, which I've asked about elsewhere.
Re: Direct page issues...
by on (#152269)
Khaz wrote:
Hahaha. Oh wow, that last example could have saved me a LOT of time in places. I have literally turned to writing Excel VBA Macros before that produce huge repetitive chunks of code that follow a predictable pattern.

Only other thing I can think of now is that the problem might not be in that chunk of code to begin with. Is your HDMA working? Or are the table values in RAM not being written correctly? If the table values are updating (watch RAM in a hex editor while it's running?), might be helpful to see your HDMA code.

Though, if you're using WLA like me.... As much as it's painful to get used to at first, you might do well to start explicitly defining the size of your operand every chance you get. You say you're using direct page, so if lda 1, X for example is being interpreted as lda $0001, X, then it won't work right. I don't know for sure if WLA will ever do that, but what WLA WILL do for sure is if WHATEVER is defined as "$0001", it will still interpret lda WHATEVER as lda $01, which is wrong.

So what I do is add a .b or .w (byte or word operand, respectively) after every single lda, sta, and, ora, cmp, etc... Any instruction that can have different sizes of operands following it. (like, lda.b 1, x will ensure it is read as lda $01, x) Yes, it's a real pain in the ass and makes your code look ugly... But once you get used to it it comes instinctively, and it forbids WLA from screwing you like that.

I'd suggest generating a "listing file" so you can see how the code assembled and make sure all the instructions came out correctly, but I hear WLA's listing files are broken. So uh... Yeah. That's all I got at the moment.


Yeah, I actually did look into forcing sizes (because I did at one point consider that a possible issue) and a guy actually made a sample program and monitored the values WLA loaded/stored with/without the size specifiers. Usually it seems to work, outside of l/r shifting (sometimes it doesn't read the value proper, so that's that), as well as a few niche scenarios (I think I've read you're post from a while ago where you've queried switching from WLA to ca65, and you've mentioned absolute indexed adressing is read as dp) . But yeah, idk why my garbage isn't running, I'll probably give this code a rest for a few days and work on windowing. Usually it's crystal clear with refreshed eyes.

and as tepples exaplained above, REPT just designates code for the assembler to copypasta