darryl.revok wrote:
So is it typical to disable rendering during NMI?
I personally don't do it, but I was curious so I just debugged a handful of games I had laying around, and there doesn't seem to be a consensus. These are the games that did turn rendering off:
Code:
Super Mario Bros.
Duck Tales
Street Fighter 2010
Bucky O'Hare
And these are the ones that didn't:
Code:
Alfred Chicken
Baloon Fight
Felix the Cat
Galaxian
The 3-D Battles of World Runner
Gimmick!
I intentionally didn't test Battletoads or anything known to use forced blanking, because those obviously have to disable rendering.
Anyway, I guess that disabling rendering can also be seen like a safety measure... If your NMI handler blows the VRAM access budget by accident, there will be no persistent VRAM corruption, the screen will just jump for a frame. I guess this is the reason why most of the games that do disable rendering do it, it's just to minimize the damage in case something goes wrong.
Quote:
I was looking this up a little bit, and it seems like not disabling rendering is wasting some cycles.
Yeah, you can buy yourself a few more cycles of VRAM access during the pre-render scanline that way. I think I can get by without using those cycles, because there's some cleaning up I have to do in my NMI handler before setting the scroll anyway, and the pre-render scanline is a good time to do it.
Quote:
In my quest for more cycles, I removed all of the reads from $2002. This made my IRQs a little jittery, so I added in one single read from $2002 at the beginning of the NMI, instead of before each nametable write, and that fixed the issue.
Most of the time I don't read $2002 at all, and I never had any problems with that. I'm extra careful to always perform $2005/$2006 writes in pairs, though.
Quote:
1. Okay, I understand that $2002 is read so that you can say for sure that $2006 is expecting a high byte, but I'm wondering why it ever wouldn't be. Every time that I've had to write to $2006 thus far, I've used two bytes. Is there every a time that something will be done intentionally that causes a mismatch in address order, or is this just an error-proofing method, in case NMI hits at an inopportune time, or something?
It must be a bug in your code. This flag doesn't change unless you read $2002 or write to $2005/$2006. I just noticed you didn't mention $2005... was that just an omission or did you not know that these registers share the even/odd write flag?
Quote:
2. Is there any particular reason why $2006 seems to be the only part of the system that's expecting the high byte first?
Maybe the engineers who designed the PPU though programmers would like to write addresses in the order humans read them. I don't think there's any technical reason for this, the designers simply had to pick a byte to go first and they decided on the high byte.
Quote:
Okay, this one isn't related to the high/low latch, but something else I've been wondering. How long does setting the PPU to +32 bit increment stay in effect?
AFAIK, until you change that. I don't think there's anything automatic touching that setting.
Quote:
Let's say I'm writing a column of tiles. I write #%00000100 to $2000 for +32 bit mode. Now if I want to write attributes next, do I have to write #$00 to $2000, or will it default to +1 on the next write?
It will definitely not default back, you have to change it yourself. If you're writing attributes for columns though, increments of 32 bytes can still be useful: Since each row of attributes is 8 bytes long, 32 bytes is equivalent to 4 rows, so you can update a full column by setting the address for the 1st byte and writing the 1st and 5th bytes, then set the address for the 2nd byte and write the 2nd and 6th bytes, and so on. This way you can write the 8 bytes of a column while setting the address only 4 times, instead of 8.
Quote:
I'm looking for any options available to save cycles.
If you have any loops at all, you should really look into unrolling them. Even partially unrolling can have incredible results. For example, if you have a loop that counts each byte that's being copied, that's one decrement instruction + a branch for each byte (a total of 5 cycles), which is a lot of overhead for a single byte. If you're copying 20 bytes, that's 100 cycles you're losing, while only 160 cycles are actually being spent copying bytes (assuming 8 cycles per byte). If you partially unroll that loop and count pairs of bytes instead, copying 2 bytes per iteration, you'll be cutting back that overhead by half! The more you unroll, the less overhead you'll have.