ppu loading rom or using ram

ppu loading rom or using ram
by mattmatteh on 2005-07-15 (#2825)

i have been reading the tech doc's and they say that the nes has 16 kB of internal ram. and it says that the cart maight have vrom and it is loaded into the ppu memory at reset.

if the nes has 16 kB ppu ram, does that include the mirroring ?

if a cart has chr-rom, how is that loaded into the ppu memory ? is it copied, mapped to the cart and the internal ram is not used ?

i am working on an nes emulator and stumped on how to code ppu memory and load the game to it.

matt

by Disch on 2005-07-15 (#2827)

The PPU has 16k of addressing space. This does NOT translate to 16k of RAM. AFAIK the only memory that actually exists on the PPU are the palettes, sprite RAM, and the two nametables... which totals under 3k (nowhere near 16k).

CHR-ROM and CHR-RAM both exist on the cartridge, so when the PPU needs tiles it takes them from the cart -- regardless of whether the game uses CHR-ROM or RAM.

As for implimenting PPU space in an emulator, you'll probably have to end up checking the PPU address on writes/reads to $2007 and access certain memory depending on the address. here's some pseudo-code which might help clarify:

Code:
case 0x2007:

Uint16 adr = ppu_addr & 0x3FFF; /* PPU space mirrored every $4000 bytes */

if( adr < 0x2000 )
{
/* game is trying to write to CHR -- write to the game's CHR space (if CHR-RAM -- can't write to ROM obviously) */
}
else if(adr < 0x3F00)
{
/* Nametable space */
adr &= 0x0FFF; /* $3xxx page mirrors the $2xxx page. Only low 12 bits important */

pNameTables[adr >> 10][adr & 0x03FF] = value_to_write; /* write the value to the proper nametable **see below!!!** */
}
else
{
/* The rest is palette space */
adr &= 0x1F; /* only low 5 bits important -- it's mirrored after that*/
if(!(adr & 0x03)) adr &= 0x0F; /* 3F10 -> 3F00, 3F14 -> 3F04, etc */

Palette[adr] = value_to_write & 0x3F;
}

my 'pNameTables' var above would be declared like this:

Uint8* pNameTables[4];

This is not the actual Nametable space, but pointers which point to nametables. This way when the game switches mirroring modes, all you have to do to emulate that is change which nametables pNameTables points to. In your emulator, you should make another array which actually contains the nametable data. At least 2k in size (Although you might want to just go with 4k so that 4-screen mirroring can be emulated easily). However -- you emulator shouldn't access that array directly on $2007 reads/writes or when rendering -- instead it should always go through pNameTables pointers so that the current mirroring mode is applied.

by mattmatteh on 2005-07-18 (#2865)

ok thanks, i got that.

i looked at the sta save format and it has in it 16K or 4000h of the ppu ram. i assume its the current page of rom that is mapped?

how does the sta format save the page mapping from the mappers ?

thanks
matt

by Zepper on 2005-07-18 (#2869)

Well, the "STA" format reminds me NESticle save state. Yes, it's a 4000h bank as described below:

0000-1FFF Patterns (CHR-RAM / graphics)
2000-3EFF Nametables (screen tilegrids)
3F00-3F1F Palette RAM
3F20-3FFF Palette RAM mirroring (always & 1Fh)

That's it, buddy. Now, don't ask me about the STA format. ^_^;;

by mattmatteh on 2005-07-18 (#2870)

are there any other save formats that are common ?

by Disch on 2005-07-18 (#2871)

NESticle is hardly common anymore. Or at least it shouldn't be (it's very poor and outdated).

FCEUltra would probably be a good emu to consider adding savestate support for.

by Quietust on 2005-07-18 (#2872)

Once upon a time, there was a savestate format named "SNSS" designed to be compatible with multiple emulators. Unfortunately, it contained some truely braindead design flaws - using BIG ENDIAN to store multibyte values (considering that most systems, including the NES itself, use LITTLE ENDIAN), storing nametable mirroring as part of the PPU state (which is technically part of the MAPPER state), storing PRG/CHR bank numbers (rather than allowing the mapper state to encode them), and then exactly 128 bytes of mapper-specific data (for some mappers, this is far more than needed, while for others it is insufficient).
It was only used in a few emulators (the only ones I can name are NESten 0.6x and Nintendulator 0.900) before it was abandoned in favor of custom savestate formats (NESten never saw a new release, but Nintendulator eventually evolved to use its own format).

by Disch on 2005-07-18 (#2873)

The nester family of emus alos use SNSS, iirc (Nester, NesterJ, NNNesterJ, possibly NesterDC and other nester ports). I considered using it myself, but I was rather dissatisfied with its lack of attention to some areas (mainly timing and APU stuff)... along with the reasons you listed.

Actually, perhaps we should collectively come up with a more modern version of SNSS that isn't weird. It's very likely we'll be able to get Xodnizel on board in supporting it in Nintencer, perhaps someone could even add it to FCEU, and maybe even Q could have Nintendulator come on board.

It would really have to cover every detail, though. Right down to CPU/PPU timestamps, detailed DMC info (so stolen cycles will be accounted for properly), NTSC/PAL mode toggle, APU frame sequencer stuff.... all of it. Savestates should be a geniune snapshot of the current state of emulation -- not the bare minimum necessary to pick up where you left off. I believe that was the biggest flaw with SNSS, and why most people opted for their own, more exact savestate formats.

SNSS.
by baisoku on 2005-07-18 (#2875)

I'm certainly open to revise and/or scrap SNSS. I attempted to drum up interest at several points in time, but never had any luck. It never was fully complete, and i can't say that i ever liked some of the design decisions in the actual code.

I can offer to put together a more 'sane' proposal, if someone who has familiarity with some of the more exotic/exquisite boards would offer to spend some time working on it, or at least give it a good thorough design and code review.

Of historical value: http://snss.baisoku.org

Oh yeah, it would also be essential to take blargg's "sands of time" stuff into consideration when working on this.

by Disch on 2005-07-18 (#2876)

Some of the things I wanted/needed for my emu that SNSS didn't cover:

- bit to signal NTSC/PAL emulation. If the state was saved when emulating PAL, it doesn't make much sense to load it into NTSC. And vice versa

- bit to signal the odd frame when emulating NTSC. Every odd frame on NTSC is one PPU cycle shorter.

- Joypad strobe state. Was the last write to $4016 0? or 1?

- $2005/2006 write toggle state

- PPU temporary address (Loopy_T) (how was this left out of SNSS?)

- CPU/APU/PPU timestamps. Rounding cycles off to the last frame is dropping data, which could potentially (although granted, unlikely) desync when doing movies or something. No data should need to be lost when saving a state -- all timestamps should be saved.

- APU Frame sequencer stuff. SNSS didn't cover $4017 writes at all. Things like:

-- Frame IRQs enabled?
-- Frame IRQ currently pending?
-- 5 or 4 step sequence?
-- which step are we on?
-- how many cycles until the next step?

- More detailed DMC operations:

-- How many cycles until the next bit in the DMC output unit is shifted out? (this affects stolen cycles and IRQ timings)
-- How many bits are left to play in the DMC output unit? (for same reasons as above)
-- is the DMC sample buffer empty or full?
-- If full, what is it filled with?
-- how many bytes are left in the currently playing DMC sample?

- $2002 status (although i suppose this will always be 0 if you do the savestate right before rendering -- however see my notes at the bottom)

- Fine horizontal scroll value

- Background color (color to render when PPU is switched off)

- contents of the $2007 read buffer

- As has already been mentioned, 128 bytes for mapper info is no good. Why not make it variable?

- Perhaps some more detailed APU stuff. recording the last value written to $4015 doesn't matter much at all. What should be done instead of that is tracking each channel's length counter and the tri's linear counter. Decay and sweep stuff might also be recorded, however the worst that can happen if they're not is some minor audio distortion under very specific circumstances -- nothing that could affect the flow of emulation at all.

As I briefly touched on with my $2002 note, is that it needs to be more clear when in the frame the state was saved. Personally, I've adopted BT's scanline counting method where you have VBlank at the start of the frame rather than at the end. If that method is used, recording $2002 status is necessary since it's not reset until the end of vblank. However either place would work.

As for timestamps, they should be relative to the number of cycles that have spilled into this frame. It's rather unlikely that the last executed instruction ended exactly at the start of the scanline. So the CPU/PPU/APU timestamps would be relative to the number of cycles each area has run past the designated start of the frame (whether it be at the start of VBlank or at the start of rendering).

by Quietust on 2005-07-18 (#2880)

No "timestamps" are necessary for the CPU/PPU/APU state. The APU just needs the various counters on each sound channel (and the frame timer) and the PPU needs the scanline and cycle numbers. The only sort of timestamp that would be useful in a savestate would be the number of frames elapsed since reset (for movie recording purposes); if you're emulating the system correctly, then your CPU, PPU, and APU should be completely synchronized with each other before you start writing the savestate data.

However, to keep things relatively simple, I would recommend only saving the state during the 'dead' scanline between the end of the frame and NMI:

-1 - 'Garbage' scanline, required to prefill the background render pipeline
0-239 - Visible screen, 240 scanlines
240 - PPU inactive, perfect place to save state
241-[260/310] - VBLANK (NMI generated at the very beginning of scanline 241)

There are several reasons to do this:
1. Most games will read the controllers during VBLANK (and it's best to save state BEFORE they do that, rather than after)
2. If you always save during Scanline 240 (preferably near the beginning), you'll never have to worry about pending NMIs (and IRQs won't be as big of a problem)
3. If you save during rendering, you'll have to save the [partial] frame buffer as well as all of the PPU's internal rendering-related buffers - if you always save during VBLANK (or during the dead scanline before VBLANK), the states of these are totally irrelevant (except maybe for the image on the screen, whether you restore the full image or simply store a thumbnail for preview purposes when selecting states)

by Disch on 2005-07-18 (#2881)

It seems that you'd need at least a one byte value for the CPU timestamp, since most of the time the scanline starts when the CPU is halfway through the instruction (and most emus run instruction at a time, rather than CPU cycle at a time). Althought... I'm not sure of a 'friendly' way to handle it. You couldn't really go by CPU cycles, since the start of the scanline might even be mid-cycle (since there are 3 [ntsc] or 3.2 [pal] ppu cycles per CPU cycle). The way I'm doing it currently is multiplying NTSC CPU cycles by 15 and PAL CPU cycles by 16 (that timestamp / 5 would be the PPU cycle) -- but that probably isn't very compatible with other emus.

an APU timestamp wouldn't be necessary, you're right. Or at least, it could have the same timestamp as the CPU (since there's no reason why you wouldn't be able to sync up the APU to the exact CPU cycle).

A PPU timestamp of sorts wouldn't be needed at all if you save on an inactive scanline, since as you pointed out it doesn't matter exactly where the PPU is on that line, because it's inactive until way far into scanline -1 anyway.

So yeah, I agree totally. The savestate shouldn't be at the start of scanline 0, or anywhere during. Start of VBlank is also not ideal because of the issue of pending NMIs. The dead scanline after rendering does seem like a good place for it.

by Quietust on 2005-07-18 (#2882)

I don't save states at the exact beginning of scanline 240 - I wait until the current CPU instruction is finished, then I save, storing both the scanline number (always 240, though it may change later) AND the PPU cycle number within that scanline (0-340, usually below 20 or so).

by Disch on 2005-07-18 (#2883)

That's the kind of CPU timestamp I'm talking about -- how far into the scanline the state is saved. Doing it in PPU cycles would work for NTSC, but for PAL it seems sort of problematic, since the end of a CPU cycle might not land exactly on a PPU cycle (3.2 ppu cycles to 1 CPU cycle, unless i'm mistaken). I know realistically, being off by a fraction of a PPU cycle won't cause any emulation problems, but I have this nagging desire of not wanting to lose anything when save/loading a state.

by Quietust on 2005-07-18 (#2884)

Disch wrote:
That's the kind of CPU timestamp I'm talking about -- how far into the scanline the state is saved. Doing it in PPU cycles would work for NTSC, but for PAL it seems sort of problematic, since the end of a CPU cycle might not land exactly on a PPU cycle (3.2 ppu cycles to 1 CPU cycle, unless i'm mistaken). I know realistically, being off by a fraction of a PPU cycle won't cause any emulation problems, but I have this nagging desire of not wanting to lose anything when save/loading a state.

I just realized that case as well - when that's the case, then just store it in the PPU state as sub-cycles.

I also realized that my emu doesn't properly cover that case, so I decided to store it in the upper 4 bits of the 16-bit 'PPU cycles' (0-340) value (with the added bonus that it retains 100% backwards compatibility).

by Disch on 2005-07-18 (#2885)

PPU cycles * 5 is what my emu does, that way both NTSC and PAL CPU cycles will always land on an even cycle. But like I said I don't know if that's "friendly" enough.

Other things worth noting:

- little endian. The human readability factor is kind of a lame reason to go with big endian.. especially considering that there's no padding so the file is already going to be kind of a mess to view in a hex editor. Plus binary files don't need to be human readable. Little endian is the logical way to go for NES savestates.

- Saving 4 nametables seems like kind of a waste since 99.99% of the games out there only have 2. Games that have more can have the extra nametables in the mapper section -- or should at least make the number of nametables in the state variable.

- As Q initially mentioned, specifying the swapped in PRG/CHR banks might not be the way to go. It makes more sense and would be safer to have swap stuff in the mapper section (the contents of mapper registers should suffice -- and they'll probably have to be saved anyway for other reasons like wram mapping, mirroring, and other stuff).

- I don't see the point for a byte specifying whether or not SRAM is writable -- that can be determined from the mapper registers. Plus it doesn't help in determining if SRAM is readable -- or even if it's swapped in. And it creates conflicts with mappers like FME-07 which put PRG at $6000-7FFF

- Mapper section needs a serious overhaul. Every mapper (save NROM/mapper 0) should save something. The contents of all mapper registers should be saved, along with other things needed (IRQ counters, reload values, the MMC1 shift register and relating info, etc)

- Nothing that can affect emulation should have to be assumed when you load a savestate (other than when in the frame the state has occured). The savestate should dictate everything needed to playback exactly as it would have had emulation resumed from the time the state was saved.

I don't know what precautions could be made to work with the Sands-of-time effect. I don't see what kind of special info you'd need in a savestate for that.

I'm really digging this idea. I'd like to start up preliminary format specs for this tomorrow (unless of course Baisoku will do it) -- although I'm skeptical as to how many emus will actually support it. Not very many of the biggies seem to be in active development -- our best bets (or our only bets) seem to be Nintendulator and Nintencer (I think Xod would be willing to support it, especially considering Nintencer is still pretty young). Maybe FCEU as well since all these new FCEUXD builds are coming out (and I just chatted with bbitmaster, he says he might be willing to come on board with the next FCEUXD release). Though, still... it may not be worth the work to develop this format if only three major emus support it. What do you guys think?

by tepples on 2005-07-18 (#2887)

Disch wrote:
- Saving 4 nametables seems like kind of a waste since 99.99% of the games out there only have 2. Games that have more can have the extra nametables in the mapper section -- or should at least make the number of nametables in the state variable.

If nametable VRAM contents go in the mapper section, then so do pattern table VRAM contents and CPU$6000 SRAM contents.

Quote:
- Nothing that can affect emulation should have to be assumed when you load a savestate (other than when in the frame the state has occured).

And even that can be simplified by specifying that all save states take effect at the moment of vblank.

Quote:
I don't know what precautions could be made to work with the Sands-of-time effect. I don't see what kind of special info you'd need in a savestate for that.

You'd need a format similar to that of a movie, storing the last few seconds of gameplay. The emulator would expand it to whatever kind of cached state it needs on a Load State command. But it would still be nice to have keyframes, especially if someone wants to make an emulator that acts as a VFW or DirectShow decoder.

Quote:
Maybe FCEU as well since all these new FCEUXD builds are coming out (and I just chatted with bbitmaster, he says he might be willing to come on board with the next FCEUXD release). Though, still... it may not be worth the work to develop this format if only three major emus support it. What do you guys think?

We'd need to standardize the movie format as well, right? If save states and movies are portable, then it would become possible to compare the pixel-for-pixel and sample-for-sample output of two emulators (modulo phase differences in the audio, which we can chalk up to different TVs).

Eventually, one goal I can see is to have the emulator itself as a module (e.g. a .dll or .so file) that takes a UNIF file, an initial state, and streams from the input devices, and produces video and audio streams as output. This refactoring, analogous to what has happened in PS1 emulators, would let one use, say, the FCEUXD frontend with the Nintendulator backend.

by Disch on 2005-07-18 (#2889)

tepples wrote:
If nametable VRAM contents go in the mapper section, then so do pattern table VRAM contents and CPU$6000 SRAM contents.

Well perhaps they should all be in their own respective blocks.

PPU block (has the 2 native nametables)
Cartridge RAM block ($6000 area)
Cartridge Nametable block (for games with 4-screen mirroring -- and possibly for MMC5's ExRAM)
CHR RAM block

This way, the rare occurence of 3+ nametables is accounted for, but doesn't unnecesarily bloat states which don't need it. Plus the far more common cartridge RAM can still be a variable size without having to keep extra nametables restricted to a fixed size. Plus CHR RAM can also have a variable size if needed (i'm sure there's some game out there somewhere that has more than 8k CHR-RAM)

Quote:
And even that can be simplified by specifying that all save states take effect at the moment of vblank.

I think Quietust's idea is best -- the start of the dead scanline right after rendering, but before VBlank. That way you don't have to worry about pending NMIs, and you also avoid the issues of being so close to rendering time. But yeah, that's one thing that I think should be assumed on state load (since it's just too much work for no gain to have the savestate allowed to be saved anywhere in the frame)

EDIT -- although I just thought of a problem. If the game writes to $4014 just before the start of that dead scanline, that would be a major problem since the CPU would be stalled through the whole scanline and into VBlank. Granted this is a rare occurance, but it might happen in games like Castelian which shut the screen off early to squeeze in more drawing time.

How about putting it a bit into VBlank? Far enough in to stay away from the triggered NMI, but near enough to the start so that there's no problem on $4014 stalls. Maybe something like 1 scanline into VBlank. Although I don't much fancy the idea of splitting VBlank... it might be the best option. Or maybe it would be simpler to just do it at the start of VBlank and just deal with NMIs. What do you guys think?

Quote:
You'd need a format similar to that of a movie, storing the last few seconds of gameplay.

So the state is actually somewhat of a short movie file? I'm not entirely sure I'd agree with this idea, since it'd considerably slow down state loading -- not to mention it would force all emus to have a movie player... making it much harder to support the format.

Besides... would this even be necessary for the rewind feature? I mean its absence would only mean you can't rewind immediately after loading a state.

Perhaps we could document it as an optional block for emus which support the feature. It by no means should be a required block, imo. But then of course that leads to sideeffect of savestates restoring to different times in different emus.

Quote:
We'd need to standardize the movie format as well, right?

That's a very good idea, which I actually hadn't thought of. I'm not really as interested in the movie format -- at least not yet. Though it is definatly something we should keep in mind when making the savestate format, since they're undoubtedly connected.

One thing about a movie format I will say -- I think it's better to drive movies by joypad strobes rather than by frames (since it would be theoretically possible to strobe a joypad twice in one frame). The games have to strobe the joypad to fetch keypresses anyway -- so that seems like the logical time to fetch them from the movie file as well (as well as record them)

by Marty on 2005-07-19 (#2891)

I rewrote my own movie code recently and took a different approach. I decided to scrap the controllers and instead just feed on the $4016 and $4017 ports only. The gain is precise accuracy and that any device can be hooked up and switched in real-time without affecting and/or complicating the movie at all. DIP switch configuration for VS. System games gets preserved as well.

With this scheme the only free bit available is $4016.7 which I let act as a CTRL bit with the next byte(s) telling how many frames ahead the current frame input block data should repeat itself - similiar to RLE. $4016 and $4017 are maintained seperately and while $4017 can't afford any bits it gets its own CTRL byte.

The downside to all this is bigger files since often times unnused data will be passed in but it can be kept at a minimum by compressing the different streams in which I let Zlib do for me. The compression ratio is very good since the input data is well suited for the dictionary-based LZx algorithm Zlib uses.

For reference, here are the file sizes for five minutes recording with:

SMB1 - 1725 bytes, Oeka Kids (drawing) - 26329 bytes

by tepples on 2005-07-19 (#2894)

Disch wrote:
Plus CHR RAM can also have a variable size if needed (i'm sure there's some game out there somewhere that has more than 8k CHR-RAM)

Videomation and any homebrew games that use the same board (CPROM).

Quote:
If the game writes to $4014 just before the start of that dead scanline, that would be a major problem since the CPU would be stalled through the whole scanline and into VBlank. Granted this is a rare occurance, but it might happen in games like Castelian which shut the screen off early to squeeze in more drawing time.

And my tetramino game, which shuts off the screen early for the same reason.

Quote:
How about putting it a bit into VBlank? Far enough in to stay away from the triggered NMI, but near enough to the start so that there's no problem on $4014 stalls.

A $4014 copy can happen at any time during vblank or at any other time when the screen is turned off. In fact, DMC DMA can happen at any time, and whatever time you choose might happen to be a cycle in the middle of a DMA.

Quote:
Maybe something like 1 scanline into VBlank. Although I don't much fancy the idea of splitting VBlank... it might be the best option. Or maybe it would be simpler to just do it at the start of VBlank and just deal with NMIs. What do you guys think?

That might be the best option.

Quote:
So the state is actually somewhat of a short movie file? I'm not entirely sure I'd agree with this idea, since it'd considerably slow down state loading -- not to mention it would force all emus to have a movie player... making it much harder to support the format.

It'd be optional. Emulators that don't support the input log block would just always start the emulation at the beginning.

Quote:
Besides... would this even be necessary for the rewind feature? I mean its absence would only mean you can't rewind immediately after loading a state.

Unless you provide a "keyframe" method to store states that happen along the way, an emulator would start from the beginning. The "keyframe" method would also be useful in an emulator that acts as an AVI filter.

Quote:
But then of course that leads to sideeffect of savestates restoring to different times in different emus.

Unless it is RECOMMENDED that the emu have options for starting at the beginning of any keyframe or at the end of movie.

Quote:
The games have to strobe the joypad to fetch keypresses anyway -- so that seems like the logical time to fetch them from the movie file as well (as well as record them)

How would the Zapper fit in to this system? It doesn't use strobing because it has one button and one photosensor, on different data bits.

by blargg on 2005-07-19 (#2901)

Just some things that come to mind.

- Use a tagged, chunked format (like IFF) to allow extension without requiring all emulators to be updated. Having a chunk for every single value might be an overkill, so a standard chunk could be a variable-length set of 4-character tags and 32-bit integer pairs, which would suffice for most registers.

- If possible, use a completely tagged format so that fields can be accessed in code via a 4-character tag rather than using structures. I could help design and implement this.

- Store the last value written to registers. In most cases this is sufficient to restore hardware state, and easiest to implement in any emulator. If the last written value isn't sufficient, also store any extra internal hardware state.

For example, applied to the APU frame counter ($4017) the state file needs to store the last value written, internal frame interrupt flag, how long until the next frame, and the frame number.

- When storing hardware state, store only what is necessary in a format closest to how it is in hardware. This won't favor any particular emulator and will be the most stable format.

- Avoid storing redundant data, unless really inconvenient. Redundancy allows inconsistency.

- The sands-of-time feature only needs a save state in the past along with external input (joypad) from then to the present (the same requirement as a normal movie). Periodic save states along the way will reduce initial setup delay, but aren't absolutely necessary.

Basic support (being able to rewind n seconds and no further) can be handled by a mini-movie from n seconds ago to present. Full support (being able to rewind back to the beginning) requires a full movie from power-up, and would benefit greatly from periodic save states along the way. These periodic "key frames" would speed up normal movie seeking too.

- To test an implementation, modify the emulator to continuously save, reset emulator, then restore the state every frame.

- Having a common emulator format would help validate emulators. Each could be run on the same code for the same amount of time and the save states could be compared.

by Disch on 2005-07-19 (#2902)

Well given my lack of experience with non-standard NES pads (like the zapper) and just movies in general, I'll keep my nose out of the movie area for the most part. Just remember that the goals we're going for are not only to be efficient and to cover as many possibilities as possible, but also to be easy to impliment in an emulator. After all if it's a royal pain in the ass to use the proposed standard format, people are just going to make their own.

As for the designated state time -- I'm not especially thrilled about having the emu split their VBlank for state loading/saving, since that's likely to be run in a chunk in pretty much every emu. Although scanline 240 might be considered part of that chunk, so putting the state at the start of VBlank might end up splitting emulation at an awkward time anyway.

Either way it looks like there's 4 options:

- Start of scanline 240 (have to account for pending NMIs)
- Start of VBlank (have to account for pending NMIs)
- 1 Scanline into VBlank (don't have to worry about NMIs, but a little awkward)
- Start of VBlank (if NMIs disabled) or immediately after NMI (if NMIs enabled)

During or just before rendering is out of the question -- since this raises a whole slew of other problems.

The 3rd and 4th options will avoid the complications of state loading right before NMI --- however they may end up being more complicated than just working around pending NMIs (4th especially).

Maybe we should just stick with scanline 240... and if it spills into VBlank so be it. We'll just have to account for that. That will be a rare occurrance anyway, and shouldn't be too difficult to work around.

For reference/clarification -- when VBlank is started there's latency on when an NMI actually occurs, right? Is it a certain number of cycles? Or do you just run 1 CPU instruction before tripping an NMI? My emu just ran a single instruction and that seemed to work.

@ Blargg:

Yes, I agree about the tagged chunk/block format. SNSS was set up that way and I just assumed we were going to do something similar (at least that's what I had in mind).