NMI handling and is there a delay or not?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
NMI handling and is there a delay or not?
by on (#16474)
I hope that this isn't already answered somwhere else but the search function seems to not be working.

I'm making an NES emulator just for fun. It's using a catch-up design with timestamps and can catch-up other components in the middle of a CPU instruction. It can run many simple demos and games and passes many test roms. No sound yet. :( So only CPU, PPU and simple mappers are emulated.

Before starting to add more things I wanted to design a system to handle everything that can interrupt the CPU's normal execution like NMI, IRQ, DMA, DMC etc. But I ran into trouble almost directly with just trying to get NMI working correctly. I use Blargg's vbl_nmi_timing test roms for testing and I pass tests #1-#4, but not the last three. In test #5 nmi_supression I either get error #3 or #4.

I made a diagram for these tests. I hope it makes some sense and that it's correct:

Code:
          |      read 0x2002     | write 0x2000
          | returns | sets |     | disable NMI
ppu clock | vbl as  | vbl  | NMI | NMI

-4          0         1      yes   no
-3          0         1      yes   no
-2          0         1      yes   no
-1          0         0      no    no           VBlank suppressed
0 (VBlank)  1         0      no    no           <-Why?
1           1         0      no    no           <-Why?
2           1         0      yes   yes
3           1         0      yes   yes
4           1         0      yes   yes




Now some questions:

1. Is vblank flag set before or after the first cycle in vblank? Or where exactly is cycle 0 from the diagram?

2. NMI is edge-triggered right? Then why is NMI suppressed on cycle 0 and 1? The condition for NMI (Vblank and NMI) is true before both.
If there is no delay between condition and NMI it should be generated on these cycles shouldn't it? It seems to me that this edge must last for 3 PPU-clocks to be generated or is it one CPU-clock to accommodate for PAL where one CPU-clock is 3.2 PPU-clocks not 3? Maybe it need to last this long to filter out electrical interference.

3. Am I right if I say that PPU generates NMI with no delay but the CPU
need at least one extra cycel to catch it? :oops:

4. Could someone please describe how NMIs are generated and how the CPU handles them. Very detailed and low-level would be nice. :)


Maybe I'm just stupid or NMI suppression on cycle 0 and 1 are just special cases.

Finally I'm wondering if someone have come up with a nice, simple and fast way of dealing with things that can interrupt the CPU in a catch-up emulator?

Any help is much appreciated.

by on (#16486)
Nice diagram! I compared it to the readme with the VBL/NMI tests and it looks to be correct. All my tests validate only CPU behavior, and I don't have the equipment to easily test what's actually going on at the hardware level. It's be really nice to know when the /NMI line to the CPU is actually asserted, since the tests leave several possibilities open. At the very least, it'd give a concrete explanation for the behavior and probably help predict other behavior that hasn't been tested yet.

1. Since reading the VBL flag near the time it's set affects its behavior, the idea of it being set at a definite time seems to be an imaginary concept, sort of a NES quantum mechanics! Since the only way to find the value of the VBL flag is to read from $2002 (even if we hook external hardware to the NES internals), I don't know any way to get a better answer.

2. The NMI input on the 6502 is edge-triggered, so once it's been asserted for some minimum amount of time, the interrupt will occur within an instruction or two. A hardware test would help for the suppression cases to see whether the PPU is never asserting /NMI, or whether it's asserting it for a very short amount of time and isn't being latched by the 6502. I'm not sure whether /NMI is either sampled every CPU clock and latched if asserted, or connected directly to a flip-flop and trips it even with an assertion just a few tens of nanoseconds wide.

3. The 6502 uses a pipeline and the last clock of each instruction is always overlapped with the first clock of the next instruction (fetching its opcode). This means that for an interrupt to occur immediately after the current instruction, IRQ or NMI must be asserted sometime before the last clock of the current instruction, otherwise the overlapped first clock of the next instruction will have already begun. This introduces apparent one-instruction delays in interrupt handling (even when changing the flags with instructions like CLI, which I might have a test ROM for in a few days).

All this interrupt handling, delays, suppression, etc. is still unclear to me and I hope we some day come up with an absolutely clear description of and explanation for the behavior.

As for implementing this efficiently in a catch-up emulator, you basically need a CPU emulator that can efficiently stop at a given time, and can have this stop time changed during an emulation run. A simple way to handle this:

Code:
int end_time; // time to stop at; set to 0 to stop CPU immediately
int cpu_time; // current time

void emulate_cpu()
{
    while ( cpu_time < end_time )
    {
        int opcode = read_mem( pc );
        ...
    }
}


You'd set end_time to the earliest CPU-altering event that can occur (i.e. the next NMI, or next IRQ if it's sooner) and then emulate the CPU. During this, whenever the CPU does something that could alter the earliest CPU-altering event (i.e. enables NMI, adjusts when IRQ will occur), adjust end_time appropriately. There are some fine points I haven't worked out yet, like when an interrupt will occur just after the current instruction, but the instruction itself disables the interrupt source and causes end_time to be changed, even though the interrupt should still occur since it was latched at the last clock of the instruction.

by on (#16528)
blargg wrote:
All this interrupt handling, delays, suppression, etc. is still unclear to me and I hope we some day come up with an absolutely clear description of and explanation for the behavior.


I've studied this very extensively on the SNES hardware, and all descriptions I've seen indicate NMI and IRQ work the same way as on the NES.
The advantage I have with the SNES is being able to step by the smallest slice of time and repeatedly test. Along with my emulator, I can tell exactly what clock position resulted in what behavior. I've tested all interations between enabling and disabling interrupts before and after they occur, retriggering interrupts, reading the interrupt status bits, etc quit exhaustfully.

I believe I have very clear explanations of how this works. If you're willing to write the documentation, then catch me on AIM sometime and I will explain it to you in as much detail as you like. I think I can even explain the reasons behind nearly all of the behavior. In short, I believe it is due to bus hold delays, but we can go over that in more detail later.

by on (#16553)
Thanks for your reply Blargg.

blargg wrote:
All my tests validate only CPU behavior, and I don't have the equipment to easily test what's actually going on at the hardware level. It's be really nice to know when the /NMI line to the CPU is actually asserted, since the tests leave several possibilities open. At the very least, it'd give a concrete explanation for the behavior and probably help predict other behavior that hasn't been tested yet.


Does this mean that even if an emulator passes your tests it can
still have wrong timing (very small)? Correct CPU behavior and
timing should be enough for even the most timing sensitive games
shouldn't it?

blargg wrote:
This introduces apparent one-instruction delays in interrupt handling (even when changing the flags with instructions like CLI, which I might have a test ROM for in a few days).


Yeah! :D
More tests for my emulator to fail on. ;)
I love your tests Blargg they are invaluable to emulator makers.
Keep 'em coming!

Your example for handling interrupts is very similar to my current design.
But as you point out it is a bit tricky to determine when and if the interrupt should be executed.

by on (#16554)
That sounds very interesting byuu.

Do you think you could explain why NMI is suppressed on PPU-cycle
0 and 1 in my diagram? Is it because of the CPU's pipeline and bus hold delays?

What exactly do you mean by bus hold delays by the way? Is it the
two-phase thingy the CPU uses to do reads and writes? Where writes
are made at the end of a cycle and reads in the middle? :oops:

How do you emulate this and is it even necessary in an emulator
where a more high-level approach often is used?

by on (#16600)
Quote:
Does this mean that even if an emulator passes your tests it can still have wrong timing (very small)? Correct CPU behavior and timing should be enough for even the most timing sensitive games shouldn't it?


I mean that a test ROM can only test those things that actually matter (affect the result). The benefit to knowing what exactly is happening in hardware is that it offers a simple explanation of why things work the way they do, and can make further testing easier. Knowing exactly what's happening isn't necessary for perfect emulation, though, since the behavior is all that matters, and that can be determined by running code on the 6502 and noting the results.