IRQ and NMI consume 7 CPU cycles

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
IRQ and NMI consume 7 CPU cycles
by on (#142586)
My emulator contains a table of instruction cycles, but it failed to account for the 7 cycle overhead needed to handle IRQ/NMI requests. We might want to add a note to the wiki to alert future emdevs.

http://www.6502.org/tutorials/interrupts.html#1.3

Sadly, accounting for these extra cycles did not fix my Micro Machines timing issue.

This may also be relevant.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142590)
Of course. ^_^;; My emu is also cycle-precision accurate. Nintendulator too... and both with the same problem. 8-) :roll:
Re: IRQ and NMI consume 7 CPU cycles
by on (#142598)
Zepper wrote:
Of course. ^_^;; My emu is also cycle-precision accurate. Nintendulator too... and both with the same problem. 8-) :roll:


There is still something wrong with the timing of my emulator. It sets NMI_occurred at dot 1 of scanline 241, but it intentionally delays the NMI until dot 150 of that scanline (~50 CPU cycles after it probably should be executed). Without that delay, the text boxes in Marble Madness do not get rendered correctly. I also found other minor glitches appear in games without that exact delay. I assume I'm not accounting for 50 CPU cycles somewhere and a few games start counting cycles from the start of NMI.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142663)
I compared my CPU emulation against the nestest log. All the values match perfectly. I was really hoping the timing would be off. Now, I'm a bit stuck.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142682)
How was the nestest log generated? FCEUX's trace log does not appear to contain cycle timings. If I had a way to make a log like that from an emulator that properly renders the text boxes in Marble Madness, I could compare it against a log from my emulator.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142686)
I can provide a log (using my emulator) if you want. ^_^;;
Re: IRQ and NMI consume 7 CPU cycles
by on (#142687)
Zepper wrote:
I can provide a log (using my emulator) if you want. ^_^;;


That's greatly appreciated.

I'm trying to think of the best way to capture this. I am interested in the rendering of the text boxes in Marble Madness that appear at the beginning of the stages. I'm not exactly sure how we can sync this up properly. Maybe start recording as soon as the text box appears and stop shortly afterwards. I might be able to sort it out from there.

Further analysis shows that my emulator is failing to account for about 40 CPU cycles. Specifically, it should take 40 CPU longer from time that NMI occurs to the time that it renders those text boxes on the next frame. I don't know if it means that the CPU is suspended for 40 additional cycles or if the timing of some of my instructions are off. But, as mentioned, my timing matches the nestest log file. So, it might be some overhead outside of instruction timing such as the 7 CPU cycles that I discovered were necessary for NMI/IRQ handling.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142704)
The widely circulated nestest log is from Nintendulator.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142732)
zeroone wrote:
Zepper wrote:
I can provide a log (using my emulator) if you want. ^_^;;


That's greatly appreciated.

I'm trying to think of the best way to capture this. I am interested in the rendering of the text boxes in Marble Madness that appear at the beginning of the stages. I'm not exactly sure how we can sync this up properly. Maybe start recording as soon as the text box appears and stop shortly afterwards. I might be able to sort it out from there.

Further analysis shows that my emulator is failing to account for about 40 CPU cycles. Specifically, it should take 40 CPU longer from time that NMI occurs to the time that it renders those text boxes on the next frame. I don't know if it means that the CPU is suspended for 40 additional cycles or if the timing of some of my instructions are off. But, as mentioned, my timing matches the nestest log file. So, it might be some overhead outside of instruction timing such as the 7 CPU cycles that I discovered were necessary for NMI/IRQ handling.


Hmm... In my emulator, the text box is glitched in the left side.
Mind you to try Rad Racer 2 and take a screenshot of the road?
Re: IRQ and NMI consume 7 CPU cycles
by on (#142734)
Zepper wrote:
Hmm... In my emulator, the text box is glitched in the left side.
Mind you to try Rad Racer 2 and take a screenshot of the road?


Image

Yikes!
Re: IRQ and NMI consume 7 CPU cycles
by on (#142742)
That screwed up image above turned out to be a MMC3 mirroring issue.

Image

According to threads on this forum, Rad Racer II should have 4-screen mirroring. I hardcoded that as a test and it produces a clean road with and without my 40 CPU cycle NMI delay hack.

The MMC3 docs do not mention how to configure 4-screen mirroring. I'll have to research that further.

Edit: The iNes file flags 6 specifies that it is a four-screen VRAM game. I modified my MMC3 mapper to not allow the game to control the nametable mirroring when the file specifies four-screen.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142744)
One more thing. Here's an image with my 40 CPU cycle NMI delay hack:

Image

And, what happens without it:

Image
Re: IRQ and NMI consume 7 CPU cycles
by on (#142745)
zeroone wrote:
The MMC3 docs do not mention how to configure 4-screen mirroring.

Thank you for reporting this omission. It has been corrected.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142746)
tepples wrote:
Thank you for reporting this omission. It has been corrected.


Wow! That was fast. I don't know if there is a better way to do this. But, if the ROM is marked as MMC3, that flag is probably the only information available to make the determination.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142820)
FCEUX 2.2.2 provides better trace logging that includes instruction CPU cycles. I started recording when the text boxes appear at the start of the first stage of Marble Madness and I compared that log against a log from my emulator.

When I delay the NMI handler by 40 CPU cycles, not only does it fix the rendering, but every instruction matches up between the logs cycle-per-cycles. Each frame ends with a spin lock waiting for the next NMI:

Code:
$FC5A:4C 5A FC  JMP $FC5A


Consequentially, delaying the NMI handler by 40 CPU cycles ultimately results in that spin lock spinning 13 fewer times.

If I remove the NMI handler hack and let NMI take place on dot 1 of scanline 241, the text box rendering gets screwed up, but still every instruction matches up between the logs cycle-by-cycle until it reaches the sprite 0 hit test at the bottom of the frame. Marble Madness appears to use a sprite 0 hit test to hide the last few scanlines, presumably to conceal graphical artifacts that would result from vertical scrolling.

In this case, the loop that is waiting for the sprite 0 hit test absorbs the difference. In fact, if I add a hack to delay setting of the sprite 0 hit flag by 40 CPU cycles, then the logs once again fully match up. This suggests that the CPU instruction timings are correct, including things like OAM DMA stalls.

How could the rendering be out of sync with the processor by 40 CPU cycles if the timings are valid?
Re: IRQ and NMI consume 7 CPU cycles
by on (#142968)
I modified the way that my emulator detects NMI. If (NMI_occurred AND NMI_output) changes from false to true, I set an NMI request flag in the CPU that is checked between instructions. The NMI handler sets the flag to false.

With that in place, there should not be any lost NMI events resulting from a race condition between reading the PPU Status Register ($2002), which has the side effect of setting NMI_occurred to false, and detecting the start of VBlank.

With that, I still had to introduce 2 minor hacks. First, instead of setting NMI_occurred to true on dot 1 of scanline 241, the emulator sets it to true on dot 17 of that scanline. With the hack in place, the NTSC and PAL versions of Marble Madness and Battletoads appear to function correctly.

The second level of Battletoads appears to be sensitive to NMI timing, sprite 0 hit detection and sprite overflow detection. If NMI timing or sprite 0 hit detection is slightly off, the stage can freeze. If the overflow detection is slightly off, then enemy hit detection can fail completely, making it impossible to advance (the player nor the enemies can get injured).

The second hack was to set the sprite overflow flag at dot 256 (at the beginning of HBlank). But, it gets computed within the first 64 dots of the scanline.

I would obviously like to remove these hacks at some point. My emulator only passes a subset of Blargg's timing tests. But, FCEUX 2.2.2 seems to fail the same tests that my emulator does. So, I do not know which tests that the emulator really needs to pass to improve things. Suggestions are welcome.

Also, level 3 of Battletoads contains a bug. When multiple rats are on the screen at the same time and the player punches one of them, the center (far) brain-like, background glitches. You can see the effect in this video of the game running on an actual NES. We should probably mention that on the Game Bugs wiki page, especially since I spent a while trying to get rid of the effect until I reproduced it on FCEUX and Nestopia.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142975)
You should measure the NMI time from another emulator... and compare it with yours. Putting a NMI time forward is not the way of fixing it. The NMI is nothing more than a PRG subroutine, probably for PPU in this game. Check what the NMI does - writes to $2006 are probably envolved there.
Re: IRQ and NMI consume 7 CPU cycles
by on (#142976)
Zepper wrote:
You should measure the NMI time from another emulator... and compare it with yours. Putting a NMI time forward is not the way of fixing it. The NMI is nothing more than a PRG subroutine, probably for PPU in this game. Check what the NMI does - writes to $2006 are probably envolved there.


I tried exactly that, but I could not work out the differences.

Were you able to solve the Marble Madness text box issues in your emulator? If not, could you post a screen shot of the glitch?
Re: IRQ and NMI consume 7 CPU cycles
by on (#143023)
The first image is a demo that uses midscanline writes(?). The other is Marble Madness.
Both have the same problem - glitched text boxes. The glitch isn't static, but "moving" left->right.
Re: IRQ and NMI consume 7 CPU cycles
by on (#143024)
Zepper wrote:
The first image is a demo that uses midscanline writes(?). The other is Marble Madness.
Both have the same problem - glitched text boxes. The glitch isn't static, but "moving" left->right.


Here's mine without the hack:

Image

And, just as you describe, those glitched tiles flicker. But, as mentioned, if I trigger NMI at dot 17 instead of dot 1, they vanish. You might want to try the same experiment.

I don't know if it is a coincidence, but dot 17 and dot 1 are exactly 2 tiles apart. And, this hack prevents Battletoads level 2 from freezing. It makes me wonder if the sprite 0 hit flag is actually set shortly after the 2 tiles are retrieved from memory, rather than when the pixels are rendered. Meaning, a sprite 0 hit could potentially be detected up to 16 pixels ahead of time.

Does Battletoads level 2 freeze in your emulator? I've been using the Game Genie code to skip to level 2 for testing purposes. Without the hack, it can still take a while to freeze. See if you can pass the level a few times without freezing.
Re: IRQ and NMI consume 7 CPU cycles
by on (#143028)
Try the midscanline demo. I don't know exactly how you're "delaying" the NMI, but my emulator freezes if I try to change the NMI time. About Battletoads, well... the short answer is yes, it hangs during the level 2 due to a missing sprite zero hit.
Re: IRQ and NMI consume 7 CPU cycles
by on (#143029)
Zepper wrote:
Try the midscanline demo. I don't know exactly how you're "delaying" the NMI, but my emulator freezes if I try to change the NMI time. About Battletoads, well... the short answer is yes, it hangs during the level 2 due to a missing sprite zero hit.


Image

Occasionally, I see the digit 1 glitch (it looks like a different tile appears instead of 1). In Nestopia, the digit 1 is perfectly stable. In FCEUX, the digit 1 and several other tiles are glitchy.

I am able to turn on and off the hack, and the hack does not appear to affect the glitched digit 1.

In my emulator, the PPU drives the CPU. For each scanline, the PPU executes the 341 dots. Without the hack, it executes NMI at dot 1 of scanline 241. With the hack turned on, it happens at dot 17 instead.

That's not something you can add?

Edit: Below is the results from midscanline_b.nes (I didn't realize you posted this one.)

Image

The image is completely stable. No glitched tiles with the hack in place. But, if I turn the hack off:

Image

Up to 2 columns of flickering glitched tiles appear to the left of the text boxes, just like in your pic. It seems likely you can fix your emulator by applying the same hack.
Re: IRQ and NMI consume 7 CPU cycles
by on (#143752)
Hi zeroone,

I wanted to weigh in on this conversation because in my own emulator, I'm experiencing very similar results to yours. I've got the same one/two columns sprite flickering to the left of the Marble Madness text box; levels 1 and 2 not loading in Battletoads, and midscanline_b showing the same type of problem as you described and displayed in your screenshot.

Interestingly, similar to you, by modifying my PPU code to trigger the NMI at line 241, dot ~25, many of the issues mostly disappear. I say mostly because in Battletoads, I seem to lock up randomly some times, and there's some pixel jitter near the top of the screen.

I tried a different fix, which in my emulator seems to be much more stable and not glitchy. I continue to signal the NMI at 241.1, but in my CPU, when I handle the NMI, instead of counting it as 7 cycles, I now count it as 14. This hack seems to do the trick for me. One difference, though, between my emulator and yours, based on what you wrote, is that mine is driven by the CPU, not the PPU. So in my case, I attempt to execute a single CPU instruction, then advance the PPU by the number of CPU cycles elapsed. I'm not sure if this has any bearing on my fix, and whether it would work for you.

As for why the fix works, I'm not exactly sure. I believe my emulator is as accurate as it can be, given that I'm using a CPU instruction as my "atom". The one shortcut I've taken in terms of timing in my PPU is that for tile data, I fetch all 4 bytes (NT, AT, LowBG, HighBG) on every 8th cycle (the last cycle of the HighBG fetch). That's pretty much it. I don't believe this should have any effect on NMI timing, though.

If you're so inclined, you can take a look at my code here: https://github.com/amaiorano/nes-emu

I'd be curious to know if increasing your NMI cycle counting from 7 to 14 (or more?) also fixes the problem for you.
Re: IRQ and NMI consume 7 CPU cycles
by on (#143757)
I'd would like to request another NMI timing test ROM(s). Something graphically and with numbers, to see how much the timing is off.
Re: IRQ and NMI consume 7 CPU cycles
by on (#143773)
daroou wrote:
Hi zeroone,

I wanted to weigh in on this conversation because in my own emulator, I'm experiencing very similar results to yours. I've got the same one/two columns sprite flickering to the left of the Marble Madness text box; levels 1 and 2 not loading in Battletoads, and midscanline_b showing the same type of problem as you described and displayed in your screenshot.

Interestingly, similar to you, by modifying my PPU code to trigger the NMI at line 241, dot ~25, many of the issues mostly disappear. I say mostly because in Battletoads, I seem to lock up randomly some times, and there's some pixel jitter near the top of the screen.

I tried a different fix, which in my emulator seems to be much more stable and not glitchy. I continue to signal the NMI at 241.1, but in my CPU, when I handle the NMI, instead of counting it as 7 cycles, I now count it as 14. This hack seems to do the trick for me. One difference, though, between my emulator and yours, based on what you wrote, is that mine is driven by the CPU, not the PPU. So in my case, I attempt to execute a single CPU instruction, then advance the PPU by the number of CPU cycles elapsed. I'm not sure if this has any bearing on my fix, and whether it would work for you.

As for why the fix works, I'm not exactly sure. I believe my emulator is as accurate as it can be, given that I'm using a CPU instruction as my "atom". The one shortcut I've taken in terms of timing in my PPU is that for tile data, I fetch all 4 bytes (NT, AT, LowBG, HighBG) on every 8th cycle (the last cycle of the HighBG fetch). That's pretty much it. I don't believe this should have any effect on NMI timing, though.

If you're so inclined, you can take a look at my code here: https://github.com/amaiorano/nes-emu

I'd be curious to know if increasing your NMI cycle counting from 7 to 14 (or more?) also fixes the problem for you.


Counting the NMI as 14 CPU cycles instead of 7 is equivalent to my hack. I am triggering the NMI at dot 21 instead of dot 1, a delay of 7 CPU cycles. I considered modifying my PPU to fetch all 4 bytes on the 8th cycle as you described, making all the tile data reads atomic. But, currently, the tile data reads are spread out as described in the wiki (a read every 2 PPU cycles).

If instructions are executed atomically, the CPU and PPU will be out of sync by up to +/- 8 CPU cycles. But, I'm not convinced that a cycle-by-cycle CPU implementation would make much of a difference because memory writes occur on the final instruction cycle. Meaning, they are virtually atomic already.

For Battletoads level 2, my hack seems to prevent it from freezing. There is a Game Genie code that gives you infinite lives and another to skip to level 2. I've used them to play through the level over and over again without issues.

Level 2 freezes due to a missed sprite 0 hit. The check happens at the lower-right corner of the status bar. It is used to split the screen between the status bar and the vertical scrolling beneath it. I think the miss is a consequence of the PPU background rendering not being enabled in time.
Re: IRQ and NMI consume 7 CPU cycles
by on (#143788)
Quote:
Counting the NMI as 14 CPU cycles instead of 7 is equivalent to my hack. I am triggering the NMI at dot 21 instead of dot 1, a delay of 7 CPU cycles.

Yeah, I'm not sure if they are exactly the same. At 241.1, my PPU signals an NMI, then when my CPU is next updated, it handles it, and reports 14 cycles; this allows the PPU to execute 14*3 = 42 PPU cycles, whereas it would only have executed 7*3 = 21 PPU cycles previously. Maybe it's effectively the same seeing as in your emulator, the PPU drives everything.
Quote:
If instructions are executed atomically, the CPU and PPU will be out of sync by up to +/- 8 CPU cycles. But, I'm not convinced that a cycle-by-cycle CPU implementation would make much of a difference because memory writes occur on the final instruction cycle. Meaning, they are virtually atomic already.

At some point, I'll take a look at Nintendulator, which is a cycle-by-cycle CPU implementation, and see if there's anything interesting in there.
Quote:
For Battletoads level 2, my hack seems to prevent it from freezing. There is a Game Genie code that gives you infinite lives and another to skip to level 2. I've used them to play through the level over and over again without issues.

Perhaps it's not valid, but I tested my fixes by saving state just before level 2 loads. I was able to easily repro the problem, tweak values progressively, and eventually see it disappear, always by loading my save state file.

Thanks for responding!
Re: IRQ and NMI consume 7 CPU cycles
by on (#215674)
necrobump :shock:

Anything new? My emu STILL has the flickering problem. Any "final word" about this issue?
Re: IRQ and NMI consume 7 CPU cycles
by on (#215675)
Wow, how strange that you should bump this now as I had literally just chatted about this issue two days ago on TMR's Twitch stream while he was speed-running Battletoads!

I haven't learned or seen anything more on this topic. I stopped working on my nes emulator a while ago, and had to re-read this thread a few times just to refresh my memory :) Would love to know if there's been any progress on why this seems to happen to so many of us that write emulators following the docs on the wiki.

(P.S. Shameless plug: I've since been working on a Vectrex emulator, and have been streaming it on Twitch here in case that's interesting to anyone :))
Re: IRQ and NMI consume 7 CPU cycles
by on (#216973)
Speaking of Marble Madness, the glitched-text-box issue doesn't appear in my emulator, but there is another problem: a flickering on the top of the screen when the game starts.

I don't know the cause of the problem, but as I remember correctly, FCEUX doesn't produce any of the mentioned glitches.
Re: IRQ and NMI consume 7 CPU cycles
by on (#217077)
Silly question. But once the nestest runs, wouldn't it want to disable NMI in order to ensure the instructions are handled properly in order? I would assume the only time nmi is enabled is when it's actually at the start screen waiting for test selection. In fact, starting the emulator directly from the test address for all official opcode tests didn't show anything being written to PPU to enable NMI.
Re: IRQ and NMI consume 7 CPU cycles
by on (#217080)
Two possibilities:

A. It disables NMI before calling the test routine and reenables NMI afterward.
B. The NMI handler doesn't write to any RAM locations that the test routine uses. In a lot of my programs, for example, NMI just increments one location in zero page.