Stumped on PPU VBlank Flag Behavior

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Stumped on PPU VBlank Flag Behavior
by on (#241256)
Hi, I have been trying to track down an issue in my emulator where my VBlank flag appears to be incorrect. The easiest way to explain this is to see the two traces below. The first trace is from my emulator whereas the second one is from Mesen.

From my emulator

Code:
0  E487 $10 $FB     BPL $E484 = $2C                       A:00 X:00 Y:00 P:06 SP:F9 CYC:333 SL:240 FC:2 CPU Cycle:27381
1  E484 $2C $02 $20 BIT $2002 = $00      >>>>>            A:00 X:00 Y:00 P:06 SP:F9 CYC:1   SL:241 FC:2 CPU Cycle:27384
2  E487 $10 $FB     BPL $E484 = $2C                       A:00 X:00 Y:00 P:06 SP:F9 CYC:13  SL:241 FC:2 CPU Cycle:27388
3  E484 $2C $02 $20 BIT $2002 = $80                       A:00 X:00 Y:00 P:06 SP:F9 CYC:22  SL:241 FC:2 CPU Cycle:27391
4  E487 $10 $FB     BPL $E484 = $2C                       A:00 X:00 Y:00 P:86 SP:F9 CYC:34  SL:241 FC:2 CPU Cycle:27395
5  E489 $60         RTS                                   A:00 X:00 Y:00 P:86 SP:F9 CYC:40  SL:241 FC:2 CPU Cycle:27397



From Mesen

Code:
0  E487 $10 $FB      BPL $E484 = $2C                    A:00 X:00 Y:00 P:06 SP:F9 CYC:333 SL:240 FC:2 CPU Cycle:27381
1  E484 $2C $02 $20  BIT $2002 = $00         >>>>       A:00 X:00 Y:00 P:06 SP:F9 CYC:1   SL:241 FC:2 CPU Cycle:27384
2  E487 $10 $FB      BPL $E484 = $2C                    A:00 X:00 Y:00 P:86 SP:F9 CYC:13  SL:241 FC:2 CPU Cycle:27388
3  E489 $60          RTS                                A:00 X:00 Y:00 P:86 SP:F9 CYC:19  SL:241 FC:2 CPU Cycle:27390
4  E0C7 $20 $81 $E4  JSR $E481 = $2C                    A:00 X:00 Y:00 P:86 SP:FB CYC:37  SL:241 FC:2 CPU Cycle:27396
5  E481 $2C $02 $20  BIT $2002 = $00                    A:00 X:00 Y:00 P:86 SP:F9 CYC:55  SL:241 FC:2 CPU Cycle:27402


From my understanding, the PPU is supposed to set the VBlank flag on scanline = 241 and cycle = 1. The CPU will then execute BIT $2002 instruction to read the state of the VBlank flag. The BIT $2002 instruction will read the status register ($2002) in the PPU and set the negative (N) flag in the CPU status register (P) equal to the VBlank flag. With this understanding, it appears that my emulator is behaving correctly. However, when comparing the trace for my emulator with Mesen's emulator, my emulator seems to be setting the VBlank flag too late. If you look at the traces above, my P flag gets set to $86 on line #4, whereas in the Mesen trace, it gets set on line #2.

Also, what's puzzling me, is that in the Mesen's trace, even though the BIT $2002 gets executed (line #1 in Mesen trace) which reads $00, the CPU status register gets updated magically to $86. Is there some documentation that describes this behavior?

One idea I had to get the similar behavior as Mesen was in my CPU logic, to always set the N flag in my P register when the PPU has scanline = 241 and cycle = 1. This seems like a hack though and I am not sure whether I would end up with issues in the future.
Re: Stumped on PPU VBlank Flag Behavior
by on (#241257)
On what pixel does your emulator actually read the value from the register?
Re: Stumped on PPU VBlank Flag Behavior
by on (#241258)
No mention of what ROM you're testing (filename + MD5 checksum needed), can't go look at what's happening timing-wise without that.

Otherwise all I can ask is are you sure you have BIT implemented correctly (pretty easy opcode)? Are you sure you implemented reading from $2002 correctly? This might be what you're missing:

Quote:
Also, what's puzzling me, is that in the Mesen's trace, even though the BIT $2002 gets executed (line #1 in Mesen trace) which reads $00, the CPU status register gets updated magically to $86. Is there some documentation that describes this behavior?

It does not "magically" get updated. The value shown at the effective address part of BIT $2002 = $00 (i.e. the = $00 part) is not reliable/accurate for MMIO registers. In other words, the trace logger/debugger makes you think the read returns $00 because of what's shown, but internally to the PPU that is not the value that's actively returned -- what gets returned is $86. A lot of emulators are this way -- I think you're just learning it now. For PPU internal status, the emulator must accurately depict it somewhere. In Mesen, it's in the PPU Status section of the debugger.

Don't believe me? Try this: load up Super Mario Bros. (W) [!].nes / 811B027EAF99C2DEF7B933C5208636DE in Mesen, open Debugger, then add this as a breakpoint: Address: Any, Condition: VerticalBlank (type the word in, no spaces). Now reload/reset the ROM with Ctrl-T and hit F5 to run it. It will break on lda $2002 at $800A, very close to the RESET vector. Now stop and look closely at the PPU Status section of the debugger: Vertical Blank is checked, which means bit 7 will technically be set in what's read from $2002 (but remember that upon read, the bit gets reset internal to the PPU). Now hit F11 ONCE to execute/step through the lda to the next instruction (bpl $800A/$AD). Note that the instruction clearly reads lda $2002 = $10, when in fact it returned $90 (see accumulator). The reason it reads $90 instead of $80 is explained on the wiki -- it has to do with the $10 that was written to $2000 prior.

In short: don't trust tools so blindly, but you're on the right path!
Re: Stumped on PPU VBlank Flag Behavior
by on (#241259)
Thanks @lidnariq and @koitsu for the replies.

lidnariq wrote:
On what pixel does your emulator actually read the value from the register?


I am sorry but I am a bit confused on this question. The pixel? The rom that I am testing with is the "branch timing tests" (http://www.slack.net/~ant/nes-tests/bra ... _tests.zip). This issue though happens with other test roms. It seems what the rom does is that it constantly polling the vblank flag (via BIT and BPL instructions). When I run the test rom against my emulator, it seems that the branch (via BPL) is occurring two instructions later than when it happens on Mesen.

koitsu wrote:
No mention of what ROM you're testing (filename + MD5 checksum needed), can't go look at what's happening timing-wise without that.


I am testing with the "branch timing tests" ((http://www.slack.net/~ant/nes-tests/bra ... _tests.zip), but this seems to also happen with other roms as well :(

Quote:
Otherwise all I can ask is are you sure you have BIT implemented correctly (pretty easy opcode)? Are you sure you implemented reading from $2002 correctly? This might be what you're missing:


I am pretty sure my BIT is implemented correctly (or at least I think so). I was able to verify the correctness of this implementation using the "nes test" rom. My emulator passes all of "nes test" (http://nickmass.com/images/nestest.nes).

Quote:
It does not "magically" get updated. The value shown at the effective address part of BIT $2002 = $00 (i.e. the = $00 part) is not reliable/accurate for MMIO registers. In other words, the trace logger/debugger makes you think the read returns $00 because of what's shown, but internally to the PPU that is not the value that's actively returned -- what gets returned is $86. A lot of emulators are this way -- I think you're just learning it now. For PPU internal status, the emulator must accurately depict it somewhere. In Mesen, it's in the PPU Status section of the debugger.


Very interesting. Hmmm. So then it is returning $86 but I just can't see it :) If this is the case, then my issue can just be the synchronization of my PPU and CPU. I wonder if maybe my emulator is doing this.

1) CPU executes BIT $2002 which reads $00 since vblank is not set yet.
2) PPU hits scanline = 241 and cycle = 1 to set vblank flag.

whereas it should go the other way around.

1) PPU hits scanline = 241 and cycle = 1 to set vblank flag.
2) CPU executes BIT $2002 which will show it read $00 but will have actually read $86.

Since in the Mesen trace, it was returning $00, this was throwing me off.

Quote:
Don't believe me? Try this: load up Super Mario Bros. (W) [!].nes / 811B027EAF99C2DEF7B933C5208636DE in Mesen, open Debugger, then add this as a breakpoint: Address: Any, Condition: VerticalBlank (type the word in, no spaces). Now reload/reset the ROM with Ctrl-T and hit F5 to run it. It will break on lda $2002 at $800A, very close to the RESET vector. Now stop and look closely at the PPU Status section of the debugger: Vertical Blank is checked, which means bit 7 will technically be set in what's read from $2002 (but remember that upon read, the bit gets reset internal to the PPU). Now hit F11 ONCE to execute/step through the lda to the next instruction (bpl $800A/$AD). Note that the instruction clearly reads lda $2002 = $10, when in fact it returned $90 (see accumulator). The reason it reads $90 instead of $80 is explained on the wiki -- it has to do with the $10 that was written to $2000 prior.


Thanks for the explanation. I'll re-read the wiki again and see if I can understand this behavior better.
Re: Stumped on PPU VBlank Flag Behavior
by on (#241260)
I was looking at Mesen's source code, and I can see now about making a read from $2002 return $00.

https://github.com/SourMesen/Mesen/blob ... U.cpp#L263

It appears to use "3" as the magical number of cycles to use for clearing the vertical blank. Is this number documented somewhere?
Re: Stumped on PPU VBlank Flag Behavior
by on (#241261)
thejunkjon wrote:
I am sorry but I am a bit confused on this question. The pixel?
The 6502 doesn't load the entire instruction simultaneously. Every byte takes time to load, and on the US NES, every byte takes three pixels.

With the BIT abs instruction, the four cycles are:
* load the opcode
* load the low byte of the address
* load the high byte of the address
* load the byte from the location specified by the address

So if you're naively executing all four bus actions at once, the entire instruction will finish before your implementation of the PPU can set the vblank flag - after all, your implementation of BIT worked correctly on the subsequent pass through the polling loop.
Re: Stumped on PPU VBlank Flag Behavior
by on (#241262)
lidnariq wrote:
thejunkjon wrote:
I am sorry but I am a bit confused on this question. The pixel?
The 6502 doesn't load the entire instruction simultaneously. Every byte takes time to load, and on the US NES, every byte takes three pixels.

With the BIT abs instruction, the four cycles are:
* load the opcode
* load the low byte of the address
* load the high byte of the address
* load the byte from the location specified by the address

So if you're naively executing all four bus actions at once, the entire instruction will finish before your implementation of the PPU can set the vblank flag - after all, your implementation of BIT worked correctly on the subsequent pass through the polling loop.


Ah, this makes sense. Yeah, I bet this is the problem :( I just execute all of them at once. Thanks again for the help with this issue.
Re: Stumped on PPU VBlank Flag Behavior
by on (#241268)
Just when you thought your 6502 emulation core was done, this comes along... ;-)

What lidnariq is describing are also sometimes referred to as "T-states" -- each individual state/phase of the instruction on a per-cycle basis. The phases are documented, see lines 891 onward for a general understanding. There are other references online which have documented these as well, but this should cover most cases.

You may find Mesen's feature in Debugger -> Options -> Break Options -> Enable sub-instruction breakpoints helpful (docs). This only applies to breakpoints, but it allows you to use Step/F11 to step through each T-state. If you are concerned about timing/etc. then this is an excellent thing to enable once you've found the instruction(s) you wish to analyse. I also suggest Debugger -> Options -> Show instruction progression (docs) which tells you what is actively happening (R/W/X etc. -- see docs please) on a per-instruction or per-T-state basis.

You may also find this post of mine helpful, with regards to tests. This post also references better/newer nestest output that relates strongly to what lidnariq asked/mentioned. That said: please do not become heavily reliant upon test ROMs, particularly for PPU. This comes up time and time again on our Discord, where someone develops an emulator that doesn't pass certain tests (commonly PPU tests). There are some of these tests which are not quite right. No, there is not a list of which ones have tests that might not pass, but not all emulators pass them yet can play games accurately just fine. This includes, IIRC, some of blargg's PPU tests. If you run through several and can't get some extremely obscure thing working, but find other emulators like Mesen or (sometimes) Nestopia don't pass them either, either ask or don't worry too heavily about it. Kevin Horton's nestest for CPU behaviour is a must, however.

Finally: this Wiki page will probably save you a lot of pain. On the NTSC NES there is a 3-to-1 ratio of PPU "dots" to CPU cycles (i.e. for every 1 CPU cycles, 3 PPU dots have passed), while on PAL the ratio is different. The PPU rendering page will come in handy for understanding what happens exactly "when" on a per-PPU-dot basis (the page is dense with information so it can't be easily skimmed), as well as PPU frame timing.