I can find no documentation on this anywhere, so I'll ask here.
So, the CPU and PPU run in parallel. Of course that is impossible to do with our emulation, which runs each processor a little at a time.
So say that the CPU and PPU are at exactly the same position in time ... how do they interleave?
A CPU cycle will consume 12 clock cycles, whether it is a read or a write. So let's say the CPU reads from $2002. Does it read the PPU state, and then have the PPU tick three times? What about a write to $2000? Does it write the PPU state, and then tick the PPU three times? (Note: ticking the PPU would be when NMIs were tested.)
Under this model, I can see no possible way to pass ppu_vbl_nmi tests 03 and 07 at the same time. I'm not saying there is not a way! I just can't find it.
For 07, I get the same 00-05=N, 06-08=- that others here have gotten. Unfortunately both people who solved it never bothered to answer how they did it for the benefit of others.
Also under this model, the cache point for F=1,Y=261 BG&&SPR disable PPU cycle skip is 337, which is completely nonsensical as the rest of the PPU works in two cycle pairs, meaning it should be 338.
03 reads from $2002, and 07 writes to $2000. 03 can only pass if we clear the NMI line at 260,340. 07 can only pass if we clear it at 261,0.
The only way I can see to stagger this is if CPU writes affect the PPU four clocks later than CPU reads. This would match SNES bus hold behavior, which is documented in the W65C816S reference manual, yet I can find no documentation on this anywhere for the NES.
The idea is that reads are requests from other chips, so they see them and start acting on them faster. But writes need to stay on the bus for a given amount of time after the other chip sees it to be acknowledged / copied.
So in this instance: CPU reads would add 8 cycles, then run the PPU for two ticks (8 cycles), then perform the read, then run the PPU for one more tick. CPU writes would add 12 cycles, then run the PPU for three ticks, then perform the write.
With this, it is now possible to pass 03 and 07 at the same time. It also allows us to place the extra cycle skip test at X=338. And yet, now 05 has "4444433333" instead of "4443333322", so it fails. And absolutely nothing I try can change that pattern.
I can try and debug problems, and I can special case behavior when the PPU and CPU are at the same exact time (eg a 'conflict'), but I need to know the proper order of operations first.
-----
If it helps any, this is my current setup, which uses the former interleave pattern and fails only 07 (and requires 10 to be at 337):
Timing: (PPU executes one cycle (4 CPU cycles), and then performs the following:
$2002 read:
$2000 write:
CPU triggers an actual NMI whenever the line transitions from 0->1.
So, the CPU and PPU run in parallel. Of course that is impossible to do with our emulation, which runs each processor a little at a time.
So say that the CPU and PPU are at exactly the same position in time ... how do they interleave?
A CPU cycle will consume 12 clock cycles, whether it is a read or a write. So let's say the CPU reads from $2002. Does it read the PPU state, and then have the PPU tick three times? What about a write to $2000? Does it write the PPU state, and then tick the PPU three times? (Note: ticking the PPU would be when NMIs were tested.)
Under this model, I can see no possible way to pass ppu_vbl_nmi tests 03 and 07 at the same time. I'm not saying there is not a way! I just can't find it.
For 07, I get the same 00-05=N, 06-08=- that others here have gotten. Unfortunately both people who solved it never bothered to answer how they did it for the benefit of others.
Also under this model, the cache point for F=1,Y=261 BG&&SPR disable PPU cycle skip is 337, which is completely nonsensical as the rest of the PPU works in two cycle pairs, meaning it should be 338.
03 reads from $2002, and 07 writes to $2000. 03 can only pass if we clear the NMI line at 260,340. 07 can only pass if we clear it at 261,0.
The only way I can see to stagger this is if CPU writes affect the PPU four clocks later than CPU reads. This would match SNES bus hold behavior, which is documented in the W65C816S reference manual, yet I can find no documentation on this anywhere for the NES.
The idea is that reads are requests from other chips, so they see them and start acting on them faster. But writes need to stay on the bus for a given amount of time after the other chip sees it to be acknowledged / copied.
So in this instance: CPU reads would add 8 cycles, then run the PPU for two ticks (8 cycles), then perform the read, then run the PPU for one more tick. CPU writes would add 12 cycles, then run the PPU for three ticks, then perform the write.
With this, it is now possible to pass 03 and 07 at the same time. It also allows us to place the extra cycle skip test at X=338. And yet, now 05 has "4444433333" instead of "4443333322", so it fails. And absolutely nothing I try can change that pattern.
I can try and debug problems, and I can special case behavior when the PPU and CPU are at the same exact time (eg a 'conflict'), but I need to know the proper order of operations first.
-----
If it helps any, this is my current setup, which uses the former interleave pattern and fails only 07 (and requires 10 to be at 337):
Timing: (PPU executes one cycle (4 CPU cycles), and then performs the following:
Code:
if(status.ly == 240 && status.lx == 340) status.nmi_hold = 1;
if(status.ly == 241 && status.lx == 0) status.nmi_flag = status.nmi_hold;
if(status.ly == 241 && status.lx == 2) cpu.set_nmi_line(status.nmi_enable && status.nmi_flag);
if(status.ly == 261 && status.lx == 0) cpu.set_nmi_line(status.nmi_flag = 0); //260,340 will pass 03, but fail 07
status.lx++;
if(status.ly == 241 && status.lx == 0) status.nmi_flag = status.nmi_hold;
if(status.ly == 241 && status.lx == 2) cpu.set_nmi_line(status.nmi_enable && status.nmi_flag);
if(status.ly == 261 && status.lx == 0) cpu.set_nmi_line(status.nmi_flag = 0); //260,340 will pass 03, but fail 07
status.lx++;
$2002 read:
Code:
result |= status.nmi_flag << 7;
result |= status.sprite_zero_hit << 6;
result |= status.sprite_overflow << 5;
result |= status.mdr & 0x1f;
status.address_latch = 0;
status.nmi_hold = 0;
cpu.set_nmi_line(status.nmi_flag = 0);
result |= status.sprite_zero_hit << 6;
result |= status.sprite_overflow << 5;
result |= status.mdr & 0x1f;
status.address_latch = 0;
status.nmi_hold = 0;
cpu.set_nmi_line(status.nmi_flag = 0);
$2000 write:
Code:
status.nmi_enable = data & 0x80;
status.master_select = data & 0x40;
status.sprite_size = data & 0x20;
status.bg_addr = (data & 0x10) ? 0x1000 : 0x0000;
status.sprite_addr = (data & 0x08) ? 0x1000 : 0x0000;
status.vram_increment = (data & 0x04) ? 32 : 1;
status.taddr = (status.taddr & 0x73ff) | ((data & 0x03) << 10);
cpu.set_nmi_line(status.nmi_enable && status.nmi_flag);
status.master_select = data & 0x40;
status.sprite_size = data & 0x20;
status.bg_addr = (data & 0x10) ? 0x1000 : 0x0000;
status.sprite_addr = (data & 0x08) ? 0x1000 : 0x0000;
status.vram_increment = (data & 0x04) ? 32 : 1;
status.taddr = (status.taddr & 0x73ff) | ((data & 0x03) << 10);
cpu.set_nmi_line(status.nmi_enable && status.nmi_flag);
CPU triggers an actual NMI whenever the line transitions from 0->1.