Some help with VBL..

Some help with VBL..
by Muchaserres on 2006-08-13 (#16216)

Hi all,

I'm completely stuck in test #8 of Blargg's vbl_timing test rom. I get passed all tests in frame_basics, so my VBL handling must be somewhat accurate.. I get it passed if I specifically check on 0x2002 reads that the PPU is on cycle 11 of the first VBL scanline, but that's like cheating.. it's like a hack. Anyone's had any similar experience or any idea on what's happening?

By the way, does anyone know if this board's search function is working properly? I get no results, no matter what I search.

Thx!

by Disch on 2006-08-13 (#16218)

er... there's a test 8? I only have up to 7

Were these updated?

*goes to check*

EDIT

OOoohhh... I read that wrong... you get error #8 on test ROM #2.

EDIT again

You shouldn't be checking for 11 cycles INTO VBlank... you should be checking for one cycle BEFORE VBlank.

Re: Some help with VBL..
by Memblers on 2006-08-13 (#16220)

Muchaserres wrote:
By the way, does anyone know if this board's search function is working properly? I get no results, no matter what I search.

Yeah, it broke somehow. I'll try to fix it once I get some other things that need to be done out of the way.

by Muchaserres on 2006-08-13 (#16222)

Well, this is what I'm trying to do.. In my PPU core, VBL's 20 scanlines come first, so I do something like this,

if( CPUCycles == ( FRAMECycles - 5 ) ) EnableSetting = false;

where FRAMECycles is 262*341*5 (5 less every other frame, you know). PPU cycles are multiplied by 5 and CPU cycles by 15 to get everything synchronized. This works fine if FRAMECycles equals 446770 (12 cycles after VBL is set), but not if it equals what it should, 262*341*5.

by Disch on 2006-08-13 (#16224)

hrrrrm...

12 PPU cycles is 4 CPU cycles -- the time of one "LDA $2002" instruction. Could you be adding 4 cycles to your CPU timestamp before executing the instruction?

</random guess>

by Muchaserres on 2006-08-13 (#16225)

I'll check that. In my CPU core, I do the cycle addition before executing the instructions themselves. I've moved that to be done after the instructions, and I get passed all errors on both tests #1 and #2 (yeah!). Should this really make any difference? Anyway, I'll recheck my sources..

</BIG thanks>

by tepples on 2006-08-13 (#16227)

But in fact, the actual memory read occurs on the fourth CPU cycle of the instruction. There should be three instruction fetches, at one cycle each, and then one data fetch from PPU register.

by Disch on 2006-08-13 (#16228)

Right

blargg's tests probably aren't able to pick up on that, since the effect of the read will be delayed. Which might be why he's passing regardless.

Muchaserres ... consider the following, and think about how your emu will handle it:

an instruction 'LDA $2002" is starting to execute 2 CPU cycles before VBlank. Since the actual $2002 read is occuring on the 4th cycle of that instruction, it's actually happening on the second cycle AFTER VBlank -- so the above read should return the VBlank flag set.

If you have your CPU as instruction granularity, you'll get a "round to the nearest instruction" effect... which might become very troublesome in timing sensitive tests such as these.

One common solution to this is to pass a mini-timestamp along to your read and write routines. For example, LDA absolute code might go something like:

Code:
op = Read( PC, 0 ); // 0 because it's the first cycle in the instruction
PC++;
switch(op)
{
case 0xAD: // LDA $xxxx
adr.lo = Read( PC, 1 ); // 1 because it's 2nd cycle
PC++;
adr.hi = Read( PC, 2 ); // 2, etc
PC++;
A = Read( adr, 3 );
SET_NZ();
break;
}

CPUtimestamp += cycles[ op ];

Then in your Read/Write procedures, you can add the mini-timestamp to the actual timestamp whenever you 'catch up' you other systems or do some other sync operation. This allows for cycle-level detail rather than instruction-level detail.

by Muchaserres on 2006-08-13 (#16229)

Uhm.. I always have had this in mind, but the impact it may have in terms of execution speed is what has stoped me from doing things this way. Have you done any tests on this? I'll try and see what happens.

Know of any document with cycle decomposition info for every opcode?

Anyway, what tepples says is quite right. The opcode used here, if I'm right, is absolute LDA (0xAD), which takes 4 cycles. When the read to 0x2002 actually happens only 3 cycles have been used, so I should only see a 1 cycle delay in my emu, instead of the 4 I see. Could this be caused by an accumulation of events similar to this one?

Thank ya!

by Disch on 2006-08-13 (#16230)

Muchaserres wrote:
Uhm.. I always have had this in mind, but the impact it may have in terms of execution speed is what has stoped me from doing things this way.

What kind of impact could there be? You're just passing a single additional function parameter -- that's practically nothing. 999 times out of 1000 the fine-time can even be ignored... like if the read is unimporant (like a read from RAM, ROM or something).

I haven't done any side-by-side tests... but I currently do this in my emu and it's no worse off than any of my previous versions.

Quote:
Know of any document with cycle decomposition info for every opcode?

http://nesdev.com/6502_cpu.txt

Quote:
Anyway, what tepples says is quite right. The opcode used here, if I'm right, is absolute LDA (0xAD), which takes 4 cycles. When the read to 0x2002 actually happens only 3 cycles have been used, so I should only see a 1 cycle delay in my emu, instead of the 4 I see.

But if blargg's ROM is timing itself by doing a certain read at a certain timestamp.. then his ROM is going to be 4 cycles off what it expected to be because his ROM is syncing up incorrectly.

Like if he syncs up in order to find the exact cycle the VBlank flag raises or lowers, then times his code from there (I'm unsure how he does it exactly -- but I think it's something like that) -- whenver his program syncs up is gonig to be 4 or so cycles off because your emu is rounding his timing reads to the next instruction.

by Muchaserres on 2006-08-18 (#16346)

Disch wrote:
Quote:
Know of any document with cycle decomposition info for every opcode?

http://nesdev.com/6502_cpu.txt

Nice doc. I've got some questions:

1.- Look at cycle #2 of BRK. It says "read next instruction byte (and throw it away)". Another example would be cycle #5 of absolute addressed read-mod-write instructions, "write the value back to effective address". Should those dummy read/write operations be emulated?

2.- Should I add mini-timestamps on ZP addressed opcodes where only RAM accesses take place?

3.- Look at cycles #4 and #5 of absolute indexed read instructions. Should I emulate both reads? Actually, I read from the effective address and add an additional cycle if there's any page boundary crossing.

Disch wrote:
But if blargg's ROM is timing itself by doing a certain read at a certain timestamp.. then his ROM is going to be 4 cycles off what it expected to be because his ROM is syncing up incorrectly.

Like if he syncs up in order to find the exact cycle the VBlank flag raises or lowers, then times his code from there (I'm unsure how he does it exactly -- but I think it's something like that) -- whenver his program syncs up is gonig to be 4 or so cycles off because your emu is rounding his timing reads to the next instruction.

That makes it clear. Thanks!

by Disch on 2006-08-18 (#16347)

1.

Depends how far you want to go with your emu. I don't think any games will rely on the dummy reads -- however I think there are one or two which rely on the dummy write (in INC absolute I think? not quite sure, not even sure if I'm remembering correctly).

If you don't do the dummy reads/writes, you'll probably be fine. At least you will with 99.999% of the games around.

2)

Again it depends. I don't -- I don't even call a function for ZP or stack read/writes.

The only time I can see this making any difference is with mapper 90's IRQ counter -- which has the option of couting on every CPU write... however a grand total of zero games use that option.

3) Same answer as #1

by Muchaserres on 2006-08-19 (#16357)

OK, so.. after a careful source modification, I still get the same mess.. It only passes the error if I do

Code:
if( ( CPUCycles + 0 ) == ( FRMCycles - 5 ) ) EnableNMI = false;

instead of

Code:
if( ( CPUCycles + cycles ) == ( FRMCycles - 5 ) ) EnableNMI = false;

In other words, if I add the timestamp the whole thing breaks. Of course, I call PPU->Run( CPUCycles + cycles ); before doing that check..

by Muchaserres on 2006-08-20 (#16376)

I've been logging 0x2002 reads while running '2.vbl_timing.nes'. Look at the results. First of all, I do,

Code:
Run( CPUCycles + miniCycles );

where 'miniCycles' is the cycle count inside the read opcode in the moment of performing the read, as you Disch suggest. Then, there are two possibilities. The wrong one,

Code:
if( ( CPUCycles + 0 ) == ( FRMCycles - 5 ) ) EnableNMI = false;

for which I get the following timing (all cycles divided by 5 for a PPU perspective),

Code:
CPUCycles = 89340 - miniCycles = 9 - PPUCycles = 89349/ 89342
CPUCycles = 89341 - miniCycles = 9 - PPUCycles = 89350/ 89342 <-- DISABLED HERE!
CPUCycles = 0 - miniCycles = 9 - PPUCycles = 6820/ 89342

The 9 'miniCycles' correspond to the 3 cycles taken by an absolute LDA till it actually reads from effective address, in its 4th cycle. The problem is that 'CPUCycles' already equals frame cycles (the rightmost number in PPUCycles) minus one, so adding the mini timestamp breaks the flow.

On the other hand, the right option,

Code:
if( ( CPUCycles + cycles ) == ( FRMCycles - 5 ) ) EnableNMI = false;

outputs this timing,

Code:
CPUCycles = 89331 - miniCycles = 9 - PPUCycles = 89340/ 89342
CPUCycles = 89332 - miniCycles = 9 - PPUCycles = 89341/ 89342 <-- DISABLED HERE!
CPUCycles = 89333 - miniCycles = 9 - PPUCycles = 89342/ 89342

Here, if we add 'miniCycles' to 'CPUCycles' we get a correct result, 89332+9=89341/89342.

The problem here is that I don't pass error #8 in the test with the right option, but with the wrong one. Any ideas? By the way, those logs I paste here correspond to the first time a read happens one PPU cycle before VBL. Would any of you mind checking against you emus which of the logs I show is more likely to be correct?

Thank you all.

by blargg on 2006-08-20 (#16377)

In my emulator, after $00F8 is set to $08, I get a ton of $2002 reads, then one per frame as it synchronizes exactly to the PPU, then the read at 89341 PPU clocks after VBL that suppresses VBL before $01 is written to $00F8.

Code:
lda #8;) Reading 1 PPU clock before VBL should suppress setting
sta <$F8
jsr sync_ppu_align1_30
jsr delay_26
lda $2002
and #$80
jsr error_if_ne
lda $2002
and #$80
jsr error_if_ne

---- = VBL occurred
...
Read $2002 at 89289
Read $2002 at 89310
Read $2002 at 89331
----
Read $2002 at 10
----
Read $2002 at 89330
----
Read $2002 at 89331
----
Read $2002 at 89332
----
Read $2002 at 89333
----
Read $2002 at 89334
----
Read $2002 at 89335
----
Read $2002 at 89336
----
Read $2002 at 89337
----
Read $2002 at 89338
----
Read $2002 at 89339
----
Read $2002 at 89340
----
Read $2002 at 89341
----
Read $2002 at 89342
----
----
Read $2002 at 89341
----
Read $2002 at 59
Write $01 to $00F8

by Muchaserres on 2006-08-21 (#16399)

I get exactly the same. Using the 'wrong' scheme, I get,

Code:
CPUCycles = 89289 - miniCycles= 9 - PPUCycles = 89298/ 89342
CPUCycles = 89310 - miniCycles= 9 - PPUCycles = 89319/ 89342
CPUCycles = 89331 - miniCycles= 9 - PPUCycles = 89340/ 89342
CPUCycles = 10 - miniCycles= 9 - PPUCycles = 6820/ 89342
CPUCycles = 89330 - miniCycles= 9 - PPUCycles = 89339/ 89342
CPUCycles = 89331 - miniCycles= 9 - PPUCycles = 89340/ 89342
CPUCycles = 89332 - miniCycles= 9 - PPUCycles = 89341/ 89342
CPUCycles = 89333 - miniCycles= 9 - PPUCycles = 89342/ 89342
CPUCycles = 89334 - miniCycles= 9 - PPUCycles = 89343/ 89342
CPUCycles = 89335 - miniCycles= 9 - PPUCycles = 89344/ 89342
CPUCycles = 89336 - miniCycles= 9 - PPUCycles = 89345/ 89342
CPUCycles = 89337 - miniCycles= 9 - PPUCycles = 89346/ 89342
CPUCycles = 89338 - miniCycles= 9 - PPUCycles = 89347/ 89342
CPUCycles = 89339 - miniCycles= 9 - PPUCycles = 89348/ 89342
CPUCycles = 89340 - miniCycles= 9 - PPUCycles = 89349/ 89342
CPUCycles = 89341 - miniCycles= 9 - PPUCycles = 89350/ 89342 <-- DISABLED HERE!
CPUCycles = 0 - miniCycles= 9 - PPUCycles = 6820/ 89342
CPUCycles = 89341 - miniCycles= 9 - PPUCycles = 89350/ 89342 <-- DISABLED HERE!
CPUCycles = 59 - miniCycles= 9 - PPUCycles = 6820/ 89342

while using the 'right' one I get,

Code:
CPUCycles = 89289 - miniCycles= 9 - PPUCycles = 89298/ 89342
CPUCycles = 89310 - miniCycles= 9 - PPUCycles = 89319/ 89342
CPUCycles = 89331 - miniCycles= 9 - PPUCycles = 89340/ 89342
CPUCycles = 10 - miniCycles= 9 - PPUCycles = 6820/ 89342
CPUCycles = 89330 - miniCycles= 9 - PPUCycles = 89339/ 89342
CPUCycles = 89331 - miniCycles= 9 - PPUCycles = 89340/ 89342
CPUCycles = 89332 - miniCycles= 9 - PPUCycles = 89341/ 89342 <-- DISABLED HERE!
CPUCycles = 89333 - miniCycles= 9 - PPUCycles = 89342/ 89342
CPUCycles = 89334 - miniCycles= 9 - PPUCycles = 89343/ 89342
CPUCycles = 89335 - miniCycles= 9 - PPUCycles = 89344/ 89342
CPUCycles = 89336 - miniCycles= 9 - PPUCycles = 89345/ 89342
CPUCycles = 89337 - miniCycles= 9 - PPUCycles = 89346/ 89342
CPUCycles = 89338 - miniCycles= 9 - PPUCycles = 89347/ 89342
CPUCycles = 89339 - miniCycles= 9 - PPUCycles = 89348/ 89342
CPUCycles = 89340 - miniCycles= 9 - PPUCycles = 89349/ 89342
CPUCycles = 89341 - miniCycles= 9 - PPUCycles = 89350/ 89342
CPUCycles = 0 - miniCycles= 9 - PPUCycles = 6820/ 89342
CPUCycles = 89341 - miniCycles= 9 - PPUCycles = 89350/ 89342
CPUCycles = 59 - miniCycles= 9 - PPUCycles = 6820/ 89342

Note that I'm running the PPU for CPUCycles + miniCycles. Could you tell me if you get that double suppression?

by blargg on 2006-08-21 (#16406)

Sorry, I'm not making much sense of your data or explanation.

by Disch on 2006-08-21 (#16410)

To me, it looks like the timestamps are way off... like the test ROM is just syncing improperly. I'd guess you're "wrong" method is working because it's offsetting the timestamp of the read by the same number of cycles the ROM is off in its sync (coincidence? or maybe you made a similar mistake twice and only corrected one of them?)

Here's a quick log I made... maybe it'll help. My timestamps are a little different -- I start my frame with the idle scanline (scanline after rendering, before VBL), so VBL starts at PPU cycle 341. Additionally -- the timestamp in this log is 1 more than you'd expect (suppresses on 341 not on 340)... as for why, don't ask ;P

Code:
-- frame $7E and earlier read $2002 every 21
ppu cycles -- presumably to sync --
frame: 0000007E -- cyc: 226 --- VBL supress: No
frame: 0000007E -- cyc: 247 --- VBL supress: No
frame: 0000007E -- cyc: 268 --- VBL supress: No
frame: 0000007E -- cyc: 289 --- VBL supress: No
frame: 0000007E -- cyc: 310 --- VBL supress: No
frame: 0000007E -- cyc: 331 --- VBL supress: No
frame: 0000007E -- cyc: 352 --- VBL supress: No

-- by this frame, it's synced and reads once per frame
for so many frames until it hits the time right before VBl raises --

frame: 00000080 -- cyc: 330 --- VBL supress: No
frame: 00000081 -- cyc: 331 --- VBL supress: No
frame: 00000082 -- cyc: 332 --- VBL supress: No
frame: 00000083 -- cyc: 333 --- VBL supress: No
frame: 00000084 -- cyc: 334 --- VBL supress: No
frame: 00000085 -- cyc: 335 --- VBL supress: No
frame: 00000086 -- cyc: 336 --- VBL supress: No
frame: 00000087 -- cyc: 337 --- VBL supress: No
frame: 00000088 -- cyc: 338 --- VBL supress: No
frame: 00000089 -- cyc: 339 --- VBL supress: No
frame: 0000008A -- cyc: 340 --- VBL supress: No
frame: 0000008B -- cyc: 341 --- VBL supress: Yes
frame: 0000008C -- cyc: 342 --- VBL supress: No
frame: 0000008E -- cyc: 341 --- VBL supress: Yes
frame: 0000008E -- cyc: 401 --- VBL supress: No

these times are the time of the $2002 read (last cycle in the LDA $2002 instruction). I didn't see any point in logging the PPU timestamp since my emu syncs up on $2002 reads anyway, so it'd be redundant.

EDIT

yeah, see? after comparing our logs, it looks to me like your timestamp is always off by 3 CPU cycles.

Code:
CPUCycles = 89330 - miniCycles= 9 - PPUCycles = 89339/ 89342

This is the first 'once-per-frame' read performed. If you notice, the read SHOULD be happening on cycle 89330, but it's happening 9 cycles later.

So the ROM seems to be synced up improperly and is off by 3 cpu cycles somehow.

EDIT AGAIN

To be honest... I'm baffled as to how it's making it to test 8... since if it's off by 3 cycles it should be failing some of the earlier tests.

Only thing I can think of is you're not syncing up your PPU to the right time on $2002 reads. Like if you were syncing up to 'CPUCycles' and not 'CPUCycles + (minicycles * 15)' -- that would probably explain all the behavior you're getting.

Last edit, I swear:

Now that I think of it... that's a very distinct possibility. Considering what you said before:

Quote:
Of course, I call PPU->Run( CPUCycles + cycles ); before doing that check..

if 'cycles' is 3 or some other low value there (like if you didn't multiply it by 15 or 16 depending on NTSC/PAL mode)... you're effectively adding 0, since 3 isn't even enough to push it to the next PPU cycle (which would need at least 5)

so if you ARE doing PPU->Run( CPUCycles + cycles );... you want to be doing PPU->Run( CPUCycles + (cycles * CPUCycleBase) );

Hopefully that's the problem

by Muchaserres on 2006-08-21 (#16413)

Aggh.. I'm getting mad Let's see.. just to clear this. On 0x2002 reads, I should update the PPU to CPUCycles+miniCycles and then check if CPUCycles+miniCycles equals the length of a frame minus one cycle.. is that right?

The 3 cycles difference between CPU and PPU you see in my log is my fault. You should add miniCycles to CPUCycles to get the real number, you know..

by Disch on 2006-08-21 (#16415)

Muchaserres wrote:
On 0x2002 reads, I should update the PPU to CPUCycles+miniCycles

yes.

I'm just wondering if you have your bases right.

You said before you have 1 PPU cycle equal 5 'master' cycles, and 1 CPU cycle equal 15 master cycles. Since the minicycle reflects CPU cycles, it should have the same 15 master cycle base.

If you're just adding 3 when you mean to be adding 45, you'll have problems like the stuff you're experiencing now. Could this be what is happening?

Quote:
The 3 cycles difference between CPU and PPU you see in my log is my fault. You should add miniCycles to CPUCycles to get the real number, you know..

I figured as much... but that's not the 3 cycle difference I was talking about.

The read from your log I pasted in my previous post happens on cycle 89339. It should be happening on 89330 (that's the time it's happening in my log).

Whether or not your timestamp being off by 3 cycles and minicycles always being 3 cpu cycles are at all related or whether they're just a coincidence I couldn't tell you. All I can say is somewhere, somehow, the ROM is syncing up improperly causing it to be off by 3 CPU cycles by the time it checks for the VBl supression.

by blargg on 2006-08-21 (#16419)

The NESdevWiki page covers the behavior in detail: PPU Frame Timing

Quote:
Reading $2002 within a few PPU clocks of when VBL is set results in special-case behavior. Reading one PPU clock before reads it as clear and never sets the flag or generates NMI for that frame. Reading on the same PPU clock or one later reads it as set, clears it, and suppresses the NMI for that frame. Reading two or more PPU clocks before/after it's set behaves normally (reads flag's value, clears it, and doesn't affect NMI operation).

by Muchaserres on 2006-08-25 (#16562)

Disch wrote:
If you're just adding 3 when you mean to be adding 45, you'll have problems like the stuff you're experiencing now. Could this be what is happening?

I've checked this several times, I'm adding 45 cycles.

Just an idea. For every frame, I do,

Code:
FRMCycles = ...; // Frame master cycles calculation.
CPU->Run( FRMCycles ); // Runs for FRMCycles, calling PPU->Run() whenever something interesting happens.
PPU->Run( CPUCycles ); // Runs for the remaining cycles.
NMIStuff(); // You know..

But as CPUCycles may sometimes be a little greater than FRMCycles, NMIStuff() accuracy can be compromissed. Should this be my problem? How do you handle this?

by Disch on 2006-08-25 (#16566)

Muchaserres wrote:
Code:
FRMCycles = ...; // Frame master cycles calculation.
CPU->Run( FRMCycles ); // Runs for FRMCycles, calling PPU->Run() whenever something interesting happens.
PPU->Run( CPUCycles ); // Runs for the remaining cycles.
NMIStuff(); // You know..

If 'NMIStuff();' includes setting the VBlank flag... then yes, this is a problem, and quite possibly the problem. If the VBlank flag is being raised by PPU->Run then you should be fine.

As for why that would be a problem... consider the following:

Code:

; Imagine LDA $2002 is executed 1 CPU cycle before the VBl flag raises:

cycle 0 -- fetch opcode (vbl next cycle)
cycle 1 -- fetch low adr (vbl set!)
cycle 2 -- fetch high adr (vbl still set)
cycle 3 -- read $2002 (bit 7 should be set)

If the vblank flag is being set only BETWEEN cpu instructions, the timing of it gets rounded off. This very well could be why your emu is desyncing (assuming this is the problem).

My Frame function started out on a similar process but has since become a bit more complex (had to as I added other stuff, like sprite overflow prediction, mapper notification on end of vblank, and other crap). Pretty much, the premise is exactly the same, but I treat NMIs more or less the same as IRQs... in that I just keep a timestamp of the 'nextNMI', and when the CPU timestamp reaches that time, I trigger an NMI.

by Muchaserres on 2006-09-12 (#17274)

OK! So, I finally got it working properly. Now my emu passes all errors from test roms #1 to #5, with no hacks, nor any other strange mechanisms. At last, I get precise timing stamps on my logs. First of all, I'd like to thank you all for being so patient with my posts. Thanks dudes!

Now, for the remaining tests. In "6.nmi_disable" I get error #3. The problem here is that I really dunno what this rom is testing at all. I mean, I'm doing exactly what the wiki says about "VBL Flag Timing". What does "NMI should occur when disabled 3 PPU clocks after VBL" exactly mean? Is the wiki right on this?

In "7.nmi_timing" I get error #7 ("NMI occurred 1 PPU clock too late"), but well.. I suppose I'd better work on test rom #6 before dealing with this one. Anyway, does anybody know the exact time delay (in cycles) between NMI triggering and NMI execution? I mean, how do you determine the exact time for "nextNMI"?

And last, does anybody know the exact behaviour of "s0.nes" on the real thing? Maybe a photo or something.. I'm just curious.

Thanks!

by blargg on 2006-09-12 (#17299)

Quote:
In "6.nmi_disable" I get error #3. The problem here is that I really dunno what this rom is testing at all.

My fault. It's testing what happens when NMI is disabled (via $2000) within a few PPU clocks of when it would normally occur. In light of recent testing, the error descriptions are off by one PPU clock, but the tests are still valid (since they pass on a NES). At some point I'll be improving the documentation of this kind of behavior.
Quote:
In "7.nmi_timing" I get error #7 ("NMI occurred 1 PPU clock too late"), but well.. I suppose I'd better work on test rom #6 before dealing with this one.

In this case, failing 6.nmi_disable isn't a problem.
Quote:
Anyway, does anybody know the exact time delay (in cycles) between NMI triggering and NMI execution? I mean, how do you determine the exact time for "nextNMI"?

Based on recent testing, the earliest the NMI can occur is two CPU clocks after the VBL flag is set. So if VBL is set on the next to last clock of an instruction, the NMI will occur after the instruction finishes. If it's set earlier than that clock, the NMI will still occur after that instruction finishes. If it's set later than that clock, it will occur after the next instruction finishes.

by Muchaserres on 2006-09-13 (#17370)

blargg wrote:
So if VBL is set on the next to last clock of an instruction

I don't get this at all.. I imagine you mean VBL being set on the last CPU clock of an instruction or earlier makes the NMI to occur at the end of that instruction, while VBL being set later makes it to occur at the end of the next instruction.

Just in case, could you describe "7.nmi_timing"'s operation, please? Thx.

<EDIT>

I'm confused. So..

1.- If VBL occurs before the last two CPU cycles of an instruction NMI occurs immediately after that instruction. If VBL occurs later, there's a one cycle delay. This option gives me "NMI occurred 1 PPU clock too late" (error #7). Here's how I implement that..

Code:
if( Nes->CPUCycles >= Nes->NMITime )
{
if( Nes->Ppu->Regs[ 0 ] & 0x80 )
{
   NMI( 2 ); // NMI now!
   if( ( Nes->CPUCycles - Nes->NMITime ) <= 30 ) NMI( 1 ); // Wait for 1 instruction (30 comes from 2 cycles * 15 base).
}
Nes->NMITime = 0x7FFFFFFF;
}

2.- If we reduce the margin to one CPU cycle, I get "NMI occurred 1 PPU clock too early" (error #4). We would only change the 30 by a 15 (1 CPU cycle).

3.- But, according to what dvdmth said in another post, "fast NMI" occurs always except when VBL occurs during the last cycle of an instruction. So the condition would look like this, giving me "NMI occurred 3 or more PPU clocks too early" (error #2),

Code:
if( ( Nes->CPUCycles - Nes->NMITime ) == 15 ) NMI( 1 );

Sorry if I'm being a pain, but I'd like to get this clear.

</EDIT>

by Muchaserres on 2006-09-16 (#17567)

Uhm.. it works fine if I set the margin to 20 (so, 4 PPU cycles). It would somehow make sense it the latency were 1 CPU cycle (setting the margin to 15). The remaining PPU cycle could be some bug in my even/odd frame scheme. Anyone familiar with this issue?

by Muchaserres on 2006-09-17 (#17631)

After reading "Which clock is IRQ/NMI checked on?" at "NES Hardware / CopyNES", I think I finally understand how this stuff works. Look..

blargg wrote:
Apparently /NMI and /IRQ are checked approximately two cycles before the next opcode fetch, and if they are set, that next opcode fetch will be the first cycle of an interrupt vectoring sequence.

OK, so if NMITime holds the time at which /NMI is set and CPUCycles the number of cycles the CPU has executed so far, then we do something like this,

Code:
if( CPUCycles >= NMITitme )
if( ( CPUCycles - NMITitme ) < TwoCPUCycles ) SlowNMI(); else FastNMI();

But then this comes,

blargg wrote:
Since the memory read latches data near the end of a cycle, my conclusion is that /NMI and /IRQ are checked near the middle of second from the last cycle of an instruction. For LDA absolute, this means the third cycle (1: opcode fetch, 2: low byte of address, 3: high byte of address, 4: read, 5: set flags). For NOP, this means the first cycle (1: opcode fetch, 2: dummy fetch, 3: nothing).

So LDA absolute takes 4 "visible" cycles, while NOP takes 2 of those. This time we do something like this,

Code:
if( CPUCycles >= NMITitme )
if( ( CPUCycles - NMITitme ) < OneCPUCycles ) SlowNMI(); else FastNMI();

Now, here it comes the reason why I think my code passes the test. Taking NOP as an example, the check happens in the middle of the first cycle (opcode fetch). In my emu, 1 CPU cycle equals 15 master cycles, while 1 PPU cycle equals 5 of them. In the code above, I use 20 master cycles as the limit, so 20/15=1.33 ~ 1.17 CPU cycles. As it is "impossible" to get the exact time I must use "<=" instead of "<". And that's it!

Is this reasoning correct?

Thx.

by blargg on 2006-09-17 (#17653)

At beginning of execution of an instruction in emulator:
Code:
if ( cpu_time >= nmi_asserted_time + 2 )
clear nmi
begin interrupt vectoring
else
execute current instruction

Or, in the catch up model,
Code:
earliest_interrupt_time = nmi_asserted_time + 2

I'm not following your descriptions in your posts because I don't know what you're talking about. I don't know how your emulator works or where exactly the code examples you're posting are being executed.

by Muchaserres on 2006-09-20 (#17821)

OK, let's see. In terms of PPU cycles, the earliest a NMI can occur is the time a NMI is asserted plus 6 PPU cycles (2 CPU cycles). What I'm saying is that the only way my emu passes the test is by adding 5 PPU cycles (1.67 CPU cycles).

I'm doing what you say, using a catch up model, on my CPU core run() function, just before the execution of every opcode.

By comparing both NMI logs I can see that the first difference occurs the second frame, and then every 11 frames or so. What bugs me the most is that I pass every other PPU test, and all versions of your CPU timing test, so I don't really know how to tackle the problem.

by blargg on 2006-09-21 (#17834)

Sounds like it may be an off-by-one error, perhaps due to a </> needing to be a <=/>=, or vice-versa. If everything passes when you add 5 PPU cycles, then your emulator may be working accurately and your other code simply giving this delay a different meaning (i.e. the number 5 has the meaning of "6 PPU clocks between NMI assertion and NMI occurring"). I kind of like tracking down things like this, so you could send me the source and I could take a look.

by Muchaserres on 2006-09-22 (#17892)

Uhm.. I've been thinking about that, but everything seems to be OK. Here's the exact code I use in my CPU core,

Code:
while( Nes->CPUCycles < Nes->FRMCycles )
{
if( Nes->CPUCycles >= ( Nes->NextNMI + 30 ) ) // 2 * 15 CPU master cycles, 6 * 5 PPU master cycles
{
   Nes->NextNMI = 0x7FFFFFFF;
   if( Nes->Ppu->Regs[ 0 ] & 0x80 ) { _NMI(); }
   if( Nes->CPUCycles >= Nes->FRMCycles ) break;
}
opcode = READMEM( PC++, 0 );
switch( opcode )
{
   case 0x00: { BRK(); break; }
   // (...)
}
ADD_CYCLES( cycleTable[ opcode ] );
}

while the time for the NMI is set like this,

Code:
NextNMI = ( 1 * 341 * 5 ); // Rest scanline first, then 20 VBL scanlines

As I said in my previous post, using a 25 instead of the actual 30 makes everything work fine. The same happens setting "NextNMI" to 340 PPU cycles instead of the actual 341. When is the exact time at which /NMI is triggered? I mean, I actually do that at the end of PPU cycle 341 in the rest scanline. I've debugged the process step by step and everything looks right to me. If there's no obvious error in the code above, my next try will be an even deeper CPU cycle-per-cycle revision, but I doubt I'll find anything as my emu actually passes your last CPU timing test.

On your debug offer, thank you very much! Just let me try a few more things first.

by Muchaserres on 2006-09-23 (#17936)

Meh.. I cannot come with a clean implementation for tests #6 and #7. I'm putting my emu apart. I'm completely tired, as it's been about two weeks of absolutely 0 progress.

by Muchaserres on 2006-09-24 (#17964)

Today I felt inspired, so I finally came to a clean implementation for those tests. Now my emu's passing all PPU/CPU tests! There's only one issue I still need to investigate: the reason why I need a 25 instead of 30.

Just one question, to close the topic: when enabling NMI's (through a write to 0x2000) inside VBL, is it necessary for the NMI flag to be disabled? I mean, what would happen if NMI's were enabled (although the flag already being set) during VBL?

by blargg on 2006-09-25 (#17972)

I don't follow your question. What do you mean by disabling the NMI? $2002 bit 7 is the only control you have. The CPU registers an NMI request when the /NMI line transitions from high to low (edge sensitive). Once this occurs, the NMI will occur. If you write $00 then $80 to $2000 during blanking, you'll get another NMI since this will cause the /NMI line to go high then low again.

by Muchaserres on 2006-09-25 (#17976)

That's what I was looking for.

by raidtab on 2006-10-26 (#18849)

I've got the same problem as the OP, and i've been trying to implement the "mini-cycles" idea into my processor code. I've had a look at the 6502_cpu.txt document but i'm a little confused with parts of it.

Quote:
Zero page indexed addressing

Read instructions (LDA, LDX, LDY, EOR, AND, ORA, ADC, SBC, CMP, BIT,
LAX, NOP)

# address R/W description
--- --------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch address, increment PC
3 address R read from address, add index register to it
4 address+I* R read from effective address

Notes: I denotes either index register (X or Y).

* The high byte of the effective address is always zero,
i.e. page boundary crossings are not handled.

Am i right in understanding that the CPU makes 2 reads in the one instruction, one to the address argument and another to the address argument + register? e.g.

LDA $01, X (where X = 10)

will make a read to 01 (and presumably throw away the result), and then on the next cycle will read from 11?

Also:

Quote:
Write instructions (STA, SHA)

# address R/W description
--- ----------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch pointer address, increment PC
3 pointer R fetch effective address low
4 pointer+1 R fetch effective address high,
add Y to low byte of effective address
5 address+Y* R read from effective address,
fix high byte of effective address
6 address+Y W write to effective address

Notes: The effective address is always fetched from zero page,
i.e. the zero page boundary crossing is not handled.

* The high byte of the effective address may be invalid
at this time, i.e. it may be smaller by $100.

So if the high byte of the effective address is invalid on cycle 5, should the read still happen regardless?

Thanks

by tepples on 2006-10-26 (#18852)

Yes, reads from half-computed addresses still happen.