INC question - possible Nintendulator bug?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
INC question - possible Nintendulator bug?
by on (#109364)
I have been running Blargg's test roms through Nintendulator (much thanks to Blargg and Q for these!).
Specfically, I am using this test from the instr_test suite: 01-implied.nes
http://blargg.8bitalley.com/parodius/ne ... est-v3.zip

Fairly early on, this sequence occurs:
Code:
E881  C8        INY                             A:00 X:FF Y:FF P:E5 SP:FB CYC:198 SL:244
E882  D0 FB     BNE $E87F                       A:00 X:FF Y:00 P:67 SP:FB CYC:204 SL:244
E884  E6 0F     INC $0F = 01                    A:00 X:FF Y:00 P:67 SP:FB CYC:210 SL:244


Remember that cycles here are PPU cycles. It shows opcode D0 (BNE) taking 6 cycles. This is equivalent to 2 CPU clock cycles. The best information I can find shows that this instruction takes between 3 and 5 cpu cycles to complete:
3 if branch is not taken
4 if the branch is taken
5 if the branch is taken, and a page boundary crossed

Is Nintendulator's timing off here, or am I misunderstanding this situation? Thanks for any advice.
Re: INC question - possible Nintendulator bug?
by on (#109366)
2 CPU cycles if branch isn't taken, 3 if it is, 4 if it crosses a page.
Re: INC question - possible Nintendulator bug?
by on (#109367)
http://www.obelisk.demon.co.uk/6502/reference.html is a good basic reference for instruction timing. You might be misled if you read something like http://nesdev.com/6502_cpu.txt (notice the "Notes:" section though), because the fetch of the next opcode is included among the steps and won't need to be done again.

For accurate emulation, you could just follow the steps in the latter document of course, but you will still need to keep in mind that trickiness for the branch instructions.
Re: INC question - possible Nintendulator bug?
by on (#109368)
Thanks for the help everyone, I was just using inaccurate info.
Re: INC question - possible Nintendulator bug?
by on (#109376)
It isn't really inaccurate, it's more that the last cycle of an opcode overlaps the first cycle of the next opcode as it can start decoding ahead of time. For example, EOR imm takes up 3 cycles in theory (one to fetch opcode, one to fetch data, one to execute instruction), but since in the last cycle it doesn't need to touch memory it fetches the opcode for the next instruction, effectively making it last 2 cycles.

EDIT: not sure if that's the exact opcode, trying to remember what I saw once regarding C64 timings (which also uses the 6502).
Re: INC question - possible Nintendulator bug?
by on (#109411)
Thanks again for the help everyone, could someone explain this one to me as well?
This trace from Nintendulator shows a BEQ instruction taking 9 PPU clocks (3 CPU clocks). I don't see how this is the case.
To me it seems like this should take 4 CPU clocks. It should use the two standard cycles, plus the optional cycle since the branch is taken, plus another cycle since it crosses a page boundary.

Code:
CFFC  C9 5A     CMP #$5A                        A:5A X:81 Y:69 P:25 SP:FB CYC:286 SL:1
CFFE  F0 05     BEQ $D005                       A:5A X:81 Y:69 P:27 SP:FB CYC:292 SL:1
D005  A9 AA     LDA #$AA                        A:5A X:81 Y:69 P:27 SP:FB CYC:301 SL:1


My best guess is that I am not understanding the page boundary properly - to me this looks like a crossing.
Re: INC question - possible Nintendulator bug?
by on (#109412)
PC=CFFE
fetch opcode then increment PC
PC=CFFF
fetch offset then increment PC
PC=D000
add offset to PC, no carry from low byte so no extra cycle
Re: INC question - possible Nintendulator bug?
by on (#109427)
Thank you! I couldn't sleep last night thinking about this issue - embarrassed that the answer is so obvious. Really appreciate the explanation of that one.
Re: INC question - possible Nintendulator bug?
by on (#109471)
The answer was never obvious to me at least, and it wasn't helped by the ambiguous "when a branch crosses a page, an extra cycle is taken" that most describe. It should be something like, "if the branch is to an instruction that begins on a different page than the instruction just after the branch begins on, an extra cycle is taken".
Re: INC question - possible Nintendulator bug?
by on (#109474)
I think the rule of thumb is that if the high byte of the address bus changes you have to add an extra cycle (low and high bytes are updated on separate cycles, which can lead to some interesting situations with hardware if you aren't careful...).
Re: INC question - possible Nintendulator bug?
by on (#109475)
"If the high byte changes" is ambiguous. A taken branch whose opcode is at $80FF involves a change in the high byte of the PC. So does one whose opcode is at $80FE and branch offset at $80FF. For example, it might not be clear to everyone that the branch offset is fetched, the PC incremented like normal, before the addition takes place. It might seem like the addition occurs with the PC at the branch offset.
Re: INC question - possible Nintendulator bug?
by on (#109482)
Sik wrote:
I think the rule of thumb is that if the high byte of the address bus changes you have to add an extra cycle (low and high bytes are updated on separate cycles, which can lead to some interesting situations with hardware if you aren't careful...).

Better rule of thumb: If adding an 8-bit value to a 16-bit value and there is a carry from bit 7, you add an extra cycle to fix the high 8-bits*

Example:

Code:
word.lo_byte += byte;

if ( word.lo_byte < byte ) // carry
{
    cycle();
    word.hi_byte += 1;
}


*Aside from write-only instructions with Absolute X, Absolute Y, and Zero Page Indirect Y addressing. There the extra cycle is fixed and always taken. :) Also PC increments while fetching opcode/operands don't observe this penalty.
Re: INC question - possible Nintendulator bug?
by on (#109498)
blargg wrote:
"If the high byte changes" is ambiguous. A taken branch whose opcode is at $80FF involves a change in the high byte of the PC. So does one whose opcode is at $80FE and branch offset at $80FF. For example, it might not be clear to everyone that the branch offset is fetched, the PC incremented like normal, before the addition takes place. It might seem like the addition occurs with the PC at the branch offset.

Actually this reminds me, there's a massive issue when doing absolute branches (i.e. full address instead of offset), if the address (the operand) happens to cross page zero, it will not be read properly - it will read the low byte from the end of the page and the high byte from the beginning of the same page, instead of the beginning of the next page. This is because the CPU doesn't increment the address properly when reading.
Re: INC question - possible Nintendulator bug?
by on (#109501)
I thought the JMP bug was just for indirect jumps like JMP ($xxFF), not for absolute jumps like $xxFE: JMP $9000.
Re: INC question - possible Nintendulator bug?
by on (#109522)
tepples wrote:
I thought the JMP bug was just for indirect jumps like JMP ($xxFF), not for absolute jumps like $xxFE: JMP $9000.


Correct.