This thread is regarding when the APU's frame IRQ flag is set and when the IRQ occurs. I started a new thread since I thought it might have some long posts and didn't want to be bogged down with an already-long thread.
The best I can do is show several code sequences and the result they produce, and leave it to you to come up with a simple model to explain it. This is the same raw data I have to work with.
The comments in the following code show what is printed when it is run on my NES. The calls to sync_apu synchronize with the APU even/odd "jitter" so that the write to $4017 will be on an even internal cycle, such that there is no extra clock delay before the new mode begins. The times of some events are commented, relative to the write to $4017.
Delays are made with delay_yaN, with a comment indicating the total clocks taken by the ldy, lda, jsr, delay routine, and rts. In other words, deleting the three instructions that invoke the delay would cause the following instructions to execute that many clocks earlier.
These demonstrate when the set IRQ flag becomes visible via a $4015 read, and the way the IRQ flag is apparently set multiple times (or not cleared when reads are at particular times):
Code:
print_a_x:
jsr debug_byte
txa
jsr debug_byte
rts
reset:
jsr setup_apu
jsr sync_apu
lda #$00 ; start mode
sta $4017 ; write at 0
ldy #48 ; 29826 delay
lda #123
jsr delay_ya1
lda $4015 ; read at 29830
ldx $4015 ; read at 29834
jsr print_a_x ; prints $00 $40
jsr sync_apu
lda #$00 ; start mode
sta $4017
ldy #48 ; 29827 delay
lda #123
jsr delay_ya2
lda $4015 ; read at 29831
ldx $4015
jsr print_a_x ; prints $40 $40
jsr sync_apu
lda #$00 ; start mode
sta $4017
ldy #48 ; 29828 delay
lda #123
jsr delay_ya3
lda $4015 ; read at 29832
ldx $4015
jsr print_a_x ; prints $40 $40
jsr sync_apu
lda #$00 ; start mode
sta $4017
ldy #48 ; 29829 delay
lda #123
jsr delay_ya4
lda $4015 ; read at 29833
ldx $4015
jsr print_a_x ; prints $40 $00
These demonstrate the time when the IRQ occurs:
Code:
irq:
bit $4015 ; clear IRQ flag
txa ; save current value of X
rti
reset:
jsr setup_apu
jsr sync_apu
lda #$40 ; clear frame IRQ flag first
sta $4017
cli
lda #$00 ; start mode
sta $4017 ; write at 0
ldy #48 ; 29828 delay
lda #123
jsr delay_ya3
ldx #0
ldx #1 ; first clock at 29831
; IRQ occurs at 29833
ldx #2
ldx #3
sei
jsr debug_byte ; prints $01
jsr sync_apu
lda #$40 ; clear frame IRQ flag first
sta $4017
cli
lda #$00 ; start mode
sta $4017 ; write at 0
ldy #48 ; 29827 delay
lda #123
jsr delay_ya2
ldx #0
ldx #1
ldx #2 ; first clock at 29832
; IRQ occurs at 29834
ldx #3
sei
jsr debug_byte ; prints $02
EDIT: Clarified delay_yaN and added the time when the IRQ occurs.
I'm a little unclear on the delay_ya2 bit in the second code snippit. The comment says it will delay 29827... but from when to when? does the delay include the LDY/LDA/JSR lines? what about the RTS?
or basically... when is that LDX #0 happening in relationship to the $4017 write?
The original post has been edited to clarify the delays and time of IRQ. The comment in the delay is the most useful value, i.e. it includes everything: the setup code, delay routine, and RTS. I wrote a little command-line tool that takes an arbitrary delay and generates the code to call the delay routine. It's very useful for writing test code.
I could use this format for posting future reverse-engineering results. It's good in that it's unambiguous and not subject to any interpretation error on my part, and can be directly tested for any errors. It would make it easier for me to post all the (possibly) new things I've found about the APU and PPU. Ideas for improving readability would be appreciated.
Code:
lda $4015 ; read at 29831
ldx $4015
jsr print_a_x ; prints $40 $40
Could that read be affected by the last read from $4015? I mean... could the last read not have cleared the IRQ flag and it's just giving you a dud value for that one?
If that's the case, then it looks like 29832 might be the first cycle where the IRQ flag is raised. It's far enough back to match your IRQ example, but enough ahead to match most of your $4015 reads (except for the above mentioned)
Quote:
Could that read be affected by the last read from $4015? I mean... could the last read not have cleared the IRQ flag and it's just giving you a dud value for that one?
I don't follow. I modified the test to read three times in a row (and starting four clocks earlier than before) and print those values; maybe that will answer this question.
Quote:
If that's the case, then it looks like 29832 might be the first cycle where the IRQ flag is raised. It's far enough back to match your IRQ example, but enough ahead to match most of your $4015 reads (except for the above mentioned)
I think the IRQ line needs to be asserted before the beginning of the last cycle of an instruction for an IRQ to occur (in place of the next instruction). When I get around to posting the exact PPU's VBL flag timing and NMI timing, you can compare behaviors.
Code:
jsr sync_apu
lda #$00 ; start mode
sta $4017 ; write at 0
ldy #50 ; 29822 delay
lda #118
jsr delay_ya5
lda $4015 ; read at 29826
ldx $4015 ; read at 29830
ldy $4015 ; read at 29834
jsr print_a ; prints $00
jsr print_x ; prints $00
jsr print_y ; prints $40
jsr sync_apu
lda #$00 ; start mode
sta $4017
ldy #50 ; 29823 delay
lda #118
jsr delay_ya6
lda $4015 ; read at 29827
ldx $4015 ; read at 29831
ldy $4015 ; read at 29835
jsr print_a ; prints $00
jsr print_x ; prints $40
jsr print_y ; prints $40
jsr sync_apu
lda #$00 ; start mode
sta $4017
ldy #50 ; 29824 delay
lda #118
jsr delay_ya7
lda $4015 ; read at 29828
ldx $4015 ; read at 29832
ldy $4015 ; read at 29836
jsr print_a ; prints $00
jsr print_x ; prints $40
jsr print_y ; prints $40
jsr sync_apu
lda #$00 ; start mode
sta $4017
ldy #50 ; 29825 delay
lda #118
jsr delay_ya8
lda $4015 ; read at 29829
ldx $4015 ; read at 29833
ldy $4015 ; read at 29837
jsr print_a ; prints $00
jsr print_x ; prints $40
jsr print_y ; prints $00
So the flag is set on 29831... but the IRQ doesn't happen until 29833?
what happens if IRQs are disabled on 29832... does an IRQ still fire on 29833? Why does this behavior have to be so goofy! I'm like astounded that so many emu authors said they got their emu to pass all the tests and I'm just having the hardest time with it. ;_;
Guess I'll go back to tinkering
Your questions are providing good ideas of things to test and document here, and who says other people have implemented this behavior properly? This discussion leads to better understanding and further tests; it's quite subtle (and this is only the beginning...). This is finally prompting me to speculate on what's actually going on in the hardware.
The 6502 uses a two-phase clock, so some events on clock X actually occur one half clock later. My theory is that memory reads occur a half clock later, so the IRQ flag is set at the beginning of clock 29831.
My theory is also that the IRQ flag is sampled just before the beginning of the final clock of an instruction, so the earliest it can be seen is 29832, and if that is the last clock of an instruction, the earliest it can occur is 29833.
Further, my theory is that SEI doesn't set the IRQ inhibit flag until its second clock (or maybe after). Since the IRQ flag is sampled just before that, SEI won't take effect until the beginning of the next instruction. If an IRQ was already scheduled on the last clock of SEI, it will still occur. Thus you can have the situation of setting the I flag, having the IRQ taken, and having the saved status on the stack with the I flag set!
Currently in my emulator I'm handling this "delayed" interrupt behavior by keeping track of the *time* the interrupt should occur, rather than just a flag saying "it will occur". The time the interrupt should occur is basically at soonest 2 clocks after when the flag is set. I haven't done much work on this so it would be productive to start a thread to discuss effective strategies for handling interrupts (since I imagine this will lead to insight into NMI handling too).
I'll definitely write some test ROMs for all the cases covered here.
Here is the test you suggested:
Code:
reset:
jsr setup_apu
jsr begin_test
lda #$00 ; start mode
sta $4017 ; write at 0
ldy #48 ; 29825 delay
lda #123
jsr delay_ya0
lda #$ff
ldx #0 ; 29828
sei ; 29830
ldx #1
ldx #2
ldx #3
jsr print_a ; prints $ff (IRQ not taken)
jsr begin_test
lda #$00 ; start mode
sta $4017 ; write at 0
ldy #48 ; 29826 delay
lda #123
jsr delay_ya1
lda #$ff
ldx #0 ; 29829
sei ; 29831
; IRQ occurs at 29833
ldx #1
jsr print_a ; prints $00
jsr print_y ; prints $04 (I flag set before IRQ)
jsr begin_test
lda #$00 ; start mode
sta $4017 ; write at 0
ldy #48 ; 29827 delay
lda #123
jsr delay_ya2
lda #$ff
ldx #0 ; 29830
sei ; 29832
; IRQ occurs at 29834
ldx #1
jsr print_a ; prints $00
jsr print_y ; prints $04 (I flag set before IRQ)
jsr begin_test
lda #$00 ; start mode
sta $4017 ; write at 0
ldy #48 ; 29828 delay
lda #123
jsr delay_ya3
lda #$ff
ldx #0 ; 29831
; IRQ occurs at 29833
sei
ldx #1
jsr print_a ; prints $00
jsr print_y ; prints $00
jmp forever
begin_test:
jsr sync_apu
lda #$40
sta $4017
cli
rts
irq:
bit $4015 ; clear IRQ flag
pla ; y = I flag *before* IRQ occurred
pha
and #$04
tay
txa ; save current value of X
rti
Thanks. I'll have to review that later when I have a clearer head (suffering from a cold today ;_;).
Anyway, I changed my emus behavior so that the bit returned from $4015.6 and the actual IRQ pending flag are two seperate things and I was able to iron out my problems (even though that seemed rather hackish... sad to say I think I'm tiring and getting to the point of just saying "screw it, that's close enough").
Anyway all tests give me a pass now! ^^
(edit3 - posted a Q here before but found my problem so I took it out)
EDIT -- another thing I noticed that's sort of on the same subject as frame IRQs. It seems Cobra Triangle does some very sloppy coding where it writes $80 to $4017 after it waits the 2 frames on startup. I was having problems with this since a frame IRQ was happening before the mode change, and Cobra Triangle would deadlock when starting the game (on it's CLI). --- since the game NEVER acknowledges or disables frame interrupts.
I was able to correct the problem by starting system emulation at the very start of VBlank (before, my HardReset would start emulation at the start of the dummy scanline before VBlank). This makes the first LDA $2002 BPL check instantaneous, and allows the second one to finish allowing the game to switch frame modes just barely before the frame IRQ is tripped.
Is there any information on when in the frame the system starts on powerup/reset? Is $2002.7 high immediately after flipping the power on?
Just some things I was thinking about. I thought it would be useful to bring up for other emu authors... in case any of you guys are having the same problem with Cobra Triangle.
edit4 - grah... now that all sound tests are passing, Time Lord is broken. Must be the frame IRQ hackish thing I did *sigh*
hrm.... Apparently Time Lord has the exact opposite problem as Cobra Triangle. It acknowledges the Frame IRQ several cycles before it's tripped. Must've worked before when I started emulation on the dummy scanline rather than at the start of VBlank.
last edit I swear -- Was able to get both Cobra Triangle and Time Lord working (had nothing to do with the APU tests). What I do now is I start emulation 10 scanlines into VBlank on reset/powerup (with $2002.7 high immediately). This gives both games lots of buffer time to switch frame modes... so that their Frame IRQs never trip. If anyone has had similar problems with these games and has come up with a different solution I'd be very interested in hearing it.
So yay... now Cobra Triangle, Time Lord, AND and APU tests are all working. It's miller time.
Disch wrote:
last edit I swear -- Was able to get both Cobra Triangle and Time Lord working (had nothing to do with the APU tests). What I do now is I start emulation 10 scanlines into VBlank on reset/powerup (with $2002.7 high immediately). This gives both games lots of buffer time to switch frame modes... so that their Frame IRQs never trip. If anyone has had similar problems with these games and has come up with a different solution I'd be very interested in hearing it.
So yay... now Cobra Triangle, Time Lord, AND and APU tests are all working. It's miller time.
Good find Disch. I had similiar problem getting Time Lord to run properly without changing the reset value of the frame counter, which would then break test #9.
Disch, what do you mean "10 scanlines into VBlank"??? Do you mean at line 10~19, then 20, then 21-260...?
My problem is with Wizard & Warriors 2 - unknown instruction at title screen. I know my VBlank flag fails on blargg's sprite timing tests, but no visible solution right now.
I just tried W&W2 in my emu and the NTSC (U) version played fine, but the game hung on the PAL (E) [!] version (the sword on the title screen wasn't being drawn, and neither was the 'D' in "IRONSWORD")... however I wasn't getting any bad opcodes.. so I don't know what could be your problem there.
I fixed Ironsword by pushing the reset start time even further forward for PAL mode. Now, rather than having it start 10 scanlines into VBlank, it starts 10 scanlines before the end of VBlank.
To clarify this:
NTSC:
** Frame Start **
10 scanlines of VBlank
** -- start emulation here on Powerup/reset -- **
10 more scanlines of VBlank
1 pre-render scanline
240 rendered scanlines
1 dummy scanline
** Frame End **
PAL:
** Frame Start **
60 scanlines of VBlank
** -- start emulation here on Powerup/reset -- **
10 more scanlines of VBlank
1 pre-render scanline
240 rendered scanlines
1 dummy scanline
** Frame End **
$2002.7 should be SET IMMEDIATELY on powerup/reset.. or these games in question (Cobra Triangle, Time Lord, and probably Ironsword but I haven't checked) will hang.
Anyway that's what I'm currently doing and all these games are working for me. I don't know about that bad opcode you're getting, Fx3... maybe your problem lies elsewhere?
Argh. This thread is about the APU frame counter. New threads don't cost anything. :)
blargg wrote:
Argh. This thread is about the APU frame counter. New threads don't cost anything.
Yes. I'm not offtopic as you might think. ^_^;;
There are 10 cycles on reset timing (APU frame counter), but it seems the latency could be a bit higher? -_-;; Plus, the frameIRQ seems to be the problem... even if ALL eleven tests shows OK (1).
The APU frame counter is basically reset when the NES is, but the PPU acts as if it were reset hundreds of clocks earlier. As for passing the APU test ROMs, that is no guarantee that everything is correct, as you point out. The RE test code in this thread is about as solid as you can get, so any problems must be due to something else. I'm going to be working more on power-up state soon.
I believe I read somewhere (might have been BT's APU doc) that the APU does nothing during the first 2048 cycles after startup. Perhaps that's related?
It looks like 1/4th of PPU frame time. I could guess the APU would start at mode #1 (5 steps) on step 5 (last one). This way, nothing is clocked during this period.
Bumping this thread because I'm having problems getting Ironsword working.
It triggers a Frame IRQ early, which never gets cleared. Later, it disables frame IRQs by setting a different frame counter mode, but that doesn't clear the IRQ. When the title screen comes around, it ends up in an infinite RTI loop where it's not clearing the Frame IRQ interrupt.
Start your PPU frame a bit earlier.
I think this time the problem may be more complicated... Failure to emulate the dummy read on "STA $4000,X" made it fail to clear frame IRQ by reading 4015.
Edit: Yep, the missing dummy read did it. Since I'm not emulating dummy reads, I've hacked the 4015 write code to explicitly check for STA nnnn,X or STA nnnn,Y instructions and clear frame IRQ if PC-3 is one of those two instructions.
Lack of dummy reads affects
any register with read side effects. Watch someone use the lack of $2007 dummy reads to detect you and then either A. rearrange certain pattern table tiles for optimum scaled mode sharpness if they're generous, B. freeze and display the "18 USC 2319" error message, or C. lock the game into the hardest difficulty like Bucky O'Hare and Earthbound.
Code:
; In pretty much all my games, my pattern tables are arranged
; with a blank tile 0 and solid tile 1. This means a zero
; at PPU $000F and nonzero at PPU $0010.
lda #$00
sta $2000
sta $2001
sta $2006
lda #$0F
sta $2006
lda $2007 ; priming read
ldx #$10
; on the NES this reads from $2F07 then $3007
; without dummy reads, this reads only from $3007
ldy $2ff7,x ; read $00 $FF on NES or $00 on bad emus
beq is_bad_emulator
And it's not just LDA a,X or LDA a,Y that needs your dummy-read hack; it's also any other instruction that uses LDX a,Y, LDY a,X, LAX a,Y, LDA (d),Y, and any of the ALU instructions that use a,X, a,Y, or (d),Y. Someone with more time than I could probably whip up an example for each of the 18
unofficial RMW instructions that use these modes.
tepples wrote:
Code:
; on the NES this reads from $2F07 then $3007
; without dummy reads, this reads only from $3007
ldy $2ff7,x ; read $00 $FF on NES or $00 on bad emus
beq is_bad_emulator
I have to make sure my test is testing the right thing, but this doesn't appear to be true on real hardware. It will more frequently return 00 than FF.
In that case we might have a situation like that of a few games that use INC to init the MMC1, where multiple reads or writes in consecutive cycles might not have the expected side effect.
Does pressing the reset button affect the cpu/ppu clock alignment on a USA NES? I am testing it multiple times by hitting reset so maybe that affects it. Just guesses at this point, need to get some actual test code.
I found some other unexpected results from reading the $2007 latch too, but that will be another thread!
Pressing the reset on a US NES resets both the PPU and CPU. However, I don't know if this effects their relative timing.
On a Famicom it only resets the CPU, which implies that the resulting timing offset could be practically anything.
Karatorian wrote:
Pressing the reset on a US NES resets both the PPU and CPU. However, I don't know if this effects their relative timing.
On a Famicom it only resets the CPU, which implies that the resulting timing offset could be practically anything.
I believe it so...