How do I shot MMC3 IRQ?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic

by on (#92290)
About MMC3 in puNES:

I'd like to know your MMC3 IRQ implementation, since it passes through all the tests. I want to know about PPU A12 rising:

a) how do you threat A12 for sprites 8x16? FCEUX doesn't clock the IRQ if PPU $2000 AND $38 != $18.
b) how do you handle the IRQ for the cycle 324, regarding the previous scanline? (assuming no timestamp system)
c) do you clock the IRQ at PPU cycle 260 with A12 high, if BG=$0000 and SP=$1000? (read a) again)

My old code fails for the scanline timing test, "scanline 0 IRQ should occur later/sooner when $2000:$08".

Mind you to give me an hand? :)

by on (#92313)
Zepper wrote:
I'd like to know your MMC3 IRQ implementation, since it passes through all the tests. I want to know about PPU A12 rising:

a) how do you threat A12 for sprites 8x16? FCEUX doesn't clock the IRQ if PPU $2000 AND $38 != $18.
b) how do you handle the IRQ for the cycle 324, regarding the previous scanline? (assuming no timestamp system)
c) do you clock the IRQ at PPU cycle 260 with A12 high, if BG=$0000 and SP=$1000? (read a) again)

My old code fails for the scanline timing test, "scanline 0 IRQ should occur later/sooner when $2000:$08".

Mind you to give me an hand? :)

Hi Zepper, I'have tried to implement MMC3 in the simplest way possible (for me). Control each time the PPU address is updated if there is a transition 0 -> 1 of A12 and if this happens then I tick the MMC3 counter. The control is always done (even in case of sprite 8x16). The only limitation that I have imposed is that the tick can happen only once between cycle 324 (previous scanline)/255(next scanline) and only once between cycle 256/323. For the cycles 324/255 the PPU rendering must be turned on. Maybe it's a bit slow, but it works. I hope that I managed to explain properly but the English is not exactly my strong point.

by on (#92320)
I thought the correct timing was that every rise on A12 causes a tick unless there was another such rise within the past 16 dots.

by on (#92321)
edit: if you smell a topic split, please, do it. :)

It looks like an endless race for MMC3 IRQ. Disch's doc for map4 mentions only two cycles (or dots): 260 (sprite fetch) and 324 (BG fetch). I did a look into other source codes, but most of them have hacks, tons of hacks, so I gave up.

If the sprite is set to 8x16, the tile could be fetched from $0000 or $1000, depending of the bit 0. Here's my info:
  • The PPU address (of fetching data) is high at dot 260 if there's the default setup: BG at $0000 and sprites at $1000. If it's the reverse, it happens at dot 324.
  • It's possible to clock the PPU address (of scrolling) by reading or writting $2006 and $2007, after increasing or resetting it (loopy_t = loopy_v), but such clocks are quite rare for games.
  • FCEUX uses a setup for sprites 8x16 by clocking the IRQ only when background or sprites are enabled, and PPU $2000 AND $38 = $18 (sprites or BG enabled, 8x8 sprite mode).
Could someone help me with this thing, please?

by on (#92322)
There are two rules:
  • A12 rise within less than 16 dots of last A12 rise: no tick.
  • Any other A12 rise: tick.

Perhaps what you misunderstand is the pattern of rises. I'll try to explain it; if that doesn't help, I'll have to draw some waveforms in GIMP. Should I?

If $1000 sprites and $0000 BG, which is the canonical setup, then the first detected rise will be at x=260. There are eight rises, at 260, 268, ..., 316, but only the first ordinarily triggers a tick because they're 8 dots apart, which is less than 16.

If $0000 sprites and $1000 BG, then the first detected rise will be at x=324. There are 34 rises, at 324, 332, 4, 12, ..., 252, but only the first ordinarily triggers a tick because they're 8 or 13 dots apart, which is less than 16. (The 13 occurs at the end of a line between the second and third pairs of pattern fetches, when the PPU rests from 336 to 340.)

In either case, mixing 8x16 sprites from $0000 and $1000 will often cause multiple detected rises in one vblank because the pattern fetches from $1000 are, for example, 16 or 24 dots apart with intervening pattern fetches from $0000.

by on (#92323)
tepples wrote:
Perhaps what you misunderstand is the pattern of rises. I'll try to explain it; if that doesn't help, I'll have to draw some waveforms in GIMP. Should I?


Thanks, but it's not necessary for me, at least.

Well, there's one thing you didn't get either: I consider two addresses. One is what you use for $2006/$2007 _and_ setting up the scrolling for drawing. Another is what's being fetched from the pattern/name tables, and this one is being "A12". Yes, I check A12 when $2006/2007 is accessed.

Next, I don't understand dot 324, regarding the previous scanline. How's that possible? I mean, when ppu cycle is 324, what should be done?

by on (#92324)
tepples wrote:
There are two rules:
  • A12 rise within less than 16 dots of last A12 rise: no tick.
  • Any other A12 rise: tick.


I don't know how much it helps you Zepper, but this is basically what I did with my hardware MMC3 and I passed all of Blarg's tests. Maybe a hardware explanation will help.

I agree with Kevtris' hypothesis that the actual MMC3 is 'time filtering' by watching M2 clocks. When I get around to it I'll actually verify this with my logic analyzer.

For now I have a counter that gets reset to zero whenever CHR A12 goes high. Then at the end of each CPU clock cycle (M2 falling edge) I clock the counter if CHR A12 is low. If the cpu cycle counter gets above ~5 then allow the next rising edge of CHR A12 to clock the MMC3 scanline counter. This is only slightly different than what Tepples laid out. His method would ALWAYS increment the counter reguardless of CHR A12, where I assume CHR A12 is an asyncronous reset signal that would prevent incrementing the counter whenever CHR A12 is high.

5 CPU cycles is about 14-16 PPU cycles (dots) so you could alternatively count those and it should work. That's something that the actual MMC3 can't do though...

by on (#92334)
One thing: the scanline counter clocks only if background or sprites are enabled, correct? I read about A12 rising even if sprites are disabled.

Next, bit 12 (A12) of loopy_v is the top bit of the fine scroll setting. Is this the "visible" bit when background (and/or sprites?) is disabled?

by on (#92335)
Both background and sprites are fetched if either is enabled; thus PA12 will rise as usual.

While rendering is off, PA13-PA12 on the cart edge more or less matches loopy_v bits 13-12. But while rendering is on, the rendering circuit controls PA13-PA12, overriding loopy_v.

by on (#92339)
tepples wrote:
There are two rules:
  • A12 rise within less than 16 dots of last A12 rise: no tick.
  • Any other A12 rise: tick.
Perhaps what you misunderstand is the pattern of rises. I'll try to explain it; if that doesn't help, I'll have to draw some waveforms in GIMP. Should I?

Let me help you. In the standard configuration PPU A12 clicks 8 times during the flyback line. In this case, if the processor does access to the PPU, additional clicks of A12 may occur. Watch this video, IRQ (blue) assert at every 8 toggles PPU A12 (yellow). It clearly seen how the IRQ is shifted with respect to horizontal flyback. So, you need a digital filter that is often PPU A12 toggling were as one.
In our cartridge we made easier (I think it close to original):
Image
As a result, the counter strobe STB (on the schematic he inverted) is activated with the first toggle PPU A12 and returned only after 4 toggles of F2 signal of 6502. This is more than enough, though, of course, possible to increase this value up to 100 (remember that in the line of about 113 cycles of F2, was tested with the oscilloscope).
Image

I'm slowpoke. :3
infiniteneslives wrote:
tepples wrote:
There are two rules:
  • A12 rise within less than 16 dots of last A12 rise: no tick.
  • Any other A12 rise: tick.


I don't know how much it helps you Zepper, but this is basically what I did with my hardware MMC3 and I passed all of Blarg's tests. Maybe a hardware explanation will help.

I agree with Kevtris' hypothesis that the actual MMC3 is 'time filtering' by watching M2 clocks. When I get around to it I'll actually verify this with my logic analyzer.

For now I have a counter that gets reset to zero whenever CHR A12 goes high. Then at the end of each CPU clock cycle (M2 falling edge) I clock the counter if CHR A12 is low. If the cpu cycle counter gets above ~5 then allow the next rising edge of CHR A12 to clock the MMC3 scanline counter. This is only slightly different than what Tepples laid out. His method would ALWAYS increment the counter reguardless of CHR A12, where I assume CHR A12 is an asyncronous reset signal that would prevent incrementing the counter whenever CHR A12 is high.

5 CPU cycles is about 14-16 PPU cycles (dots) so you could alternatively count those and it should work. That's something that the actual MMC3 can't do though...

by on (#92342)
Quote:
There are two rules:
A12 rise within less than 16 dots of last A12 rise: no tick.
Any other A12 rise: tick.


Based on the last post, I do not think this is entirely correct, it's more like this: A12 has been low for more that 16 dots and now rises: tick; otherwise: no tick.

The way this seems to be done is by delaying the lowering of the strobe signal when A12 is low, as can be seen by the above pics.

Zepper, are you doing a pixel-by-pixel emulation or are you using timestamps? Timestamps are a devil with MMC3 IRQs!

by on (#92343)
I use pixel-by-pixel, real time. Well, I'll try again.

EDIT: There's something strange. According to Kevtris:
Quote:
For some reason, yet to be determined, if both bits 3 and 4 of PPU register 2000h are clear, the IRQ counter will not decrement, even if the PPU address is manually manipulated (with 2001h set to 00h to disable rendering) through 02006h. If either or both bits are set, the counter will decrement properly if the PPU address is manually manipulated.


blargg's test writes $00 to $2000 and toggles $2006, expecting the IRQ to be clocked... and it fails here in my emulator.

by on (#92419)
*bump*

Could someone clarify the Kevtris' quote?

by on (#92420)
I will assume that the filter signal PPU A12 can be a bit more complicated. For example, should pass a few pulses of PPU A12 in short time limit for the counter decrement. Thus, the software does not write fast enough and perfectly filtered and does not interfere with the overall functionality of the counter IRQ. Let me remind you that there are several revisions MMC3: A, B, C etc. And the differences between them is in IRQ counter (some one said this to me long time ago).

by on (#92422)
Zepper wrote:
blargg's test writes $00 to $2000 and toggles $2006, expecting the IRQ to be clocked... and it fails here in my emulator.


It sounds like Kevtris and Blargg don't completely agree here...

For what it's worth my hardware version passes blargg's test, and mine gets clocked from $2006 like blargg would suggest. I never checked to see if blargg's tests passed on a real MMC3 running in the NES but I'm assuming he did...

How are your irq's behaving in actual games Zepper? Do they work for games, but you're just unable to pass blargg's tests?

HardWareMan wrote:
I will assume that the filter signal PPU A12 can be a bit more complicated. For example, should pass a few pulses of PPU A12 in short time limit for the counter decrement. Thus, the software does not write fast enough and perfectly filtered and does not interfere with the overall functionality of the counter IRQ.


So are you thinking that blargg's tests don't pass on a real MMC3? Because assuming that they do, what you're suggesting doesn't make sense.

by on (#92426)
infiniteneslives wrote:
So are you thinking that blargg's tests don't pass on a real MMC3? Because assuming that they do, what you're suggesting doesn't make sense.

Logically, any toggling of PPU A12 should affect the counter. If only a software controlled PPU A12 source masking is not implemented.

by on (#92427)
I have read somewhere that Kevtris information is old and inaccurate. When in doubt, do what Blargg does ;)

by on (#92430)
HardWareMan wrote:
infiniteneslives wrote:
So are you thinking that blargg's tests don't pass on a real MMC3? Because assuming that they do, what you're suggesting doesn't make sense.


Logically, any toggling of PPU A12 should affect the counter. If only a software controlled PPU A12 source masking is not implemented.


The MMC3 can't be watching for writes to $2006. And Blargg's tests require that the IRQs is fired at the SAME time as the FIRST edge of CHR A12 (not M2), I know because I kept failing blargg's tests until I realized I wasn't firing the IRQ until the end of the CPU's clock cycle. So *ASSUMING* blargg's tests pass on a real MMC3 I can't see how some other filtering is possible.

Now for the something like the RAMBO-1 it could be filtering some other way since it's IRQ is delayed from the first edge of CHR A12. It might be checking for more than just that first positive edge of CHR A12.



Zepper: I'm guessing you aren't having the same exact issue that I did, but it's probably worth explaining the problem I had. You NEED to fire IRQ the instant CHR A12 goes high to pass Blargg's tests. If you wait until the CPU finishes it's current cycle to fire the IRQ or you'll fail his scanline timing tests. Basically the CPU must get the IRQ *BEFORE* it starts it's next cycle, that way the IRQ gets handled vice the next instruction.




crudelios wrote:
I have read somewhere that Kevtris information is old and inaccurate. When in doubt, do what Blargg does ;)


Yeah I don't want to discredit any of the great pioneering work Kevtris did early on (or is currently doing). I greatly appreciate how his documentation is one of the few that explains things on a hardware level. But I have found a few mix ups here and there. With most of this stuff it's best to check around before just relying on one source of documentation.

But I haven't performed these tests myself on the real MMC3 so until I get off my rear and do, take what I'm saying with a grain of salt... ;)

by on (#92455)
Discussion here and here. :)

EDIT: well, it's finally working after re-reading blargg's information. However, I get the scanline timing error, of "should occur sooner when $2000=$10". Is this related with cycle 324 of the previous scanline???
Code:
4)  BG and Sprites must use opposing pattern tables for CHR.  EG:
   a)  if 8x16 sprites, BG must use $0xxx, *ALL* sprites must use $1xxx
   b)  if 8x8 sprites, if BG is using $0xxx, sprites must use $1xxx
   c)  if 8x8 sprites, if BG is using $1xxx, sprites must use $0xxx   (slightly abnormal)

With settings 'a' and 'b', the IRQ will occur after dot 260.  With setting 'c', it will occur after dot 324 of the *previous* scanline.

How "previous"? Is this triggered when the IRQ counter is 1?

by on (#92522)
With the reversed set (BG at $1000 and SP at $0000), I can't get scanline_timing test to pass.

On scanline 1, at cycle 322 prints "should occur later"; at cycle 324 prints "should occur sooner". Setting at 323 gives "no IRQ occurred".

blargg never liked that I analyse the test source when emulating it, but always suggested "to adjust the things until it passes". Anyway, it's something like...
Code:
   cli
   nop
           <-- IRQ requested
   nop
           <-- IRQ triggered, which should be fired...
   inc irq_flag
           <-- here!
   delay 1000
   sei
   nop
   lda irq_flag
   cmp #$11
   beq @no_irq
   rts

by on (#92555)
Zepper wrote:
With the reversed set (BG at $1000 and SP at $0000), I can't get scanline_timing test to pass.

On scanline 1, at cycle 322 prints "should occur later"; at cycle 324 prints "should occur sooner". Setting at 323 gives "no IRQ occurred".


Well I don't know how much my hardware understanding will help your emulator version Zepper. But I'm having a hard time understanding what you were saying in your last post. Mostly what you meant by the cycle 322/323/324 differences.

Also are you comments in Blargg's code basically saying that your IRQ is getting fired one instruction too late?

by on (#92563)
PPU clock cycles 322, 323 and 324. Sorry about language barrier... :(

Anyway, here's a recap: when $2000:$18 is $08 (BG at $0000 and SP at $1000) the IRQ is working ok. For the other setting (BG at $1000 and SP at $0000), I'm adjusting the timing. Looks like the IRQ fires at PPU clock cycle 323 on scanline 1. My diagram shows that the IRQ is firing one instruction earlier when the test ROM fails.

I'm using the test "scanline_timing.nes".

by on (#92565)
Zepper wrote:
My diagram shows that the IRQ is firing one instruction earlier when the test ROM fails.


How many CPU cycles are you off by?

Is there a positive edge of CHR A12 on that instruction? If you're triggering early (by a few clock cycles not scanlines) then it sounds like you're triggering off the wrong thing. Unless you've got some time misalignment between your PPU and CPU.

I know I've said it before but... Your IRQ MUST come in at the exact same time as CHR A12 rises (in the middle of the CPU clock cycle etc). If you round off to the nearest CPU clock cycle you'll end up getting it late/early depending on how you round. Round out to next and you'll be late at times, round in to beginning of CPU cycle and you'll be early. When the CPU and PPU are aligned the IRQ should be found by the CPU on the next cycle, because it didn't get there early enough for the current cycle.

by on (#92818)
I also never managed to get the "abnormal" behavior working properly, nor did I ever get that test to pass. I've even whipped up simple models to test that my code is working the way it's described in all the docs I've been able to find on MMC3 IRQs. The only way I've managed to get the desired result (IRQ occurring on the scanline before it's supposed to at dot 324) is if I start the simulation at dot 256 (bypassing all the rendering clocks).

With the normal behavior I get this output from my simulation:

Image

With abnormal settings:

Image

With the hack, abnormal settings still causes an IRQ at scanline 19, 39 .. 219, 239. But the normal settings start firing IRQs at scanline 20, 40 .. 220. I don't see how the IRQ is supposed to fire on an earlier scanline without some trickery.

by on (#92821)
beannaich wrote:
I don't see how the IRQ is supposed to fire on an earlier scanline without some trickery.


Scan line early? norm and alt fire IRQ at the same time. When counter == 0 AND 'allowed' posedge of CHR A12.

The only thing that is different between the two is what to do if 0 is loaded into the reload register $C000. alt will ONLY fire once after being cleared by $C001. Norm will fire on EVERY 'allowed' posedge of CHR A12 assuming IRQs are enabled.

by on (#92825)
...unless the scanline does NOT start at cycle 0, but on 260.

EDIT: I was doing it all wrong! All MMC3 tests have passed now. Sweet at maximum. :)

by on (#92841)
Zepper wrote:
...unless the scanline does NOT start at cycle 0, but on 260.


260??? It's only a 8bit register 0-256 (upto 257 scanlines)

Congrats on getting it working Zepper!

by on (#92848)
infiniteneslives wrote:
Scan line early? norm and alt fire IRQ at the same time.


I should have clarified. By "normal" and "abnormal" I meant:

Normal: BG at $0000, SP at $1000
Abnormal: BG at $1000, SP at $0000

According to Disch's mapper docs, the abnormal settings cause the IRQ to fire at dot 324 of the scanline before when it's supposed to fire. Normal settings fire at dot 260.

by on (#92849)
Ahhhh DOT 260... :)

Well if you allow clocking of the counter properly (digitally filtering CHR A12) then you shouldn't have to concern yourself with background and sprite setup. The effect of it firing early will happen naturally just like it does in the hardware. I'm not sure how easy it is to do in an emulator but it's dead simple in hardware.

Another way to think about it is have a 'clock allow' flag that's controlled by the CPU every clock cycle.

On rising edge of CHR A12:
*If 'clock allow' flag is set decrement counter
*else counter unchanged

Set the 'allow clock' flag:
When CHR A12 has been low for ~5 *consecutive* CPU clock cycles (or 15 PPU cycles should work but that's not what the hardware senses) If CHR A12 goes high before 5 cycles you have to start counting again from 0 (not where you were interupted by CHR A12)

Clear the 'allow clock' flag:
Any time CHR A12 is high

In hardware you just have a counter that is clocked by negedges of M2. And the counter is reset asynchronously when CHR A12 is high.

by on (#92851)
beannaich wrote:
I should have clarified. By "normal" and "abnormal" I meant:

Normal: BG at $0000, SP at $1000
Abnormal: BG at $1000, SP at $0000

According to Disch's mapper docs, the abnormal settings cause the IRQ to fire at dot 324 of the scanline before when it's supposed to fire. Normal settings fire at dot 260.


Normal: clock at dot 260.
Abnormal: clock at dot 324. "Forget" the word previous and just do the things.

by on (#92852)
Zepper wrote:
Normal: clock at dot 260.
Abnormal: clock at dot 324. "Forget" the word previous and just do the things.


It's kind of hard to "forget" that when your IRQs are firing at the right dots, but still failing the tests, meaning they are firing at the wrong scanlines. :)

by on (#92859)
beannaich wrote:
meaning they are firing at the wrong scanlines. :)


False, Blargg's tests will fail if the IRQ is fired ONE CPU cycle off from where it belongs. So you can have right scanline but wrong split second (CPU cycle). It sounds like your IRQ may only be firing at the wrong time in relation to M2 which I discussed above a few posts back. This is easy todo if you handle your enabling/disabling improperly.

by on (#92888)
infiniteneslives wrote:
False, Blargg's tests will fail if the IRQ is fired ONE CPU cycle off from where it belongs. So you can have right scanline but wrong split second (CPU cycle). It sounds like your IRQ may only be firing at the wrong time in relation to M2 which I discussed above a few posts back. This is easy todo if you handle your enabling/disabling improperly.


This turned out to be true, but the problem wasn't in my MMC3 emulation. Instead it was a problem with the PPU losing sync at just the right moment, causing the counter to not be clocked, and making the test fail.

So, for future readers of this topic who may be having troubles with that test failing, make sure your PPU is always synced AFTER executing instructions and BEFORE you check for IRQs.

by on (#92891)
I'd say BEFORE an instruction.

by on (#92892)
beannaich wrote:
make sure your PPU is always synced AFTER executing instructions and BEFORE you check for IRQs.


Yeah I think that's basically what I was trying to explain when I said this. I just didn't understand how to put it in terms of emulation :)

infiniteneslives wrote:
Unless you've got some time misalignment between your PPU and CPU.

I know I've said it before but... Your IRQ MUST come in at the exact same time as CHR A12 rises (in the middle of the CPU clock cycle etc). If you round off to the nearest CPU clock cycle you'll end up getting it late/early depending on how you round. Round out to next and you'll be late at times, round in to beginning of CPU cycle and you'll be early. When the CPU and PPU are aligned the IRQ should be found by the CPU on the next cycle, because it didn't get there early enough for the current cycle.

by on (#92900)
basically... we have an abstraction...

Code:
ppu_run -> apu/irq_clock -> cpu_opcode/next_byte -> ppu_run ->...
... -> poll_nmi/irq -> ppu_run -> ...


infiniteneslives wrote:
False, Blargg's tests will fail if the IRQ is fired ONE CPU cycle off from where it belongs. So you can have right scanline but wrong split second (CPU cycle).


Right. :)

by on (#93046)
OK, I rewrote the IRQ code here, but something's annoying... regarding the previous scanline quote from Disch' MMC3 document I believe.

- if register $2000 and $18 = $10 (BG at PPU $1000 and SP at PPU $0000), the IRQ is expected to be clocked at ppu cycle 4 AND 324 on scanline zero (20 lines after the VBlank). That's odd, because it's like the counter had been clocked twice. I can't get a "Passed" if I don't setup the IRQ to trigger this way. The test considers cycle 324 of the scanline zero as if the IRQ had been triggered on scanline 1. o.O

Am I doing something wrong???

by on (#93052)
So I'm not sure if I'm quite following you Zepper. I wasn't sure which quote extactly you were referring to. But I think I might see where your problem is.

So with BG at $1000 and sprites at $0000 the MMC3 is clocked when the background is being fetched. But the PPU fetches the background TWICE per line. However it's normally only clocked once per line, the exception is the first line where it gets clocked twice.

More detailed explaination---

Disch's docs wrote:
there are 42 opportunities for A12 to rise. These
opportunities occur on the following dots:

4, 12, 20, ..., 244, 252 (32 BG tiles)
260, 268, 276, 284, 292, 300, 308, 316 (8 Spr tiles)
324, 332 (2 BG tiles)

(You might be able to see now how I came up with those 260, 324 numbers I threw at you earlier)

MMC3 seems to ignore rises that are too close together. This is why the 8 sprite fetches will only clock
the counter once. Exactly how far apart the rising edges have to be is unknown, but it is somewhere between
14 and 16 dots. So any two consecutive opportunities are too close together (including the most distant
332->4), but any two non-consecutive opportunities will both be acknowledged.


I'm no PPU expert or anthing but this is how I understand it, and I believe it answers your question. So the basic sequence for each scanline is fetch 32 BG tiles, 8 sprite tiles, 2 BG tiles, go to next scanline. In this setting the scanline counter gets clocked every time we start fetching BG tiles. The key here is that the time between fetching the last 2 BG tiles of the first scanline and fetching the 32 BG tiles for the second scanline only count as one collective 'clock' of the scanline counter because they are too close together to be seen as two 'clocks' of the scanline counter. So in reality the first scanline clocks the counter TWICE (once at begining with the 32 BG tiles and once at end for the 2 BG tiles) and each subseqent scanline only clocks the counter ONCE (at the end for the 2 BG tile fetch).

That's why Disch says, "the IRQ will occur after dot 324 of the *previous* scanline." because the first scanline clocked the counter twice so the counter could be considered as 'off by one' scanline. EDIT: in hindsight I think this might have been the "previous scanline" quote you were referring too. ;)