What's going on with the MMC3 counter?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic

What's going on with the MMC3 counter?
by Drag on 2011-04-07 (#76351)

All of the PPU rendering information I've been looking over lately has got me interested in scanline counting.

However, the currently-available information regarding the MMC3 IRQ counter seems to conflict with how the PPU works.

First and foremost, Brad Taylor's doc states that the following fetches occur:
2 Nametable fetches (2xxx)
2 tile fetches (two parts of one tile) (0xxx or 1xxx)
-- 32 times

2 Discarded Nametable fetches (2xxx)
2 Tile fetches (two parts of one tile) (1xxx or 0xxx)
-- 8 times

2 Nametable fetches (2xxx)
2 tile fetches (two parts of one tile) (0xxx or 1xxx)
-- 2 times

2 Nametable fetches (2xxx)

Supposedly, the MMC3 watches A12 (not A13 like Taylor's doc says), but this is confusing, because A12 determines the difference between 0xxx-1xxx, and 2xxx-3xxx.
Given the above data about fetches, A12 will rise 8 times per scanline if sprites use CHR $1000, 34 times if BG uses CHR $1000, and 42 times if both use CHR $1000.

However, each rising edge of A12 supposedly decrements the MMC3's scanline counter, so something isn't right here.

Where am I making the mistake?

by thefox on 2011-04-07 (#76352)

It ignores rises which occur too close together (i.e. those last 7 ones).

by Drag on 2011-04-07 (#76353)

thefox wrote:
It ignores writes which occur too close together (i.e. those last 7 ones).

If that's the case, then why does the scanline counter decrement twice every scanline if both sprites and background use CHR $1000? There's no pause between the different blocks of memory fetches, so theoretically, this would stop the scanline counter, just like if both BG and sprites used CHR $0000

by thefox on 2011-04-07 (#76355)

Drag wrote:
thefox wrote:
It ignores writes which occur too close together (i.e. those last 7 ones).

If that's the case, then why does the scanline counter decrement twice every scanline if both sprites and background use CHR $1000? There's no pause between the different blocks of memory fetches, so theoretically, this would stop the scanline counter, just like if both BG and sprites used CHR $0000

Maybe it uses a different method of ignoring the successive writes... counter maybe. I don't think anybody really knows.

by Drag on 2011-04-08 (#76366)

I dunno, I'm looking over this backwards and forewards, and it seems the common way to explain how the counter works is that it just monitors A12, and uses some kind of filter to eliminate the quick toggling of A12 when it goes between fetching a tile bitmap at $1xxx and fetching a nametable byte at $2xxx over and over (which toggles A12 quickly).

However, the explanation always breaks down when you have to consider the fact that the counter clocks twice per scanline when both sprites and bg use tiles at $1xxx. This is the part where I get stuck when I try to wrap my head around it; there's no gap between when the tiles finish rendering and the sprite fetches begin; A12 would just keep toggling the same way the whole way through, which would mean the scanline counter would only ever see one rise, and effectively be halted, like what happens when the BG and Sprite tiles are both fetched from $0xxx.

Either there's some kind of undocumented thing going on with the PPU fetches (like, maybe there is a gap), or there's more to the counter than just watching A12.

by qbradq on 2011-04-08 (#76370)

Drag, are you testing this in an emulator or on hardware? If on hardware are you using the PowerPak or a donor cart? If a donor cart, which one?

I have been studying the MMC3 a lot recently (because I am trying to implement in in a CPLD, wish me luck ). Here are some of the finer points that have an impact on your observations:

1. Games using MMC3 are only supposed to use $0XXX for background and $1XXX for sprites. As far as I can tell this is the only configuration supported by the official MMC3.
2. MMC3 boards use a 220pF capacitor to ground to provide a low-pass filter for the A12 line. This means that when the PPU toggles A12 rappidly the MMC3 only sees a single rising edge.
3. The MMC3 implementation on the PowerPak is known to have issues with it's implementation of the IRQ counter. This is likely due to the lack of an external low-pass filter for A12.
4. Emulators seem to be based on Blargg's behavioral description of the MMC3, which states that if BG is used at $1XXX and sprites at $0XXX that the IRQ counter is clocked during the scanline, and if the other way around it is clocked at the pre-render scanline (during sprite fetch).

I think the above should clear up your confusion about MMC3 opperation. As for using $1xxx as BG and sprites, this is 100% not supported by the MMC3 hardware. Chances are if you run this in an emulator (see point 4) the emulator will think both cases are true and clock the IRQ counter twice, once during the scanline and once again during the pre-scanline sprite fetch.

by tepples on 2011-04-08 (#76372)

Is it really a low-pass filter on PA12 with a period longer than one fetch, or is it just to deglitch the signal? I ask because PA12 is one of the inputs to the bankswitching function (the others being PA11 and PA10). So wouldn't a low-pass filter on PA12 also affect bankswitching? At this point, I'd propose a decap and photomicrograph of the MMC3 IC so that we can see exactly what's going on inside that chip.

by qbradq on 2011-04-08 (#76374)

Good point. Now that I look at the math on that the break frequency of that low-pass is in the hundreds of Mhz. That would be fine if the MMC3 was used as intended (with BG at $0xxx and Sprites at $1xxx), but it would not explain Blargg's observations. Perhaps the clock signal to the counter is significantly slower than the PPU clock.

by kevtris on 2011-04-08 (#76379)

qbradq wrote:
Good point. Now that I look at the math on that the break frequency of that low-pass is in the hundreds of Mhz. That would be fine if the MMC3 was used as intended (with BG at $0xxx and Sprites at $1xxx), but it would not explain Blargg's observations. Perhaps the clock signal to the counter is significantly slower than the PPU clock.

Yeah that capacitor seems to just be for deglitching. If it truly did filter A12, then nothing would work right, since the signal was slowed down too much.

On my FPGA NES, I have a counter that acts as the low pass filter. It ignores multiple pulses of A12 using the counter. It's possible that the counter has a period of 1/2 scanline on a real MMC3.

This way, if A12 is held high most of the time, it could fire twice?

There is another thing too that might've been forgotten about.

When the PPU is fetching nametables, A12 is low. So it's alternating between fetching nametable/attrib table data and character data so A12 will keep toggling in this case.

In my FPGA case, I got 100% compatibility (at least all the mmc3 games in goodnes and all the derivatives that use it such as multicarts and things) by filtering using the counter approach.

It might be useful to hook an MMC3 up "bare" and watch what happens when A12 is toggled at different rates. BTW I suspect it'd use M2 clocks for the counter if I had to guess.

If you get something that seems to work, I suggest the following games for testing:

* SMB3 (pretty bog-standard, but sometimes the status bar will jump a scanline or two)

* Megaman 4 (apparently they write to the registers "out of order" and older emulators failed in a few places.

* Klax (japanese version only! US version is RAMBO-1) both games heavily abuse the IRQ counter, making it fire every 4 or 8 scanlines. This is done to get more attribute entries effectively for the blocks at the bottom of the screen.

* one of the SMB3 total conversions/hacks. I can't quite remember the name of it, but this game SHOULD fail on a proper MMC3 emulation, because they didn't select the proper bank for the tiles or sprites. This causes interrupts to occur way too early from what I recall. Not sure now what the name of it was.

by Dwedit on 2011-04-08 (#76381)

It's Mario Adventure by DahrkDaiz.

by Drag on 2011-04-08 (#76388)

Also, what happens when you try to access $1000-1FFF in the PPU RAM, like if you wanted to write to CHR-RAM, or access data you put in the CHR-ROM? Technically, this would cause A12 to rise, so does this affect the scanline counter in reality?

by Dwedit on 2011-04-08 (#76389)

Blargg's demos for testing the MMC3 used this method of setting A12 almost exclusively.

by Drag on 2011-04-12 (#76574)

Ok, if the scanline counter reacts to just simple $2006 writes, I have an idea.

Leave PPU rendering off, and just manually manipulate the PPU A lines through writes to $2006, to simulate what the PPU does during a scanline.

First, set the IRQ counter to 02, and then do the following:
$2006 <- 2000 ;Tile 1 = 00
$2006 <- 23C0
$2006 <- 0000
$2006 <- 0008

$2006 <- 2001 ;Tile 2 = 01
$2006 <- 23C0
$2006 <- 0010
$2006 <- 0018

$2006 <- 2002 ;Tile 3 = 00
$2006 <- 23C1
$2006 <- 0000
$2006 <- 0008

$2006 <- 2003 ;Tile 4 = 01
$2006 <- 23C1
$2006 <- 0010
$2006 <- 0018

...and so on, until you "fetch" 32 tiles. Then "fetch" 8 sprite tiles:

$2006 <- 2000
$2006 <- 23C0
$2006 <- 1000 ;Fetch tile 00 for "sprite 0"
$2006 <- 1008

$2006 <- 2000
$2006 <- 23C0
$2006 <- 1010 ;Fetch tile 01 for "sprite 1"
$2006 <- 1018

etc...

Technically, the first two fetches for the sprites are "garbage nametable fetches" which Brad Taylor's doc says are somewhat unpredictable, but seem to relate to the first tile to be rendered for the next scanline. Thus, I just naively set it to the first tile again to accomodate that. I've left out the rest of the scanline, because this alone should be enough to give us some information.

When you get to the simulated sprite fetches, A12 will rise 8 times. If the scanline IRQ doesn't fire sometime during the simulated sprite fetches, then we'll know that something more is going on than just simply watching A12.

If the scanline IRQ does fire in the middle of the simulated sprite fetches, then this test is inconclusive.

If necessary, I could probably come up with a test rom, but someone will need to test this with an actual MMC3, because I don't have the means to do it myself.

by tepples on 2011-04-13 (#76589)

I seem to remember this having been done, but the IRQ fired multiple times during the scanline because the PPU controlled by the CPU is so much slower than the PPU controlled by the rendering circuit.

by Drag on 2011-04-13 (#76620)

tepples wrote:
I seem to remember this having been done, but the IRQ fired multiple times during the scanline because the PPU controlled by the CPU is so much slower than the PPU controlled by the rendering circuit.

I considered that. However, there's a number of reasons why this specific test might fail, which is why nothing can be concluded if it does. In addition to the CPU's driving being much slower than the PPU's driving, it could also fail because this test doesn't actually read or write anything (so two pins aren't being driven). It could also fail because the garbage nametable addresses I put in are incorrect.

If it fails from the timing, then there may be an internal countdown counter like kevtris uses, or the path from A12 to the scanline counter may have a capacitor on it to even out the toggling (though that doesn't explain the double-clocks when BG and Sprites both use pattern table 1), or maybe M2 is used somehow (though I doubt it, since that's the CPU clock and not the PPU clock, how would that be useful?).

If it fails from incorrect driving (for example, the garbage nametable fetches during sprites), then we may need to perform the PPU decapping you proposed in another thread before we can figure out what's going on, or the Brad Taylor document (which I'm using as reference) may have outdated info. Or, the problem could be as simple as the fact that this test doesn't make any actual reads, it just drives the A lines.

If the test passes though, then we know it must be the specific sequence of addresses put on the A-line, and that'd be one giant mystery solved.

by tepples on 2011-04-13 (#76664)

It is claimed that the scanline counter goes twice as fast when using the MMC3 with the pattern tables backward. I have a hypothesis about what causes this.

The fetches for the first two tiles are at x=320-327 and 328-335. Then the PPU waits 5 cycles[1] before starting the next tiles at x=0-7, 8-15, ... The MMC3 sees A12 go high at 324, low at 328, high at 332, low at 336, and then not high again until 345 (x=4 of the next line). It's low for nine dots, which are three whole M2 cycles (on NTSC), compared to 1.333 cycles for the rises between fetches.

So it counts at x=324, then it counts again at x=4.

I have a way to test this hypothesis: put both sprites and background at $1000. If my hypothesis is correct, the scanline counter should run at normal speed, clocking only at x=4.

Once this is tested, I can think of ways to generate several different patterns of A12 on a line using 8x16 sprites. The way I understand sprite evaluation, where secondary OAM is cleared to $FF, the PPU will fetch the tiles for a sprite whose tile number is $FF ($0FF0-$0FFF, $1FF0-$1FFF, or $1FE0-$1FFF depending on PPUCTRL settings).

[1] One cycle less between the pre-render scanline and first picture scanline of every other field on NTSC.

by Drag on 2011-04-13 (#76667)

tepples wrote:
It is claimed that the scanline counter goes twice as fast when using the MMC3 with the pattern tables backward. I have a hypothesis about what causes this.

Was this mentioned on the board somewhere? I was going off of Kevtris's MMC3 doc on his website, since that's the only place the double-clock quirk was mentioned, and it was mentioned for BG and Sprites both using page 1. If this was incorrect, and the double-clock is actually when BG uses page 1, and sprites use page 0, then that makes a lot more sense, meaning I was probably scratching my head over nothing. :S

Either way, I agree with your hypothesis. I'm just frusterated that I focused so much on figuring out why an extra clock would occur between BG and Sprite fetches to realize that there were 5 PPU cycles at the end of each scanline (except the prefetch every other frame) where A12 is invariably low. >_<

by tepples on 2011-04-13 (#76668)

If someone has an MMC3 devcart, I'll volunteer to make a test program. I can only test it on FCEUX, Nintendulator, Nestopia, and PowerPak.

by Drag on 2011-04-13 (#76674)

Ok, I just got some updated information from a certian Mr. Disch.

So, I made this thread originally because I was confused over a possible quirk with the MMC3 where the scanline counter clocks twice if both BG and sprites use tiles in the 1xxx range. I read this on Kevtris's site, because it's the only place I knew it was mentioned.

However, this was disproved by Blargg some time ago, and nobody cared to point this out, so this entire thread was a complete waste of time for all involved.

The scanline counter only ever clocks once in a scanline, and that's ONLY when either sprites use page 1 with bg on page 0, or the other way around. It does watch A12, and it does ignore rapid toggling of A12, so some kind of filtering is going on. The A12 rises have to be far enough apart to be recognized, and although we don't know how far, it's estimated to be something like 14-16 pixels apart.

So unfortunately Tepples, this would debunk your hypothesis, because A12 is only low for the last 5 pixels of the scanline, which isn't long enough for the MMC3 to stop ignoring A12 toggles. In fact, Armadillo uses this backwards configuration, so this must be true, and must prove that the scanline counter counts the scanlines correctly.

Anyway, I'll stop now because this information is plainly available in Disch's mapper docs, and in the future, I'm going to recommend we update the information on the wiki using these docs.

by tepples on 2011-04-13 (#76678)

In that case, it'd still warrant a test case to see exactly how far apart the clocks must be to trigger a reclock. I think I know how to generate the 8x16-mode OAM patterns need to trigger rises spaced at 8, 16, 24, or 32 dots apart. Then we could nail it down to "somewhere between 8 and 16" or "somewhere between 16 and 24", and anyone with a devcart could verify it.

It might be using M2 to make sure at least four CPU cycles have elapsed between rises. It might even vary on PAL vs. NTSC due to different dot:M2 ratios (3:1 on NTSC, 3.2:1 on PAL).

Yes, I have been transcribing Disch's docs occasionally.

by Drag on 2011-04-13 (#76679)

Ok, that's a good plan, I'm glad something will come out of all of this, because I thought I wasted everyone's time with this thread. If I sounded frustrated in my post, that's why. :S

by thefox on 2011-04-14 (#76687)

tepples wrote:
In that case, it'd still warrant a test case to see exactly how far apart the clocks must be to trigger a reclock. I think I know how to generate the 8x16-mode OAM patterns need to trigger rises spaced at 8, 16, 24, or 32 dots apart. Then we could nail it down to "somewhere between 8 and 16" or "somewhere between 16 and 24", and anyone with a devcart could verify it.

If somebody comes up with a test suite that runs in RAM, and nobody else volunteers , I can test this on my NES by hotswapping (PAL only, though).

by tepples on 2011-04-14 (#76696)

What's the best practice for code that runs on a hotswap? How do I get notified that the cart has been inserted, and how do I get notified of vblank?

Which cart do you plan to swap to? Is it one of the five CHR RAM carts (Mega Man 4, Mega Man 6, Ninja Crusaders, Pinbot, High Speed)? If not, I'll need to know which cart's CHR ROM will be connected to the PPU so that I can know which character encoding I'm dealing with.

by thefox on 2011-04-14 (#76702)

tepples wrote:
What's the best practice for code that runs on a hotswap? How do I get notified that the cart has been inserted, and how do I get notified of vblank?

I've never actually ran any code like that before (have to disable the CIC first), but what I'm planning to do is load blargg's bootloader from PowerPak, then hotswap an MMC3 cart in and transfer the test suite to NES over the controller port. I guess you'd have to poll for vblank, unless there happens to be an NMI vector in the game that points to RAM/WRAM.

Since I'm transfering the test suite after inserting the cart, it shouldn't be absolutely required to detect when the cart is inserted, however this could still be useful to detect if there are problems with the connector.

I guess I have to run some tests of my own first.

Quote:
Which cart do you plan to swap to? Is it one of the five CHR RAM carts (Mega Man 4, Mega Man 6, Ninja Crusaders, Pinbot, High Speed)? If not, I'll need to know which cart's CHR ROM will be connected to the PPU so that I can know which character encoding I'm dealing with.

Yeah, I forgot about that, altho audio/background color etc could be used for signaling success/fail as well. The cart is SMB3, so CHR ROM. It should also be possible to read the results from RAM over the controller port cable, but I'll have to look in to that first.

EDIT: Hmm, IRQ detection would be problematic as well, but there are usually ways to work around it.

by Drag on 2011-04-14 (#76707)

tepples wrote:
What's the best practice for code that runs on a hotswap? How do I get notified that the cart has been inserted, and how do I get notified of vblank?

If you can control the software on one cartridge, here's a way to do it:

Let's say you have Cart A and Cart B.

In Cart A, disable all interrupts (the DMC IRQ, the vblank NMI, everything), then copy some code to the internal ram ($0000-$07FF), and then jump to the code in RAM.

The code to copy to RAM:
Run a loop that spins on $2002, waiting for the vblank flag to be set. Then, read the joypad. Check for a certain button combo (like, pressing start, or start/select). If the combo isn't being pressed, branch back to the vblank wait.

The idea is for you to press buttons on the controller when you've inserted Cart B. While the NES is running the code you put in RAM, it'll be completely safe to remove Cart A from the cartridge slot. Just keep in mind that any graphics you put on the screen will disappear while there's nothing in the slot. However, they'll reappear once you insert Cart B, so maybe you can copy the entire BG pattern table to the nametable as an indication to whether the cart's inserted properly or not?

There's a small problem though, if you use a commercial game, you'll be forced to use its IRQ vector and routine. There's no way to change this without modifying the rom.

by thefox on 2011-04-14 (#76708)

Drag wrote:
There's no way to change this without modifying the rom.

Unless you have Game Genie (which I don't). =)

by Drag on 2011-04-14 (#76710)

thefox wrote:
Drag wrote:
There's no way to change this without modifying the rom.

Unless you have Game Genie (which I don't). =)

That's risky though, but I guess if you can perform a SEI before the DMC IRQ occurs, then it wouldn't matter. Also, you'd need to be able to yank the cartridge out of the genie without removing the genie from the connector. Otherwise, I think whatever codes you put in will be erased.

by thefox on 2011-04-14 (#76712)

Drag wrote:
That's risky though, but I guess if you can perform a SEI before the DMC IRQ occurs, then it wouldn't matter. Also, you'd need to be able to yank the cartridge out of the genie without removing the genie from the connector. Otherwise, I think whatever codes you put in will be erased.

True. EDIT: I guess it should be possible to reprogram the Game Genie from the code in RAM by writing to some magic registers?

If somebody is willing to write some tests and wants me to run them, please .org the code at $200. You can use $0-$2F and $1F0-$7FF for RAM (of course be careful not to overwrite any of your code ). The stack pointer is $FD at entry. Preferably there should be an RTS at the end of the code (this goes back to the bootloader). This would be helpful so that I can run the code using NRPC, which makes it easy for me to run several tests and read back the RAM in between them.

by Drag on 2011-04-14 (#76715)

thefox wrote:
I guess it should be possible to reprogram the Game Genie from the code in RAM by writing to some magic registers?

I don't think the genie has any magic registers; it has to retain 100% compatibility with all games. As such, every single write has to pass through to the cartridge. Only the genie itself would be able to modify its internal state.

by thefox on 2011-04-14 (#76716)

Drag wrote:
thefox wrote:
I guess it should be possible to reprogram the Game Genie from the code in RAM by writing to some magic registers?

I don't think the genie has any magic registers; it has to retain 100% compatibility with all games. As such, every single write has to pass through to the cartridge. Only the genie itself would be able to modify its internal state.

It has to have some so that the code entry screen which is run at boot time can pass the codes to the GG hardware. I couldn't find any technical information about that however.

by tepples on 2011-04-14 (#76727)

thefox wrote:
It has to have some so that the code entry screen which is run at boot time can pass the codes to the GG hardware. I couldn't find any technical information about that however.

As I understand it, the last write to the register turns off the registers' write enable.

If it's SMB3, with plenty of RAM, then ORG at $6000 would work as well, wouldn't it? I wonder to what extent its NMI and IRQ handlers can be hijacked. Does anyone have a list of games whose NMI and IRQ entry points are in RAM?

by Drag on 2011-04-14 (#76732)

tepples wrote:
thefox wrote:
It has to have some so that the code entry screen which is run at boot time can pass the codes to the GG hardware. I couldn't find any technical information about that however.

As I understand it, the last write to the register turns off the registers' write enable.

If it's SMB3, with plenty of RAM, then ORG at $6000 would work as well, wouldn't it? I wonder to what extent its NMI and IRQ handlers can be hijacked. Does anyone have a list of games whose NMI and IRQ entry points are in RAM?

That's right.

Also, be careful, $6000 is on the cart, not the NES, so it won't be preserved when you remove the cart and stick a new one in.

by tepples on 2011-04-14 (#76734)

I thought the procedure was going to be 1. load bootloader, 2. swap, 3. load test code into SMB3 cart, 4. run. Or would that not work?

by thefox on 2011-04-14 (#76735)

tepples wrote:
If it's SMB3, with plenty of RAM, then ORG at $6000 would work as well, wouldn't it?

Yeah but like I said, having the entry point at $200 (and maximum code size $600 bytes) would make things a lot easier for me.

tepples wrote:
thefox wrote:
It has to have some so that the code entry screen which is run at boot time can pass the codes to the GG hardware. I couldn't find any technical information about that however.

As I understand it, the last write to the register turns off the registers' write enable.

Is there any info about this somewhere? What is its power on state? It has to somehow have the registers and the GG firmware ROM enabled on init. If I was unclear, I thought about first putting PowerPak in without GG, then SMB3 with GG. But this is just speculation as I don't have a GG to test with.

by Drag on 2012-06-04 (#95011)

Bump!

Anyway, if it hasn't been done yet, I have a way we can test to see approximately what the latency of the scanline counter's A12 monitoring is.

Enable 8x16 sprite mode
Put 8 sprites (OAM0-OAM7) on the same y-coordinate on the screen, disable the rest (move to Y=$FF)
Set sprites 0 and 7 to tile $00, set the rest to tile $FF
Repeat, but with 1 and 7 = $00
Repeat, but with 2 and 7 = $00
Keep repeating until 6 and 7 = $00

Check how this affects the scanline counter, I've predicted a couple of possibilities:
The scanline counter will clock twice when the tile $00 sprites are too far apart. (This would mean that the MMC3 only filters out a small amount of pixels, enough to ignore the two dummy nametable reads that occur before each sprite tile fetch, but not enough to cause the MMC3 to ignore a single read to $1xxx)
The scanline counter won't clock at all, unless sprites 6 and 7 are the $00 sprites. (This would mean the MMC3 is filtering more than a couple of pixels, enough that a single $1xxx read isn't enough to clock the counter, but two $1xxx reads (with a $2xxx read between them) is)
The scanline counter just won't clock at all, period. (This would mean that two $1xxx reads (with a $2xxx read between them) isn't enough to clock the counter)

If the first option (scanline counter clocks twice) happens, then we can figure out how close the sprites need to be in order for the scanline counter to clock correctly. There are 4 pixels of A12=0 between two sprites, and for each tile=$FF sprite between the two tile=$00 sprites, add 8 pixels of A12=0. This will help us figure it out within a granularity of 8 pixels (or 8/3=2.667 CPU cycles).

If the third option happens (scanline counter just doesn't clock at all), then repeat the test, but this time, set all 8 sprites to tile $FF, and then one-by-one, set sprite 7 to tile $00, 6 to tile $00, and so on until the scanline counter starts clocking normally.

This is all assuming that the filtering is done with a capacitor, meaning that in addition for A12 to need to be low for a sufficient amount of time before a rising edge of A12 can be detected, A12 would also need to stay high for a sufficient amount of time, for the rising edge to be detected. It's possible that the filtering is somehow being done a different way, eliminating the need for A12 to stay high, but requiring A12 to stay low between rising edges.

by infiniteneslives on 2012-06-04 (#95012)

I'm interested to find out the results of your test. My guess is it's only filtering a few pixels. I've been meaning to fully test this as well but been too lazy.

Drag wrote:
This is all assuming that the filtering is done with a capacitor, meaning that in addition for A12 to need to be low for a sufficient amount of time before a rising edge of A12 can be detected, A12 would also need to stay high for a sufficient amount of time, for the rising edge to be detected.

I can tell you that the filtering is most definitely not all done by the capacitor on the PCB and most likely no internal capacitor in the MMC3.

First off ithe on board capacitor is too small to filter the frequency of CHR A12 which was already pointed out in this thread. I've actually kind of realized why that capacitor is there. Turns out the CHR A12 signal is a bit noisy which isn't surprising. Using a noisy signal for clocking functions causes unpredictable problems. I noticed this when running SMB3 on the NESDEV1. Usually it was good, but if you did something that produced a fair amount of sound in the game the menu bar would jitter a bit. This was most noticeable at the end of the first level where you can use the turtle shell to bust up a bunch of bricks. I added a small capacitor to the PCB and it was clean as a whistle, I suspect you could see the same result on a few actual games if you removed the capacitor .

Secondly there is NO delay between the FIRST posedge of CHR A12 and the generation of the IRQ signal. If a capacitor was performing the fitering you'd expect the edge from CHR A12 to be delayed by the capacitor by atleast a few pixels. But it comes IMMEDIATELY, which implies that the first edge is sensed and the subsequent edges are the ones getting filtered. Not very easy to get that sort of filtering with a mere capacitor. That's why I agree with Kevtris' take that it's probably filtered based on M2 cycles.

One thing that would be interesting to see is if you just removed M2 from the MMC3. If that signal is only used for sensing $6000-$7FFF then the MMC3 scanline counter would work like normal and there is some kind of strange time filtering being done by some other means. If M2 is used for BOTH WRAM addressing and CHR A12 filtering and NOTHING else, then the MMC3 would most likely be stuck in 'allow chr A12 clockings' mode and EVERY positive edge of CHR A12 would clock the counter. The MMC1 behaves similarly by filtering consecutive writes from RMW operations. If you remove the M2 signal from the MMC1 it will work aside from WRAM and EVERY write gets written to the shift register so RMW operations result in TWO writes. The shift register basically is stuck in 'allow writes' state. Point being the MMC1 filters writes based on PRG R/W durring the PREVIOUS CPU cycle sensed by M2, makes sense they'd do things in a similar manner for MMC3 to me atleast.

by Drag on 2012-06-04 (#95017)

Ah, interesting. That would rule out all but the first of my predicted outcomes, so that saves a lot of headache.

I'd just like there to be a method we can use that allows a granularity better than 8 pixels, but playing with the sprites is the best I can come up with for now; the CPU won't be much help because each register write takes 12 pixels. :\

Either way, that's not a terrible estimate, but it'd likely require direct pin manipulation to get something more precise.

It was mentioned earlier in this thread, but using a counter to temporarily disable edge detection might be what the MMC3 is actually doing, and if that counter is clocked by M2, then it'd decrement every 3 pixels, so taking the result from the test and rounding the pixel amount to the nearest 3rd pixel might be a way to add some precision.

by infiniteneslives on 2012-06-04 (#95024)

Drag wrote:
Either way, that's not a terrible estimate, but it'd likely require direct pin manipulation to get something more precise.

Yeah that's what I did with the MMC1. I'll do it with the MMC3 as well. It's just monotonous and time consuming, so I've been delaying sitting down and working on it. I should do it in the next month or two and I'll post the results here.

Your test would still be good verification though. So don't keep yourself from performing your test just because of what I've said.

For what it's worth when I was replicating the MMC3 I tried a few values for how many CPU cycles to filter out posedges of CHR A12. I looked back through my notes and looks like I tried 3-10 CPU cycles before adding my CHR A12 filter cap. I also changed when the counter was reset.

My conclusions:
*The counter should reset on the negative edge of CHR A12. Effectively it looks like the counter has a asynchronous active high reset driven by CHR A12. I tried making it a sychronous reset where the counter was reset if CHR A12 was high on each rising edge of M2 but it always messed up on SMB3, bucky ohare, and klax.

*I tried varying values of the counter output to allow a nearby CHR A12 posedge to clock the counter. I went from 3-10 the extremes were definetly broken with the sweet spot being around 4-6 cycles. 5 was the best choice but still goofed up from time to time on SMB3 and Klax (J) about a half dozen times for one level worth of play. Basically the bottom portion of the screen would jump up a scanline or two.

*Adding the CHR A12 capacitor was key to clearing up all glitches with the counter reset asynchronously with CHR A12 and filtering out nearby CHR A12 rises until my counter clocked in 5 CPU cycles. Which equates to 14-16 pixels.

After adding the capacitor I'm not sure if I went back and played around with how many cycles to filter out (there's nothing in my notes). It's possible that more or less would have worked just fine meaning that my major problem was noise when I found 5 to be the sweet spot.

by cpow on 2012-06-04 (#95026)

With all this talk about PPU and CPU cycles there seems to be the implicit conclusion that 1 CPU cycle is 3 PPU cycles. This would imply NTSC. Now I wonder...are there any PAL games that use MMC3?

Is it just that people are relating the PPU and CPU cycles using the NTSC codification because it's convenient? Is there a reason why CPU cycles need to be discussed at all that I'm missing? M2 would cycle differently relative to CHR A12 in a PAL MMC3 game would it not?

by Drag on 2012-06-04 (#95033)

cpow wrote:
With all this talk about PPU and CPU cycles there seems to be the implicit conclusion that 1 CPU cycle is 3 PPU cycles. This would imply NTSC. Now I wonder...are there any PAL games that use MMC3?

Is it just that people are relating the PPU and CPU cycles using the NTSC codification because it's convenient? Is there a reason why CPU cycles need to be discussed at all that I'm missing? M2 would cycle differently relative to CHR A12 in a PAL MMC3 game would it not?

I was only using pixels because 1 pixel = 1 ppu cycle, 1 memory fetch = 2 ppu cycles, and that's basically what drives A12. However, if it turns out that M2 is what drives the filtering, then the MMC3's scanline counter would behave different on NTSC vs PAL. The difference would only be by a small amount of PPU cycles (maybe not even 1), but it still should be accounted for.

by infiniteneslives on 2012-06-04 (#95037)

cpow wrote:
Is it just that people are relating the PPU and CPU cycles using the NTSC codification because it's convenient?

I think the main reason is it's not well understood what drives the filtering other than time, so depending what you're considering one would discuss time relative to either the PPU or CPU. I've been relating PPU and CPU cycles because I consider the time measured by the CPU. However most documentation I've seen measures this time by PPU cycles so I've converted that to NTSC PPU cycles in most of my discussions because that's the 'norm'. I think it's more to do with the knowledge of what's actually going on vice just convenience. But I admit I've been ingorant to PAL in my conversions during discussion.

Quote:
Is there a reason why CPU cycles need to be discussed at all that I'm missing?

If the time filtering is done by CPU cycles then there IS a reason to discuss CPU cycles. I have good reason to suspect this is what the actual MMC3 hardware is actually doing, and I've gotten it to work in my implementation as well. However if my presumptions are wrong then yes my discussions of CPU cycles is moot in regards to the actual MMC3, yet it's still valid for all of the MMC3 reimplementations I'm aware of.

Quote:
M2 would cycle differently relative to CHR A12 in a PAL MMC3 game would it not?

Yes it would. Keep in mind though that M2 cycle timing is different relative to CHR A12 on any NES though due to PPU/CPU alignment unknowns. So there is some uncertainty in ANY system, but like Drag said I don't think that uncertainty is much more in PAL systems. Whatever the slight difference is it must be manageable since how PAL games use the MMC3 no problem.

EDIT: oh and I did find some further checking in my notes. With the capacitor installed I found that <4 CPU cycles of filtering was good enough for SMB3 but a little glitchy on Klax (J). So my guess is the minimum required filtering would be 4-5 CPU cycles, mine works great with 5 currently.