While creating repro of Super Mario All Stars for MMC5, I had to implement MMC5 in FPGA. For some unknown reason, creators of this cartridge used the ability of MMC5 to supply diffrent CHR banks for sprites & background. Of course they also use the IRQ, so I come back into trouble of thinking how MMC5 scanline detector works.
Fortunately I have Just Breed MMC5 cartridge, so using KrzysioKazzo i was able to research MMC5 features with cycle acuracy
Before I started any research, I rev'ed the board to confirm if the wiki data is correct. I found two bugs:
1. pin 28 is PPU-!A13, not PPU-A13 as states wiki
2. one of the resistors in amplifier circuit is 6.8k (but it might vary from board to board I think)
I am also very curious about the unconnected pins. They are wired and cut out of edge from board, this is super crazy. Also, multimeter test shows internal connection. I think they might be related to the WRAM.
----
Ok, now some research about the MMC5 itself:
0. Multiplier ($5205/$5206) -> after writing A and B, low(A*B) and high(A*B) can be read immediatelly on next cpu cycle (no need to wait 8 cycles like in mapper90). This means that the whole product is calculated as combinatorial function and it requires quite a lot of ASIC resources.
The 8 cycle delay in mapper 90 is because after every cycle, each succesive bit of B is multiplied by A and added to the result causing much less resource need.
1.Check if sequence of two PPU reads from same address at row will trigger MMC5's scanline detector
It won't
2.Check if sequence of three PPU reads from same address at row will trigger MMC5's scanline detector
It does
---
03. Will 3 reads with A13=0 trigger it?
It won't
-------------------------------------------------------------------------------------------------
04. Does the PPU address matter or only A13 is checked?
There must be three consecutive fetches of the same address with A13=1 (all bits are checked)
-------------------------------------------------------------------------------------------------
05. What if there are more than 3 fetches per row?
Interrupt was generated earlier.
06. I found it by accident, but if I add one additional CPU read cycle before reading 0x5204 or after writing 0x5203,
the output will be:
00 40 40 40 40 40 40 40 40 40 40 c0 40 40 40 40 40 40 40 40
instead of:
00 00 40 40 40 40 40 40 40 40 40 40 c0 40 40 40 40 40 40 40
so the scanline detector starts working one scanline earlier.
06. What makes MMC5 think that frame rendering has ended
CPU read from from $fffa? -> yes
CPU read from from $fffb? -> yes
CPU write at $fffa/$fffb? -> no
CPU write $00 at $2001 -> no
If there are 3 or more CPU reads between which are no PPU read, MMC5 starts thinking PPU rendering has ended.ppu read
cpu read
cpu read
cpu read <- at beginning of this cycle, mmc5 sets in-frame to 0 (so if this read would be from $5204, it will return 0)
PPU writes does not matter, the following sequence will still set in-frame to 0:
ppu read
cpu read
ppu write
cpu read
cpu read <- at beginning of this cycle, m2 sets in-frame to 0 (so if this read would be from $5204, it will return 0)
---
Later I will check the memory protection bits, because I roughly tested it few days ago and I think that when M2 stops toggling, these bits are automatically set (like if reset happened). No idea how MMC5 checks without help of external detector that this happened.
Fortunately I have Just Breed MMC5 cartridge, so using KrzysioKazzo i was able to research MMC5 features with cycle acuracy
Before I started any research, I rev'ed the board to confirm if the wiki data is correct. I found two bugs:
1. pin 28 is PPU-!A13, not PPU-A13 as states wiki
2. one of the resistors in amplifier circuit is 6.8k (but it might vary from board to board I think)
I am also very curious about the unconnected pins. They are wired and cut out of edge from board, this is super crazy. Also, multimeter test shows internal connection. I think they might be related to the WRAM.
----
Ok, now some research about the MMC5 itself:
0. Multiplier ($5205/$5206) -> after writing A and B, low(A*B) and high(A*B) can be read immediatelly on next cpu cycle (no need to wait 8 cycles like in mapper90). This means that the whole product is calculated as combinatorial function and it requires quite a lot of ASIC resources.
The 8 cycle delay in mapper 90 is because after every cycle, each succesive bit of B is multiplied by A and added to the result causing much less resource need.
1.Check if sequence of two PPU reads from same address at row will trigger MMC5's scanline detector
It won't
Code:
public byte[] cpu_read(long start_address, int bytes_to_read)
void cpu_write(long start_address, byte[] bytes_to_write);
public byte[] ppu_read(long start_address, int bytes_to_read);
public void ppu_write(long start_address, byte[] bytes_to_write);
public int read_irq(); //reads !IRQ line (0=irq asserted, 1=not asserted), this does not produce any cpu cycle
public void cpu_m2_constant_clocking(bool clocking_enabled); //if clocking_enabled=false -> there won't be any cpu/ppu cycles during idle time, if
clocking_enabled=true, there will be CPU read at $0000 as idle cycle
cpu_m2_constant_clocking(false);
cpu_read(0x5204, 1); //clear any interrupt if pending
cpu_write(0x5203, new byte[] {10}); //generate irq at scanline 10
string result = "";
byte r0x5204;
for (int i = 0; i < 20; ++i) {
r0x5204 = cpu_read(0x5204, 1)[0];
result += String.Format("{0:x2} ", r0x5204);
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1);
ppu_read(0 << 13, 1);
}
Output:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
void cpu_write(long start_address, byte[] bytes_to_write);
public byte[] ppu_read(long start_address, int bytes_to_read);
public void ppu_write(long start_address, byte[] bytes_to_write);
public int read_irq(); //reads !IRQ line (0=irq asserted, 1=not asserted), this does not produce any cpu cycle
public void cpu_m2_constant_clocking(bool clocking_enabled); //if clocking_enabled=false -> there won't be any cpu/ppu cycles during idle time, if
clocking_enabled=true, there will be CPU read at $0000 as idle cycle
cpu_m2_constant_clocking(false);
cpu_read(0x5204, 1); //clear any interrupt if pending
cpu_write(0x5203, new byte[] {10}); //generate irq at scanline 10
string result = "";
byte r0x5204;
for (int i = 0; i < 20; ++i) {
r0x5204 = cpu_read(0x5204, 1)[0];
result += String.Format("{0:x2} ", r0x5204);
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1);
ppu_read(0 << 13, 1);
}
Output:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2.Check if sequence of three PPU reads from same address at row will trigger MMC5's scanline detector
It does
Code:
string result = "";
byte r0x5204;
cpu_m2_constant_clocking(false);
cpu_read(0x5204, 1); //clear any interrupt if pending
cpu_write(0x5203, new byte[] {10}); //generate irq at scanline 10
for (int i = 0; i < 20; ++i) {
r0x5204 = cpu_read(0x5204, 1)[0];
result += String.Format("{0:x2} ", r0x5204);
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1); <- this was added
ppu_read(0 << 13, 1);
}
Output:
00 00 40 40 40 40 40 40 40 40 40 40 c0 40 40 40 40 40 40 40
byte r0x5204;
cpu_m2_constant_clocking(false);
cpu_read(0x5204, 1); //clear any interrupt if pending
cpu_write(0x5203, new byte[] {10}); //generate irq at scanline 10
for (int i = 0; i < 20; ++i) {
r0x5204 = cpu_read(0x5204, 1)[0];
result += String.Format("{0:x2} ", r0x5204);
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1); <- this was added
ppu_read(0 << 13, 1);
}
Output:
00 00 40 40 40 40 40 40 40 40 40 40 c0 40 40 40 40 40 40 40
---
03. Will 3 reads with A13=0 trigger it?
It won't
Code:
string result = "";
byte r0x5204;
cpu_m2_constant_clocking(false);
cpu_read(0x5204, 1); //clear any interrupt if pending
cpu_write(0x5203, new byte[] {10}); //generate irq at scanline 10
for (int i = 0; i < 20; ++i) {
r0x5204 = cpu_read(0x5204, 1)[0];
result += String.Format("{0:x2} ", r0x5204);
ppu_read(0 << 13, 1); <- change was made here
ppu_read(0 << 13, 1); <- change was made here
ppu_read(0 << 13, 1); <- change was made here
ppu_read(1 << 13, 1); <- change was made here
}
Output:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
byte r0x5204;
cpu_m2_constant_clocking(false);
cpu_read(0x5204, 1); //clear any interrupt if pending
cpu_write(0x5203, new byte[] {10}); //generate irq at scanline 10
for (int i = 0; i < 20; ++i) {
r0x5204 = cpu_read(0x5204, 1)[0];
result += String.Format("{0:x2} ", r0x5204);
ppu_read(0 << 13, 1); <- change was made here
ppu_read(0 << 13, 1); <- change was made here
ppu_read(0 << 13, 1); <- change was made here
ppu_read(1 << 13, 1); <- change was made here
}
Output:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-------------------------------------------------------------------------------------------------
04. Does the PPU address matter or only A13 is checked?
There must be three consecutive fetches of the same address with A13=1 (all bits are checked)
Code:
string result = "";
byte r0x5204;
cpu_m2_constant_clocking(false);
cpu_read(0x5204, 1); //clear any interrupt if pending
cpu_write(0x5203, new byte[] {10}); //generate irq at scanline 10
for (int b = 0; b < 14; ++b) {
for (int i = 0; i < 20; ++i) {
r0x5204 = cpu_read(0x5204, 1)[0];
result += String.Format("{0:x2} ", r0x5204);
ppu_read((1 << 13) | (1 << b), 1); <- change was made here
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1);
ppu_read(0 << 13, 1);
}
result += "\r\n";
}
Output:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 40 40 40 40 40 40 40 40 40 40 c0 40 40 40 40 40 40 40
byte r0x5204;
cpu_m2_constant_clocking(false);
cpu_read(0x5204, 1); //clear any interrupt if pending
cpu_write(0x5203, new byte[] {10}); //generate irq at scanline 10
for (int b = 0; b < 14; ++b) {
for (int i = 0; i < 20; ++i) {
r0x5204 = cpu_read(0x5204, 1)[0];
result += String.Format("{0:x2} ", r0x5204);
ppu_read((1 << 13) | (1 << b), 1); <- change was made here
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1);
ppu_read(0 << 13, 1);
}
result += "\r\n";
}
Output:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 40 40 40 40 40 40 40 40 40 40 c0 40 40 40 40 40 40 40
-------------------------------------------------------------------------------------------------
05. What if there are more than 3 fetches per row?
Interrupt was generated earlier.
Code:
string result = "";
byte r0x5204;
cpu_m2_constant_clocking(false);
cpu_read(0x5204, 1); //clear any interrupt if pending
cpu_write(0x5203, new byte[] {10}); //generate irq at scanline 10
for (int i = 0; i < 20; ++i) {
r0x5204 = cpu_read(0x5204, 1)[0];
result += String.Format("{0:x2} ", r0x5204);
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1);
}
Output:
00 00 40 40 40 c0 40 40 40 40 40 40 40 40 40 40 40 40 40 40
byte r0x5204;
cpu_m2_constant_clocking(false);
cpu_read(0x5204, 1); //clear any interrupt if pending
cpu_write(0x5203, new byte[] {10}); //generate irq at scanline 10
for (int i = 0; i < 20; ++i) {
r0x5204 = cpu_read(0x5204, 1)[0];
result += String.Format("{0:x2} ", r0x5204);
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1);
ppu_read(1 << 13, 1);
}
Output:
00 00 40 40 40 c0 40 40 40 40 40 40 40 40 40 40 40 40 40 40
06. I found it by accident, but if I add one additional CPU read cycle before reading 0x5204 or after writing 0x5203,
the output will be:
00 40 40 40 40 40 40 40 40 40 40 c0 40 40 40 40 40 40 40 40
instead of:
00 00 40 40 40 40 40 40 40 40 40 40 c0 40 40 40 40 40 40 40
so the scanline detector starts working one scanline earlier.
06. What makes MMC5 think that frame rendering has ended
CPU read from from $fffa? -> yes
CPU read from from $fffb? -> yes
CPU write at $fffa/$fffb? -> no
CPU write $00 at $2001 -> no
If there are 3 or more CPU reads between which are no PPU read, MMC5 starts thinking PPU rendering has ended.ppu read
cpu read
cpu read
cpu read <- at beginning of this cycle, mmc5 sets in-frame to 0 (so if this read would be from $5204, it will return 0)
PPU writes does not matter, the following sequence will still set in-frame to 0:
ppu read
cpu read
ppu write
cpu read
cpu read <- at beginning of this cycle, m2 sets in-frame to 0 (so if this read would be from $5204, it will return 0)
---
Later I will check the memory protection bits, because I roughly tested it few days ago and I think that when M2 stops toggling, these bits are automatically set (like if reset happened). No idea how MMC5 checks without help of external detector that this happened.