Mapper idea - NESdev BBS

Mapper idea
by lidnariq on 2010-06-18 (#63085)

So, the PPU has a lot of cycles that it's not actively using its bus. On each rendering scanline, of the 341 pixels&cycles per scanline, 170 are spent reading the bus, 1 is idle (on almost all scanlines), and 170 are spent latching the lower 8 bits of the address bus (because A0..7 and D0..7 are multiplexed. I'll call these ALE cycles because that's the 8051 term). ALE cycles are one pixel long, or ~180ns, so definitely long enough to cheaply get RAM fast enough that we can have a whole extra access cycle.

So we could build a mapper that could DMA data to or from the CHRRAM by disconnecting CHRRAM from the PPU on these ALE cycles. Plus we could do this while rendering is enabled, although we'd have to be careful about not tearing.

The downsides:
* We could only write to RAM that was on the cart itself, so we'd either have to supply our own RAM for the nametables, or not be able to DMA to them.
* (for PRG->CHR) We have no way to force the CPU to stop reading, so we either have to piggyback on the existing read cycles (and feed the CPU fake data like "ORA #0")
* (for CHR->PRG) We'd have to force PRGRAM off the bus while we were writing to it -- i.e. code that read/wrote/ran from it during DMA wouldn't work.
* (either direction) or have fast PRGRAM+ROM on the cart (~70ns) and do our reads/writes in the first quarter of ph2. Regardless, external hardware can't rapidly read or write to the 2kB of RAM inside the NES, so if we're not just blitting data out of ROM (and if we are, why not use CHRROM with nice bankswitching?) we need PRGRAM too.

In any case, the point of all of this is: I've been trying and failing to brainstorm how you'd use this ability. Any ideas?

Re: Mapper idea
by tepples on 2010-06-18 (#63086)

lidnariq wrote:

* We could only write to RAM that was on the cart itself, so we'd either have to supply our own RAM for the nametables

Such a mapper might have a 32 KiB RAM and twelve 5-bit registers, each controlling a 1 KiB chunk from $0000 to $2C00.

Quote:

or have fast PRGRAM+ROM on the cart (~70ns) and do our reads/writes in the first quarter of ph2.

If what you want to do is allow direct CPU writes to VRAM, you can do that without hooking into ALE. Maintain a queue of RAM locations that the PPU wrote and execute the writes during the dummy nametable fetches during sprite rendering. It limits bandwidth to 16 bytes per scanline, but the CPU can't do much more than that during vblank anyway.

Quote:

so if we're not just blitting data out of ROM (and if we are, why not use CHRROM with nice bankswitching?)

Because "nice" bankswitching isn't perfect bankswitching. Some people want it even finer than 1 KiB chunks, such as a 256-byte area for a single 32x32 pixel sprite cel.

Quote:

In any case, the point of all of this is: I've been trying and failing to brainstorm how you'd use this ability. Any ideas?

One idea: Bankswitchable ExRAM. Map $5C00-$5FFF to a selectable 1 KiB chunk of the 32 KiB CHR RAM. Execute reads and writes during the next ALE cycle; it might be cheaper to require double reads to load the latch.

by Memblers on 2010-06-18 (#63088)

I drew up a schematic to allow the NES CPU to write to CHR-RAM whenever, but using discrete logic it needed 6 chips (5 74xx573 and 1 74xx245) just to handle switching the RAM between bus's. There were plenty of other problems left in that. If you use a CPLD or FPGA it's still a lot of I/Os.

With the PowerPak, I'm not sure at the moment if there is any reason that one couldn't make a mapper to do stuff similar to that.

In my Squeedo redesign, I'm actually planning to use just a single fast SRAM for PRG/CHR/MCU. Won't be a problem, and there could be some pretty interesting results when you can push that many pixels around. Blast processing!

by Bregalad on 2010-06-19 (#63099)

I think dual ported SRAMs do exist. Yes you can only access them one "side" at a time, but it could be possible to do the following :
- Writes to a mapper reigster which last the value
- On the next ALE cycle, the value latched is written to the SRAM.

In fact it's likely how the MMC5 work internally I guess.
Another problem is to detect ALE cycles. Normal cycles can be detected by ANDing /RD and /WR, but as far I know there is no way to detect ALE cycles, since the ALE signal isn't driven to the cart connector (and not even the bottom connector).

by tepples on 2010-06-19 (#63106)

Bregalad wrote:
Another problem is to detect ALE cycles. Normal cycles can be detected by ANDing /RD and /WR, but as far I know there is no way to detect ALE cycles

Would /ALE = /RD NAND /WR work?

by Memblers on 2010-06-19 (#63111)

tepples wrote:
Bregalad wrote:
Another problem is to detect ALE cycles. Normal cycles can be detected by ANDing /RD and /WR, but as far I know there is no way to detect ALE cycles

Would /ALE = /RD NAND /WR work?

That's what I've wondered, but unlike the 6502, we don't have a detailed datasheets to show all the timing characteristics of the PPU.

The dual-port SRAMs available are pretty expensive, not really worth looking at (especially since like mentioned, you can only write one side at a time - seems nearly useless).

I bet if one searches "dual-port RAM" on here, there would be all sorts of discussions that would turn up for it (could try on the old forum too).