Double Buffered CHR-RAM or DMA Steal?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Double Buffered CHR-RAM or DMA Steal?
by on (#192339)
Do any of the Mappers have double buffered CHR-RAM. CHR-RAM sounds great but you still have to pump through the Port during VBlank which is a tiny tiny window. But it seems to me you could have two CHR-RAM chips and then put one to the PPU and then bank one into CPU space. Then toggle, possibly even add mappers before it so you can have them looking at different "windows", but leaving the lower 256 Patterns fixed you can fit the rest into CPU space it seems.
I.e.
Code:
PPU                  CPU
8K Pattern Table     
8K Pattern Table     8K Pattern Table
4K Name Table        4K Name Table
256B Pal             256B Pal
You could put the Pal at $4300, then the name tables at $4400,$4800,$4C00,$5000, then the 8K Pattern Table at $6000 with your normal ROM at $8000-FFFF

Alternatively looking at the timing diagrams for the PPU it looks like it goes
Set latch
read RAM
Set latch
read RAM

while the PPU is setting the latch, it doesn't need the RAM chip ;) so you could "DMA steal", so when the RAM's CE goes lo, you could then invert it with a small delay and use some buffers to set the addr and data lines to write to the CHR-RAM. Then give it back to the PPU none the wiser ;) Or use it to do a RAM refresh and use cheaper DRAMs.
Re: Double Buffered CHR-RAM or DMA Steal?
by on (#192341)
Looking at the wiki, several mappers support more than 8kb CHR RAM. MMC3 and VRC7 for starters. The compo cart mapper has 32kb.

edit: But not double-buffered or cpu-accessible.
Re: Double Buffered CHR-RAM or DMA Steal?
by on (#192344)
The closest thing to this in licensed games is the 1024-byte dual-port ExRAM in the MMC5, but it's used for a third nametable (or attributes on the other nametables), not a pattern table.

Both the CPU and the PPU are driving an address at once. To connect two memories, attaching one to the CPU and the other to the PPU, your mapper would have to multiplex all their address and data inputs.

It'd be easier to just use one memory and implement a write FIFO like TMS9918 and its descendants have. The PPU reads video memory only during every other pixel, giving you plenty of time (in SRAM terms) to stick a write in between. Your mapper would still need several dozen I/O pins to control all of the CHR RAM's address lines and data lines, not unlike the MMC5, but at least it wouldn't need to control two memories.
Re: Double Buffered CHR-RAM or DMA Steal?
by on (#192401)
DMA theft has been proposed (incl. by me) but I don't believe anyone's implemented one yet.
Re: Double Buffered CHR-RAM or DMA Steal?
by on (#192415)
I had this idea a few years ago, but never did anything with it.

The timing of CPU and PPU is such that, for every CPU cycle, there's a point in time where the PPU isn't accessing memory and the access can instead go to the CPU, by feeding a latch or something similar. I'm not an electrical engineer, so there may be a physics reason for this idea to not work, but never the less, the concept seems plausible.
Re: Double Buffered CHR-RAM or DMA Steal?
by on (#192420)
There's totally enough idle time during the PPU's ALE cycles to stuff a write in, especially given modern RAM speeds.

The problem is that it's still ridiculously I/O heavy. At the point that you've added all the hardware for multiplexing all the address and data lines, you basically need an FPGA, and at that point you probably "should" just use the FPGA's internal SSRAM to fake a dual-ported RAM instead.

It oughtn't be hard to use the PowerPak's XC2S30's 3KiB of internal SSRAM for this purpose... if I could figure out how to get a key for the toolchain from Xilinx I probably would have already tried it.
Re: Double Buffered CHR-RAM or DMA Steal?
by on (#192428)
It doesn't have to be I/O heavy. You have what 12bits Addr and 8bits Data so 2 8bit Muxs and 1 4 bit Mux to pull SRAM off the PPU and switch it to your latch. So that is controlled by a signal. You then have your 3 buffers ( either 3 8bits or 2 8bits and a 4 bit ) which you can load chip by chip, so you need an 8 bit bus to the 3 chips and a 2 bit select line. The 6502 won't be able to hit the latches faster than 4 clocks, so you easily have a
set upper addr
set lower addr
set data
DMA steal
that you can run off PHI2 ( seems you get PHI0 not 2 though ? )

The 6502 will have a valid address on PHI rise, and valid data on PHI sink. so you could plex a 16 bit bus into your latches.
This way you have a bunch of chips, but a very small I/O need from the controlling CPLD, all it has to do is detect the address range and then control the mux lines, CS line when needed. Bit if you want to put as much as possible into a larger CPLD you could
16 bits plexed bus into CPLD + 1 bit control to switch addr/data
8 bits bus to latches + 2 control
1 bit CHR-RAM MUX
that is 28 pins, you can get XC9072XL with
44-pin PLCC (34 user I/O pins)
44-pin VQFP (34 user I/O pins)
48-pin CSP (38 user I/O pins)
64-pin VQFP (52 user I/O pins)
100-pin TQFP (72 user I/O pins)
to which if you go the 64 or 72 pin version you can do
24 bits in, 20 bits out direct.
With the 72 you could do 24 bits CPU, 20 Bits SRAM, 20 bits PPU, Chip select, PPU R/W
Right?
I would think for price though jungle logic and a controlling CPLD would be cheaper.
Re: Double Buffered CHR-RAM or DMA Steal?
by on (#192432)
Oziphantom wrote:
I would think for price though jungle logic and a controlling CPLD would be cheaper.


Quite the opposite, you can now get a modest "self initializing FPGA" marketed as a CPLD for less than the cost of the XC9572XL you chose. That CPLD is not longer recommended for new designs and started seeing EOL price hikes 2 years ago. The current Lattice/Altera FPGAs I'm referring to have more logic and SRAM available than the powerpak. Also that jungle logic ain't free, can quickly add up for a few dollars and inflate your PCB size.

I'm actually in the process of prototyping a design similar to what you guys are discussing here. Been waiting to make my own thread explaining details once I get the proof of concept fully working with both SRAM and flash.
Re: Double Buffered CHR-RAM or DMA Steal?
by on (#192439)
Oziphantom wrote:
It doesn't have to be I/O heavy.
You've studied the differences between the VIC-20 and the C64, right? How there's all that extra hardware in the VIC-20 relative to the C64 because the VIC-20 can't ever stop driving its address bus?

The NES's CPU/PPU memory spaces are basically the same problem, only they don't even run synchronously.

Quote:
You have, what, 12bits Addr and 8bits Data so 2 8bit Muxs and 1 4 bit Mux to pull SRAM off the PPU and switch it to your latch. So that is controlled by a signal. You then have your 3 buffers ( either 3 8bits or 2 8bits and a 4 bit ) which you can load chip by chip, so you need an 8 bit bus to the 3 chips and a 2 bit select line.
The break-even point (cost-wise) for a discrete logic design without any programmable logic is somewhere around 3 to 6 ICs, depending on the specific ICs (and volume) you're talking about. Once you need to throw in a PAL or anything better, you should really just do the entire thing in a modern FPGA.

You're already talking about five 4-bit-wide multiplexers just for the necessary crossbar to get the SRAM attached to either address and data bus. That's what I'm talking about by "ridiculously I/O heavy".

Quote:
( seems you get PHI0 not 2 though ? )
The 2A03 generates φ0 internally by dividing the 21.5 or 26.6MHz master clock by a 6- or 8- stage twisted-ring counter respectively. The internal 6502 then generates φ1. Meanwhile, at least in the 2A03, another twisted-ring counter is ORed with the first one to produce the external "M2" signal, such that M2 falls at the same time as φ0.
Re: Double Buffered CHR-RAM or DMA Steal?
by on (#192446)
infiniteneslives wrote:
Oziphantom wrote:
I would think for price though jungle logic and a controlling CPLD would be cheaper.


Quite the opposite, you can now get a modest "self initializing FPGA" marketed as a CPLD for less than the cost of the XC9572XL you chose. That CPLD is not longer recommended for new designs and started seeing EOL price hikes 2 years ago. The current Lattice/Altera FPGAs I'm referring to have more logic and SRAM available than the powerpak. Also that jungle logic ain't free, can quickly add up for a few dollars and inflate your PCB size.

I'm actually in the process of prototyping a design similar to what you guys are discussing here. Been waiting to make my own thread explaining details once I get the proof of concept fully working with both SRAM and flash.


Yeah XC95144XL are now an eye watering $30 a pop, while the 36s are $1.34, it was under these conditions I was thinking a 36er + Jungle logic < $30. Any info/model numbers on the Altera/Lattice parts would be greatly appreciated.
Re: Double Buffered CHR-RAM or DMA Steal?
by on (#192447)
You'll be hard pressed to orchestrating all that with 36 macrocells. Maybe it's possible if you can do most of it with combinational logic alone. But you certainly won't have enough logic for even a modest featured MMC1 scale mapper to go along with your fancy dual port RAM. Full fledged MMC1 doesn't even fit in 36 macrocells.

Paying the premium for 5v tolerance in programmable logic is only worthwhile if you're 36 macrocells or less. For a design of this nature you'll have level shift the entire board down to 3v, but it'll be about a dollar well spent. You'll recover that dollar and then some with all parts on the board being supplied with 3v. Then you can get some substantial logic in there with a Lattice Mach XO/XO2, or Altera Max 5/10 while still on a budget.