Do any of the Mappers have double buffered CHR-RAM. CHR-RAM sounds great but you still have to pump through the Port during VBlank which is a tiny tiny window. But it seems to me you could have two CHR-RAM chips and then put one to the PPU and then bank one into CPU space. Then toggle, possibly even add mappers before it so you can have them looking at different "windows", but leaving the lower 256 Patterns fixed you can fit the rest into CPU space it seems. 
I.e.
Alternatively looking at the timing diagrams for the PPU it looks like it goes
Set latch
read RAM
Set latch
read RAM
while the PPU is setting the latch, it doesn't need the RAM chip so you could "DMA steal", so when the RAM's CE goes lo, you could then invert it with a small delay and use some buffers to set the addr and data lines to write to the CHR-RAM. Then give it back to the PPU none the wiser
 so you could "DMA steal", so when the RAM's CE goes lo, you could then invert it with a small delay and use some buffers to set the addr and data lines to write to the CHR-RAM. Then give it back to the PPU none the wiser  Or use it to do a RAM refresh and use cheaper DRAMs.
 Or use it to do a RAM refresh and use cheaper DRAMs.
I.e.
Code:
PPU                  CPU
8K Pattern Table
8K Pattern Table 8K Pattern Table
4K Name Table 4K Name Table
256B Pal 256B Pal
You could put the Pal at $4300, then the name tables at $4400,$4800,$4C00,$5000, then the 8K Pattern Table at $6000 with your normal ROM at $8000-FFFF8K Pattern Table
8K Pattern Table 8K Pattern Table
4K Name Table 4K Name Table
256B Pal 256B Pal
Alternatively looking at the timing diagrams for the PPU it looks like it goes
Set latch
read RAM
Set latch
read RAM
while the PPU is setting the latch, it doesn't need the RAM chip
 so you could "DMA steal", so when the RAM's CE goes lo, you could then invert it with a small delay and use some buffers to set the addr and data lines to write to the CHR-RAM. Then give it back to the PPU none the wiser
 so you could "DMA steal", so when the RAM's CE goes lo, you could then invert it with a small delay and use some buffers to set the addr and data lines to write to the CHR-RAM. Then give it back to the PPU none the wiser  Or use it to do a RAM refresh and use cheaper DRAMs.
 Or use it to do a RAM refresh and use cheaper DRAMs.