Scroll splits and PPU fetching tiles for the next scanline

Scroll splits and PPU fetching tiles for the next scanline
by tokumaru on 2013-12-30 (#123097)

Back when most people didn't know (or didn't care) about the minute details of how the PPU renders pictures, I assumed we had 85 PPU cycles (around 28 CPU cycles, NTSC) to mess with $2006, $2005 (and possibly $2007) during HBlank. But now that I think of it, the PPU's address register must have its final value some time before the end of HBlank, since the first couple of tiles of the next scanline are fetched during the last few cycles of HBlank.

I'm thinking about this because I'm considering the feasibility of changing one palette color per scanline during HBlank. This would be done only on scanlines 0-3 of each row of tiles, because then it's possible to reset the scroll for those scanlines with only two $2006 writes ($2005/$2006 trickery is necessary for scanlines 4-7), meaning you could redefine 4 colors every 8 scanlines. This would allow for richer static (and not so static) backgrounds, without compromising the integrity of sprites (rendering would never be disabled mid-frame), which could be interesting in certain kinds of games (point-and-click adventures, for instance). This would obviously be limited to games that don't need a lot of CPU time, since a lot of it will be dedicated to managing the palettes (although with a scanline counter you could easily make use of the scanlines where no palette updates take place).

But even the timing for a single color seems pretty tight when you do the math:

1- You need to set the address of the palette entry to be changed, but that can mostly be done before HBlank (and we can pre-load the registers with the values we'll be using later), as only the end of the second $2006 write has to fall within HBlank (bless the temporary VRAM address register), so the time for this task doesn't really count.

2- Now we have to write the color (STX $2007 ;4 cycles). Since we already used 2 registers with pre-loaded values, there's only one left, so in order to restore the scroll for the next scanline we need to get the second byte of the address from somewhere (TSX ;2 cycles - we could avoid using the stack pointer here with self-modifying code, but there wouldn't be any speed gains). The final writes will need 8 cycles (stx $2006, sty $2006).

So, a total of 14 (or is it 15, considering the last cycle of the $2006 write that selects the palette entry to be changed?) cycles must be between the time when the PPU auto increments/restores its address register and the time when the first tiles of the next scanline are fetched. Technically, there are that many cycles available, but AFAIK, the difficulty in syncing the CPU with the PPU could mean an error of up to 7 cycles, which could really screw things up.

What do you think about this? Could changing one palette entry per scanline possibly be a safe operation to perform without the risk of graphical glitches? Maybe if the VBlank wait loop is composed only of short instructions (i.e. no JSRs to a random number generator that will RTS) we can reduce the sync error enough so that the color change will always fall within the safe area?

Re: Scroll splits and PPU fetching tiles for the next scanli
by Dwedit on 2013-12-30 (#123099)

Indiana Jones and the Last Crusade changed the background color 16 times at the title screen, but if you look closely at the screen, you'll notice that the rightmost 40 pixels are blank in scanlines where the palette changes. It also shows some minor glitches in Nintendulator periodically.

Also, you can't have sprites on scanlines where you change the palette.

Re: Scroll splits and PPU fetching tiles for the next scanli
by blargg on 2013-12-30 (#123100)

tokumaru wrote:

Technically, there are that many cycles available, but AFAIK, the difficulty in syncing the CPU with the PPU could mean an error of up to 7 cycles, which could really screw things up.

There's that NMI-tolerant method of synchronizing down to a pair of dots I posted a while back, can't remember whether problems were found with it.

Re: Scroll splits and PPU fetching tiles for the next scanli
by tokumaru on 2013-12-30 (#123101)

blargg, your ways of synchronizing with the PPU are amazing, I remember looking at some of your code in the past. I'll be sure to study the method you used again, and hopefully I'll be able to understand it and consider it usable in an actual game. Thanks.

Re: Scroll splits and PPU fetching tiles for the next scanli
by lidnariq on 2013-12-30 (#123102)

Can you write to palette memory without disabling rendering? Otherwise I think you've omitted the time taken by two writes to $2001.

Re: Scroll splits and PPU fetching tiles for the next scanli
by tokumaru on 2013-12-30 (#123103)

lidnariq wrote:

Can you write to palette memory without disabling rendering?

I'm not sure. We can change the scroll during HBlank without disabling rendering, that's for sure, but for palettes we need to use $2007, and the PPU would be fetching sprite patterns at the time of the $2007 write... I'm really not sure whether that's a problem.

I'm assuming that since the address register remains unchanged during the sprite pattern fetches (meaning it's not used for that purpose) and palette RAM is internal to the PPU (meaning that no external VRAM access is made to access it), a palette write probably wouldn't interfere with the sprite patterns being fetched.

Quote:

Otherwise I think you've omitted the time taken by two writes to $2001.

I just hope those aren't necessary if you make sure your $2006/$2007 writes fall within the time when the PPU is not using its address register, but I'll let someone with better knowledge of the NES' internals (or the ability to use Visual 2C02) give the final word on this.

If disabling rendering ends up being necessary, the whole benefit of this technique (no image data sacrificed at the sides, sprites constantly enabled) is lost.

EDIT: I might just have to stop being lazy and test this myself! I'll see if I can find time to code a test ROM.