Back when most people didn't know (or didn't care) about the minute details of how the PPU renders pictures, I assumed we had 85 PPU cycles (around 28 CPU cycles, NTSC) to mess with $2006, $2005 (and possibly $2007) during HBlank. But now that I think of it, the PPU's address register must have its final value some time before the end of HBlank, since the first couple of tiles of the next scanline are fetched during the last few cycles of HBlank.
I'm thinking about this because I'm considering the feasibility of changing one palette color per scanline during HBlank. This would be done only on scanlines 0-3 of each row of tiles, because then it's possible to reset the scroll for those scanlines with only two $2006 writes ($2005/$2006 trickery is necessary for scanlines 4-7), meaning you could redefine 4 colors every 8 scanlines. This would allow for richer static (and not so static) backgrounds, without compromising the integrity of sprites (rendering would never be disabled mid-frame), which could be interesting in certain kinds of games (point-and-click adventures, for instance). This would obviously be limited to games that don't need a lot of CPU time, since a lot of it will be dedicated to managing the palettes (although with a scanline counter you could easily make use of the scanlines where no palette updates take place).
But even the timing for a single color seems pretty tight when you do the math:
1- You need to set the address of the palette entry to be changed, but that can mostly be done before HBlank (and we can pre-load the registers with the values we'll be using later), as only the end of the second $2006 write has to fall within HBlank (bless the temporary VRAM address register), so the time for this task doesn't really count.
2- Now we have to write the color (STX $2007 ;4 cycles). Since we already used 2 registers with pre-loaded values, there's only one left, so in order to restore the scroll for the next scanline we need to get the second byte of the address from somewhere (TSX ;2 cycles - we could avoid using the stack pointer here with self-modifying code, but there wouldn't be any speed gains). The final writes will need 8 cycles (stx $2006, sty $2006).
So, a total of 14 (or is it 15, considering the last cycle of the $2006 write that selects the palette entry to be changed?) cycles must be between the time when the PPU auto increments/restores its address register and the time when the first tiles of the next scanline are fetched. Technically, there are that many cycles available, but AFAIK, the difficulty in syncing the CPU with the PPU could mean an error of up to 7 cycles, which could really screw things up.
What do you think about this? Could changing one palette entry per scanline possibly be a safe operation to perform without the risk of graphical glitches? Maybe if the VBlank wait loop is composed only of short instructions (i.e. no JSRs to a random number generator that will RTS) we can reduce the sync error enough so that the color change will always fall within the safe area?
I'm thinking about this because I'm considering the feasibility of changing one palette color per scanline during HBlank. This would be done only on scanlines 0-3 of each row of tiles, because then it's possible to reset the scroll for those scanlines with only two $2006 writes ($2005/$2006 trickery is necessary for scanlines 4-7), meaning you could redefine 4 colors every 8 scanlines. This would allow for richer static (and not so static) backgrounds, without compromising the integrity of sprites (rendering would never be disabled mid-frame), which could be interesting in certain kinds of games (point-and-click adventures, for instance). This would obviously be limited to games that don't need a lot of CPU time, since a lot of it will be dedicated to managing the palettes (although with a scanline counter you could easily make use of the scanlines where no palette updates take place).
But even the timing for a single color seems pretty tight when you do the math:
1- You need to set the address of the palette entry to be changed, but that can mostly be done before HBlank (and we can pre-load the registers with the values we'll be using later), as only the end of the second $2006 write has to fall within HBlank (bless the temporary VRAM address register), so the time for this task doesn't really count.
2- Now we have to write the color (STX $2007 ;4 cycles). Since we already used 2 registers with pre-loaded values, there's only one left, so in order to restore the scroll for the next scanline we need to get the second byte of the address from somewhere (TSX ;2 cycles - we could avoid using the stack pointer here with self-modifying code, but there wouldn't be any speed gains). The final writes will need 8 cycles (stx $2006, sty $2006).
So, a total of 14 (or is it 15, considering the last cycle of the $2006 write that selects the palette entry to be changed?) cycles must be between the time when the PPU auto increments/restores its address register and the time when the first tiles of the next scanline are fetched. Technically, there are that many cycles available, but AFAIK, the difficulty in syncing the CPU with the PPU could mean an error of up to 7 cycles, which could really screw things up.
What do you think about this? Could changing one palette entry per scanline possibly be a safe operation to perform without the risk of graphical glitches? Maybe if the VBlank wait loop is composed only of short instructions (i.e. no JSRs to a random number generator that will RTS) we can reduce the sync error enough so that the color change will always fall within the safe area?