The problem, I think, is with the implementation of ppu_on_all().
Code:
_ppu_on_all:
lda <PPU_MASK_VAR
ora #%00011000
ppu_onoff:
sta <PPU_MASK_VAR
sta PPU_MASK
jsr _ppu_waitnmi
lda #$00
sta PPU_ADDR
sta PPU_ADDR
lda <PPU_CTRL_VAR
sta PPU_CTRL
rts
Please correct me if I'm wrong in this analysis, but this turns on rendering immediately, which means it might happen in the middle of a frame somewhere, giving you half a blank screen and half of an incorrectly scrolled screen for the first frame it is turned on. Next it waits for an NMI to complete and immediately sets the VRAM address to 0, which will force the scroll to 0,0 for the second frame (unless the NMI routine is finished before the end of vblank -- this might depend whether or not music is playing). It will only start applying your scroll correctly on the third frame, as far as I can tell.
The purpose of ppu_off() and ppu_on_all() is generally to leave rendering disabled for an extended period of time (i.e. multiple frames) while you make some changes to the nametable (direct vram writes). If you're only turning off rendering occasionally, the two frames of incorrect apperance might not be a big deal, and for a game where the scrolling is supposed to be at 0,0 anyway, you're only talking about one frame of incorrect appearance.
You may want to remove these 3 lines from _ppu_on_all:
Code:
lda #$00
sta PPU_ADDR
sta PPU_ADDR
These lines serve no purpose, as the correct scroll will already get applied during the line: jsr _ppu_waitnmi
This should cut the incorrect visual down to a single half-rendered frame.
In the examples he provided, this half-rendered frame seems to be hidden by an all-black palette. The new palette doesn't get applied until the _ppu_waitnmi subroutine, so the bad frame is not seen. The second problem frame is also not an issue since all the examples start at 0,0 scroll anyway. This might explain why _ppu_on_all does not appear very robust; in the way shiru was using it, there really is no problem to be seen. You could hide the problem yourself by clearing the palette, and rendering one frame before calling ppu_off():
Code:
pal_clear()
ppu_waitnmi()
ppu_off()
// ... do your vram writes here
// ... setup your new palette
ppu_on_all()
This adds an extra frame of latency to your turning off the PPU (probably negligible), but should hide the half-rendered frame.