The longest instruction length is 8 CPU cycles. If the CPU processes each instruction atomically, then it will be out of sync with the PPU for as much as +/- 24 PPU cycles (2.5 tiles). This might explain why it is difficult to properly emulate games that switch the nametable mid-scanline, such as Marble Madness, without introducing an NMI delay hack.
Would a CPU that processes instructions cycle-by-cycle, doing the memory reads/writes on exactly the right cycles, actually resolve this? Would it perform well?
Alternatively, the PPU could be modified to read from the nametables at the last possible moment, just before the tile is drawn, as opposed to preparing shift registers with the values. I'm not sure what the side effects of such a change would be.
Would a CPU that processes instructions cycle-by-cycle, doing the memory reads/writes on exactly the right cycles, actually resolve this? Would it perform well?
Alternatively, the PPU could be modified to read from the nametables at the last possible moment, just before the tile is drawn, as opposed to preparing shift registers with the values. I'm not sure what the side effects of such a change would be.