On the PPU rendering page, an explanation is given on how the PPU renders scanlines. I'm trying to implement a scanline-based PPU, but when I look at other emulator's sources, they seem to set the PPU's current VRAM address (loopyV) to the temporary VRAM address (loopyT) on scanline 0. Why is this done? It doesn't give a clear explanation for this on the page, and while it's logical to do so (I think) it isn't explained anywhere (to my knowledge). So why do emulators update this address on scanline 0?
EDIT:
Not even every emulator does so. I don't know when I'm supposed to update LoopyV to LoopyT other than when I write to $2006 for the second time. Can anyone help me with this?
If rendering is enabled, the video memory address v is updated during the sync pulse before line 0, which ends at dot 304 of the pre-render line.
Loopy_V is actually the PPU's own counter as it iterates through the tilemap and fetches the tile numbers. Loopy_T is the PPU's reference to where the top-left corner of the screen is.
If you look at how the bits are related to scrolling, and compare the bits for how you select a tile on the tilemap, you'll find they match exactly. 3 extra bits are used as "fine Y" that don't correspond to bits addressing a tile on the map.
Loopy_V is incremented 34 times as it fetches the tiles for a scanline, then it snaps back to the start of that scanline (bits from T copied to V) so it can fetch the tile numbers for the next line. It also increments fine Y, then possibly tile Y at that time.
Then at the start of the frame, it needs to apply both the X and Y scrolling position, so it copies all bits from T to V.
tepples wrote:
If rendering is enabled, the video memory address v is updated during the sync pulse before line 0, which ends at dot 304 of the pre-render line.
Due to my lack of understanding, it took me a while to understand what the timing diagram (
http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png) can do for me. Turns out at he last VBLANK line, the horizontal positions first get reloaded, and some dozen lines later, the vertical positions get reloaded. It makes much more sense now that I've discovered how helpful the diagram can be
Dwedit wrote:
Loopy_V is actually the PPU's own counter as it iterates through the tilemap and fetches the tile numbers. Loopy_T is the PPU's reference to where the top-left corner of the screen is.
If you look at how the bits are related to scrolling, and compare the bits for how you select a tile on the tilemap, you'll find they match exactly. 3 extra bits are used as "fine Y" that don't correspond to bits addressing a tile on the map.
Loopy_V is incremented 34 times as it fetches the tiles for a scanline, then it snaps back to the start of that scanline (bits from T copied to V) so it can fetch the tile numbers for the next line. It also increments fine Y, then possibly tile Y at that time.
Then at the start of the frame, it needs to apply both the X and Y scrolling position, so it copies all bits from T to V.
Why 34 times though? The rest of your explanation makes sense, but if you're talking about the horizontal tiles of a scanline, shouldn't it be incremented 32 times because the PPU draws 32 times 8x8 tiles?
ArsonIzer wrote:
Why 34 times though? The rest of your explanation makes sense, but if you're talking about the horizontal tiles of a scanline, shouldn't it be incremented 32 times because the PPU draws 32 times 8x8 tiles?
I'm not sure why 34 tiles are fetched, but if the horizontal scroll is not 0, the PPU will display 33 tiles.
tokumaru wrote:
ArsonIzer wrote:
Why 34 times though? The rest of your explanation makes sense, but if you're talking about the horizontal tiles of a scanline, shouldn't it be incremented 32 times because the PPU draws 32 times 8x8 tiles?
I'm not sure why 34 tiles are fetched, but if the horizontal scroll is not 0, the PPU will display 33 tiles.
33? I assume that you mean the fine X scroll, which means that 1 extra tile will be displayed, which results in 2 tiles only being showed partially, right? Still doesn't explain the 34 thing, although maybe he's talking about the last 2 increments of a scanline to grab the correct tiles for the next scanline. On the timing diagram (or whatever it's called), it says there are 2 additional increments of the horizontal position after the vertical position has been incremented to grab the next 2 tiles of the next scanline. Would that make sense? But if that's the case, wouldn't a scanline still only make 32 increments per line, i.e. 30 for itself and 2 for the next line?
The PPU's memory reader operates on an 8-dot sequence. Because of how scrolling and tile data decoding work through a shift register, the PPU needs to prefetch about two tiles. The first two tiles are fetched during the back porch (end of horizontal blanking) of the previous line, and the fetches continue throughout the display portion of the current line, after which sprite tile fetches begin. The PPU only needs to fetch 33 tiles, but it fetches 34 tiles because it would have cost a few gates to design the hardware to skip fetching during the last 8-dot sequence of the line.
In other words, the PPU uses a pipeline. It must be full when the scanline starts, so it needs to start filling ahead of time. Once it gets almost to the end of the scanline, the filling that occurs after that is unnecessary since it'll stop before it empties the pipeline. But, as tepples said, it takes more hardware to stop filling at this point rather than just let it keep filling and abandon the pipeline once the visible pixels for the scanline are rendered.
blargg wrote:
In other words, the PPU uses a pipeline. It must be full when the scanline starts, so it needs to start filling ahead of time. Once it gets almost to the end of the scanline, the filling that occurs after that is unnecessary since it'll stop before it empties the pipeline. But, as tepples said, it takes more hardware to stop filling at this point rather than just let it keep filling and abandon the pipeline once the visible pixels for the scanline are rendered.
Thanks for the explanation. This isn't something that's necessary to make an accurate emulator, right? I understand that if I want to create a very accurate emulator (using split-screen effects and whatnot), I need to emulate it in a way that it draws X pixels per so many cycles, rather than drawing a single scanline every time. Would I have to emulate this hardware "garbage-filling" quirk as well, or is it irrelevant to the rendering functionality of the real NES?
MMC2 and MMC4 rely on the quirk.
ArsonIzer wrote:
tepples wrote:
If rendering is enabled, the video memory address v is updated during the sync pulse before line 0, which ends at dot 304 of the pre-render line.
Due to my lack of understanding, it took me a while to understand what the timing diagram (
http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png) can do for me. Turns out at he last VBLANK line, the horizontal positions first get reloaded, and some dozen lines later, the vertical positions get reloaded. It makes much more sense now that I've discovered how helpful the diagram can be
When I try to write to ppu on "scanline -1" (261?) adress is changes by +$1000 on each write in Nintedulator, is it correct?
So, it's not allowed to write in ppu on that line? But other emulators allow this.
In order to write to video memory on the pre-render line, you need to keep rendering turned off (PPUMASK=0) until you're done writing, and you need to turn rendering back on by x=304. I ran into this problem when trying to fit NES15 in the same bank of the multicart as another NROM-128 game because adding the NMI dispatcher pushed it just over the limit. It was relying on the fact that the PPU stays off the bus during vblank time, but some updates were overflowing into the pre-render line. Turning rendering off explicitly allowed updates to continue into the pre-render line with no problem.
Ti_ wrote:
When I try to write to ppu on "scanline -1" (261?) adress is changes by +$1000 on each write in Nintedulator, is it correct?
So, it's not allowed to write in ppu on that line? But other emulators allow this.
Accessing $2007 while rendering is active causes the VRAM address to increment in a rather strange way - it activates both "increment horizontal" and "increment vertical", but during rendering the carry bits are configured differently so it results in it incrementing both "tile X" and "scanline Y". Young Indiana Jones Chronicles relies on this behavior to make the screen shake vertically.