PPU shifting and associated timings

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
PPU shifting and associated timings
by on (#116073)
Hi everyone,

Like many here, I am developing an NES emulator and I am now going forward with my PPU implementation (only BG for now). First of all, I would like to thank the

nesdev wiki contributors for the excellent work!

From my understanding, there are two 16-bit registers and two 8-bit shift registers used during background rendering:
1) One 16-bit reg which contains only high BG tile information with lower 8 bits to be rendered first
2) One 16-bit reg which contains only low BG tile information with lower 8 bits to be render first
3) One 8-bit shift reg containing the attributes of the first tile to be rendered
4) One 8-bit shift reg containing the attributes of the second tile to be rendered
I am not sure whether the above description is correct and would like some feedback.

Also, I am having trouble with some PPU timings, and I would like to check if my understanding is correct. I am following this great diagram as a reference:
http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png

For the first tile, here's what happens:
- At cycle 321, an NT byte is placed in an internal latch
- At cycle 323, an AT byte is placed in an internal latch
- At cycle 325, low BG tile byte is placed in an internal latch
- At cycle 327, high BG tile byte is placed in an internal latch
- At cycle 329 (8 cycles later than first NT fetch), the AT byte and both low and high BG tiles are placed in corresponding shift registers (higher 8 bits for BG data).
While all this is happening, the existing contents of shift registers are shifted by one every cycle. Please correct me if I'm wrong.

Now the process continues for the second tile:
- Still at cycle 329, an NT byte is placed in an internal latch
- At cycle 331, an AT byte is placed in an internal latch
- At cycle 333, low BG tile byte is placed in an internal latch
- At cycle 335, high BG tile byte is placed in an internal latch
- At cycle 337 (8 cycles later than NT fetch), the AT byte and both low and high BG tiles are placed in corresponding shift registers (higher 8 bits for BG data).
As the first tile has been entirely shifted by then, the contents for the first tile are now present in the lower byte of the 16-bit register, and contents for the second one are placed on the higher bytes of these 16-bit registers. Again, I would like to verify if this is correct or not.

We now have data corresponding to the two first tiles to be drawn first on the next scanline. Do the shift registers STOP shifting at that point? As there are two unused fetches happening right after, I would assume they stop otherwise on-screen pixel data would be lost. Are the shifters only "shifting" between cycle 2 and cycle 256? Are they even shifting during cycles 321-336 or are the shift registers purely loaded during that time frame? The wiki (http://wiki.nesdev.com/w/index.php/PPU_rendering) mentions that every cycle, a bit is fetched from the 4 background shift registers in order to create a pixel on screen. The wiki also says that afterwards, the shift registers are shifted once, to the data for the next pixel. Every 8 cycles/shifts, new data is loaded into these registers. This information is a bit misleading to me, and I assume that shifting is not happening all the time, but I would like somebody to confirm when shifting exactly happens. I'm generally pretty confused also as to why the two 8-bit AT registers are shift register in the first place - are these also shifted by 1 bit every cycle? The diagram at the beginning of the PPU rendering wiki page is not perfectly straight-forward to me.

Now, regarding drawing - why would "4 background shift registers" (mentioned in the wiki) be needed for drawing one pixel? I assume only three would be needed (low BG, high BG, and one AT byte corresponding to the tile pixel you are rendering). Is this correct?

I think that's all the questions I have for now! I'm sorry for the long post, I just want to make sure I follow the original design as close as I can. Looking forward to your replies.

Regards,
Sebastien
Re: PPU shifting and associated timings
by on (#116075)
sronsse wrote:
why would "4 background shift registers" (mentioned in the wiki) be needed for drawing one pixel? I assume only three would be needed (low BG, high BG, and one AT byte corresponding to the tile pixel you are rendering).

The attribute is two bits deep. These shift registers are used to delay bit 1 and bit 0 of the attribute by 0-7 pixels depending on the scroll value.
Re: PPU shifting and associated timings
by on (#116081)
Hi tepples,

The part I didn't understand was the fact that the 8 bit registers do not contain the AT byte but actually palette information per pixel (the AT info has been processed already).

Thank you for your answer, I understand this part now. Any insights on some of my other questions?

Regards,
Sebastien
Re: PPU shifting and associated timings
by on (#116085)
Hi tepples,

Actually there is still one part I don't quite get. As the palette attribute shift registers are only 8-bit wide, they can only contain information to render 8 pixels, correct? But the BG data shift regs can contain data for 2 tiles, meaning 16 pixels. Does that mean that there are actually two internal latches for storing two different AT bytes? When is the 8-bit register pair loaded with tile 0 palette attributes, and with tile 1?

Regards,
Sebastien
Re: PPU shifting and associated timings
by on (#116092)
At the beginning of a tile, the 8-bit attribute shift registers contain the attribute bits for that tile, while the next tile's attribute bits are simply connected to the shift registers' inputs - since it's all the same bit, the result is the same as if the shift registers were 16 bits wide with the upper 8 bits being set to the next tile's value. The actual attribute byte for the next tile is stored in a nearby 8-bit latch, and VRAM address bits A1 and A6 are fed into 4-to-1 multiplexers to select which bits go into the shift register inputs.

If you want to see the actual circuitry, you can load up Visual2C02 and search for node 10251 - it's right in the middle of the logic you're implementing.
Re: PPU shifting and associated timings
by on (#116095)
sronsse wrote:
Hi everyone,

Like many here, I am developing an NES emulator and I am now going forward with my PPU implementation (only BG for now). First of all, I would like to thank the

nesdev wiki contributors for the excellent work!

From my understanding, there are two 16-bit registers and two 8-bit shift registers used during background rendering:
1) One 16-bit reg which contains only high BG tile information with lower 8 bits to be rendered first
2) One 16-bit reg which contains only low BG tile information with lower 8 bits to be render first
3) One 8-bit shift reg containing the attributes of the first tile to be rendered
4) One 8-bit shift reg containing the attributes of the second tile to be rendered
I am not sure whether the above description is correct and would like some feedback.


Almost, but the two 8-bit attribute shift registers both hold attribute bits for the same tile, not different tiles (one holds the high attribute bit, the other the low attribute bit).

sronsse wrote:
Also, I am having trouble with some PPU timings, and I would like to check if my understanding is correct. I am following this great diagram as a reference:
http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png

For the first tile, here's what happens:
- At cycle 321, an NT byte is placed in an internal latch
- At cycle 323, an AT byte is placed in an internal latch
- At cycle 325, low BG tile byte is placed in an internal latch
- At cycle 327, high BG tile byte is placed in an internal latch
- At cycle 329 (8 cycles later than first NT fetch), the AT byte and both low and high BG tiles are placed in corresponding shift registers (higher 8 bits for BG data).
While all this is happening, the existing contents of shift registers are shifted by one every cycle. Please correct me if I'm wrong.


Sounds right. Glad you like the diagram. :)

sronsse wrote:
Now the process continues for the second tile:
- Still at cycle 329, an NT byte is placed in an internal latch
- At cycle 331, an AT byte is placed in an internal latch
- At cycle 333, low BG tile byte is placed in an internal latch
- At cycle 335, high BG tile byte is placed in an internal latch
- At cycle 337 (8 cycles later than NT fetch), the AT byte and both low and high BG tiles are placed in corresponding shift registers (higher 8 bits for BG data).
As the first tile has been entirely shifted by then, the contents for the first tile are now present in the lower byte of the 16-bit register, and contents for the second one are placed on the higher bytes of these 16-bit registers. Again, I would like to verify if this is correct or not.


Sounds right.

sronsse wrote:
We now have data corresponding to the two first tiles to be drawn first on the next scanline. Do the shift registers STOP shifting at that point? As there are two unused fetches happening right after, I would assume they stop otherwise on-screen pixel data would be lost. Are the shifters only "shifting" between cycle 2 and cycle 256? Are they even shifting during cycles 321-336 or are the shift registers purely loaded during that time frame? The wiki (http://wiki.nesdev.com/w/index.php/PPU_rendering) mentions that every cycle, a bit is fetched from the 4 background shift registers in order to create a pixel on screen. The wiki also says that afterwards, the shift registers are shifted once, to the data for the next pixel. Every 8 cycles/shifts, new data is loaded into these registers. This information is a bit misleading to me, and I assume that shifting is not happening all the time, but I would like somebody to confirm when shifting exactly happens. I'm generally pretty confused also as to why the two 8-bit AT registers are shift register in the first place - are these also shifted by 1 bit every cycle? The diagram at the beginning of the PPU rendering wiki page is not perfectly straight-forward to me.


The shifters seem to shift between dots 2...257 and dots 322...337 (inclusive, with a shift on each of those dots). You're right that it wouldn't make sense for it to keep shifting after it fetches the second tile at the end of the line, since data for the first tile would be shifted out and lost then. It also seems the shifts before the first tile load at 329 are useless. Should add this to the wiki...

16-bit tile regs are needed to accommodate fine x scrolling, which works by selecting a bit from the lower 8 bits of the shift register. If fine x is placed on bit 7 (counting from 0), it must be possible to shift 8 times without running out of bits, and that requires 16 bits. For attribute bits you can get away with just 8 bits + 1 bit, since the bit shifted in from the "upper" part is always the same for the entire tile.

sronsse wrote:
Now, regarding drawing - why would "4 background shift registers" (mentioned in the wiki) be needed for drawing one pixel? I assume only three would be needed (low BG, high BG, and one AT byte corresponding to the tile pixel you are rendering). Is this correct?


It probably turned out to be simpler to split the two AT bits into their own shift regs like for the tile data. Makes sense with how fine x works.
Re: PPU shifting and associated timings
by on (#116097)
Hi tepples, Quietust, ulfalizer,

Thank you for your responses, everything is clear now! This will allow me to start coding away now.

Regards,
Sebastien
Re: PPU shifting and associated timings
by on (#116099)
Updated the diagram to be clearer about the shift and reload locations.
Re: PPU shifting and associated timings
by on (#116213)
Hi ulfalizer,

I decided to go ahead and start implementing cycle-correct PPU events and in the process realized that the code I was producing was pretty hard to read (basically a lot of if statements, switch/case etc.). I changed my strategy to have a static array defined during initialization with a list of all events which happen per your diagram. Every cycle I can just look up this table and figure out which events to process (NT fetch, VBLANK clear, etc.), and this also easily allows me to sleep until the next action (in case there are some idle cycles within that array). I am not sure if anyone on these forums took that approach before, but it's a pretty good technique to be able to easily have cycle-correct emulation and keep things fast as well.

I attached a visual representation of all these events using a similar color scheme as your diagram, and they do look pretty similar :) I wanted to thank you for your time producing this diagram - I don't think I would have been able to come up with clean code without it.

Regards,
Sebastien
Re: PPU shifting and associated timings
by on (#116217)
sronsse wrote:
Hi ulfalizer,

I decided to go ahead and start implementing cycle-correct PPU events and in the process realized that the code I was producing was pretty hard to read (basically a lot of if statements, switch/case etc.). I changed my strategy to have a static array defined during initialization with a list of all events which happen per your diagram. Every cycle I can just look up this table and figure out which events to process (NT fetch, VBLANK clear, etc.), and this also easily allows me to sleep until the next action (in case there are some idle cycles within that array). I am not sure if anyone on these forums took that approach before, but it's a pretty good technique to be able to easily have cycle-correct emulation and keep things fast as well.

I attached a visual representation of all these events using a similar color scheme as your diagram, and they do look pretty similar :) I wanted to thank you for your time producing this diagram - I don't think I would have been able to come up with clean code without it.

Regards,
Sebastien


I do something similar, though it's still switches in my code.

For a table of events it's probably a good idea to have separate arrays for lines and dots, since the events are the same on many of the lines. With one huge array you risk running out of cache as you scan through the table each frame.
Re: PPU shifting and associated timings
by on (#116267)
Hi ulfalizer,

You're right regarding potential cache issues, and I originally only had 4 arrays for different scanlines. I would either need to add special cases (flags for instance), or extra arrays for this. I will see what works/looks best.

Regards,
Sebastien
Re: PPU shifting and associated timings
by on (#116326)
Hello again,

Quick question: when palette attributes shift registers are reloaded, how does the PPU know which bits to grab from the AT byte? Does it actually do some arithmetic based on the lower 2 bits of coarse X scroll and lower 2 bits of coarse Y scroll?

Regards,
Sebastien
Re: PPU shifting and associated timings
by on (#116329)
sronsse wrote:
Does it actually do some arithmetic based on the lower 2 bits of coarse X scroll and lower 2 bits of coarse Y scroll?

Yes. When the attribute byte comes back, the PPU feeds the bits through a multiplexer selected by coarse Y bit 1 and coarse X bit 1.