PPU questions

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
PPU questions
by on (#174509)
Hello NES developers :D

I have some confusing questions related to the PPU

1- As far as I understood , the output which is 4 bits and equivalent to a pixel , the 4 bits are ( 2 from pattern table and 2 from attribute table ) to be concatenated forming 4 bits which is an index to a color in the palette memory
Is what I understood true ?

2- If I am true , I think there must be another memory fetch taking the above 4 bits which are index to colour to fetch the colour it self

from the palette memory

3- In this PDF (page 39) http://web.mit.edu/6.111/www/f2004/proj ... report.pdf

the diagram is showing the BG buffer , My question is why there are 4 registers ? isn't two enough for the 2 pattern tile ?
what are the other two is doing ?

I hope some one answer me quickly
Thank's in advance.
Re: PPU questions
by on (#174510)
Muhammad_R4 wrote:
Hello NES developers :D

I have some confusing questions related to the PPU

1- As far as I understood , the output which is 4 bits and equivalent to a pixel , the 4 bits are ( 2 from pattern table and 2 from attribute table ) to be concatenated forming 4 bits which is an index to a color in the palette memory
Is what I understood true ?


Close. Knowing the contents of the color palette RAM, 5 bits fully construct a pixel color index. There are four palettes for sprites, as well as four for the backdrop. What you describe is correct for the backdrop output generator, before sprite data is multiplexed in.
Re: PPU questions
by on (#174514)
can you please explain in much detail ? telling me if I am right or not and thank's
Re: PPU questions
by on (#174515)
Muhammad_R4 wrote:
1- As far as I understood , the output which is 4 bits and equivalent to a pixel , the 4 bits are ( 2 from pattern table and 2 from attribute table ) to be concatenated forming 4 bits which is an index to a color in the palette memory
Is what I understood true ?

Yes.

Quote:
2- If I am true , I think there must be another memory fetch taking the above 4 bits which are index to colour to fetch the colour it self

from the palette memory

Like OAM, palette memory (also called Color Generator Random Access Memory, or CGRAM) is on a separate bus. The eight fetches from CGRAM happen in parallel to the fetches from video memory, one for each pixel.

Quote:
3- In this PDF (page 39) http://web.mit.edu/6.111/www/f2004/proj ... report.pdf

the diagram is showing the BG buffer , My question is why there are 4 registers ? isn't two enough for the 2 pattern tile ?
what are the other two is doing ?

There are a total of six 8-bit shift registers:
  1. A parallel-in, serial-out (PISO) shift register for pattern data bit plane 0, to extract individual pixel values from the bytes
  2. A PISO shift register for pattern data bit plane 1, to extract individual pixel values from the bytes
  3. A serial-in, parallel-out (SIPO) shift register fed by shift register A, to delay bit 0 of pixel data for fine scrolling
  4. A SIPO shift register fed by shift register B, to delay bit 1 of pixel data for fine scrolling
  5. A SIPO shift register fed by attribute bit 0, to delay bit 2 of pixel data for fine scrolling
  6. A SIPO shift register fed by attribute bit 1, to delay bit 3 of pixel data for fine scrolling
Pattern data fetched from CHR ROM or CHR RAM is fed into shift registers A and B. These in turn feed into C and D, while the attribute data feeds into E and F. Together, shift registers C, D, E, and F act as a delay line, feeding a set of four 8 to 1 multiplexers (or "muxes" for short) that select a pixel from the shift registers based on the fine horizontal (X) scroll value interpreted as a delay amount.

Together, the pattern_bitmap_one line represents what I call registers A and C, and the pattern_bitmap_two line represents what I call registers B and D. This makes palette_data_one equivalent to E and palette_data_two analogous to F. The palette_data_one and palette_data_two registers are 16-bit in the cited diagram, but they really only need to be 9-bit (1 bit of source and 8 bits of shift register), as a single value applies to all eight pixels of an 8x1 pixel background sliver.
Re: PPU questions
by on (#174518)
Quote:
Like OAM, palette memory (also called Color Generator Random Access Memory, or CGRAM) is on a separate bus. The eight fetches from CGRAM happen in parallel to the fetches from video memory, one for each pixel.


So can I understand from this that after the 4 bits are fetched they are used again to address CGRAM which is on a separate bus and all this before going to the buffer ?
If so , Why in the documentation they considered the palette ram as a part of the VRAM ? -( memory mapping section in nesdev )
Re: PPU questions
by on (#174520)
Quote:
c.A serial-in, parallel-out (SIPO) shift register fed by shift register A, to delay bit 0 of pixel data for fine scrolling
d.A SIPO shift register fed by shift register B, to delay bit 1 of pixel data for fine scrolling
e.A SIPO shift register fed by attribute bit 0, to delay bit 2 of pixel data for fine scrolling
f.A SIPO shift register fed by attribute bit 1, to delay bit 3 of pixel data for fine scrolling
Pattern data fetched from CHR ROM or CHR RAM is fed into shift registers A and B. These in turn feed into C and D, while the attribute data feeds into E and F. Together, shift registers C, D, E, and F act as a delay line, feeding a set of four 8 to 1 multiplexers (or "muxes" for short) that select a pixel from the shift registers based on the fine horizontal (X) scroll value interpreted as a delay amount.


Can I understand why I want a delay

why it is not like the sprite rendering , output going out of the shift registers ?

Sorry I didn't get this part
Re: PPU questions
by on (#174523)
Muhammad_R4 wrote:
Why in the documentation they considered the palette ram as a part of the VRAM ? -( memory mapping section in nesdev )

Though video memory and CGRAM are separate areas of memory, they are both accessed through the PPU data ports. Writes to $3F00-$3FFF go to CGRAM; writes anywhere else (that is, $0000-$3EFF) go to video memory.

Quote:
Can I understand why I want a delay

The delay is how the PPU performs fine horizontal scrolling. When you set the horizontal scroll position to 0, 1, 2, 3, 4, 5, 6, or 7, you're actually delaying the pixel output by 7, 6, 5, 4, 3, 2, 1, or 0 pixels. More delay = move to the right; less delay = move to the left.
Re: PPU questions
by on (#174528)
I know this but why it can't go as follows :

I fetch the pattern bit , attributes bit ( both for 8 pixels ) , put them in a shift register then output

why all of the other registers are found ? ( specially SIPO why I would need a parallel output ? )

Sorry I didn't get it again
Re: PPU questions
by on (#174529)
The pattern slivers are always fetched on 8 pixel boundaries. Without the SIPO/mux, you would be able to scroll horizontally only at multiples of 8 pixels.
Re: PPU questions
by on (#174531)
I am sorry again but I did not get it :(

Could you explain why but in much details ?
Re: PPU questions
by on (#174553)
Please give us more details on your requests for "more details". What parts of the post did you understand, and what parts of the post did you not understand? If you are unable to express what parts of the posts you did and did not understand, I will have to break up the explanation into smaller posts, as if this were a chat room instead of a forum, and ask you whether you understood each part. Here goes:

Consider the following pixel values:
Code:
0100200303300330

They are encoded into bit planes as follows:
Code:
01000001 01100110 = $41 $66 (bit plane 0)
00001001 01100110 = $09 $66 (bit plane 1)


Did you understand this so far?
Re: PPU questions
by on (#174572)
I understand this and how data in pattern tables and attributes are translated into sprites , this is not what confusing me

I drew this to illustrate what I am thinking in

Image

here , we have 4 of 8 bit SIPO shift registers , loaded with the data during the fetch phases

and the SIPO should provide the appropriate delay

I haven't understood when you said
Quote:
The pattern slivers are always fetched on 8 pixel boundaries. Without the SIPO/mux, you would be able to scroll horizontally only at multiples of 8 pixels.
Re: PPU questions
by on (#174574)
The SIPOs are used as a delay line to delay the decoded pixel data by 0 to 7 pixels.

Let's say you get the following pixel stream out of the PISOs:
Code:
0100200303300330

Now let's feed it into the SIPOs. Here I show the contents of the shift register while it receives the 16 pixels from above. In reality, there are four shift registers, each containing one bit plane of the pixels. But for the sake of pedagogical clarity, I will show them as a single shifter containing pixels. I will use the symbol x to represent a "don't care" value, meaning a pixel that appears before (to the left of) or after (to the right of) the 16 pixels from above.

Code:
     .----- Contents of SIPO
    |
,++++++. ,- Output from PISO stage
|||||||| |
xxxxxxxx 0  "Preroll" fetches at end of
xxxxxxx0 1  previous line's horizontal
xxxxxx01 0  blanking period.  These
xxxxx010 0  pixels are not rendered.
xxxx0100 2
xxx01002 0
xx010020 0
x0100200 3

01002003 0  Active display starts here
10020030 3
00200303 3
02003033 0
20030330 0
00303300 3
03033003 3
30330033 0
03300330 x
3300330x x
300330xx x
00330xxx x
0330xxxx x
330xxxxx x
30xxxxxx x
0xxxxxxx x
^^^^^^^^
ABCDEFGH

A: 0100200303300330
B: 100200303300330x
C: 00200303300330xx
D: 0200303300330xxx
E: 200303300330xxxx
F: 00303300330xxxxx
G: 0303300330xxxxxx
H: 303300330xxxxxxx

Notice how lines B-H appear to be scrolled to the left compared to line A.

Did you understand this so far? If not, how did you think fine scrolling (by distances less than 8 pixels) would be accomplished?
Re: PPU questions
by on (#174579)
I have traced it and I understand how those values come and manipulated through the example

but still I cant connect this with how they are rendered :( :(

Quote:
Did you understand this so far? If not, how did you think fine scrolling (by distances less than 8 pixels) would be accomplished?


answering your question , I think there is a counter ( name it : pxl counter 4 bit and rolls over when = 256 )
this counter toggles the PISO loadad with the data ( say : 13023310)
1st clock 0 comes out
2nd clock 1 comes out
3rd clock 3 comes out
and so on

There must be a problem I can't see in the above idea , So I want to know it + how the SIPO solves the problem

and I really appreciate you are still helping me :)

[edit : Is this related to the fact that the tile is fetched 16 clk cycles before being rendered on the screen ? ]
Re: PPU questions
by on (#174589)
Pixels are loaded into the PISO only once every eight pixels. And all pixels loaded into the PISO at one time must come from one pair of bytes. The PPU cannot fetch, say, pixels 1-8 into the PISO because they cross two pair of bytes: the pair of bytes containing pixels 0-7 and the pair of bytes containing pixels 8-15.

If you set the horizontal scroll position to 0, 1, 2, 3, 4, 5, 6, or 7, the PPU will begin by making the following background pattern fetches:

At x=321-328, fetch the pixels at horizontal positions 0-7 of the background
At x=329-336, fetch the pixels at horizontal positions 8-15 of the background
At x=1-8, fetch the pixels at horizontal positions 16-23 of the background
At x=9-16, fetch the pixels at horizontal positions 24-31 of the background
At x=17-24, fetch the pixels at horizontal positions 32-39 of the background
At x=25-32, fetch the pixels at horizontal positions 40-47 of the background
[More fetches for later pixels on the same line omitted]

If you set the horizontal scroll position to 8, 9, 10, 11, 12, 13, 14, or 15, the PPU will begin by making the following background pattern fetches:

At x=321-328, fetch the pixels at horizontal positions 8-15 of the background
At x=329-336, fetch the pixels at horizontal positions 16-23 of the background
At x=1-8, fetch the pixels at horizontal positions 24-31 of the background
At x=9-16, fetch the pixels at horizontal positions 32-39 of the background
At x=17-24, fetch the pixels at horizontal positions 40-47 of the background
At x=25-32, fetch the pixels at horizontal positions 48-55 of the background
[More fetches for later pixels on the same line omitted]

If you set the horizontal scroll position to 16, 17, 18, 19, 20, 21, 22, or 23, the PPU will begin by making the following background pattern fetches:

At x=321-328, fetch the pixels at horizontal positions 16-23 of the background
At x=329-336, fetch the pixels at horizontal positions 24-31 of the background
At x=1-8, fetch the pixels at horizontal positions 32-39 of the background
At x=9-16, fetch the pixels at horizontal positions 40-47 of the background
At x=17-24, fetch the pixels at horizontal positions 48-55 of the background
At x=25-32, fetch the pixels at horizontal positions 56-63 of the background
[More fetches for later pixels on the same line omitted]

Do you understand the above? If you do, I will try to continue to bridge this explanation to the SIPO.
Re: PPU questions
by on (#174599)
yes I understand but I have a question

what is really the horizontal scroll ? I think it is counter ranging from 0 --> 255 for the 256 pxl on the screen , Is this true?

I will fully understand your comment , after answering this question.
Re: PPU questions
by on (#174601)
If by "horizontal scroll" you mean the value written to $2005:

Bits 7-3 of the horizontal scroll are the "coarse horizontal scroll", which correspond to bits 4-0 of internal registers t and v in PPU registers

Bits 2-0 of the horizontal scroll are the "fine horizontal scroll", which select a line coming out of the SIPO.
Re: PPU questions
by on (#174613)
Ok , you can continue to the SIPO , I understand
Re: PPU questions
by on (#174615)
So the coarse X bits of the starting address (t) scroll the background horizontally by 0, 8, 16, 24, 32, 40, ..., 240, or 248 pixels, and the SIPO adds between 0 and 7 pixels (x) to this scroll amount. Together, this adds up to a scroll between 0 and 255 pixels, the whole width of one nametable.