Sprite Rendering

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Sprite Rendering
by on (#185209)
As I currently understand it, when rendering sprites the SNES first scans through the OAM to see which sprites have V/H position values that match with the current line. Then fetches tiles for those sprites during hblank so that they can (potentially) be displayed when the next line is output.

When comparing the V position is the SNES comparing against current_scanline or [current_scanline+1]? It seems like it should compare against [current_scanline+1] since the sprite tile data that it is fetching will (potentially) be displayed on the _next_ scanline, not the current scanline.

Admittedly I could programmatically figure this out by writing a test ROM with some sprites enabled (if I actually had the knowledge to know how to do that, which I don't...yet) and running in bsnes/higan . But I think it's useful to have this information on the forums as I was unable to fine the answer elsewhere.
Re: Sprite Rendering
by on (#185213)
Are details from Anomie's timing document not enough? Quoted/underlined relevant part:

Quote:
DETAILED RENDERER TIMING
------------------------

Most of this is conjecture, based on a NES timing expetiment conducted by Brad Taylor (big_time_software@hotmail.com) around Sept 25, 2000.

All SNES VRAM memory access cycles are 4 master cycles long (the same amount of time it takes to output one pixel), compared to 8 (2 pixels) for the NES. The scanline is thus 340 memory accesses long. Also, the SNES PPU can access 2 bytes at a time (one from each VRAM chip) where the NES could only access one. Like the NES, 'rendering' begins on scanline 0, however nothing is actually output for scanline 0.

Beginning when the PPU begins outputting the first pixel on the scanline (just after H-Blank), we load the data for 32 tiles. For Modes 0-4, each BG takes 1 memory access for the tilemap word, and either 1, 2, or 4 accesses for 8 pixels of character data (depending on if the BG is 2, 4, or 8 bits, and see now why the bitplanes are stored the way they are?). For Modes 5 and 6, 2 or 4 memory accesses (again, if the BG is 2 or 4 bits) are required for 16 half-pixels of character data. Since this is at most 8 memory accesses for any BG mode, and 8 pixels are loaded at a time, you can see we just break even. For Mode 7, a tilemap entry is read from the low-byte VRAM chip and a pixel from the high-byte chip. During the rendering of the first tile, the third tile is being loaded from VRAM. This is a total of 256 memory access cycles.

Also during this time, OAM is being examined to determine the first 32 sprites on the next scanline. This (and the later loading during H-Blank) is the reason for the dummy scanline 0, otherwise there would be no sprite data for the first scanline on the screen.

During H-Blank, 68 memory access cycles are devoted to loading the next scanline from 34 4-bit sprite tiles, and 16 memory access cycles to loading the scanline for the first two tiles of the next scanline (recall that the third tile is being loaded while the first is being rendered). This totals 340 memory access cycles.

There's a lot of comparisons to that of the NES -- and we'll need to wait for byuu to comment on some of this -- but if it's like the NES, then yes, it should be scanline+1. That's based on this (last line) and this (first item). In other words: the whole "scanline 0" concept (or "pre-render scanline") I believe can explain this. byuu et al, do I have this wrong?
Re: Sprite Rendering
by on (#185217)
Quote:
This (and the later loading during H-Blank) is the reason for the dummy scanline 0, otherwise there would be no sprite data for the first scanline on the screen.

I remember people saying this about the NES some time ago, which I always find really weird, because on the NES there indeed are no sprites for the first scanline on the screen! Now I believe that the consensus is that the pre-render scanline on the NES exists because the first couple of background tiles of the next scanline are fetched during the hblank of this scanline. I wonder why they didn't take the opportunity to process sprites during the pre-render scanline too. Interesting that they fixed this on the SNES.
Re: Sprite Rendering
by on (#185218)
If I'm understanding this right then that means that setting the sprites V position to 0 would imply that the first row of pixels for that sprite would be "cut off" since no sprites are displayed on scanline 0. Is that correct?

And to clarify, I guess it would be better to say that each sprite's V position is compared not only to [current_scanline+1] but really to any range between [current_scanline+1 <= sprite_v_pos < current_scanline+1+sprite_height]
Re: Sprite Rendering
by on (#185223)
jwdonal wrote:
I guess it would be better to say that each sprite's V position is compared not only to [current_scanline+1] but really to any range between [current_scanline+1 <= sprite_v_pos < current_scanline+1+sprite_height]

I'm totally guessing here, but rather than comparing to a range (not sure if you actually suggested comparing against multiple values for each sprite or if you just gave a high level description what would conceptually be happening, but still), the math that would make more sense would be to do current_scanline+1 - sprite_v and then compare that result to sprite_height. If it's less, then that's the index of the sprite row that must be drawn, if it's more (including negative values), the sprite is not in range.
Re: Sprite Rendering
by on (#185399)
I seriously doubt there is any complex addition and subtraction going on.


In a chip design:
It's easiest to have a flip-flop reset or set on an event.
It's easiest to test for equal or not equal.
It's easy to count, and clear a count.
It's easy to test when a counter reaches a value.


So for just the sprite row counting I would imagine looks something like this:

1) End of VBlank, New Scanline: every given sprite slot's internal "visible" flag cleared, each internal "sprite row" cleared.
2) compare the value of the current scanline to each sprite_v_pos: If equal, set its "visible" flag.
3) if "visible" flag is set, fetch the data from "sprite row" offset + address base, then increment "sprite row".
4) compare each "sprite row" to global "sprite height". If equal, clear its "visible" flag.
5) next scanline, go to step 2.


Step 3 is obviously more complicated with a priority encoder determining what gets fetched.
It's most likely not exactly this process, but my point is that there are not going to be these adders and subtractions and mathematical comparisons for less than or greater than. Just exact-value tests (using parallel xor gates).
Re: Sprite Rendering
by on (#185400)
Both the NES PPU and various discrete logic arcade games test for "is this sprite in range vertically" using a subtractor. It's really not that complicated, and probably actually smaller than the array of XORs and ANDs and addressing necessary to handle an array of RAM/latches.

Furthermore, you need to perform the subtraction (up to 6 bits out of the 8 possible) in order to figure out which sliver to fetch anyway.
Re: Sprite Rendering
by on (#185411)
lidnariq wrote:
Both the NES PPU and various discrete logic arcade games test for "is this sprite in range vertically" using a subtractor. It's really not that complicated, and probably actually smaller than the array of XORs and ANDs and addressing necessary to handle an array of RAM/latches.

Furthermore, you need to perform the subtraction (up to 6 bits out of the 8 possible) in order to figure out which sliver to fetch anyway.


doubt it.
work it out in your head.

and checking for a counter reaching 8,16,32,64,or 128 only involves one AND gate.
Re: Sprite Rendering
by on (#185413)
Your skepticism is duly noted.

It doesn't change that they still use a full adder because you still need the rest (in the case of the SNES, all six lower bits) of the full adder anyway (to determine which row of the sprite to render on this scanline).