I went a bit overboard and got greedy with using sprites for the
screen filling logo experiment.
Once i've fixed a few minor mistakes (there's currently 9 sprites on 3 of the rows), it'll be at 118 to 119 sprites on screen at the same time; overtracing the 64 limit almost by a power of 2.
This means the project got a little bigger than i originally had in mind. Now i must multiplex the sprites (early gen console style), which specifically means timed animation of sprite position and tile reference during the draw of one screen.
Before i dig into it - Am i correct if
-Sprite tile animation and position updates would need to happen during hblank
And
-Is there a shorthand answer if this is even viable?
I essentially have a time buffer of 64 scanlines before the scanner hits the next set of 'virtual' / multiplexed sprites, but it can't begin until at least a row (8 scanlines) have passed so i don't move sprites prior to them being rendered. Since the total amount of sprites is less than a power of 2, there's fewer upates to be made halfway down the screen. Attaching a sprite count list.
I can of course reset sprites back to the initial setting during vblank.
Code:
Row Spritecount
1 8
2 7
3 7
4 5
5 9 (needs a fix, cap at 8)
6 8
7 8
8 8
9 6-7 (depending on a choice)
10 6
11 9 (needs a fix, cap at 8)
12 7
13 9 (needs a fix, cap at 8)
14 8
15 7
16 4
17 3
18 2
Uploading a new display list to OAM must take place while rendering is off, which means a blank bar across the screen at least five scanlines in height. I think Bigfoot does this, but not much else.
That's using OAMDMA, right?
I was thinking using unrolled lda/sta (even if it takes longer) to do partial/finely sliced updates. If completely unrolled, i'd also have access to deciding what bytes go where. Since i've taken care to align most sprites to an 8-pixel grid, sometimes i just need to change y position, most often also tile reference, but not always. Sometimes, there's still need for changing x position, and palette. Ideally, i'd reassign one sprite to change into the next 'virtual' sprite with the same x position and possibly (second priority) what other property that might stay the same: subpalette and/or tile number.
My aim is to do this 'clean', but this illegal opcode looks interesting (quoting
this nesdev wiki article):
AXS #i ($CB ii, 2 cycles)
Sets X to {(A AND X) - #value without borrow}, and updates NZC. One might use TXA AXS #-element_size to iterate through an array of structures or other elements larger than a byte, where the 6502 architecture usually prefers a structure of arrays. For example, TXA AXS #$FC could step to the next OAM entry or to the next APU channel, saving one byte and four cycles over four INXs. Also called SBX.
It doesn't work, even with LDA/STA. PPU is constantly accessing OAM during rendering scanlines, even during hblank (it's preparing for the next scanline).
See
http://wiki.nesdev.com/w/index.php/PPU_ ... evaluation.
Just as a quick test, I fed your image to
NES Image Converter 2, and while the result did lose some detail, I think it came out quite nice. So I think you should be fine without extra sprites. This tool in particular uses 16x8 attribute areas and 8x16 sprite overlays. (It tries to intelligently decide how to allocate the attributes and where sprites are needed the most. Sometimes it succeeds, sometimes it doesn't.)
BTW, I really like how the image looks with NTSC filter!
Hm. That's a nice idea! Using a different attribute grid (even though that 'breaks' the frame for the original experiment; running this 'vanilla' on something like UNROM) should be able to lower the need for sprites considerably without any visually noticable sacrifices. Thanks! Sprite mode is a tough one. Slicing amount of sprites vs hogging sprite per scanline bandwidth in unnecessary places...
Also thanks for the clarification on how sprite evaluation works.
Tepples, your hint on how bigfoot does it might be of use for another theme / still image. Thanks!
FrankenGraphics wrote:
(even though that 'breaks' the frame for the original experiment; running this 'vanilla' on something like UNROM)
16x8 attributes can be done on UNROM (or even NROM) with timed code by switching between two nametables (with identical tile indices, but differing attribute tables) every 8 scanlines. Of course this hogs all the CPU time.
It's interesting to see how your nes converters' idea of how palette and sprite data should be organized differs from mine. I could probably learn a thing or two studying it.
It's like you said; pretty well working! I could perhaps take it from there and edit the name table /pattern table where it got a little too mangled and assume your solution for handling it in code, if you don't mind me trying.
Edit: I also wonder how your converter handles sprite generation (so that i can hand edit the output optimally). For instance, the top line used for highlights were drawn at the bottom of a sprite tile to let it not interfere with scanlines directly below it. This doesn't seem to be the case in the converter. Does it iterate solutions and select the best for minimal scanline interference/max coverage (for example testing different offsets on where to begin, like NESST does for its new nametable import), or are the priorities different?
I could probably trick it by placing a same-coloured pixel as a reference point any distance above the first sprite line.
Trying to coax the converter to do my bidding... (if that sounds demonic, well, it's DOOM) by feeding it slight changes to trig/untrig decisions being made. A few turns at it and i got pretty satisfied with the output. Just a bit of brushing up to do, and learning how to do 16x8 attributes. Your tools probably just saved me a couple of hours.
FrankenGraphics wrote:
Edit: I also wonder how your converter handles sprite generation (so that i can hand edit the output optimally). For instance, the top line used for highlights were drawn at the bottom of a sprite tile to let it not interfere with scanlines directly below it. This doesn't seem to be the case in the converter. Does it iterate solutions and select the best for minimal scanline interference/max coverage (for example testing different offsets on where to begin, like NESST does for its new nametable import), or are the priorities different?
I could probably trick it by placing a same-coloured pixel as a reference point any distance above the first sprite line.
I can't remember the exact details about what it does, but yeah at least coverage was one part of it. Probably it just greedily selects the sprite that minimizes error in the output (as long as it doesn't break the 8 sprites per scanline rule), and repeats this 64 times. The sprites are always placed on an 8x16 grid (arbitrary placement is a much more difficult problem).
BTW, you can right click the toolbar in the converter to display a few debug windows. (Hover over the images in the Debug Images window to see a tooltip explaining what each one of them shows.)
... Ok, ridiculous question. Would the timing work out such that one could disable rendering, rely on OAMADDR being 0, write 1 to 4 bytes to OAMDATA to change sprite 0, and re-enable rendering? Only being able to multiplex one sprite isn't great, but ...
If it works, the greatness would depend a bit on how many times you would be able to multiplex it. If it's as good as per scanline, you might be able replace more than just one sprite with sprite 0 on a per 8px-row basis, depending the density of their content not overlapping.
The bright dotted lines in the picture example posted above (rathergoodspr.png) could be rearranged to interfere as little as possible on a sliver basis, maybe realistically letting a per scanline-changing sprite 0 cover up to 3 or rarely even 4 such sprites in a single row.
Here's another question that's ridiculous:
Has anyone tried writing to PPUCTRL to switch sprite mode mid-screen? This project *does* have pattern space to waste to accomodate both modes for one screen. The gain would be less rigid placement where needed, sprite conservation otherwise.
I'm pretty certain that disabling rendering to upload a byte to OAM—if it works at all—would still impose a dead band of one scanline.
On the other hand, I'm about 99% confident that switching sprite height mid-render would work fine. On every scanline, OAM evaluation has to evaluate the difference of OAM[n*4] and the current scanline to establish both whether the sprite is in range and what scanline of the sprite to draw ... and switching between 16 row and 8 row is just a matter of whether the number is allowed to differ by 0-7 or 0-15.
It looks to me like the logic that determines this in Visual2C02 is specifically node 1052, because that node is grounded when both node 1036 is high and sprite_size_out is low. (Also if any of nodes 5853, 5839, 5800, 5801, 5802, 5803 are high.
Just before i try making something overly complex just like that, can someone remedy my confusion as to which of the two "B" cases is true?
Three examples of nine scanline intersecting 8x16 sprites.
#1 to #8 have high (H) priority and will always be shown.
#9 has low (L) priority and will always be canceled.
R is for row, C is for spriteCount.
B1 is true. It all happens on a per-scanline basis, it doesn't matter where sprites start or end, every scanline the PPU scans all 64 sprites and picks the ones with the highest priorities for displaying.
Thanks, that's super helpful, especially in this case.
Updated the infographic, for future reference
tepples wrote:
Uploading a new display list to OAM must take place while rendering is off, which means a blank bar across the screen at least five scanlines in height. I think Bigfoot does this, but not much else.
Now that you mentioned
Bigfoot, there is an interview of this game's programmer about what tricks he used here, including the sprite thingy -
http://www.retrogamingtimes.com/rgtimes10/