I could get up to 5kB of sprite patterns in a loop, with variable sized updates.
I could only get up to 3kB of sprite patterns in an unrolled loop which updates every 16x16 sprite individually.
Misleading isn't it?
Code or it didn't happen.
psycopathicteen wrote:
I could get up to 5kB of sprite patterns in a loop, with variable sized updates.
I could only get up to 3kB of sprite patterns in an unrolled loop which updates every 16x16 sprite individually.
Apples and oranges.
He did say it was misleading.
I found this while searching for unrolling, and I want to clarify what's going on here in case anyone else comes upon this.
Super NES games commonly use 16x16 pixel or larger sprites. Because of how VRAM is laid out on this platform, as a 16-tile-wide sprite sheet, the top and bottom halves of a 16x16 pixel sprite picture have to be copied separately. But if you have a bunch of top halves put together and a bunch of bottom halves put together, you can copy those to VRAM more quickly than doing each top half and each bottom half individually. For example, a 32x32 pixel image made of four 16x16 pixel pieces can be copied as 4 top halves (one 8-tile copy) and 4 bottom halves (another 8-tile copy). What psycopathicteen is saying is that the reduced overhead of bunched copies provides more of a speedup in practice than unrolling a loop that performs one 2-tile copy at a time.
Do I understand right?
On processors with branch prediction, a loop can be faster than an unrolled loop.