When a loop is faster than an unrolled loop.

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
When a loop is faster than an unrolled loop.
by on (#113102)
I could get up to 5kB of sprite patterns in a loop, with variable sized updates.
I could only get up to 3kB of sprite patterns in an unrolled loop which updates every 16x16 sprite individually.

Misleading isn't it?
Re: When a loop is faster than an unrolled loop.
by on (#113104)
Code or it didn't happen. :P
Re: When a loop is faster than an unrolled loop.
by on (#113111)
psycopathicteen wrote:
I could get up to 5kB of sprite patterns in a loop, with variable sized updates.
I could only get up to 3kB of sprite patterns in an unrolled loop which updates every 16x16 sprite individually.

Apples and oranges.
Re: When a loop is faster than an unrolled loop.
by on (#113123)
He did say it was misleading.
Re: When a loop is faster than an unrolled loop.
by on (#114531)
I found this while searching for unrolling, and I want to clarify what's going on here in case anyone else comes upon this.

Super NES games commonly use 16x16 pixel or larger sprites. Because of how VRAM is laid out on this platform, as a 16-tile-wide sprite sheet, the top and bottom halves of a 16x16 pixel sprite picture have to be copied separately. But if you have a bunch of top halves put together and a bunch of bottom halves put together, you can copy those to VRAM more quickly than doing each top half and each bottom half individually. For example, a 32x32 pixel image made of four 16x16 pixel pieces can be copied as 4 top halves (one 8-tile copy) and 4 bottom halves (another 8-tile copy). What psycopathicteen is saying is that the reduced overhead of bunched copies provides more of a speedup in practice than unrolling a loop that performs one 2-tile copy at a time.

Do I understand right?
Re: When a loop is faster than an unrolled loop.
by on (#114544)
On processors with branch prediction, a loop can be faster than an unrolled loop.