Metasprite compression - my take

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Metasprite compression - my take
by on (#235142)
I'm trying my hand at compressing metasprite data and I would be very interested in your thoughts on it.

I tried finding other threads about this issue but could only find Bregalad's old thread on his/her custom format https://forums.nesdev.com/viewtopic.php?f=2&t=9724 which seemed a bit too complex for me to dare try implementing.

My take is to cut down the data from 4bytes/tile entry to 3 by taking away a big range from the x and y positions and scattering the bits for the tile character in with the other values like this:

X = x position
Y = y position
T = tile character
A = attributes


Uncompressed data:
Code:
1: XXXX xxxx
2: YYYY yyyy
3: TTTT tttt
4: AAAA aaaa


Compressed data step 1:
Code:
1: **XX xxxx
2: **YY yyyy
3: *TTT tttt
4: AAA* **aa
So what I do is I sacrifice bits I think I can work my way around not having. This means I can only store values 0-63 ($00-$7F) for x and y position. Same goes for tile character but I make sure to put an offset in the header for the metasprite to be able to display tiles beyond that range. As for the attributes, I only take away the unimplemented bits. When that's done I rearrange the data like this:

Code:
1: TTXX xxxx
2: TtYY yyyy
3: AAAt ttaa


And for unpacking this data in my drawing routine, here's a simplified version of that:
Code:
; ZP stuff

OAM_sprite_ptr_lo   .db 1
OAM_sprite_ptr_hi   .db 1

temp_var_1         .db 1
temp_var_2         .db 1
temp_var_3         .db 1
temp_var_4         .db 1

tile_offset         .db 1


; drawing routine

DrawMetaSprite:

   ldy #0
   lda (OAM_sprite_ptr_lo), y
   sta OAM_currSpriteByteSize      ; this is a value I use unrelated to the compression.

   iny
   lda (OAM_sprite_ptr_lo), y
   sta tile_offset
   dey

   ldx #0

   @drawing_loop:

         
      @temp_x    = temp_var_1
      @temp_y    = temp_var_2
      @temp_tile   = temp_var_3
      @temp_attr   = temp_var_4

      lda (OAM_sprite_ptr_lo), y
      sta @temp_x      ; we'll save this for later to do offsets to the x-position.
      and #%11000000   ; shave off all but the tile char bits (bit 6-5).
      sta @temp_tile   ; store them for later.

      iny
      lda (OAM_sprite_ptr_lo), y
      sta @temp_y      ; save y-position for later just like the x-position.
      and #%11000000   ; shave off all but the tile char bits (bit 4-3).
      lsr
      lsr            ; shift the bits.
      ora @temp_tile   ; merge with bits 6-5.
      sta @temp_tile
      ; TTTt 0000

      iny
      lda (OAM_sprite_ptr_lo), y
      sta @temp_attr   ; save for later.
      and #%00011100   ; shave off all but the tile char bits (bit 2-0).
      lsr
      ora @temp_tile   ; shift and merge with the other bits
      ; TTTt ttt0
      lsr            ; shift the result once more to get all bits in thei proper place.
      ; 0TTT tttt
      adc tile_offset   ; add the offset stored in the header for the metasprite data, incase we're using tiles within $80-$FF.
      sta $0201, x   ; store the final uncompressed tile char byte.


      ;;;; Do position offsets, attribute manipulations and so on...
      
      
   rts


Here's an example of compressed metasprite data:
Code:
test_metasprite_offset:
   .db -16, -16   ; y, x

test_metasprite_0:
   .db 16   ; header: amount of unpacked bytes in this sprite
   .db 128   ;    ; tile char offset value
   
   .db $88, $08, $1f         ;;;; .db $08, $08, $47, $03
   .db $90, $48, $03         ;;;; .db $10, $08, $48, $03
   .db $88, $90, $1f         ;;;; .db $08, $10, $57, $03
   .db $90, $d0, $03         ;;;; .db $10, $10, $58, $03


So by doing this I use 75% of the space I used before to store the metasprite data (apart from the extra byte for the offset in the header), but of course I'm adding a lot of new instructions to do the unpacking for each tile.
I've implemented all this in my engine now and have it sort of working apart from some issues I have on existing metasprites with values going outside of the range that my compression allows.

I would be very interested in hearing if my take on this makes sense and if there's anything I can do to speed up the decompression of the tile character. My idea with compressing the data this way is that it will not be too hard to understand the decompression code as well as being compatible with metasprites with a maximum width of 63x63 (actually (63+8) * (63+8), given the width and height of an individual tile).

Sorry if this post is hard to follow. I omitted most of my drawing routine to put the focus on the unpacking. Obviously the real code does a lot more in terms of flipping sprites and offsetting positions before finally storing them..

Cheers!
Re: Metasprite compression - my take
by on (#235147)
If you know your cels will often use horizontally adjacent sprites, spaced 8 pixels apart horizontally, you can do the equivalent of run-length encoding on the X, Y, and color attributes, as I did in The Curse of Possum Hollow (NES) and Libbet and the Magic Floor (GB).
Code:
1: XXXX xxxx  ; x position relative to hotspot
2: YYYY yyyy  ; y position relative to hotspot
3: 000L LLcc  ; length and color attributes
; followed by L+1 bytes of the form
4: VHTT tttt  ; individual sprite flip bits and tile number

Here, X and Y occupy a full byte, with a particular X value used as the sentinel for the end of a cel. L is the number of sprites in this horizontal strip, where 0 means one and 7 means eight. Sprite flip bits are set for individual tiles instead of a whole strip because I've found that it improves tile reuse, particularly in scenarios like the bottom half of Mario or the top half of a Goomba in the first Super Mario Bros. The tile number is 6 bits because 64 tiles or 1 KiB is a fairly good range. It's the size of MMC3 CHR windows 2-5 ($1000, $1400, $1800, and $1C00) or the starting offset of a particular sprite sheet loaded into CHR RAM.

Haunted: Halloween '85 uses a simpler format that specifies the tile number of only the first in a strip. Tile number always increases by 1 for later tiles in a strip. It gets away with this because its VRAM update engine is oriented toward reusing 16x16 pixel pieces, with four pieces allocated to each actor, instead of individual 8x8 or 8x16 pixel tiles.
Code:
1: XXXX xxxx  ; x position relative to hotspot
2: YYYY yyyy  ; y position relative to hotspot
3: VH0L LLcc  ; flip, length, and color attributes
4: TTTT tttt  ; tile number
Re: Metasprite compression - my take
by on (#235183)
Is it really necessary to store all four values of a sprite, even if they are compressed?

I do the following:

First of all, I only store the width and height of each meta sprite instead of the offset of every single tile. (So, my tiles inside the meta sprite are always positioned rectangular.)

Next thing: If you need a separate palette for each tile, you can store four palette values in one byte. Otherwise, I would simply set the palette value once, so that all tiles use the same value.

Flipping isn't stored in my case at all.
If you have characters that can be shown from the front, then you could do it like this: The left half of your meta sprite always gets drawn normally. The right half gets drawn horizontally mirrored.
This allows for symmetrical characters when you need them, but you can still draw anything else as well without any overhead. You just need to store some graphics in a mirrored way in the graphics data.

After all, do you really need a single tile that can be drawn with independent, arbitrary flipping?

Vertical flipping isn't considered at all here. When do we really need it for separate tiles, i.e. not for flipping the whole character at once? (If you have a pure top down game, you could use the same method as for horizontal flipping.)

And background priority is usually something based on the game situation anyway and not something that belongs to the sprite data itself, much less to individual tiles.

So, basically, this means we have:
width
height
n * palettes / 4 (or simply: one palette for everything)
n * tile

and that's it.
Feel free to store width and height into the same byte (since a width and height of 16 should be enough for most cases).
Re: Metasprite compression - my take
by on (#235185)
Very interesting to hear your takes on this! In my case I reuse a lot of tiles and also layer them quite alot to simulate more than 3 colors per tile, so the ability to have pixel perfect positioning, flipping and palette choice is something I aim to preserve. Maybe I could look into having tiles sorted by palette and taking away one more bit from the tile character byte, though. Maybe taking away vertical flipping even. Perhaps it would even be a good idea to have multiple compression techniques depending on the data? The compression type could be stored in the header data so that the engine can unpack it appropriately.
Re: Metasprite compression - my take
by on (#235189)
pwnskar wrote:
I would be very interested in hearing if my take on this makes sense and if there's anything I can do to speed up the decompression of the tile character.


My only advice is to consider whether the compression is even necessary. Maybe you've already done that and my post here is unnecessary, but depending on the goals of the game, saving ROM space at the expense of sprite rendering speed may not be worth it.

For example, in my game, I do my best to optimize my metasprite storage format for speed. It's stored a little bit unintuitively in ROM because it's marginally faster for rendering, and rendering speed is where my bottleneck is (I'm targeting a 512k rom, so I have plenty of ROM space). Your mileage may vary completely depending on the goals of your game. (a tiny NROM game might need every space optimization available!)
Re: Metasprite compression - my take
by on (#235190)
How many bytes are you saving with this anyway?
Re: Metasprite compression - my take
by on (#235198)
gauauu wrote:
My only advice is to consider whether the compression is even necessary. Maybe you've already done that and my post here is unnecessary, but depending on the goals of the game, saving ROM space at the expense of sprite rendering speed may not be worth it.

You might be right. I'm doing this in part to stay ahead of the problem of running out of storage or having to implement complex bank swaps to get around the issue that way. Also it just seems like a fun thing to try out.

pubby wrote:
How many bytes are you saving with this anyway?

I'm reducing every tile entry from four bytes to three so that should save me about 25% of metasprite data.
Re: Metasprite compression - my take
by on (#235213)
pwnskar wrote:
In my case I reuse a lot of tiles and also layer them quite alot to simulate more than 3 colors per tile, so the ability to have pixel perfect positioning, flipping and palette choice is something I aim to preserve.

Depending on how much layering you really need, it could be accomplished by simply storing two meta sprites in the way I described. Then you might do something like this:

If the first value in the array is not 0, then the value is the width and you process the meta sprite normally.
But if the first value in the array is 0, then this is an indicator that your meta sprite is actually a layered meta sprite. So, the next value is the number of the meta sprites that are layered. And then come the normal values for each meta sprite:
Code:
0
NumberOfMetaSprites
for i = 1 to NumberOfMetaSprites
{
    Width/Height
    Palette
    for x = 1 to Width
        for y = 1 to Height
            Tile
}

If most of your meta sprites are layered, the whole checking for 0 could be omitted and you could simply store the number of meta sprites at the start.
But if the layering is just an exception and 99% of the time you only have single-palette sprites, then the 0 is there, so that your 99% of normal sprites don't waste another byte for an indicator that would mostly have the value 1. Therefore, you avoid overhead for 99% of the meta sprites.


If you really definitely do need exact positioning for every tile, and you can ensure that no new tile will be more than seven pixels away from the previous one, then you might be able to store x and y in the same byte: Each half byte from $0 to $F is then supposed to be a signed value from -8 to +7. And your current tile is then calculated based on the position of the previous tile plus that new value.
Then you can put seven bits of the tile value plus the horizontal bit into the next byte, so you have two bytes per tile instead of three.

Whenever you need an attribute change other than horizontal flipping, you might do the thing I described above: If your value is $FF, then the next value is the new attributes value. Otherwise, you simply process normally.
This way, you only have overhead in the situations where some attribute actually changes.

O.k., if every second tile has an attribute change, this won't pay off. But if you need to change vertical flipping and palettes only once per meta sprite (after the initial value in the beginning), this might be a good compression: You have two additional bytes for every change (1: indicator that a change occurs. 2: attribute value), but zero additional bytes for every tile that uses the same attributes as the previous one.

Depending on how many times you need the horizontal flipping, you have to consider whether you put it into the normal tile value and use a seven bit tile ID, so horizontal flipping has no overhead. Or whether you use a regular eight bit tile ID and include the horizontal flipping into the attribute change mechanism.
Re: Metasprite compression - my take
by on (#235325)
Thanks for the detailed suggestions DRW!

I've had another look at my player avatar's animations and I'm using fine positioning of tiles quite a bit to get more frames and color overlays out of less tiles, so just to keep thing maintainable (for my brain) I think I will just opt to keep that feature for now. One thing you bring up that I think I can do some RLE on is the attribute bytes. I found that most of my animation frames could have tiles sorted by palette choice without breaking much of the layering. Tile flipping is also not used very much, so I could probably store the tiles in groups of palette and have most of them store only x and y positions. I could even bump the range back up to 255 again that way and not have to worry about that.

I won't have time and energy to do any of that tonight though, but I'll give post an update when I've got it working.

Thanks for all the ideas and comments, you guys are the best! :beer:
Re: Metasprite compression - my take
by on (#235326)
Here's another hint: ROM space is relatively cheap and MMC3 even allows for on-the-fly bank switching in granular slices of 64 tiles.

The eight sprites per scanline limit on the other hand is a hard limit.

So, if I had the choice between sprite layering to save individual tiles or no layering and wasting more tiles, I would choose the latter because of less potential flickering.

When it's about using more than one palette for your sprite, "Mega Man"-style, this rule of course isn't applicable.
But if you layer sprites that have the same palette, just to save PPU tiles: I say take the tile waste penalty.
Making sure that a sprite with a width of 16 pixels indeed only ever has two sprite tiles next to each other and not three is a bit more important than the fact that you might have to double the space of your graphics chip to store all graphics. Flickering is noticeable by the player. The internal chip size isn't.
Re: Metasprite compression - my take
by on (#235327)
DRW wrote:
Here's another hint: ROM space is relatively cheap and MMC3 even allows for on-the-fly bank switching in granular slices of 64 tiles.


I just looked on Infinite NES lives, and the difference in cost between a board with 128 Kb CHR and 256 Kb CHR is $0.40 USD.

Just to put a number on what you're actually looking at.

I'd add that having more than 128 Kb of graphics for a homebrew is a lot. You may not get close to that threshold anyway.

Apologies for not reading the entire thread, but if the debate is between saving CHR and using more scanlines, I would wait and see if you'll actually have any benefit from saving those tiles. If your CHR size ends up being 129, then maybe look for something to cut. If it's not though, the "benefit" of saving CHR space isn't even going to lead to a cost savings of 40 cents. So there would be no actual benefit at the end of the day.
Re: Metasprite compression - my take
by on (#235328)
Say you're using MMC3 with the C bit off, or equivalently FME-7. This means sprite address space ($1000-$1FFF) is divided into four 1K windows, each of which contains 64 8x8-pixel tiles or 32 8x16-pixel tiles. You might allocate one window for the player, one for power-ups and other common objects in levels, and two for enemies. If all cels of an enemy type's sprite sheet fit in a single 1K bank, multiple enemies of that type can share one of the two sprite windows allotted to enemies. Otherwise, such as for bosses, they would have to go in separate windows.

If the cels of an enemy type sum to over 32 unique tiles, even after having discarded flipped tiles, and you have more than two such enemies on screen at once, you will have to do one of four things:

A. Discard the enemy, which is inadvisable for important enemies such as bosses
B. Flicker, swapping some enemies' cels into the mapper's CHR windows each frame and hiding the others
C. Use an IRQ to change banks at the bottom of one sprite, provided they don't share a scanline
D. Delay an enemy from spawning, allowing it to pop into existence at its spawn point once a suitable window opens
E. Give each enemy its own block of CHR RAM and stream cels one by one into that block, attempting to predict which cel to double-buffer based on which cel is currently showing

I'm aware of games that use each of the above strategies. Super Mario Bros. uses A. A Chinese port of the beat-em-up Final Fight 3 uses B. Batman uses C for its status bar. If a particular enemy type's sprite sheet is just over 32 8x16 pixel tiles, you may have to prioritize not wasting tiles over staying under 8 on a line so that you don't get even worse flicker from employing option B.

128 KiB is easy to hit with background graphics if you're doing an art style that aggressively hides the tile grid and rarely reuses tiles from one map to the next.
Re: Metasprite compression - my take
by on (#235590)
This is a great idea, the advantage is that you have a fixed 75% rate of compression without the loss of any feature, and that the decompression might be (a little) simpler as opposed to my approach. The disadvantage is that you cannot insert any uncompressed sprites in compressed data (only an issue initially), and that you will never be able to compress more than 25%.

Discarding one bit of tile number is probably a bad idea (tm), discarding more bits in the status bit, such as the "behind background" bit, would make more sense. Whenever a sprite is behind background is something structural in the game, and not part of metasprite data itself, IMO.

In my cases I could easily reduce sprite size by half or sometimes even less by using the appropriate prediction.

DRW's approach, which is also what many games probably used, compress even more but disallows to have multiple colour per sprites or non-rectangualrly aligned sprites. This is OK when you have a large game with lots of spce in the CHR-ROM, but not in the case where you try to be tight and use CHR-ROM as efficiently as possible.
Re: Metasprite compression - my take
by on (#235651)
Bregalad wrote:
This is a great idea, the advantage is that you have a fixed 75% rate of compression without the loss of any feature, and that the decompression might be (a little) simpler as opposed to my approach. The disadvantage is that you cannot insert any uncompressed sprites in compressed data (only an issue initially), and that you will never be able to compress more than 25%.

Discarding one bit of tile number is probably a bad idea (tm), discarding more bits in the status bit, such as the "behind background" bit, would make more sense. Whenever a sprite is behind background is something structural in the game, and not part of metasprite data itself, IMO.

In my cases I could easily reduce sprite size by half or sometimes even less by using the appropriate prediction.

DRW's approach, which is also what many games probably used, compress even more but disallows to have multiple colour per sprites or non-rectangualrly aligned sprites. This is OK when you have a large game with lots of spce in the CHR-ROM, but not in the case where you try to be tight and use CHR-ROM as efficiently as possible.


Thanks!

I've had some more thought put into how I can RLE this a bit more, trying to come up with a way to have tile entries stored in just two bytes.

My thought is to have set of tiles stored like this:

Code:
Metasprite_00:
   @header:      .db XX              ; no of tiles for this set.
                 .db %AAAo ooaa      ; attributes for all tiles in set and an offset for tile chars.
   
   @tile_00:     .db %XXXX xxxx      ; x-position relative to hotspot.
                 .db %YYYY yyyy      ; y-position relative to hotspot.
                 .db %TTTT tttt      ; tile character.
               
   @tile_01:     .db %TTtX xxxx      ; bit 5-3 of tile char and bit 5-0 of x-pos.
                 .db %tttY yyyy      ; bit 2-0 of tile char and bit 5-0 of y-pos.
                                     ; tile char limited to 0-63 but offset by adding ( 64 * o )
                                     ; x- and y-pos limited to 0-31 but position is relative to last tile entry.
                                     ; 16 would have to added before compression and then subtracted again to allow
                                     ; having negaitve values. This would limit the range to -15 to 15.

If the compression comes across a new attribute value or if any of the other values are outside the range of compression, the group is ended and a new one would be made. I think this compression could work well as long as the tiles are sort of chained together, meaning the positions of a tile is not further than -15 or 15 pixels away from the previous one. The total size of one metasprite should no longer have to be limited to any total width or range of tiles.

But I'm also tempted to store tile entries like @tile_00 for all the following tiles in each group. That way I'd stay with 75% compression rate (or, slightly above, each time a new header is written) but have full range for positions and tile characters. The decompression would also be taken out of factor, as the values would be stored exactly as they are.

However, comparing these methods on a real example of 16 tile entries that actually have quite a few attribute changes, I find a difference of 44 vs 54 bytes. I'll attach an image of the example..
My initial technique that I've already implemented ends up at 48 bytes, which isn't far off from my theoritcal new one, but I don't think it should be more efficient in decompression and it would limit the total size of a sprite to 71*71 as all position values are independent and need to fit within a range of 0-63.

So I think I'll try to implement the example given in the code block above. If I'm right it should be about as expensive to decompress as my current technique but with a less limited range for metasprite dimensions and a smaller ROM footprint. For now I'll have to rearrange all tiles to an optimal order in NES Screen Tool but I suppose that could be automized in the compression later on.

EDIT: I'm afraid I forgot to factor in the relative distances between tiles when I did my calculations. The method I described in the above code block might not prove to be very effective on the example image after all. I think it would actually end up being closer to 59 with the tiles ordered with neighboring positions in mind. So in this case my current technique would be better.... hmm..

Well, most of my sprites don't use that many flips and stuff so maybe it's worth trying it anyway to see if the overall compression gets better.
Re: Metasprite compression - my take
by on (#235841)
OK, so after having spent a lot of time doing a python script to sort all tiles into groups and finally doing all the compression it turns out the results were quite disappointing....

The average compression on all my player avatar sprites ended up at only 77%. The only benefit I will have gained compared to my initial method would be that the total x and y size of a metasprite is no longer limited to a range of 71*71.
I will probably not bother trying to implement the unpacking in my actual game code though, as I don't feel that feature is enough to make up for the poor compression rate.

I think I might just settle for the initial technique for all sprites that fit within its' limitations and just have bigger ones uncompressed or split up in parts.