Bagel NES CHR compression.

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Bagel NES CHR compression.
by on (#177293)
Bagel is a new NES CHR compression codec. It decompresses to 64 byte blocks, saves roughly 43.3% in data size, and is only 3x as slow as a raw memory copy. The block decoder compiles to 137 bytes of ROM. Attached is a zip file containing the encoder, decoder, and a demonstration on the various setups the main block decoder can be used in. The demo also includes a game. :)


Prerelease discussion follows:
---

Last time on PBJ, tepples wrote:
Can your decoder break the RLEINC2 runs into fixed-size output packets? Say I wanted to decode 128 bytes, send that to VRAM, decode the next 128 bytes, send that to VRAM, etc. That was my original motivation behind PB8: to be able to decode while rendering is on.

So lately I've had a few compression ideas that I might try out (and possibly silently fail at). If I were to try to make another compression scheme that had a fixed the decode window of 128 bytes, and has an optional dual segment setup like pb53, would this fulfill 99% of use case of how pb53 is currently used?
Re: PBJ compression format
by on (#177294)
JRoatch wrote:
So lately I've had a few compression ideas that I might try out (and possibly silently fail at). If I were to try to make another compression scheme that had a fixed the decode window of 128 bytes, and has an optional dual segment setup like pb53, would this fulfill 99% of use case of how pb53 is currently used?

Probably. But the Stock and Shop sections in RHDE used it for a bunch of 64-byte furniture images. And it might also end up used for a 960-byte nametable (or other multiples of 64 bytes, that is, two nametable rows) followed by rows of attribute data.
Re: Another attempt at CHR compression
by on (#177915)
Some updates on my experiments. I'm mostly in a statistical gathering stage where my test data is "all-nrom.chr".

Initial tests found that matching from 64 bytes back performs marginally better then 128 bytes back. I guess that's because it's the size of a metatile (4 tiles). From that I decided that the block and window size will be 64, and as you said, 64 bytes is also the size of an attribute table.

Organizing the data so that all the first planes gets decoded before any of the second planes do, could help with reducing the number of bytes taken up by "duplicate plane" commands. but I'm still trying to find the best way to code that.

Out of the various match distances I tried, 2 bytes back did not perform as well as I hoped (I wanted it for the dithering cases). However matching to an implied plane of all 0x00 far exceeded my expectations. I guess the reasons for so many 0x00 bytes interjected everywhere is because of monochrome tiles and sprite transparency.

I made up this simple scheme as a demonstration of that last finding:
Code:
0xxxxxxx:
    ASL the control byte to get bit patten for the next 8 bytes.
    this will force the last byte to be zero, but statistically that's ok for fonts and such.
    0 = 0x00, 1 = read new byte
1xxxxxxx:
    For each bit in the control byte
    0 = use previous byte, 1 = read new byte

Example input:
00 00 00 00 00 30 30 00  00 00 00 00 00 00 00 00
ff ff ff ff ff ff ff ff  00 00 00 00 00 00 00 ff
38 6c 6c 6c 6c 6c 38 00  38 00 00 00 00 00 38 00

Example output (with added indenting):
03 30 30
00
80 ff
81 00 ff
c3 38 6c 38 00
41 38 38

Obviously this simple scheme is generally worst then pb53, *but* still happens to compress SMB1 and DABG better (to 5968 and 4746 bytes respectively). So I think this is a good start.

Note to admins: Is it ok to get the last three posts of this thread split into a new thread titled "Another attempt at CHR compression"?
Re: Another attempt at CHR compression
by on (#177945)
JRoatch wrote:
However matching to an implied plane of all 0x00 far exceeded my expectations. I guess the reasons for so many 0x00 bytes interjected everywhere is because of monochrome tiles and sprite transparency.

I made up this simple scheme as a demonstration of that last finding:
Code:
0xxxxxxx:
    ASL the control byte to get bit patten for the next 8 bytes.
    this will force the last byte to be zero, but statistically that's ok for fonts and such.
    0 = 0x00, 1 = read new byte
1xxxxxxx:
    For each bit in the control byte
    0 = use previous byte, 1 = read new byte

Example input:
00 00 00 00 00 30 30 00  00 00 00 00 00 00 00 00
ff ff ff ff ff ff ff ff  00 00 00 00 00 00 00 ff
38 6c 6c 6c 6c 6c 38 00  38 00 00 00 00 00 38 00

Example output (with added indenting):
03 30 30
00
80 ff
81 00 ff
c3 38 6c 38 00
41 38 38

Obviously this simple scheme is generally worst then pb53, *but* still happens to compress SMB1 and DABG better (to 5968 and 4746 bytes respectively). So I think this is a good start.

It's similar to common byte encoding used in The Legend of Zelda: Oracle games for Game Boy Color, where each 16-byte packet starts with a common byte along with a 16-bit bitfield of which bytes in that packet match the common byte (and thus don't need to be coded). As for your particular choice of $00 as the fixed common byte, I tried this before in another demo that I didn't manage to get to a full release before I got the contract to make Haunted: Halloween '85 back in March of last year.

How well would the RLE mode do if the definition of a "run" is changed from "use the same byte" to "use the byte that most commonly follows the previous byte"? That'd need a 256-byte table of predictions.

Quote:
Note to admins: Is it ok to get the last three posts of this thread split into a new thread titled "Another attempt at CHR compression"?

I'm still trying to figure out a workable policy of when to grant a split request going forward. Here, when the OP, poster of split point, and requester are same user, with few other users who posted in a topic (and thus few likely to be subscribed to replies through email), it looked like a strong case.
Re: PBJ revisited: Another attempt at CHR compression
by on (#178109)
tepples wrote:
How well would the RLE mode do if the definition of a "run" is changed from "use the same byte" to "use the byte that most commonly follows the previous byte"? That'd need a 256-byte table of predictions.

I'm not sure if I understood what you meant by this. So I plotted out some bitmaps that mapped block position, byte value at match distance, and byte value read.

I couldn't make sense of the noise in those bitmaps even though it seemed to exhibit some patterns.
I did see that the first byte of each 8 byte plane likely had a zero byte preceding it, and that the first plane of each tile tended to zero while the second plane tended to match the first plane.

So I tried this scheme out, and while it has obvious shortcomings (namely planes with all 0xff, inverted planes, chr bank duplication, and expansion of uncompressible data), it slightly beats pb53 for 'all-nrom.chr':
Code:
For each 64 byte block:
    block header: Each bit in the control byte specifies the type for following 8 planes.
    8 planes:
        type 0 control byte: 0 = use previous byte (first previous byte is 0x00), 1 = read new byte
        type 1 control byte:
            if odd plane; 0 = 0x00, 1 = read new byte
            else if even plane; 0 = byte from previous plane, 1 = read new byte

Example input:
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
ff ff ff ff ff ff ff ff  ff ff ff ff ff 30 70 ff
00 81 00 ff ff 00 81 00  00 00 00 00 00 00 00 00
38 6c 6c 6c 6c 6c 38 00  38 6c 6c 6c 6c 6c 38 00

Example output (with indenting):
19
  00    00
  80 ff    06 30 70
  5a 81 ff ff 81    00
  c3 38 6c 38 00    00
Re: PBJ revisited: Another attempt at CHR compression
by on (#178196)
Ok, I have a thing that's at nice compromise between high complexity and small file size. It's just like the previous post but with 2 special codes to signal a inverted+0xff plane and for duplication of the previous 64 byte block.
Code:
First, 1 byte for the block count ranging from 1-128 (or more if you want nametables as well).
    "0 blocks" is a special mode where 64 blocks are interleaved with 64 more blocks.
For each 64 byte block:
    block header: Each bit in the control byte specifies the type for following 8 planes.
        if block header is 0x00 and first plane control byte is 0x00: duplicate previous 64 byte block.
    8 planes:
        type 0 control byte:
            if odd plane; 0 = 0x00, 1 = read new byte
            else if even plane; 0 = byte from previous plane, 1 = read new byte
            if whole control byte is 0xff:
                on odd planes, 0xff 8 times
                on even planes, inverted previous plane
        type 1 control byte: 0 = use previous byte (first previous byte is 0x00), 1 = read new byte

Out of the 193 CHR files I tested, this scheme only lost to 6 of them against pb53 (including "Nuts & Milk" and "Othello"). On average the scheme saves 41.7% on filesize. For comparison tokumaru's compression saves 48.1% and pb53 saves 38.3%.

The next step is for me to implement the NES decoder and python encoder/decoder. For the NES I plan to use only 0x0140~0x017f for the decode buffer. The duplicate block command will be implemented by simply not touching the decode buffer.

Questions:
What should I name this thing?
What do I need to consider if I'm implementing this as a drop in replacement for pb53?
How should I handle input data that's not a multiple of 64 bytes?
Re: PBJ revisited: Another attempt at CHR compression
by on (#178213)
JRoatch wrote:
For the NES I plan to use only 0x0140~0x017f for the decode buffer. The duplicate block command will be implemented by simply not touching the decode buffer.
[...]
What do I need to consider if I'm implementing this as a drop in replacement for pb53?

An issue related to real-time CHR loading: If your decode buffer is 64 bytes, and you can produce only one output per frame (as implied by the fixed decode buffer address), you won't be able to fill the entire vblank. Something that expects 128 bytes per frame, such as loading screenshots in the Action 53 menu, would become slower.
Re: PBJ revisited: Another attempt at CHR compression
by on (#178218)
tepples wrote:
JRoatch wrote:
For the NES I plan to use only 0x0140~0x017f for the decode buffer. The duplicate block command will be implemented by simply not touching the decode buffer.
[...]
What do I need to consider if I'm implementing this as a drop in replacement for pb53?

An issue related to real-time CHR loading: If your decode buffer is 64 bytes, and you can produce only one output per frame (as implied by the fixed decode buffer address), you won't be able to fill the entire vblank. Something that expects 128 bytes per frame, such as loading screenshots in the Action 53 menu, would become slower.

For precisely this reason is why I'm choosing 0x0140~0x017f and not 0x0100~0x013f. To fill up 128 bytes at a time, decode a 64 byte block, copy that to the lower 64 bytes, then decode another block. While implementing this, maybe I'll find that requiring to set a buffer pointer might be better. Alternatively, and if it's technically feasible, there could be parallel text rendering. Besides that, for blank screen bulk uploads I think there shouldn't be a need to use the whole 128 byte range.
Re: PBJ revisited: Another attempt at CHR compression
by on (#178624)
After doing some back of the envelope calculations to determine the decoder complexity of the previous scheme, I decided to improve the codec some more. The details of it a bit cumbersome to explain in english. With this new codec I now got 43.3% savings in filesize (almost exactly in the middle of pb53 and tokumaru's compression), but the decompression speed theoretically the same as pb53. For each 64 byte block the minimum size is 1 byte (100% black tiles) and the maximum 65 bytes.

I still don't know what to name this thing. Any good short names based around the themes of squishing and food dehydration?
Re: PBJ revisited: Another attempt at CHR compression
by on (#178799)
I am so close to releasing this thing. The tentative name is PBH. I've implemented the block decoder. The code size is 133 bytes, and it uncompresses at around 36 cycles per output byte (+14 for uploading to the PPU). In dabg (discounting the time for PPU uploading), pb53 takes 26 cpb vs this which takes 31 cpb. The slightly large code size is due to making sure it'll behave sanely with random input. As it is currently implemented, it takes only 2 parameters: a stream pointer in zero page, and X for the offset from $0100 which is assumed to have the contents of the previously decoded block.

I still have a few interoperable things to figure out such as:
- Normal bulk mode of X number of 64 byte blocks
- Parallel segment decoding bulk mode for action53
- Streaming from a 128 byte buffer.
- Allow encoding blocks independent of each other and maybe print out a list of block offsets in the file.
- Rework PBJ by gutting the pb8 mode and instead have a signaled exit to allow the stream pointer to be reused by other things like this PBH.
- Encoder options for all of the above
- Documentation
- (secretly replace pb53 in 240p test suite)

Anything else I might need to consider?
Re: PBJ revisited: Another attempt at CHR compression
by on (#178810)
JRoatch wrote:
I still don't know what to name this thing. Any good short names based around the themes of squishing and food dehydration?

PBJR continues the naming theme, while being disgusting and squishy.
Re: PBJ revisited: Another attempt at CHR compression
by on (#178870)
Myask wrote:
JRoatch wrote:
I still don't know what to name this thing. Any good short names based around the themes of squishing and food dehydration?

PBJR continues the naming theme, while being disgusting and squishy.

I don't understand what that means, or how it relates to peanut butter and jelly.

In anycase, I'm thinking it might be better to use some name that's distinct from PBJ altogether. As it is, the PBH ('H' is for honey) file extension visually clashes with PBJ a bit too much (at least PB53 is four characters and has numbers). Maybe something that starts with the letter 'C'.
Re: PBJ revisited: Another attempt at CHR compression
by on (#178900)
PBJR Pino Batch J Roatch
Which leads to the secondary thought of Peanut Butter, Jelly, and Roach.
Re: PBJ revisited: Another attempt at CHR compression
by on (#178901)
PB53's name came from its use as a packet-oriented alternative to Apple PackBits, which I had used in previous NES games. But that works too.
Re: Bagel NES CHR compression.
by on (#179789)
Sorry for the wait. The first post has been updated with the new finished codec. For naming I decided to follow a theme of bread foods.

The zip file also contain a more optimized form of my nametable compression, but I think I'll see if I can make up a slightly better one that decompresses to some form of PPU strings before I call the nametable compression good enough.
Re: Bagel NES CHR compression.
by on (#186921)
So now I'm finally getting around to seeing where I could use Bagel.

My next project is tools to manipulate an NES Stripe Image stored starting at $0108. (I chose $0108 because of how the 6502's only autoincrementing read instruction interacts with interrupts.) This would allow me to combine CHR and nametable updates in one vblank without needing a plethora of specific update formats such as the seven different bgup routines in RHDE. An update containing eight tiles would need two packet headers:

Code:
$0100-$0107: scratch space in case the uploader is interrupted, enough for PC, P, A, X, Y
$0108-$010A: address and length for first four tiles
$010B-$014A: first four tiles
$014B-$014D: address and length for second four tiles
$014E-$018D: second four tiles
$018E: $FF terminator


I looked at bagel.s and found this doc comment for bagel_decompress_block:
Code:
;;
; bagel_stream_ptr = the input stream pointer,
;     points to after stream on exit.
; X = output buffer offset, restored on exit.
; Returns: flags of decrementing bagel_block_count


Does that mean I could call it with X=$0B to decompress four tiles and X=$4E to decompress four more?
Re: Bagel NES CHR compression.
by on (#186923)
Yes. That will work given that the output buffer in bagel's context starts at $0100.

Also disregard "[X] restored on exit" part. When bagel_decompress_block returns, X has been incremented by 64. I use this fact to have my own string writing system compute strings lengths by differences in the X register.
Re: Bagel NES CHR compression.
by on (#186924)
re: the demo

Incidentally (?), blanking and overwriting characters when the letters there are in alphabetical order at that speed has a really nice animation effect on text written in the nametable.
Re: Bagel NES CHR compression.
by on (#186926)
WheelInventor wrote:
re: the demo

Incidentally (?), blanking and overwriting characters when the letters there are in alphabetical order at that speed has a really nice animation effect on text written in the nametable.

It's kind of a throwback to the demo I posted in first post I made in this fourm.

Bonus undocumented feature of the demo: While holding B press Left or Right to see it decompress from the wrong parts of the ROM.
Re: Bagel NES CHR compression.
by on (#186927)
Oh, i see. Well, it has a nice substance to it, when it's either because of (like in this case), or based on a real process, like decompressing and uploading tiles. :beer: :) For comparison, many "text scrambler" effects for appearing/disappearing text i've seen used in animation/sfx seem to just take a seed and dish out some RND product based on that and the current frame. I think random functions often feel a little cheap/superficially imposed; because they often are. Such can be done with AfterEffects scripts, but there's also commercial plugins for the same for those who don't have the time. Sorry for the side note.
Re: Bagel NES CHR compression.
by on (#186958)
I have some ideas for improving the encoder's command-line options.

In main_encode_blocks it encodes the file as a set of independent banks, each in turn made up of pages that are swizzled together. Currently, you have hardcoded 64 blocks (4096 bytes) per page and 2 pages per bank, but I have ideas for options to change this. For example, a bank of four 2048-byte pages might be helpful for Super Mario Bros. 3-style CHR rotation in an MMC3 cart with 32K CHR RAM. Or 1024-byte pages might be helpful when loading things that remain constant across all pages of a character's sprite animation, such as a projectile that remains on screen as the player switches to other CHR banks.

I'd also like an option to write out a header listing the offset to the start of each bank's compressed data. This is important for things like RHDE, which need random access to 64-byte units (which would be 1 block in Bagel) to decode furniture graphics in the Furnish menu.

Code:
optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -d, --decompress      decode the input file
  -b, --bytes=SIZE      put SIZE bytes (multiple of 64) per page
  -I, --interleave=NUM  interleave NUM pages together (requires -b)
  --header              write the block count, interleave size, and byte
                        offset to subsequent pages (or interleaved sets)
  -o FILE, --output FILE
                        output to FILE instead of last positional arg


I'm willing to add these features myself. Do you have a public version control repository for Bagel? Or should I just attach diffs in replies to this topic?
Re: Bagel NES CHR compression.
by on (#186996)
It might be faster for you to improve on the encoder's features.

I can try today and we'll see how far I can get Maybe this next weekend I can try something. I also needed an option to disable inner block referencing, as some streaming setups can't guarantee an intact buffer.

tepples wrote:
I'd also like an option to write out a header listing the offset to the start of each bank's compressed data.

I'm assuming this would be the same format as the address list in pb53.

tepples wrote:
I'm willing to add these features myself. Do you have a public version control repository for Bagel? Or should I just attach diffs in replies to this topic?

I do not have a public repository. Feel free to place it in any of yours. I believe I got the software licencing attached correctly, GPLv3 for the python encoder, and all permissive for the decoder.
Re: Bagel NES CHR compression.
by on (#189875)
I should post this now even though It's missing decoding, but this is my idea for accommodating your encoding options.

The header if used is encoded in 2 bytes: YYYYYIII nnnnnnnn
n = number of blocks per interleave (0 means 256)
I = number of interleaves per segment minus 1
Y = number of segments (unimplemented)

For the case of non interleaved pages, the first byte is then 0.