PBJ compression format

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
PBJ compression format
by on (#161617)
Attachment:
pbj-image.png
pbj-image.png [ 8.49 KiB | Viewed 3186 times ]

In August I spent a lot of time crunching numbers to try to find a way to better compress chr data. Ultimately that effort failed because the LZMPi like bit-streams I was coming up with would bloat the decoder too much for my tastes, but I did find out that the core PB8 routine of PB53 was surprisingly effective. So late November I decided to upgrade my nametable compression format by having a PB8 mode. After I posted that quick hack I then proceeded to rearrange the codespace to make a much more efficient decoder.

Code:
## PBJ Stream format:
  Initially starts in PB8 mode.
  In PB8 mode:
    00000000        : 0x00 8 times.
    01111111        : 0xff 8 times.
    0nnnnnnn ...    : For each bit in the pb8 control byte,
                      0 is a new byte, 1 is the previous byte.
  In BG mode:
    0nnnnnnn        : 128-N Run of BG
  In both modes:
    10011111 xx     : Switch to BG mode and set BG to X.
    100nnnnn xx     : 32-N incrementing run starting at X.
    10111111        : Switch to PB8 mode.
    10111110 xx yy  : Set PPU_ADDR to yyxx
    101nnnnn xx yy  : For 32-N times, emit alternately X and Y.
    110nnnnn ...    : 32-N literal bytes.
    11111111        : End stream.
    111nnnnn xx     : 32-N run of X.


PB8 is the peanut butter, RLEINC2 the jelly, and my old nametable compression the bread that combines it all together into .pbj files.

The decoder assembles to 159 bytes of ROM and uses 5 bytes of RAM.
The current encoder is a bit dumb as it's only able to use either pb8 planes or everything else but not both.

Example file sizes:
The pbj picture in this post: 4513 bytes
The gus portrait from 240p test suite: 2297 bytes
The title screen from Zooming sectary: 1423 bytes

EDIT: oops, I uploaded the wrong decoder.
Re: PBJ compression format
by on (#161620)
I assume "BG" is a common byte, used for runs of long lengths without needing to re-specify the run byte.

I see you're including a "seek in output" command (%10111110). That complicates an offline decoder, as it has to buffer the entire output in case the stream seeks backwards. What was the reasoning behind that command?

Can your decoder break the RLEINC2 runs into fixed-size output packets? Say I wanted to decode 128 bytes, send that to VRAM, decode the next 128 bytes, send that to VRAM, etc. That was my original motivation behind PB8: to be able to decode while rendering is on.
Re: PBJ compression format
by on (#161625)
tepples wrote:
I assume "BG" is a common byte, used for runs of long lengths without needing to re-specify the run byte.

That is correct.

tepples wrote:
Can your decoder break the RLEINC2 runs into fixed-size output packets? Say I wanted to decode 128 bytes, send that to VRAM, decode the next 128 bytes, send that to VRAM, etc. That was my original motivation behind PB8: to be able to decode while rendering is on.

I say continue to use PB53 for that. PBJ was design for only blank screen bulk uploads. In fact PB53 beats PBJ in chr compression (PBJ has to code a duplicate 8 byte plane twice). I suppose a decoder can be made to keep partial RLEINC2 run commands in memory for the next partial decode, but the size of such a decoder will probably be larger then just including PB53 as well.

tepples wrote:
I see you're including a "seek in output" command (%10111110). What was the reasoning behind that command?

So that a complete PPU picture can be rendered. Sometimes I also want to be able write a ASCII tile set to a different offset then $0000, or maybe I need to write just the bottom of the screen for a status bar. The double byte command already reads in two bytes, so it's not that far away from writing those two bytes to PPU_ADDR instead of PPU_DATA (12 bytes of code in the decoder).

tepples wrote:
That complicates an offline decoder, as it has to buffer the entire output in case the stream seeks backwards.

It does complicate things, but I plan to have some options to read/write a list of ppu address with corresponding file offsets, and only process files like a bag of bytes when the "seek in output" command is not present. Also I'll include a different option to write out a sav file for your editor as well.