Generic compression on NES - *done*

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Generic compression on NES - *done*
by on (#82823)
*** EDIT ***
Now this project have been compleded. RHDN page of this project here : http://www.romhacking.net/utilities/882/

*** Original post ***
I'd like to create tools to help compression of generic data on the NES (or any other platform, for that matter).
ROM space is limited on the NES which makes compression especially important. Data that could be compressed includes (but is not restricted to) :
- Game level maps (contains some reperitiveness)
- Tilesets (I'm pretty sure tokumaru already released a great algorithm specific for tilessets but any generic algorithm should be applicable too)
- Text data (as there is only 26 letters and predictable entropy it can compress very well)
- Meta-sprite Data (with 4 bytes per hardware sprite it takes a lot of space)
- Music sequence data (contains some repetitiveness / predictable )
- 6502 code (predictable entropy)

Then there is 2 ways I can think of data being decompressed :
1) The obvious one : A whole "file" is compressed as a whole and decompressed as a whole. Of course, you need to use WRAM to decompress the file unless it' really small (and if it's reall small then why compress it). So this method is useful only if you want to compress large files.
2) The useful one : A whole "file" is compressed but you only decompress a small section of it at a time, typically less than 256 bytes.
For example, if you have a compressed script of the game, you only want to decompress a senstance at a time, if you compress a map you only need one row or column at a time, etc... and therefore no WRAM is needed.

This makes it an important point. Compression algorithms used MUST allow breaks in the compressed sequence and allot sub-point entries, so that you can only decompress a small section of the "file" when it's needed.
Most algorithms I've seen doesn't take this into account. More specifically, LZ77 requires references to previously decoded data to work. This typically makes it not usable.

So, the tool I'd like to build would have to be practical, general purpose and flexible. I'd like it to try to compress a file with multiple algorithms, and let the use choose the best one (the one that compresses better). Also, it should be easy for anyone to add their own algorithms to the list, provided they write the compression and decompression code themselves.

To be practical the tool should work for binary data but also for assembly code so that it could turn this :
Code:
Label:
   .db $ab, $7f, $1b, $7a, $02, $99, $00    ; Uncompressed data


To this :
Code:
Label:
    .db $3f, $d4, $eb         ; Compressed data

(I just wrote down random data).

This will be crucial if there is many label inside a file, so that the program can access (and decompress) a small chunk of data that is in the file (instead of decompressing the whole file).

All compression algorithms could be used if they follow these 3 conditions :
- Pseudo-random acessible data (with labels in the middle of the file) that doesn't require references to previous data.
- Don't require too much RAM to decompress
- Don't require too much CPU to decompress

The size of the decompression code should also be taken in account when comparing different algorithms to determine the most efficient one for a particular data.

by on (#82828)
A few years ago, I played around with "Huffword" compression, which is like metatiles for text. A compressed text would be broken into "pages", such as lines of dialogue in an RPG or screens of text in the game's built-in operations guide. It would consist of the following:
  • Huffman-compressed stream of word IDs
  • Pointers to start of page in the word stream
  • Huffman decoding table for word IDs
A dictionary would consist of the following:
  • Huffman-compressed character streams
  • Pointers to each word ID in the character stream (I have some ideas on how to store these pointers efficiently)
  • Huffman decoding table for characters

Thus there would be two Huffman decoding tables, one for words and one for characters. I know of an efficient way to store "canonical Huffman" decoding tables if you're curious.

With some kinds of data, the only kind of compression you can do is to rearrange data with little internal fragmentation. My music engine stores music sequence data in what I believe is a fairly efficient format: 5-bit interval from a pattern's base note and 3-bit duration (1, 2, 3, 4, 6, 8, 12, or 16 rows). In-between durations are stored as two bytes, representing two notes tied together. I'm not so sure as to the most efficient way to store sound effect sequence data or instrument envelope data though.

What kind of predictable entropy is in 6502 code? I've read code for 8-bit architectures is already a lot more dense than code for 32-bit RISC architectures such as MIPS or ARM, apart from things like Thumb.

The kind of compression you'd use for game level maps depends on how the maps are structured. For something like SMB1 (inf by 13), SMB2 (inf by 15), SMB3 (inf by 27), or Super Mario World (inf by 32), the level could be treated as a list of objects roughly sorted from left to right, and scrolling into a new column would involve finding which objects intersect a given column.

by on (#82833)
I have no idea what I'm talking about. That being said, here is a solution used by Genesis homebrew developers:
https://github.com/sikthehedgehog/mdtools

I heard it wasn't a terrible leap from C64 assembly to Amiga. Maybe some of the C or assembly code will give you ideas. I think the author mentioned it can decompress a single tile from a set just fine.

by on (#82834)
Bregalad wrote:
This makes it an important point. Compression algorithms used MUST allow breaks in the compressed sequence and allot sub-point entries, so that you can only decompress a small section of the "file" when it's needed.
Most algorithms I've seen doesn't take this into account. More specifically, LZ77 requires references to previously decoded data to work. This typically makes it not usable.
Only if you need random access. For something like a one-way scroller or a music track an LZ-77 coder with a small sliding-window can work quite nicely. At any rate small block sizes and contexts tend to really hurt most compression algorithms, so be careful.

tepples wrote:
What kind of predictable entropy is in 6502 code? I've read code for 8-bit architectures is already a lot more dense than code for 32-bit RISC architectures such as MIPS or ARM, apart from things like Thumb.
There's still a fair bit of redundancy. My LZ coder typically saves 30-40% on C64 "code", though YMMV. Aside from unrolled loops and little tables there's usually a lot of repeated variable addresses and opcodes, assuming that encoding nearby one or two-byte matches is a win.

Also even when you do have a "general-purpose" compressor working there's usually much to be gained by transforming your content to better fit it. So think hard about how you interleave your data streams and experiment with simple delta-coding and the like to expose more redundancy.

by on (#82835)
Oh I forgot to mention that I wanted to base my work on this library :

http://www.romhacking.net/docs/134/
I'm pretty sure it contains quite a few different algorithms, although each algorithms can have it's variants.
The problem will be to code all the parsing of data in C, but I guess I should first come with something that works with purely binary files, and then do something that parses text files, detects labels, and .db, .dw and .incbin statements to be able to compress the data, while keeping the rest intact.
Also I'd have to compile decompression routines for all algorithms and compute how many bytes they take, and write a comparative table on the screen to tell the user how efficient all implemented algorithms are.
This might not be easy but it will be SOOO much worth it !

tepples, of course I'm interested in your efficient way to store the binary tree !! The only time I've tried to implement huffman I used 16-bits words for the binary tree, which obviously is NOT efficient. Each node contained two pointers to both nodes, the first in case a '0' is met and the second if a '1' is met.
Then when you eventually reach a leaf, the pointer's high byte was $00 which indicates the low byte is the litteral to insert in decompressed stream.

Unfortunately, this is efficient only for data such as text where only a small part of all 256 bytes are actually used. If all 256 bytes would be used at least once, it would result in a very large tree.
I guess it would be more efficient to only use 8-bit words, but then how do you know if it's a litteral or a pointer ?

Quote:
Only if you need random access. For something like a one-way scroller or a music track an LZ-77 coder with a small sliding-window can work quite nicely. At any rate small block sizes and contexts tend to really hurt most compression algorithms, so be careful.

Interesting. I bet many stuff would not compress well with a small window though.
For example if your window is 64 bytes you'd need to remain a circular buffer of 64 bytes constantly "open" in your compressed stream, and you can only "advance" in it, never go back or anything.
For example in the case of a song that loops, you'd have to tell the compressor to reset everything at the start of the loop, and never use references to previous data, else it'll work at the first play of the song, and then it would screw up after the first loop !

In other words, all "sub-sections" that are between labels (every sentense in case of text, every metasprite in the case of metasprites, every routine in the case of 6502 code) would have to be compressed with a blank sliding window in mind, which would annihilate any efficiency of the compression.

Not that I have anything against Lxxx algorithms. If anyone has a variant of them that would work for very small amount of data at a time (using a common dictionary for the whole file) then I'm all for it too.

Quote:


What kind of predictable entropy is in 6502 code? I've read code for 8-bit architectures is already a lot more dense than code for 32-bit RISC architectures such as MIPS or ARM, apart from things like Thumb.

I'm pretty sure common opcodes and common arguments will be more frequent.
$00, $ff, $a9, $a5, $85, $4a and $0a will be very frequent bytes and could be compressed with less bits. While for example, say, $de, will be rare and be compressed with more bits.

Quote:
I have no idea what I'm talking about. That being said, here is a solution used by Genesis homebrew developers:
https://github.com/sikthehedgehog/mdtools

Nice but unfortunately it looks like it's specifically for graphics. I'm no expert, but a codec made for graphics might not work well with other kind of data. I might be wrong though.
Anyway my gold would be to made a tool that would be completely free and open source, and then anyone could write his own compression and decompression routines in C that would be added to the list my tool tries, and then it's up to the user to know which ones to actually use in their games/demoes.

@tepples : about efficient ways to encode music, I don't know. I use a format which is 1 bytes per note (4-bytes pitch and 4-byte lenght, some values acts as command instead), but then I have to use a command byte for each octave change. I also can repeat sections and call "subroutines" (only one level, I don't have a stack although a stack could be implemented).
Also, I wonder if you are going to be compressing the data anyways, maybe sometimes it might be better to store data in an inefficient way that is compressed efficiently, than in a efficient way that is compressed inefficiently. The latter could turn up like trying to compress already compressed data, which often doesn't work well (although I might be wrong).

by on (#82838)
SMB3 maps are not infinitely wide. The game loads the entire map into RAM. There are 0x1950 bytes reserved for the map in RAM. At a height of 27, that's up to 240 tiles wide.

As for the main point of the thread, how to compress stuff, never overlook explicit dictionary compression (LZ78). I saw Zelda Oracles use it, and was amazed by how well it worked. You get random-access, and a low RAM requirement.
But if you have enough RAM to hold the entire decompressed thing, an LZ77-based algorithm always beats LZ78.

by on (#82839)
Bregalad wrote:
Quote:
Only if you need random access. For something like a one-way scroller or a music track an LZ-77 coder with a small sliding-window can work quite nicely. At any rate small block sizes and contexts tend to really hurt most compression algorithms, so be careful.

Interesting. I bet many stuff would not compress well with a small window though.
For example if your window is 64 bytes you'd need to remain a circular buffer of 64 bytes constantly "open" in your compressed stream, and you can only "advance" in it, never go back or anything.

For example in the case of a song that loops, you'd have to tell the compressor to reset everything at the start of the loop, and never use references to previous data, else it'll work at the first play of the song, and then it would screw up after the first loop !

In other words, all "sub-sections" that are between labels (every sentense in case of text, every metasprite in the case of metasprites, every routine in the case of 6502 code) would have to be compressed with a blank sliding window in mind, which would annihilate any efficiency of the compression.

By a "small" window I was thinking more along the lines of a kilobyte or two stored in SRAM. And, yes, you'd only do it with one-way streaming data. If your average block is smaller than a couple of kilobytes then it's the wrong way to go.

This approach is particularly useful if for streaming off of an external medium which isn't random-access, like a floppy, tape or network. Imagine an FDS shoot'em up continually streaming in maps, objects and music.

Of course random-access to compressed data is often amortized by caching larger blocks, and hoping that the next access will tend to fall into the same block. Think packing text in RPGs into kilobyte-sized chunks.

Bregalad wrote:
Not that I have anything against Lxxx algorithms. If anyone has a variant of them that would work for very small amount of data at a time (using a common dictionary for the whole file) then I'm all for it too.

I too would be interested in any such methods and I wouldn't be surprised if it's been studied extensively, but I've never seen such a method. Generating an optimal "phrasebook" for a file is bound to be computationally infeasible but there's got to be decent heuristics out there.

How did you intend to do this for huffword, Tepples?

by on (#82843)
Bregalad wrote:
tepples, of course I'm interested in your efficient way to store the binary tree !!

In a canonical Huffman code, the codes are always lexicographically ordered by their bit strings, which increase in length. Consider this code for text:
Code:
00    a
010   t
011   i
100   n
1010   
1011  u
1100  m
1101  p
1110  l
11110 k
11111 EOF

It turns out you can reconstruct the bit patterns from code lengths (0, 1, 3, 5, 2) fairly efficiently, and then mapping code indices to code words is as easy as "atin umprk\0".

Quote:
Also, I wonder if you are going to be compressing the data anyways, maybe sometimes it might be better to store data in an inefficient way that is compressed efficiently, than in a efficient way that is compressed inefficiently. The latter could turn up like trying to compress already compressed data, which often doesn't work well (although I might be wrong).

In a few oddball cases, a round of RLE before a round of LZ can help, especially if the LZ's sliding window is short. I'd look it up in more detail, but I have pressing housework.

by on (#82844)
Dwedit wrote:
As for the main point of the thread, how to compress stuff, never overlook explicit dictionary compression (LZ78). I saw Zelda Oracles use it, and was amazed by how well it worked. You get random-access, and a low RAM requirement.

That still leaves the question of what to add to the dictionary and what to leave out. Traditional "streaming" LZ-78 builds it incremenally and adds every non-match to it, until it gets full anyway, which would be rather inefficient in this case.
Also, are their dictionaries recursively structured (like LZW) or unordered blobs of memory (like LZ-77) or built some other way?

tepples wrote:
Bregalad wrote:
Also, I wonder if you are going to be compressing the data anyways, maybe sometimes it might be better to store data in an inefficient way that is compressed efficiently, than in a efficient way that is compressed inefficiently. The latter could turn up like trying to compress already compressed data, which often doesn't work well (although I might be wrong).

In a few oddball cases, a round of RLE before a round of LZ can help, especially if the LZ's sliding window is short. I'd look it up in more detail, but I have pressing housework.
RLE is just a special-case of LZ with a single-byte offset, so it's going to be a loss unless encoding an RLE run uses significantly fewer bits than the equivalent LZ match.
Anyway, Bregalad makes a good point. It's often worth enlarging data to make it compress better, so e.g. it's probably a bad idea to bit-stuff your data feeding it through a general-purpose compressor. This is also why EXE-packers like to transform short relative branches into absolute jump addresses.

by on (#82845)
Quote:
As for the main point of the thread, how to compress stuff, never overlook explicit dictionary compression (LZ78). I saw Zelda Oracles use it, and was amazed by how well it worked. You get random-access, and a low RAM requirement.
But if you have enough RAM to hold the entire decompressed thing, an LZ77-based algorithm always beats LZ78.$

I've currently no idea how '78 works but I'll look into it.

The problem is that most info pages I can find with google insist on how the data is encoded, when I'm more interested at how it's decded but oh yeah...

Quote:
It turns out you can reconstruct the bit patterns from code lengths (0, 1, 3, 5, 2) fairly efficiently, and then mapping code indices to code words is as easy as "atin umprk\0".

I don't really understand what you mean but that library on RHDN that I linked also does something like that so I'll look into it.

@Dwedit : SMB3 uses WRAM, which I want to not always uses, as I say in my first post. Of course I have nothing against methods that would require decompressing a large file to WRAM but I think there is a total lack of info for compressing really small chunks data (< 256 bytes), which would end up more useful in games.

by on (#82858)
Look into pucrunch tools for 6502?

by on (#82860)
Pucrunch and Aplib are both LZ77-based. So is zip, 7-zip, etc. When I say that, I mean any system that looks at previous data with a sliding window. (sometimes the sliding window can extend to the entire block)
They need the entire block of ram available to do decompression, and require sequential access if you don't want to use the entire block of ram.

LZ78-style systems have random access, as long as you have a pointer table that gets you to the correct starting location.

by on (#82864)
Dwedit wrote:
Pucrunch and Aplib are both LZ77-based. So is zip, 7-zip, etc. When I say that, I mean any system that looks at previous data with a sliding window. (sometimes the sliding window can extend to the entire block)
They need the entire block of ram available to do decompression, and require sequential access if you don't want to use the entire block of ram.


The block of ram only needs to be the size of the sliding window, if you do a circular buffer. But then again, that limits you to turning off the display large updates to the ppu or only updating data from the ring buffer to port access (like the ppu) in small amounts during vblank period.

Just nitpicking, but LZ77 is the original method and uses run lengths of raw data. LZSS uses a bitmask to select if the byte read is a raw/literal or part of a command event for operating the LZ window access. I know pucruch uses LZSS. I can' t really see any LZ based compression algo's using the original LZ77 spec like I previously mentioned.

by on (#82865)
tomaitheous wrote:
The block of ram only needs to be the size of the sliding window, if you do a circular buffer. But then again, that limits you to turning off the display large updates to the ppu or only updating data from the ring buffer to port access (like the ppu) in small amounts during vblank period.

And it also limits the sliding window to the dinky little RAM in the NES unless you're paying extra for PRG RAM on the cart.

Quote:
Just nitpicking, but LZ77 is the original method and uses run lengths of raw data. LZSS uses a bitmask

I guess the meaning of "LZ77" has been generalized from the original version that strictly alternated one-codeword literals with history matches to new variants that indicate literal vs. match in some other way. Nintendo since at least the GBA has used "LZ77" to refer to an implementation of LZSS almost identical to that in the Allegro library except for a few bitfield ordering differences. LZSS has an 8-bit control word whose bits indicate one match (with fixed-width distance and length) or one literal codeword. Pucrunch doesn't use that exact format either; it uses a lot of gamma codes for lengths and such.

by on (#82866)
Pucrunch has a design flaw that Apack corrected, byte values are not guaranteed to be aligned in Pucrunch, but they are in Apack.

by on (#82916)
Dwedit, you claim that LZ78 allows random acess might be true, but still you need the whole dictionary in memory. In other words, you can immediately forget this if you don't want to use SRAM.

However, I found on CSDb someone was mentioning a LZ77 variant that, instead to refer to (offset, lenght) references to previously decompressed output data, refers to (offset, length) references to compressed input data, so that this allows random access and removes the need for a RAM buffer larger than the small part of the file you want to decode.

However, chances are that you'll have a harder time to find matching (offset,length) pairs in compressed data but who knowns.

So, I'm thinking of the following compression shemes list (for now, might add some later) :
- LZ77 variant with references to compressed data
- TinyHuff : Huffman variant where the 7 most common values are represented with 2-8 bits, and all the others with 9 bits.
- Xip 2,3,4,5 : The 2^n most common values are encoded on n+1 bits, while others are encoded with 9 bits.
- Full huffman

I'm also willing to do 6502 code that is as compact as possible, while being still decently fast. The size of the code will optionally be taken in account when comparing file sizes.

I still hesitate between coding compression in C or Java or maybe something else...

by on (#82918)
The dictionary sits in ROM, not RAM. That's the whole purpose behind an explicit dictionary, you save RAM by only decoding what needs to exist in RAM.

by on (#82920)
All this talk about LZ77 gave me some interesting inspiration:

If you have linear levels (as in, you can only scroll "forwards" or "backwards", whichever directions those may be), you can design a variant of LZ77 that'll unpack the level as you're scrolling forwards, using the contents in ram as the sliding window.

If you compress the stage twice (once from beginning to end, and again from end to beginning), you can allow the player to freely move forwards and backwards though a stage of virtually unlimited size, using the "forward stream" when moving forward, and the "backwards stream" when moving backwards.

Ideally, the decompressor will unpack individual bytes as necessary, pausing when the screen stops scrolling, and stepping backwards if you allow backscrolling, so the position isn't lost.

As for the actual compression ratio, you'd need to compress each level twice if you want 2-way scrolling, so it might not be terribly good storage wise, but it'd be incredible RAM wise, since you would only really need one page to store an entire screen's worth of metatiles for any given moment.

by on (#82921)
Or just use LZ78 and not need a sliding window at all.

by on (#82922)
Bregalad wrote:
- TinyHuff : Huffman variant where the 7 most common values are represented with 2-8 bits, and all the others with 9 bits.

In other words, an entropy code with fixed probabilities. Other such codes include Golomb codes and gamma/exp-Golomb codes. The CHR codec I used for the MGC compo cart uses gamma codes for run lengths, except it swaps the code for the most common run length with the code for 1.

Quote:
I still hesitate between coding compression in C or Java or maybe something else...

I have some example full-Huffman code in Python whose decoder shouldn't be too hard to translate into 6502. You said before that you're interested, but do you know Python?

Dwedit: I haven't read about LZ78-style methods with a static dictionary other than metatiling, SMB-style object compression, and Huffword. Have you any interesting sources for me to read?

by on (#82923)
Compressing the same data twice seems like a huge waste.

by on (#82924)
tokumaru wrote:
Compressing the same data twice seems like a huge waste.

Yet Blaster Master compresses its map data several times into metametametatiles.

EDIT: Oh wait, you were talking about the bidirectional scrolling map compression. I guess object coding or metametametatiles might be a better option.

by on (#82937)
Dwedit wrote:
The dictionary sits in ROM, not RAM. That's the whole purpose behind an explicit dictionary, you save RAM by only decoding what needs to exist in RAM.

That would be sweet. I've heard of static dictionary coders but I've never spent the time to track down and read up on the specifics. Can you describe or give us a reference on a good algorithm for crafting the dictionary of such a coder?

Anyway, I don't see how LZ-78 is any better suited to static coding than LZ-77 is. I suppose you could write down the final LZ-78 dictionary to ROM but that would be much akin to just writing down the first LZ-77 window for use as a static dictionary. Besides, I rather think the NES is better suited to copying strings from a LZ-77 window than chasing down LZ-78 trees.

by on (#82939)
tokumaru wrote:
Compressing the same data twice seems like a huge waste.

I know, it was only an idea; if you had extremely tight RAM constraints, and had to have 2-way scrolling, then this is how I'd do it. You could just as easily drop the second compression if you were willing to have 1-way scrolling, like SMB.

Metametameta tiles aren't too bad if you think about it, Tepples. If I recall correctly, Atlantis no Nazo compresses its stages into a sequence of screens that are sequences of metacolumns that are sequences of metatiles. It's pretty fast, too; the game will happy scroll at 16 pixels per second, if I remember right.

This is probably a better option, I only posted my inspiration because it sounded neat. :P

by on (#82940)
Rygar did that too.

by on (#82941)
Multilayer metatiling, or quadtree map data storage, is fairly widespread in NES games. Mega Man series has a 2-layer quadtree (32x32 made up of four 16x16s, each made up of four 8x8s), Blaster Master has a 4-layer quadtree according to this document, and I seem to remember reading that one of tokumaru's Sonic-related projects has a 5-layer quadtree (256x256 down to 8x8).

by on (#82942)
Quote:
I know, it was only an idea; if you had extremely tight RAM constraints, and had to have 2-way scrolling, then this is how I'd do it. You could just as easily drop the second compression if you were willing to have 1-way scrolling, like SMB.

It's not like I'd do it. Remember, true LZ77 compression is not possible on the NES unless you have enough RAM for the whole sliding window, for small files that is basically the whole file.

Instead, you can use any compression sheme that allows random acess and code level as a set of screens or rows or columns that you only decode one at once, and progressly advances as the player scrolls in any direction.

Even if you when to store the data compressed twice, unless you can achieve more than 50% of compression, you'd end up having a better time not compressing at all ! Sorry but your idea wasn't a good one this time.

Metametatiles isn't compressing the same data twice, it's just two layers of metatiles. I don't consider metatiles really compression, as the game designer really makes the game with metatiles in mind (instead of plain tiles).

Quote:
I have some example full-Huffman code in Python whose decoder shouldn't be too hard to translate into 6502. You said before that you're interested, but do you know Python?

I dont' know a thing abotu Python but why not. I'm open to anything, if you know a language you basically know them all. Although I'm typically confused when classes are used, which is why I prefer C over C++ as I feel like C++ is just a huge mess that didn't really improve C that much. Classes in Java are more oranized though, but I don't like how it forces all integers types to be signed and big endian.
I'd really like to do all coders and decoders in high level, in addition to 6502 decoders.

by on (#82948)
Quote:
It's not like I'd do it. Remember, true LZ77 compression is not possible on the NES unless you have enough RAM for the whole sliding window, for small files that is basically the whole file.


I've used 256byte and 512byte sliding window sizes with pucrunch before and had good results on the compression size. Again, I also used a ring/circular buffer so that's all the ram it took (the sliding window size). Well, pucrunch also takes a small 16 byte array for a decoding table IIRC other than LZ stuff.

by on (#82970)
If you don't use SRAM, even 512 bytes is a HUGE portion of the available memory. If you use SRAM however that will do the trick, but you'd have to decompress the file from the begining and go forward to the area you need just because you can't solve the backwards (lenght, offset) references otherwise... Unless you use the variant I mentioned that does references in the ROM compressed data instead of doing them in decompressed data, which allows random acess and uses no more RAM than the buffer for the data you need.

by on (#82979)
Bregalad wrote:
Unless you use the variant I mentioned that does references in the ROM compressed data instead of doing them in decompressed data, which allows random acess and uses no more RAM than the buffer for the data you need.
Random access is a bit of a stretch. You'd need to store (bit-)pointers to all potential access points and make sure they aren't straddled by match/literal runs.
This is probably fine for dialogues in an RPG but it renders any sort of pointer arithmetic rather tricky, such as to display arbitrary chunks out of large two-dimensional tilemap arrays.

by on (#83055)
There are some different kind of compression suitable for different kind of games. In many cases you could use nybbles instead of bytes as units, I suppose.

In the case of Sokoban (is one example), the player piece is in exactly one position, which does not contain a crate or box. Walls remain there, are never moved, and nothing else is in their place; and the level can be considered surrounded by walls. The number of targets is equal to the number of crates and no crate is in a position where it can never be moved (if there is one, replace it with a wall while compiling if it is already on a target, and display an error message if it is not on a target). (I have used this method; recently, in fact.)

In the case of stroker.gb, the maximum size of levels is 8x8 (not counting the borders), and each cell inside is one of two states, so I store each row as one byte. Currently I stored the size information as two bytes but it should be one byte, since there is two nybbles is enough space for that. I also compress the tile set to use one bit per pixel, and there are only fifteen tiles (digits 0 to 7 (the level numbers are displayed in octal), the "Lv" symbol, black square, white square, and two copies of the border tile (so that the code to flip the tiles can be simplified)). The program decompresses it and duplicates them using different palettes so that there are three colors (actually, shade of gray) which they can appear in during the game. I also use tile 0x9C as the tile to clear the screen with; if you know GameBoy programming, can you figure out why I have done this?

by on (#83074)
Quote:
Random access is a bit of a stretch. You'd need to store (bit-)pointers to all potential access points and make sure they aren't straddled by match/literal runs.

In most cases you'd have to store those pointers even if you left the data uncompressed.

Quote:
This is probably fine for dialogues in an RPG but it renders any sort of pointer arithmetic rather tricky, such as to display arbitrary chunks out of large two-dimensional tilemap arrays.

True. As you said, the problem comes from the fact it's two-dimentional. So you can either do random access in one of the directions (for example use pointers for each row OR each column of the maps, although it-ll end up wasting significant space), or separate the large map into screens, which are compressed separately (the solution I'd do probably).

Quote:
There are some different kind of compression suitable for different kind of games. In many cases you could use nybbles instead of bytes as units, I suppose.

This will work only if one type of nybble is much more frequent than another type, which is less likely to happen than with bytes, but it really all depends on the data you're using.

by on (#84377)
I've been reversing the game Shatterhand for a bit and it uses a very basic system.
There's a table that defines 'tiles' (2x2 8 pixel blocks) and then another table that defines metatiles (2x2 tiles), and each room in the game is composed of 8x8 metatiles.

There's a byte associated with each tile which determines tile type, i.e solid, water, climbable, pain tile, so on. And the upper two bits are the palette for the tile.

The game doesn't require extreme variation so it works well. Although I've seen similar stuff described on the forum before I just thought I'd mention how this particular commercial game did it. More knowledge can't hurt.

by on (#84378)
It's always interesting to know what the various games used. The scheme used by Shatterhand (or Solbrain, which I prefer, because that tokusatsu series was actually broadcast here in Brazil back when I was a kid) seems very standard, and works well for most platformers.

I'm a little curious to know how NES games handled slopes. Even though the NES has tons of platformers, not many of them have slopes. Of course I'm more interested in how the physics handles the slopes than in how they are stored in the ROM, so this doesn't have much to do with compression! =)

by on (#84379)
@tokumaru: I guess you've read http://games.greggman.com/game/programming_m_c__kids/ already? It contains some info on how they implemented slopes.

by on (#84380)
heh, yeah, I think that many things in that game were implemented poorly, including slopes. Don't get me wrong, that M.C. Kids article is an interesting read for anyone interested in NES programming, but IMO the product itself isn't a particularly good example of a well coded game. Some of the solutions mentioned in the article are pretty rushed and far from ideal.

I just bought a cart of the game a couple of weeks ago though, so I don't think it's a terrible game, just... mediocre I guess, both as a game and as the program behind it.

by on (#84381)
I love that game. I mean, how is it badly coded? What pieces? Yeah, they were rushed and most games were and couldn't be finely tuned at all. Bayou Billy anyone? I mean, also...if anyone else did it or was even asked to do a McDonalds game, I'm sure it would have been way worse, I think they hit the game outta the park with what they were given.

by on (#84382)
tokumaru wrote:
I just bought a cart of the game a couple of weeks ago though, so I don't think it's a terrible game, just... mediocre I guess, both as a game and as the program behind it.

Which makes it good for subdividing engines into those better than M.C. Kids and those worse than M.C. Kids. If only there were a few more such post-mortems, we could even set up a "Mohs scale" of platformer engine sophistication, analogous to those for mineral hardness, science fiction realism, rock music noisiness, and more, with more points than just "Action 52" and "M.C. Kids".

by on (#84385)
A bit off topic but do you have something public and playable yet tokumaru?

by on (#84386)
Nope... Real life sucks!

by on (#84690)
I played around with a couple different types of compression implementations in NES projects. One was an adaptation of pucrunch with a 256-byte sliding window, which enabled a number of useful optimizations. This was useful for a tracker-based music engine I was working on.

Straight-up RLE also had a number of useful properties. It provided surprisingly decent compression for typical world maps comprised of 16x16 metatiles. I split game maps into screens of 16x13 metatiles in size and separately RLE-compressed each one. The background engine was able to pull individual rows and columns from the compressed screens directly, without needing to decompress the whole thing into a buffer first, by efficiently skipping around within the compressed block. Not quite fast random access, but it worked well in this scenario.
Re: Generic compression on NES - *done*
by on (#98317)
This is a bump to say I eventually had the time to complete the project.

Download link here : http://dl.dropbox.com/u/23465629/NES_junk/CompressTools.7z

Unique features :
- Can compress data in multiple data blocks.
- Uses advanced input scripts (similar to assembly language) to define the data to be compressed
- Open source, the encoding algorithms are well commented and explained.
- Algorithms uses a plugin-like system, where anyone can program his algorithms and add them to the package
- Can output compressed data blocks to assembly language file, C header file, single binary or multiple binaries, easy to include in any project.

I made this mostly for my own use but I hope it will be useful for other people too since I took the time to make it clean and well commented.
Re: Generic compression on NES - *done*
by on (#98324)
Interesting, seems very good. Are the 6502 decompressors available anywhere?

Had a slight problem when compressing the readme.txt with the example.txt script:
Code:
C:\Users\f\Downloads\CompressTools\bin>"C:\Program Files (x86)\Java\jre7\bin\java.exe" compressTools.Main example.txt
Data parsed sucessfully !
Number of blocks : 1 Longest block : 19599 bytes.
Total size : 19599 bytes.
Minimum value : 9 Maximum value : 122
Amount of values used : 89
Largest gap of unused values : 123 to 256 of size 133

Total size after RLE encoding : 19540 bytes. Savings : 0%

Total size after BitpackRLE encoding : 19036 bytes. Savings : 2%

Total size after EscapeRLE encoding : 19412 bytes. Savings : 0%

Total size after BytePair encoding : 12275 bytes. Savings : 37%

Total size after RecursiveBytePair encoding : 10675 bytes. Savings : 45%

Total size after TinyHuff encoding : 15882 bytes. Savings : 18%

Total size after TinyHuffFixed encoding : 14277 bytes. Savings : 27%

Total size after Huffman encoding : 11728 bytes. Savings : 40%

Total size after ROM_LZ77 encoding : 15569 bytes. Savings : 20%

Total size after ROM_7bit_LZ77 encoding : 12811 bytes. Savings : 34%

Total size after RAM_LZ77 encoding : 11329 bytes. Savings : 42%

After this the program hung, eating 100% CPU. I waited for a couple of minutes until I killed the process.
Re: Generic compression on NES - *done*
by on (#98326)
Yes, this is perfectly normal.
That's because the last encoding (Static Dictionary) has a complexity of O(n^3) and performs very poorly on large data. It works fine on small datas though, which is why I kept it at all.

I really wish I could fix the algorithm to be more efficient, but I have no idea how.

The algorithm itself is pretty simple : I look for "string" matches of 16, 15, 14, ... down to 2 that are repeated at least 2 times, and replace the one that saves the most bytes if replaced, and repeat.
Unfortunately the amount of time it takes on larger inputs is crazy, because there is so many possible "strings", in an example of a 1kb block, there is :
1024 - 16 = 1008 strings of 16 bytes
1009 strings of 15 bytes
...
1023 strings of 2 bytes

(total of ~16'000 strings)
and then for all those strings I should count how many times they happen...
Readme is 19k so it would take even much more time.

Fortunately the others algorithms I implemented doesn't have such problems.
Re: Generic compression on NES - *done*
by on (#98327)
Whenever I need to search a big string for multiple strings, I always turn to Rolling Hash functions. Would anything like that help here?
Re: Generic compression on NES - *done*
by on (#98328)
Maybe ? I have no idea what it is. :?
Re: Generic compression on NES - *done*
by on (#98329)
Here's the quick version: (from Rsync's source code, LPGL. Somewhat based on adler32)
Code:
/* the Rollsum struct type*/
typedef struct _Rollsum {
    unsigned long count;               /* count of bytes included in sum */
    unsigned long s1;                  /* s1 part of sum */
    unsigned long s2;                  /* s2 part of sum */
} Rollsum;

/* macro implementations of simple routines */
#define RollsumInit(sum) { \
    (sum)->count=(sum)->s1=(sum)->s2=0; \
}

#define RollsumRotate(sum,out,in) { \
    (sum)->s1 += (unsigned char)(in) - (unsigned char)(out); \
    (sum)->s2 += (sum)->s1 - (sum)->count*((unsigned char)(out)+ROLLSUM_CHAR_OFFSET); \
}

#define RollsumRollin(sum,c) { \
    (sum)->s1 += ((unsigned char)(c)+ROLLSUM_CHAR_OFFSET); \
    (sum)->s2 += (sum)->s1; \
    (sum)->count++; \
}

#define RollsumRollout(sum,c) { \
    (sum)->s1 -= ((unsigned char)(c)+ROLLSUM_CHAR_OFFSET); \
    (sum)->s2 -= (sum)->count*((unsigned char)(c)+ROLLSUM_CHAR_OFFSET); \
    (sum)->count--; \
}

#define RollsumDigest(sum) (((sum)->s2 << 16) | ((sum)->s1 & 0xffff))


Rolling hash function: not that great for resisting hash collisions, but very fast to compute, and very fast to compute incrementally.
You roll bytes in and out. So for a length of 16, you'd roll in 16 bytes, then after that, old bytes get rolled out when you roll new bytes in.
Combine this with a hash table (that uses this hash algorithm), and you can find which strings match what you are looking for easily. Must be the same length as your current hash. Since hash collisions will be common, combine hash matching with full memory comparison.
Re: Generic compression on NES - *done*
by on (#98330)
It looks interesting and it sounds like this hash thing is probably the optimisation I was looking for. Also I remember the Java has built-in hash function, even though I have no idea what it is and how exactly it is used.
Sorry for sucking so much but I'll have to study this some other day, as right now I'll have to go to bed.
Re: Generic compression on NES - *done*
by on (#98331)
Edit: I'm not sure that I understood dwedit's suggestion when I wrote this. I think it was a reference to the Rabin-Karp algorithm which does indeed require a rolling hash?

Original message continues below:


The generation of the hash itself need not be so optimized; a rolling hash is a fast way to make a hash, but it's not necessarily the best hash to use. The part that you need to speed up the most is the searching, for which any hash will do. Here's an alternative to a rolling hash:

Code:
input: string[n]
output: hash_strings[16][n]

for i=0; i < 16; ++i
  for j=0; j < n; ++j
    hash = HASH_INIT // appropriate starting value
    for k=0; k < i; ++k
      p = j + k
      if (p >= n) break
      hash = hash_next( hash, string[p] ) // hash function that takes a byte, appends to hash
      hash_strings[k][j] = hash


Since you need to hash for mutliple lengths, something like this might be just as efficient as a rolling hash, and might allow you to choose a stronger hash function. The hashing part is only O(n) anyway, or maybe O(m^2 * n) where m is the number of string lengths you need. The real savings come from the efficient search you do next.

Once you have your hashed strings, you can build a hash table or just some sort of sort or balanced tree (either way, easy to accomplish in O(n log n)), and all your matching strings will be grouped together in the same hash bucket / contiguous sort region / branch.
Re: Generic compression on NES - *done*
by on (#98369)
thefox wrote:
Interesting, seems very good. Are the 6502 decompressors available anywhere?

I second this.
It would be really handy to have the decompressions implementation on actuall NES, or 6502 in general.
Re: Generic compression on NES - *done*
by on (#98414)
Sorry, but the lack of any 6502 code was intentional.

Probably because I want to force the users to fully understand how the encoding actually work (instead of just copying some code and not understanding anything) - and because there is several ways decoding the same data could be done, each one with it's pros & cons, and I don't see why the decoding routine should be imposed to the user.

I'll have to study hash codes harder and optimize my encodings a little - currently I don't really understand how this hash thing works at all, so don't expect the revised to come until some time.
Re:
by on (#98433)
Just now looking at this thread.

Bregalad wrote:
Quote:
I have no idea what I'm talking about. That being said, here is a solution used by Genesis homebrew developers:
https://github.com/sikthehedgehog/mdtools

Nice but unfortunately it looks like it's specifically for graphics. I'm no expert, but a codec made for graphics might not work well with other kind of data. I might be wrong though.

SLZ is a generic compression format, but it's LZ77 so it doesn't meet the "random access" criteria (on the Mega Drive you usually can afford to decompress entire chunks of data anyway). Also it doesn't meet the "real-time" criteria (by this I mean you probably should avoid using it in the middle of gameplay as it may cause framerate drops).

UFTC was designed for decompressing sprites on the fly, and of course it does allow random access... for tiles. The format is designed specifically for Mega Drive tiles, and even moreso, it was designed to play to 68000's strengths, meaning it'd do bad on the NES horribly. Although I suppose you could adapt the general idea for something else to match whatever requirements you need. It's just a glorified dictionary-based format, after all.

In practice if you really need real-time random access your best bet is to just make a dedicated format for the specific situation (as is the deal with UFTC and sprites). Dictionaries help a lot though, since you can use them as a replacement for sliding windows. And if you wonder, no, neither of those two algorithms use any RAM other than the required to store the decompressed data =P
Re: Generic compression on NES - *done*
by on (#98456)
Yes, metasprites takes SO much space !
It's 4 bytes per sprite, usually there is 4-6 sprites in a metasprite -> 16-24 bytes per metasprite, and usually 2-3 frames for enemies in 4 direction, that makes between 128 and 288 bytes per enemy, not counting several pointers needed to decode the metasprites !
Also bosses are larger which makes this even bigger.

I tried to compress some of my enemies and player metasprites sheet, apparently it compresses about 18% with ROM LZ77, which is not super-ultra cool but better than nothing.
Re: Generic compression on NES - *done*
by on (#98457)
Sounds to me like figuring out how to reverse the sprite data in would be worth it. I mean, since it offers H/V sprite flips, just find the coorrds for each sprite, and basically mirror it.
Re: Generic compression on NES - *done*
by on (#98460)
I do sprite flipping dynamically, but that's in a platformer. Other types of games, such as RPGs, usually require different sprite definitions for facing up and down, and only left/right can take advantage of flipping. If I'm not mistaken, the game Bregalad is working on has an overhead view.
Re: Generic compression on NES - *done*
by on (#98477)
If an overhead game supports 8-directional movement, northeast and southeast can be flipped to make northwest and southwest.
Re: Generic compression on NES - *done*
by on (#98629)
Don't forget, you have 3 unused bits in the attribute byte of each sprite. If you seriously needed to have compact metasprite data, here's something you can try:

A lot of times, a metasprite's sprite tiles are arranged back-to-back in a rectangle or a square:
Code:
[1][2]
[3][4]
[5][6]

If you omit the X and Y positions of the sprite tiles, you can have your metasprite routine "assume" that you want each sprite tile to be 8 pixels to the right of the previous one. That way, you only need to store a tile and an attribute byte.
Code:
[1][2][3][4] --->

One of the unused bits in the attribute byte can act as a "carriage return", placing the sprite tile on a new row 8 pixels underneath the previous one.
Code:
[1][2]! |
[3][4]! |
[5]     v

Finally, if you needed to be fancy and have overlaying sprite tiles, you can have another unused bit specify that the sprite tile comes with X and Y coordinates.

If most of your sprites were simple squares/rectangles, storing your data like this would offer huge improvements on space, but it would add some overhead in your metasprite-decoding routine.

Edit: Diagrams.
Re: Generic compression on NES - *done*
by on (#98639)
That's a pretty good idea, Drag. I once thought about saving some space by adding a second sprite decoding routine that would use the more compact rectangle representation, for simpler sprites. Each object would then call one routine or the other depending on which kind of sprites they used, but your idea is better. I already had planned something else for the unused attribute bits though (some sort of priority system), but I guess I can find another solution for that. Thanks a lot for the idea!
Re: Generic compression on NES - *done*
by on (#98655)
No problem! I'm just glad I finally had an idea that was actually helpful to someone. :P
Re: Generic compression on NES - *done*
by on (#104266)
Just wanted to say thank you for making this tool since I actually used it for something recently.

Here are some things you might want to consider changing (but are in no way critical):

* It would be nice if the tool could be called from command line so that it would compress with just the specified algorithm, not all of them. The output filename should also be specifiable. This way the tool would fit well in to build rules (such as Makefile).

* For automatically generated labels, using e.g. "bpe_left:" instead of "bpe_left" is probably more compatible (I don't know if there are assemblers out there which CAN'T handle the ":" in there).

* Some of the output data, such as the two MinMax values of BytePair algorithm would better fit being symbols instead of actual (.db) data.

* Speaking of .db, I don't know what assembler you're using (WLA DX?), but CA65 doesn't understand .db (needs .byte), so it would be nice if this could be configured.

* For whatever reason, this doesn't work:
Code:
.define foo2 2

Output:
Code:
Error in foo.txt at line 30 :
For input string: "2 2"

.define foo1 2 works so I don't know what it's doing there.

* The mini assembler is kind of picky about whitespaces, this doesn't work:
Code:
.db 12,34
Re: Generic compression on NES - *done*
by on (#104267)
The old assemblers followed this rule:
1. Labels MUST NOT follow whitespace; they MUST be the first thing on the line. A space, not a colon, denotes a label's end.
2. Anything other than a label MUST have whitespace before it on the line.

For example, .db is an instruction and .db 12,34 at the start of a line thus violates 2.

Newer assemblers use a colon to distinguish a label from an instruction.
Re: Generic compression on NES - *done*
by on (#104268)
tepples wrote:
The old assemblers followed this rule:
1. Labels MUST NOT follow whitespace; they MUST be the first thing on the line. A space, not a colon, denotes a label's end.
2. Anything other than a label MUST have whitespace before it on the line.

For example, .db is an instruction and .db 12,34 at the start of a line thus violates 2.

Newer assemblers use a colon to distinguish a label from an instruction.

Those are not the problems in this case case, though, the problem is there's no white space after the comma.
Re: Generic compression on NES - *done*
by on (#104367)
Thank you very much for the feedback. I am very pleased that this project was useful to someone else !
I shall do the improvements you mention when I get the time to do it. Most of them sound pretty simple.

The reason the .db parser is so picky is because I coded it all by myself and I didn't know too much how to do it so I coded something randomly that worked for typcial cases and didn't bother to stress test it.
I think a comma without a white space should definitely be acceptable, and I don't know why there is an error when you use Foo2. Perhaps "foo" is already defined, and I replace it by it's address before it's followed by a 2 which makes no sense ?
Re: Generic compression on NES - *done*
by on (#104526)
OK I've made several of the changes you suggested, and have at least semi-valid reasons not to have changed the others :

- Now all labels are followed by : in outputs files. Which will create weird labels ending with :: if you try to chain algorithm, but at least it will be visible that you chained algorithms that way.
- You can compress using only a single algorithm with the "fix algorithm" -fa option, followed by the number of the algorithm you want to use. I know numbers suck, but I wanted to keep it generic, that way any algorithms can be removed or added in a completely generic way. Calling them by name would have been possible but I'm lazy. In another version pehaps.
- You can force the output to the file you desire with the -o option. It's recommended to use this alongside with -fa, else you'd end up with a single file compressed with the last algorithm, each algorithm overwriting output from the previous one.
- You can change .db into .byte and .dw with .word with the -byte option

Now what way NOT changed :
- .db is still picky about the format. The reason is that if I replaced commas with white spaces, and threat white spaces as separators, then .db 12 34 would have been the same as .db 12, 34 which is a problem. I'd have to put some trought to fix this problem so I can't do it on just a quick fix. For a future version maybe.

- I have no idea what is causing the foo2 bug. I'd have to investigate this and release a fix

- Minmax is still defined as data. The reason for this is that I really do a one to one compression, I compress a List<List<Byte>> into another List<List<Byte>>, both with associated labels. If I had to make the difference between parameters and data, it would not be a one to one compression, and things would not be as generic as they are now (i.e. some coding and decoding algorithm would need more parameters than just a List<List<Byte>> to work, which I did not want to have because they could not expand the same Enoding interface if this were to happen).

The update is temporarly available at Dropbox, and will be released on Romhacking.net very soon.

I hope this will be useful to everyone, and I am looking forward to other feedback in order to improve this tool.
Re: Generic compression on NES - *done*
by on (#104535)
Bregalad wrote:
OK I've made several of the changes you suggested, and have at least semi-valid reasons not to have changed the others :

Thanks! The ones that you didn't change weren't a big deal anyways, I can work around them easily.
Re: Generic compression on NES - *done*
by on (#104590)
OK I tried the new version, here are some further suggestions.

Bregalad wrote:
- Now all labels are followed by : in outputs files. Which will create weird labels ending with :: if you try to chain algorithm, but at least it will be visible that you chained algorithms that way.

Yes but then the files will be unusable without modification, so I don't think it's a good thing. Also this happens if the input file has labels ending with ":" in them. So I think you should at least strip the ":" from the end of labels read by the mini assembler, or simply (as a quick fix) strip all extra ":" from end of a label when outputting them.

Quote:
- You can compress using only a single algorithm with the "fix algorithm" -fa option, followed by the number of the algorithm you want to use. I know numbers suck, but I wanted to keep it generic, that way any algorithms can be removed or added in a completely generic way. Calling them by name would have been possible but I'm lazy. In another version pehaps.

If -fa is used, it would be nice if it also didn't output "no_coding.asm" file.

Quote:
- You can force the output to the file you desire with the -o option. It's recommended to use this alongside with -fa, else you'd end up with a single file compressed with the last algorithm, each algorithm overwriting output from the previous one.

I'd prefer if it was possible to set the extension of the file on the command line (e.g. "-o foo.s") instead of it always appending ".asm" at the end of the filename.
Re: Generic compression on NES - *done*
by on (#108264)
Here are some test results profiling Bregalad's CompressionTools for actual data in my Simon's Quest Retranslation.

The methods listed in the bottom of this table are from CompressionTools version 1.1. The others are the compressors I wrote in the specified Retranslation versions.
Code:
FILE#:          0       1       2       3       4       5       6       7       9       10      11      13      14      15              16
FILE:           blank   title   ending  passwd  passwd2 menu    crPalFI crNtaFI crNtaEN srPalFI srNtaFI srNtaEN mapPAL  mapNTA  TOTAL:  +VRAM
--------------- --------------------------------------------    ----------------------- ----------------------- --------------  ------
Uncompressed:   2048    2048    1024    2048    1024    1016    32      1024    1024    32      2048    2048    32      2048    17496   74752
--------------- --------------------------------------------    ----------------------- ----------------------- --------------  ------
RLE:            32      802     531     1524    705     545     33      635     644     32      920     919     33      1266    8621    57280
BitpackRLE:     16      -       -       -       -       -       32      -       -       27      -       -       32      -       -       -
EscapeRLE:      24      803     547     1530    709     549     32      640     649     32      922     921     32      1266    8656    -
BytePair:       1026    -       685     1275    662     632     23      681     690     31      1247    1247    31      1585    -       -
RecBytePair:    22      -       469     865     386     344     21      464     484     31      682     682     31      1294    -       -
TinyHuff:       520     949     610     1409    536     567     23      621     630     27      1155    1155    29      1476    9707    -
TinyHuffFixed:  514     985     612     1449    575     577     18      682     691     24      1293    1292    26      1486    10224   -
Huffman:        -       1070    799     1450    717     734     67      721     731     53      1185    1184    56      1686    -       -
ROM_LZ77:       128     772     690     1148    542     619     32      648     669     34      1040    1036    34      923     8315    63620
ROM_7bit_LZ77   68      -       -       -       -       -       28      -       -       32      -       -       32      -       -       -
RAM_LZ77:       34      554     557     728     410     395     26      555     572     34      704     703     34      808     6114    54844
StaticDic:      145     -       524     943     500     428     24      550     565     32      799     799     32      1191    -       -
--------------- --------------------------------------------    ----------------------- ----------------------- --------------  ------
Tokumaru:       147     823     569     1663    757     559*    -       686     702     -       113     1129    -       1385    -       44472*
pucrunch**:     301     710     676     779     605     555     314     689     710     325     787     786     325     951     8513**  -
appack.exe:     31      469     435     533     343     295     43      435     460     55      519     518     54      718     4908    35196
--------------- --------------------------------------------    ----------------------- ----------------------- --------------  ------
1.3.8:          33      679     467     1487    613     484     34      616     625     33      880     879     34      525     7389    57341
2.3.1:          45      377     467     1168    532     311     21      616     625     33      870     869     34      525     6493    57312
--------------- --------------------------------------------    ----------------------- ----------------------- --------------  ------
Deflate:        16      435     418     468     324     259     20      384     397     32      492     491     703     435     4523    35054
--------------- --------------------------------------------    ----------------------- ----------------------- --------------  ------


Deflate is provided for curiosity only. Like RAM_LZ77, it is way too complex to be implemented on NES.
Tokumaru crashed for 32-byte files, and failed to work for the 1016-byte file. Increasing the size to 1024 produced the result of 559 bytes. It also complained that the 74 kilobyte VRAM file contained too many tiles, but hacking the source code increasing the limit from 512 tiles to 10000 tiles produced a successful compression (don't know if it is decompressable).
pucrunch includes a decompressor in each data, and thus has a large starting cost.

From the CompressTools selection, there were quite a few codecs that failed to apply to one or more files, and as such they cannot be said to be general-purpose. And EscapeRLE got a "verify error" for files 0 and 1 (harmless, as the data was valid).

The compression methods marked with a mere version number (1.3.8 and 2.3.1) stand for the compression method I wrote myself for Simon's Quest. Turns out it is not too bad a general-purpose compressor, though it is mostly oriented for compression of nametables. I also profiled it for the game's VROM data (with duplicate tiles removed), and it achieved a performance somewhere between ROM_LZ77 and RAM_LZ77.

The compression format, basically RLE with tweaks, is documented below:
Code:
        ; LIT: Input byte c = 0..3F:
        ;        Put next (c+0x01) bytes verbatim, except BACKWARDS
        ; END: Input byte c = 40:
        ;        End stream
        ; SEQ: Input byte c = 41..7F:
        ;        Read byte b
        ;        Put b, (c-0x3F) times; add 1 to b after each iteration
        ; DBL: Input byte c = 80..9F:
        ;        Read byte b1
        ;        Read byte b2
        ;        Put b1, (c-0x7D) times; swap b2 and b1 after each iteration
        ; RUN: Input byte c = A0..FF:
        ;        Read byte b
        ;        Put b, (0x101-c) times

The earlier version was almost the same, except RUN was 80..FF and DBL did not exist.
You can find the source code to the decompressor for the 1.3.8 version here. It assembles into 92 bytes, and uses 4 bytes from zeropage for parameters. It decompresses directly into PPU's $2007 port.
The 2.3.1 version can be found here. It assembles into 123 bytes.
You can find an example compressor here: http://bisqwit.iki.fi/src/nes-ppu_rlein ... ss.php.txt

If you wish to perform independent tests, you can download the files I used for testing here.
Re: Generic compression on NES - *done*
by on (#108266)
I'd say test out APLIB, pucrunch, and LZMA (same compression as 7-zip) as well. APLIB is fast for decompressing, since it uses a nice trick where 8-bit values are always byte aligned. Obviously not as fast as RLE, but still good.

APLIB is one of the best I've seen that has very low system requirements to decompress.

Also test out "tokumaru_tile_compression" for graphics.
Re: Generic compression on NES - *done*
by on (#108268)
LZMA actually produced just 10 bytes smaller total than Deflate when I tested, for this data.
I should test APLIB, or rather my reimplementation thereof. I'm not sure how efficient it would be to rewrite it for 6502 though. The decompressor is neat and nice in 8086 assembler, but...
And I'll see about Tokumaru's project, as well. Thanks for the suggestion.

EDIT: Added Tokumaru.
APLIB's algorithm (which is LZ-based by the way) is a bit of trouble due to its thousands of options, and the fact that the choice of those options very much determines how large the decompressor will be. My COM file packer tests those options in conjuction with the decompressor size and chooses the smallest combination. Oh, and LZ-based are trouble for 6502 in general.
(Disclaimer: My knowledge of APLIB comes almost entirely through reverse engineering the executable compressor of aPACK.)
Re: Generic compression on NES - *done*
by on (#108277)
If pucrunch was displaying "standalone decompressor required", then it wasn't including any decompression code.
Re: Generic compression on NES - *done*
by on (#108284)
Bisqwit wrote:
APLIB's algorithm (which is LZ-based by the way) is a bit of trouble due to its thousands of options, and the fact that the choice of those options very much determines how large the decompressor will be. My COM file packer tests those options in conjuction with the decompressor size and chooses the smallest combination. Oh, and LZ-based are trouble for 6502 in general.
(Disclaimer: My knowledge of APLIB comes almost entirely through reverse engineering the executable compressor of aPACK.)

There's a 6502 decompressor for apLib at http://jiggawatt.org/badc0de/decrunch/a ... h_6502.asm, it works well enough (I used it in STREEMERZ). I'm not sure what compression options you're talking about, appack.exe I've been using (from apLib's "examples" directory) didn't give me any.
Re: Generic compression on NES - *done*
by on (#108291)
Dwedit wrote:
If pucrunch was displaying "standalone decompressor required", then it wasn't including any decompression code.

It never did, unless I missed it.

thefox wrote:
There's a 6502 decompressor for apLib at http://jiggawatt.org/badc0de/decrunch/a ... h_6502.asm, it works well enough (I used it in STREEMERZ). I'm not sure what compression options you're talking about, appack.exe I've been using (from apLib's "examples" directory) didn't give me any.

Interesting.
The compression options may have been of my adding, then. It has been years.

You can find my reimplementation of aPack, suitable for small .com files, here: http://bisqwit.iki.fi/src/apacker-com.cc
To compile it, you need a C++ compiler and a libc implementation that supports positional arguments syntax for printf (e.g. %$4d). At runtime, it requires nasm.

Some tests here, using my apacker reimplementation, aPACK v1.00, and appack.exe from aPLib v1.01, for a few DOS programs I've written in the last few years. All results include the decompressor code.

Code:
|| What    || Original || apacker-com || aPACK || appack
| synhili  | 3961      | 2610         | 2638    | 2759*
| jainput  | 9679      | 5384         | 5826    | 5915*
| inputter | 1620      | 1396         | 1403    | 1313*

Links: synhili, jainput, inputter
(Note: appack results do NOT include a decompressor. A decompressor weighs about 80-100 bytes.)

The options include, if I read my code correctly:
- Whether a byte that is just the previous byte plus an increments of +1, -1, or +2, is encoded efficiently as a gamma
- Whether x86 near jumps (JMP NEAR) are translated into absolute format or not
- Whether x86 near calls (CALL NEAR) are translated into absolute format or not
- Whether type-2 codepairs (appends) are supported or not
- 0, 1 or 2 special efficiently encoded choosable constant bytes (aplib has always one, which is a zerobyte). Removing the one extends the range of the special short-offset copy by one byte.
- Whether all literals are stored in a separate table, and literals in actual data are encoded as gamma offsets to that table.
Re: Generic compression on NES - *done*
by on (#108347)
I just want to say I released version 1.2, which will be available on Romhacking.net in a day or so.

I implemented why thefox suggested (I am really sorry to have taken more than 2 months to do this, especially considering it took me about 1 hour to do it but you know December-January-February is usually one of those super busy periods of the year), and I fixed a bug bisqwit had found.

I want to thank you both for having helped to improve CompressTools.

Also,
Quote:
Like RAM_LZ77, it is way too complex to be implemented on NES.

RAM_LZ77 is not too complex for the NES, it can just use a lot of RAM, which would probably end up needing external SRAM.
If the source code is modified to force 8-bit references to the sliding window, then the coded could eat only 256 bytes of RAM, which is acceptable even without external SRAM. However :
1) the codec is likely to work very poorly with only 8-bit references
2) In order to decode block N you have to decode blocks 0, 1, .... upto N, so if N is large it can be very time consuming. It is currently the only algorithm in CompressTools that can't allow random access to individual blocks when decoding

Quote:
The compression methods marked with a mere version number (1.3.8 and 2.3.1) stand for the compression method I wrote myself for Simon's Quest. Turns out it is not too bad a general-purpose compressor, though it is mostly oriented for compression of nametables. I also profiled it for the game's VROM data (with duplicate tiles removed), and it achieved a performance somewhere between ROM_LZ77 and RAM_LZ77.

This sounds interesting. It sounds like this works very well with nametables, as it can also encode alternate patterns like ABABAB very efficiently.
I think daisy chaining any RLE variant over a BytePair encoding will give a very similar result.

I am worried about the verify error you get with Escape RLE.
This means that the data was encoded, then decoded again, and that the result was different from the original data. At least my algorithms does this, instead of only encoding, like some compressors does ^^

EDIT : Also, please not that now, to force an algorithm, you have to use names, not numbers (which makes more sense). For example what was -fa 0 in version 1.1 should be -fa RLE in version 1.2
Re: Generic compression on NES - *done*
by on (#108359)
Bregalad wrote:
I am worried about the verify error you get with Escape RLE.
This means that the data was encoded, then decoded again, and that the result was different from the original data. At least my algorithms does this, instead of only encoding, like some compressors does ^^

The problem with EscapeRLE appears to be with runs that are >= 129 bytes long (i.e. they encode as $80..$FF). Your decoder reads them wrong.
A minimal test case that invokes this error is a file that contains 129 bytes of NUL.
Re: Generic compression on NES - *done*
by on (#108360)
Yeah, good old "bug" of java forcing bytes to signed.
I'll fix this right away, thank you very much.
Re: Generic compression on NES - *done*
by on (#108366)
Thanks for fixing it.
I tried compiling CompressTools (1.1) using GCJ, and it compiles, but I get the following error when I run it:

Code:
block = 0
java.util.ConcurrentModificationException
   at java.util.AbstractList$3.checkMod(libgcj.so.13)
   at java.util.AbstractList$3.next(libgcj.so.13)
   at java.util.AbstractList$1.next(libgcj.so.13)
   at java.util.ArrayList.addAll(libgcj.so.13)
   at java.util.ArrayList.addAll(libgcj.so.13)
   at compressTools.RAM_LZ77.addToSlidingWin(RAM_LZ77.java:276)
   at compressTools.RAM_LZ77.DecodeBlock(RAM_LZ77.java:260)
   at compressTools.RAM_LZ77.Decode(RAM_LZ77.java:204)
   at compressTools.Main.main(Main.java:185)

This happens in the RAM_LZ77 method only, and it happens on nearly all data that I tried (but not all).

That aside, here's my contribution, RLEINC1 and RLEINC2 for CompressTools. http://bisqwit.iki.fi/src/rleinc_compresstools.zip
I would submit a patch instead, if you had a Git repository for it.
This marks the first time ever that I write something in Java and publish it.
Re: Generic compression on NES - *done*
by on (#108400)
Quote:
This marks the first time ever that I write something in Java and publish it.

Congratulations for this ! Java is a really cool language, I hope you'll continue to use it for other things.

Also, thank you for your submission. The format is not described in the source which is a problem but you explained it very well in your post so I guess I could just copy/paste your post in the source before including it with CompressTools officially. Is it ok with you ?

Quote:
I would submit a patch instead, if you had a Git repository for it.

Well I'm not familiar with this kind of stuff, but you're right that it would be a neat idea to have a repository, if several people starts to use this and it gets updated regularly. It would also allow people to contribute to the project without contacting me explicitly, and to continue the project if I die of a heart attack. (since it's probably the first "compression compiler" ever made, I never heard of anything like this before).

It is funny how the project evolved because originally I just made this to see which compression algorithms works better for which data, and finally thefox had the idea to make it usable directly in the compiling process, by fixing the used algorithm and forcing the output file name.
The next logical extension would be to scrap off those "header blocks" which are made of just one byte and replace them by define statements. It would be nicer, however I did it the way it is currently made to have a "one to one" compression, i.e. each algorithm compresses a bidirectional byte array into another bidirectional byte array, of different dimensions. I like this orthogonal approach. However I'd definitely have to think of a way to make the "define" approach and still have things be nice and clean. I am open to suggestions too.

Quote:
This happens in the RAM_LZ77 method only, and it happens on nearly all data that I tried (but not all).

I don't quite understood how you managed to get that problem, because I tried the compiler on many data and never encountered into a similar problem. (as far as I can remember)
Apparently this is a problem with the "addAll" function. I don't see how it can be a problem for java as it's always possible to add data to an existing vector (i.e. no possible index out of bounds exception or any crap like this).
Re: Generic compression on NES - *done*
by on (#108413)
Bregalad wrote:
Java is a really cool language, I hope you'll continue to use it for other things.


Nah, I felt dirty enough using it for just this. ;-)


Quote:
Also, thank you for your submission. The format is not described in the source which is a problem but you explained it very well in your post so I guess I could just copy/paste your post in the source before including it with CompressTools officially. Is it ok with you ?


That is fine.

Quote:
Quote:
This happens in the RAM_LZ77 method only, and it happens on nearly all data that I tried (but not all).

I don't quite understood how you managed to get that problem, because I tried the compiler on many data and never encountered into a similar problem. (as far as I can remember)
Apparently this is a problem with the "addAll" function. I don't see how it can be a problem for java as it's always possible to add data to an existing vector (i.e. no possible index out of bounds exception or any crap like this).


I can't even begin to guess the cause of that problem, so you're on your own on this, sorry. I just reported what I got. If it helps to know, it was a 64-bit system that I compiled it on, and this was the command line: "gcj --main=compressTools.Main -o com -g compressTools/*.java" (optimization / debugging options made no difference). gcj is version 4.7.2, and it is the default installation that comes with gcj when I say "apt-get install gcj-jre" in Debian testing. It also installed ecj, but I don't know for what purpose.
Re: Generic compression on NES - *done*
by on (#108689)
I just realized that appack.exe does not actually produce self-extracting executables like aPACK does. I have thus added appack.exe results to my previous table, and added a note of this in the executable compression tests table.
Re: Generic compression on NES - *done*
by on (#108711)
By the way, appack.exe results also have a 24 byte header that's not required for decompression.
Re: Generic compression on NES - *done*
by on (#115498)
Version 2.0 of CompressTools is now released !

Improvements that were made :
- Strings and chars can use all unicode chars, and they can be mapped with bytes.
- Escape sequences are now implemented within strings.
- The StaticDictionary algorithm is now much improved, it is still the slowest, especially for large data where it can take a dozen of minutes to compress, however, it is still "usuable", as opposed to it's previous version.
- A tool to convert from plaintext to input scripts has been made.
- No single byte blocks added any longer for compression parameters, now they are handled as defines

However, be careful, there is some incompatibilities with scripts from versions 1.x, notably with special characters in strings and with .map directives.
Re: Generic compression on NES - *done*
by on (#116573)
Hey, this seems pretty neat and I ran the example (which spat out some warnings about unknown escape sequence \*) but I don't understand how I'm supposed to use the files that it outputs. Where is the 6502 decompression code?
Re: Generic compression on NES - *done*
by on (#116589)
You are supposed to fully understand how the algorithm you pick works and to write your own decompression code. It is not an "advanced" tools for nothing :) You have a java decompression code available.

There is multiple reasons I did it this way. First, I don't want to restrict usage of CompressTools for the NES. Second, there is more than one way to decompress data using a known algorithm, for example there might be a fast way that uses a lot of RAM and a slow way that uses few RAM. Finally, it's extremely hard to make 6502 ASM code generic, as it will not even port from one assembler to another, let alone be easily included in someone's project.

The resulting generated files can however easily be ".include"ed with most assemblers, as there is options to change how the data is defined (with .db, .byte, etc...) and at worse you can always include it as binary (.incbin) with a table of pointers.
Re: Generic compression on NES - *done*
by on (#116595)
Bregalad wrote:
You are supposed to fully understand how the algorithm you pick works and to write your own decompression code. It is not an "advanced" tools for nothing :) You have a java decompression code available.

There is multiple reasons I did it this way. First, I don't want to restrict usage of CompressTools for the NES. Second, there is more than one way to decompress data using a known algorithm, for example there might be a fast way that uses a lot of RAM and a slow way that uses few RAM. Finally, it's extremely hard to make 6502 ASM code generic, as it will not even port from one assembler to another, let alone be easily included in someone's project.

The resulting generated files can however easily be ".include"ed with most assemblers, as there is options to change how the data is defined (with .db, .byte, etc...) and at worse you can always include it as binary (.incbin) with a table of pointers.


I guess that make sense ... Did you choose only algos that are feasible on 6502?
Re: Generic compression on NES - *done*
by on (#116598)
Because the 6502 is turing complete, any algorithm that is feasible with a computer is feasible with 6502.

However I picked algorithms that are feasible easily and that would take relatively few memory.
Re: Generic compression on NES - *done*
by on (#116601)
Bregalad wrote:
Because the 6502 is turing complete, any algorithm that is feasible with a computer is feasible with 6502.

I'm pretty sure that by "feasible" he meant algorithms with reasonable memory consumption that won't hog the CPU for long periods of time, that can coexist with game engines.
Re: Generic compression on NES - *done*
by on (#116604)
Bregalad wrote:
Because the 6502 is turing complete

Nothing is Turing complete, as a Turing machine has unbounded memory. Adding a physical limitation to a Turing machine results in a linear bounded automaton, and what you probably meant is that the 6502 is LBA-complete. This brings me to the second half of your comment:

Quote:
I picked algorithms [...] that would take relatively few memory.

Even computer vs. NES is a big difference. Apple II, Commodore 64, Atari 800, and FDS have a relatively large RAM that allows fairly distant references to previous compressed data in LZ77-style algorithms.
Re: Generic compression on NES - *done*
by on (#116610)
tepples wrote:
Bregalad wrote:
Because the 6502 is turing complete

Nothing is Turing complete, as a Turing machine has unbounded memory. Adding a physical limitation to a Turing machine results in a linear bounded automaton, and what you probably meant is that the 6502 is LBA-complete. This brings me to the second half of your comment:

Quote:
I picked algorithms [...] that would take relatively few memory.

Even computer vs. NES is a big difference. Apple II, Commodore 64, Atari 800, and FDS have a relatively large RAM that allows fairly distant references to previous compressed data in LZ77-style algorithms.


This. But my biggest concern is ROM - can the decompression algorithm be small enough to still get a net savings? This project seems to presume that you're willing to use 2 or more compression algorithms for your different types of data. All the more reason to be careful about cost-versus-return not only in net byte savings but man-hours spent researching and developing the code.

And my second biggest one is - can it use little to no RAM? I'm using extended SRAM but I *still* am miserly about it. I'm less concerned about speed. If you're decompressing text, then you're usually decompressing 20-100 characters at a time. That could even be staggered across frames so no hiccups. For a map, you're probably doing it between map switches. Unless you're Metroid, or SMB, and then you're using an efficient but restrictive proprietary format. In my mind, a generic compression algorithm will always either need more RAM, or be slow.
Re: Generic compression on NES - *done*
by on (#116622)
I'm still a huge fan of APLIB. APLIB is very good at what it does, it often beats ZIP. It's a decompression format that relies on references to previously decompressed data, so you need enough RAM to hold the complete decompressed file. But you don't need much more than that, besides a few variables.
When you're constrained for RAM, you might not want to use compression formats that require you to hold the entire decompressed data in memory.
Re: Generic compression on NES - *done*
by on (#116624)
I was half kidding about the turing complete sentence. I just like to remember this fact to people who seems to think the 6502 CPU is a limitation. Like I like to turn down people who claim 6502 can't do multiplication and division 8-)

Back on topic, @sonder, your concerns are very close to mines when I decided to create CompressTools (see my original post), because there is no tool available that comes any close to this.

The constraint that appears when compressing something for a NES game is completely different than the constraints for a C64 tech demo for instance, even if both uses the same CPU.

Almost all of the algorithm can decompress with almost no RAM if you divide your data into small blocks (the whole point of CompressTools was to make this "block" system).

The blocks could represent a sentence of text, a screen of map data, a row of map data, a column of map data, or a musical track, or even a 6502 routine (you'll have to pre-assemble it or write directly in hex though - it would be amazing to do compile-compress as a single step but it would increase the complexity of CompressTools dramatically and make it too NES dependant).

In all algorithms exept RAM-LZ77, you only need RAM for a single block and several variables in the worst case. In some cases you could even strip the buffer for a single block if you access bytes sequentially, but then it's not pseudo-random access any longer.

When it comes to ROM, the arrays that are needed for decompression are already taken into account for the compression ratio, but not the decompression code. It would however be doable in a few hundreds bytes at the very worst. LZ based algorithms are probably the most complicated ones of the lot to implement so they would probably be the worst in this regard, but I made sure to implement simple variants of them.

Quote:
it often beats ZIP

(I'm just being a bastard on purpose) ZIP is not a compression algorithm, but a packaging archive format, supporting mutiple algorithm for every file. You could have a completely uncompressed ZIP for instance.
I doubt any ZIP compressor does the trick to try each algorithm on each file and takes the best one, trashing the others. Well CompressTools does that internally for some algorithms that have variable bit sizes for stuff. It just uses the optimal bit size.
Re: Generic compression on NES - *done*
by on (#116666)
Bregalad, bigs ups for releasing this suite of compression algorithms. It was very helpful to me when I was putting together my text engine.

A few posters have asked for an example of a decompression algorithm, so I'm posting one that I'm using in my current project.

Each compression scheme best fits a different set of use cases. This is an example of a decompression algorithm for BytePairEncoded Recursive, which is a great compression scheme for large amounts of text. I was able to reduce the entire uncompressed script of Final Fantasy 6 (~167kb) into 10 8kb banks, a 53% reduction in the original size.

The BPE decompression routine included below is 57 bytes in rom and uses 9 bytes of ram in zero page. It can decompress the included example statement 0118 (expands 56 bytes to 105 bytes) in ~2280 cpu cycles (a little less without active DPCM DMA).

Code:
; =============================================================================
;   TextEngine_LoadString: Loads a string
;   In:      Pointer to text statement to load in A (lo) and Y (hi)
;   Out:     Writes string and list of chars used by string
;   Notes:   A,X,Y are wiped out
;         9 temp bytes in zp
_TextEngine_LoadStringToMemory:
.scope
; temp variables in zp
.alias   _stringLength       $00
.alias   _numChars            $01
.alias   _Recursion           $02
.alias   _TempA               $03
.alias   _StringPtr           $04
.alias   _BPE_Left            $06
.alias   _BPE_Right           $08
; these constants are true for strings with characters from 0x00 - 0x60
.alias   _HighR               $ff
.alias   _LowR                $61
; Load the string and decompress it into RAM
   sta _StringPtr                     ; Save address of the string.
   sty _StringPtr+1                   ;   +
   lda #<[$c000-[_HighR-_LowR+1]*2]   ; Get pointer to BPE_Left
   sta _BPE_Left                      ;   |
   lda #>[$c000-[_HighR-_LowR+1]*2]   ;   |
   sta _BPE_Left+1                    ;   +
   lda #<[$c000-[_HighR-_LowR+1]]     ; Get pointer to BPE_Right
   sta _BPE_Right                     ;   |
   lda #>[$c000-[_HighR-_LowR+1]]     ;   |
   sta _BPE_Right+1                   ;   +
   ldx #$01                           ; X is index of current decoded byte
   ldy #$00                           ; Y is index of current encoded byte
   sty _Recursion                     ; Recursion = 0
   lda (_StringPtr),y                 ; Save length of encoded data
   sta _numChars                      ;   +
_getCode:
   cpy _numChars                      ; IF Y == Num Encoded Bytes
   beq _return                        ; THEN all data is decoded, return
   iny                                ; y++
   lda (_StringPtr),y                 ; A = an encoded byte
_decode:
   cmp #_LowR                         ; IF A < LowR
   bcc _writeCode                     ; THEN goto WriteCode ELSE decode byte
   sbc   #_LowR                       ; A = A - _LowR. Carry is already set.
   sta _TempA                         ; Save Y
   tya                                ;   |
   pha                                ;   |
   lda _TempA                         ;   +
   pha                                ; Save A
   tay                                ; Y = A
   inc _Recursion                     ; Recursion++
   lda (_BPE_Left),y                  ; A = BPE_LEFT[Y]
   jsr _decode                        ; Decode this byte
   pla                                ; Restore A
   tay                                ; Y = A
   lda (_BPE_Right),y                 ; A = BPE_RIGHT[Y]
   jsr _decode                        ; Decode this byte
   pla                                ; Restore Y
   tay                                ;   +
   dec _Recursion                     ; Recursion--
   beq _getCode                       ; IF (Recursion == 0) THEN get next code
_return:
   rts                                ; ELSE return to recursion thread
_writeCode:
   sta TEXT_String,x                  ; TEXT_String[x] = a
   inx                                ; x++
   lda _Recursion                     ; IF (Recursion == 0)
   beq _getCode                       ; THEN get next code
   rts                                ; ELSE return to recursion thread
(I use Ophis as my compiler, but I'm certain you could easily port this code to any other assembler.)

Here is some test data to run the algorithm on. I did some pre-processing on this data before running it through Bregalad's compressor: BPE performs best when it has a large range of values to use as indexes to byte-pairs, so I took the ASCII script and reduced it to using only the first 97 characters (0x20-0x80) and then subtracted 0x20 from each character. This means that the compression scheme has 160 possible indexes for compressed byte pairs. If you're willing to get creative with reordering ASCII for your script, I'm sure it would be possible to reduce this count from 97 to 80 or 64, for even better compression results (I use character 0x7F for '...', so this text may look like it's missing ellipses).

Code:
; Bank 00 contains text statements 0 through 215. 8186 bytes. ; TRUNCATED for NESDEV to first 4 statements + 0118

; Labels for statements:
.word Statement0000, Statement0001, Statement0002, Statement0003, Statement0118

Statement0000:
; World of Balance
; Length: 9 (compressed) 16 (decompressed). Charcount: 12
   .byte $09
   .byte $37, $75, $b6, $b5, $22, $94, $69, $43, $45

Statement0001:
; "The ancient War of the
; Magi When its flames at
; last receded, only the
; charred husk of a world
; remained. Even the power
; of magic was lost In the
; thousand years that
; followed, iron, gunpowder,
; and steam engines took the
; place of magic and life
; slowly returned to the
; barren land. Yet now rises
; one who would reawaken
; magic and use its dreaded
; power as a means by which
; to conquer all the world.
; Could anyone truly be
; foolish enough to repeat
; that mistake?"
; Length: 254 (compressed) 454 (decompressed). Charcount: 38
   .byte $fe
   .byte $02, $e4, $69, $43, $49, $76, $65, $37, $6c, $00, $82, $6e, $78, $ef, $d9, $7d, $37, $48, $de, $88, $68, $46, $fe, $a4, $68, $9e, $60, $fe, $53, $65, $52, $f5, $b3, $b3, $74, $6d, $4c, $59, $6e, $78, $ae, $6c, $52, $8e, $48, $92, $e5, $b5, $8b, $d1, $a8, $60, $52, $45, $a9, $64, $b3, $7f, $25, $56, $76, $9f, $50, $7a, $63, $60, $b5, $a9, $d9, $43, $00, $8d, $68, $4c, $4f, $86, $7d, $29, $4e, $6e, $78, $62, $66, $53, $c7, $59, $ee, $68, $62, $9e, $60, $46, $4f, $73, $7a, $b3, $74, $9a, $6d, $74, $47, $f2, $50, $7a, $44, $63, $0c, $c7, $86, $80, $c3, $76, $47, $64, $45, $68, $6b, $4f, $4b, $6e, $45, $50, $fe, $43, $61, $b5, $a9, $d9, $43, $00, $c7, $a5, $46, $78, $53, $4c, $7a, $4c, $77, $52, $99, $b9, $4e, $8e, $6b, $6e, $78, $42, $6c, $52, $de, $4c, $ce, $7f, $39, $e2, $d8, $00, $52, $9b, $6f, $6d, $61, $da, $95, $57, $66, $b6, $e0, $8d, $4b, $76, $60, $a9, $d9, $43, $00, $c7, $92, $61, $88, $68, $44, $e0, $44, $b3, $60, $50, $7a, $98, $41, $68, $8b, $a4, $69, $68, $42, $77, $da, $49, $ae, $60, $7c, $43, $6d, $51, $55, $98, $f7, $9f, $d1, $a8, $0e, $60, $23, $66, $b6, $69, $59, $6d, $61, $ba, $55, $4c, $77, $42, $78, $46, $4f, $4f, $4c, $9b, $48, $00, $76, $66, $91, $00, $7c, $52, $45, $50, $80, $d0, $62, $c1, $4d, $49, $86, $41, $4b, $45, $1f, $02

Statement0002:
; Wedge:
; There's the city.
; Length: 14 (compressed) 24 (decompressed). Charcount: 17
   .byte $0e
   .byte $37, $b3, $47, $45, $67, $83, $63, $45, $87, $96, $43, $88, $59, $0e

Statement0003:
; Biggs:
; Hard to believe an esper's
; been found frozen there a
; thousand years after the
; War of the Magi.
; Length: 57 (compressed) 100 (decompressed). Charcount: 29
   .byte $39
   .byte $22, $49, $47, $47, $53, $67, $28, $6c, $6a, $7c, $42, $7b, $49, $45, $90, $d2, $6f, $d4, $07, $53, $d5, $de, $46, $66, $4e, $6a, $46, $52, $4f, $5a, $76, $6e, $63, $61, $41, $60, $62, $66, $53, $c7, $59, $ee, $68, $41, $46, $54, $63, $6e, $78, $37, $6c, $00, $82, $9f, $ef, $d9, $0e

Statement0118:
; Edgar:
; I hear you've been busy
; down south, taking over a
; country or three! Just
; what is the Empire up to?
; Length: 56 (compressed) 105 (decompressed). Charcount: 30
   .byte $38
   .byte $db, $81, $48, $ee, $00, $70, $07, $90, $d5, $de, $42, $92, $cd, $44, $7a, $4e, $00, $53, $66, $62, $74, $54, $41, $4b, $89, $4f, $56, $98, $41, $60, $43, $66, $4e, $ba, $77, $75, $6e, $52, $c0, $79, $2a, $55, $86, $60, $57, $ca, $8c, $96, $25, $b1, $9a, $61, $ed, $00, $6b, $1f

; BPE_LEFT table
BPE_LEFT:
   .byte $45, $54, $45, $49, $54, $4f, $1a, $53, $41, $44, $54, $41, $4f, $00, $45, $59, $64, $48, $4c, $0c, $4f, $45, $59, $45, $01, $4f, $45, $6b, $5f, $52, $0e, $45, $29, $4f, $34, $43, $4f, $53, $07, $49, $71, $73, $41, $49, $57, $45, $70, $56, $47, $55, $41, $41, $4f, $62, $2c, $63, $45, $49, $49, $97, $45, $41, $6e, $9c, $29, $47, $07, $4d, $4c, $52, $46, $4c, $4d, $47, $a0, $48, $39, $43, $34, $af, $4d, $57, $45, $49, $82, $4c, $1f, $4b, $55, $54, $25, $bb, $53, $2b, $23, $45, $41, $33, $4d, $be, $bf, $c2, $69, $07, $4f, $72, $4b, $53, $59, $69, $70, $54, $57, $69, $42, $50, $42, $4e, $4d, $4e, $47, $57, $bc, $45, $6b, $76, $45, $52, $c5, $45, $c8, $83, $4b, $c6, $23, $91, $92, $b0, $b2, $e7, $55, $45, $2d, $4c, $45, $55, $64, $72, $45, $54, $41, $57, $62, $6d, $c4, $54, $46, $4c, $4d

; BPE_RIGHT table
BPE_RIGHT:
   .byte $00, $48, $52, $4e, $00, $55, $60, $00, $4e, $00, $4f, $52, $4e, $62, $53, $66, $47, $41, $4c, $00, $52, $4e, $00, $60, $00, $57, $4c, $00, $00, $41, $00, $41, $00, $46, $48, $4b, $4d, $54, $68, $54, $00, $00, $00, $68, $41, $6a, $00, $61, $48, $53, $42, $4c, $00, $61, $4f, $00, $54, $52, $53, $84, $46, $54, $61, $45, $07, $4f, $65, $45, $49, $49, $75, $44, $41, $6c, $67, $63, $66, $48, $63, $7e, $50, $49, $44, $44, $00, $6a, $00, $41, $52, $52, $44, $aa, $48, $9d, $7b, $45, $65, $93, $00, $b8, $6f, $64, $6a, $52, $53, $65, $61, $60, $60, $44, $52, $60, $75, $00, $61, $63, $45, $4f, $61, $7a, $49, $48, $67, $56, $60, $00, $01, $80, $67, $65, $61, $61, $00, $67, $59, $54, $65, $67, $62, $69, $50, $6c, $41, $61, $5f, $4e, $00, $90, $43, $49, $73, $61, $45, $a3, $67, $7e, $00, $41, $77
Re: Generic compression on NES - *done*
by on (#116804)
Nice, thanks for posting this. My game may or may not use a lot of text but if it does this might be a life saver :)