*** EDIT ***
Now this project have been compleded. RHDN page of this project here : http://www.romhacking.net/utilities/882/
*** Original post ***
I'd like to create tools to help compression of generic data on the NES (or any other platform, for that matter).
ROM space is limited on the NES which makes compression especially important. Data that could be compressed includes (but is not restricted to) :
- Game level maps (contains some reperitiveness)
- Tilesets (I'm pretty sure tokumaru already released a great algorithm specific for tilessets but any generic algorithm should be applicable too)
- Text data (as there is only 26 letters and predictable entropy it can compress very well)
- Meta-sprite Data (with 4 bytes per hardware sprite it takes a lot of space)
- Music sequence data (contains some repetitiveness / predictable )
- 6502 code (predictable entropy)
Then there is 2 ways I can think of data being decompressed :
1) The obvious one : A whole "file" is compressed as a whole and decompressed as a whole. Of course, you need to use WRAM to decompress the file unless it' really small (and if it's reall small then why compress it). So this method is useful only if you want to compress large files.
2) The useful one : A whole "file" is compressed but you only decompress a small section of it at a time, typically less than 256 bytes.
For example, if you have a compressed script of the game, you only want to decompress a senstance at a time, if you compress a map you only need one row or column at a time, etc... and therefore no WRAM is needed.
This makes it an important point. Compression algorithms used MUST allow breaks in the compressed sequence and allot sub-point entries, so that you can only decompress a small section of the "file" when it's needed.
Most algorithms I've seen doesn't take this into account. More specifically, LZ77 requires references to previously decoded data to work. This typically makes it not usable.
So, the tool I'd like to build would have to be practical, general purpose and flexible. I'd like it to try to compress a file with multiple algorithms, and let the use choose the best one (the one that compresses better). Also, it should be easy for anyone to add their own algorithms to the list, provided they write the compression and decompression code themselves.
To be practical the tool should work for binary data but also for assembly code so that it could turn this :
To this :
(I just wrote down random data).
This will be crucial if there is many label inside a file, so that the program can access (and decompress) a small chunk of data that is in the file (instead of decompressing the whole file).
All compression algorithms could be used if they follow these 3 conditions :
- Pseudo-random acessible data (with labels in the middle of the file) that doesn't require references to previous data.
- Don't require too much RAM to decompress
- Don't require too much CPU to decompress
The size of the decompression code should also be taken in account when comparing different algorithms to determine the most efficient one for a particular data.
Now this project have been compleded. RHDN page of this project here : http://www.romhacking.net/utilities/882/
*** Original post ***
I'd like to create tools to help compression of generic data on the NES (or any other platform, for that matter).
ROM space is limited on the NES which makes compression especially important. Data that could be compressed includes (but is not restricted to) :
- Game level maps (contains some reperitiveness)
- Tilesets (I'm pretty sure tokumaru already released a great algorithm specific for tilessets but any generic algorithm should be applicable too)
- Text data (as there is only 26 letters and predictable entropy it can compress very well)
- Meta-sprite Data (with 4 bytes per hardware sprite it takes a lot of space)
- Music sequence data (contains some repetitiveness / predictable )
- 6502 code (predictable entropy)
Then there is 2 ways I can think of data being decompressed :
1) The obvious one : A whole "file" is compressed as a whole and decompressed as a whole. Of course, you need to use WRAM to decompress the file unless it' really small (and if it's reall small then why compress it). So this method is useful only if you want to compress large files.
2) The useful one : A whole "file" is compressed but you only decompress a small section of it at a time, typically less than 256 bytes.
For example, if you have a compressed script of the game, you only want to decompress a senstance at a time, if you compress a map you only need one row or column at a time, etc... and therefore no WRAM is needed.
This makes it an important point. Compression algorithms used MUST allow breaks in the compressed sequence and allot sub-point entries, so that you can only decompress a small section of the "file" when it's needed.
Most algorithms I've seen doesn't take this into account. More specifically, LZ77 requires references to previously decoded data to work. This typically makes it not usable.
So, the tool I'd like to build would have to be practical, general purpose and flexible. I'd like it to try to compress a file with multiple algorithms, and let the use choose the best one (the one that compresses better). Also, it should be easy for anyone to add their own algorithms to the list, provided they write the compression and decompression code themselves.
To be practical the tool should work for binary data but also for assembly code so that it could turn this :
Code:
Label:
.db $ab, $7f, $1b, $7a, $02, $99, $00 ; Uncompressed data
.db $ab, $7f, $1b, $7a, $02, $99, $00 ; Uncompressed data
To this :
Code:
Label:
.db $3f, $d4, $eb ; Compressed data
.db $3f, $d4, $eb ; Compressed data
(I just wrote down random data).
This will be crucial if there is many label inside a file, so that the program can access (and decompress) a small chunk of data that is in the file (instead of decompressing the whole file).
All compression algorithms could be used if they follow these 3 conditions :
- Pseudo-random acessible data (with labels in the middle of the file) that doesn't require references to previous data.
- Don't require too much RAM to decompress
- Don't require too much CPU to decompress
The size of the decompression code should also be taken in account when comparing different algorithms to determine the most efficient one for a particular data.