Text compression on the NES

Text compression on the NES
by mbrenaman on 2006-09-07 (#17050)

Sorry if this has been brought up before, but the search function on this board is broken (atleast from here).

Me and friend of mine are waist deep in a small RPG project, and well, I don't see many of our goals getting accomplished without some sort of text compression.

I'm not familiar with any methods of text compression. I imagine maybe some sort of substitution method. However, I understand the limitations of an NES game. So....

What (if any) method would be viable for compressing raw text on the NES?

Re: Text compression on the NES
by tepples on 2006-09-07 (#17054)

mbrenaman wrote:

Me and friend of mine are waist deep in a small RPG project, and well, I don't see many of our goals getting accomplished without some sort of text compression.

I'm not familiar with any methods of text compression.

Go look up LZ77 and LZ78 on Wikipedia.

Quote:

I imagine maybe some sort of substitution method. However, I understand the limitations of an NES game. So....

What (if any) method would be viable for compressing raw text on the NES?

A lot of games use "dual tile encoding", where common pairs of characters are represented as a single byte. This is the special case of LZ78 where the dictionary is precomputed and limited to strings of length 2. Some games use word-wise Huffman encoding, where each word is given a number and the compressed text consists of a stream of numbers.

by Bregalad on 2006-09-07 (#17057)

The only compression I've done until now is run-lenght compression (wich will never work on text, unless you have a lot of "heeeeeeeey", "haaaaaaaaaa", etc..., but that would be stupid), and DTE, dual tile encoding wich is especially efficitent for text, because the same group of two or more letters come QUITE regulary. Also, all words begining and finishing with the same letter can also be DTE encoded with the space that go before/after that letter.
Finally, DTE is VERY easy to implement, unlike other compression shemes. In the programmer's viewpoint, it is almost as if no compression was used.
The only difference is that when you read the input buffer, if the value is on a certain range instead of copy it to the output buffer just read the value of two separate tables and send them to the output buffer.
The only con is that no assembler support DTE encoding with their ASC tables.
Good luck for your project.

by mbrenaman on 2006-09-08 (#17094)

Thank you for the replies.

DTE seems simple enough but; I'd have to setup a directory of text snippets saved in individual text files, write a program (C probably) to go through all of them, and .incbin the results, plus the 2 char table used to decode the compressed text.

Quite a pain but maybe well worth the results.

Once again, thanks for the info. I have something to work from now.

Thanks.

by Bregalad on 2006-09-08 (#17101)

mbrenaman wrote:
I'd have to setup a directory of text snippets saved in individual text files, write a program (C probably) to go through all of them, and .incbin the results, plus the 2 char table used to decode the compressed text.

This is the real downside, and the same apply to all other compression shemes. I have no idea how to code a C programm to convert/compress data right now (but I probably will someday), and that is somewhat limitating in developement methods.

by tepples on 2006-09-08 (#17111)

Bregalad wrote:
I have no idea how to code a C programm to convert/compress data right now (but I probably will someday)

You could make an NES program that compresses ROM to SRAM

by Memblers on 2006-09-08 (#17114)

tepples wrote:
Bregalad wrote:
I have no idea how to code a C programm to convert/compress data right now (but I probably will someday)

You could make an NES program that compresses ROM to SRAM

Ha, yeah that's what I do. That's how my RLE compressor works. Upside is then you've got compression code you could also put in your program for.. something.

by nineTENdo on 2006-09-10 (#17186)

I was bored and i found this site
Code:
http://www.cs.tut.fi/~albert/Dev/pucrunch/

There some 6502 coding going on there having to compression on thus im guessing. Not sure but thought that would help.

by tepples on 2006-09-10 (#17188)

Pucrunch and other LZ77-class compressors are intended for machines with more RAM than ROM, such as the Commodore 64, the Apple II, or the Famicom Disk System.

by commodorejohn on 2006-09-12 (#17283)

DTE is a good idea. Here's how I would do it:

Assuming that the game text in the ROM is in ASCII format (which I always use, as it allows me to simply .db "text goes here",) none of the top 128 characters are used for normal text, so we can use bit 7 of the text character as a DTE flag (I'm sure 128 DTE table entries should be all you need.)

Since bit 7 is the 6502's Negative flag, any time a byte with bit 7 set is loaded into the accumulator, the Negative flag will be set. This allows us to use a BMI to jump to a DTE decompression routine, saving us a compare instruction, and just output the character normally otherwise.

In the DTE decompression code, assuming that the DTE table is always at the same place in memory (which it should be,) we can use indexed addressing to get the appropriate values, so we will do an ASL A (more on this in a minute,) and a TAX. We do an ASL A to get rid of the now-superfluous bit 7 and multiply the DTE entry number by 2 (the size of a DTE table entry.) Thus we have our index into the DTE table.

So we can now LDA table,X to get our first character, and then output that character (whether directly or by JSRing to a special character-out routine, it doesn't matter,) then INX and LDA table,X again to get our second character, output that, and return to the normal string-printing loop.

So, if I were programming this, here's how it would look:

Code:
textout:
; This routine assumes that a pointer to the string to print is loaded into $00, and destroys the contents of A, X, and Y.
; Also, it assumes that the string is zero-terminated (allowing us to check for the end of the string with a BEQ.)
LDA ($00),Y ; In some assemblers, the parentheses should be replaced with brackets []
BMI doDTE
BEQ done
JSR charout
increment:
INY
BEQ fixup
JMP textout ; I don't have my 6502 manual handy, so I don't remember whether it would
; be faster to BNE textout (I don't think so.)
done:
RTS

doDTE:
ASL A
TAX
LDA DTEtable,X
JSR charout
INX
LDA DTEtable,X
JSR charout
JMP increment

fixup:
INC $01
JMP textout

Of course, I'm not the best 6502 programmer out there, and I'm sure there's something I missed that would make it even faster, but that should give you a general idea of how to do it. The bit 7 thing is of course dependent on your text characters all being in the lower half of the background tile table, so you may have to go with a different method for detecting DTE characters.

by tokumaru on 2006-09-12 (#17287)

commodorejohn wrote:
We do an ASL A to get rid of the now-superfluous bit 7 and multiply the DTE entry number by 2 (the size of a DTE table entry.) Thus we have our index into the DTE table.

Speed-up tips to save yourself a few cycles:
1. Don't get rid of bit 7, displace the base address of the table 128 bytes back ( LDA DTEtable-128, X ) instead.
2. Make 2 tables, one with the first character and one with the second, so that you don't have to multiply by 2, nor INX for the second char. Just load one character from a table and the other from another table.

Not much, but this kind of speed-up can make a difference when things get bigger and more complicated.

by commodorejohn on 2006-09-12 (#17288)

Ah, good point.

by tokumaru on 2006-09-12 (#17292)

This DTE encoding seems interesting. Does anyone know how efficient it really is? Well, I know it's compression ratio must be less than 50%, since it encodes at most 2 characters. I don't know... the idea is interesting, but doesn't seem very efficient.

by commodorejohn on 2006-09-12 (#17295)

I think the main point is that it's fast and easy to implement, even if it's not terribly space-reducing. And there are a number of commonly-used letter pairs in English.

by tepples on 2006-09-12 (#17298)

There are also a number of commonly-used letter triples and quadruples, etc. They're called "words." Has anyone investigated separating the text into words, assigning a code number to each word, and then on decompression looking up the code number in a table to get characters?

by Memblers on 2006-09-12 (#17302)

tepples wrote:
There are also a number of commonly-used letter triples and quadruples, etc. They're called "words." Has anyone investigated separating the text into words, assigning a code number to each word, and then on decompression looking up the code number in a table to get characters?

I thought that's basically what Huffman compression does. I'm sure that'd work well, especially when the word is much longer than the code number that represents it. And give the most repeated ones the smallest code numbers.

by tokumaru on 2006-09-12 (#17304)

Memblers wrote:
I'm sure that'd work well, especially when the word is much longer than the code number that represents it. And give the most repeated ones the smallest code numbers.

If you used 16-bit codes you could index 65536 words. That's more than enough for most languages if you ask me. And 2 bytes per word is very good, as most words are larger than 2 characters. However, defining the dictionary is what would be more space consuming, killing the efficiency of the compression.

by Memblers on 2006-09-12 (#17307)

Since it's just text though, you've at least got bit 7 free to do whatever with. So if you took 127 of the most repeated words (including leading and trailing spaces, punctuation) and had bit 7 represent that the byte is a code number, you'd have a decent amount of the script represented in single bytes. That would save memory even from the word "a" (remembering there's spacing). And common stuff like "the, of, is, it, to".

DTE is pretty good, I don't know if this is better. Maybe if there are some big, commonly used words. Or if you reduced your charset to 64 (6-bit) and combined a dictionary with DTE. Could use DTE on the dictionary too, heheh.

by tepples on 2006-09-12 (#17308)

Memblers wrote:
Could use DTE on the dictionary too, heheh.

Building the romhackers' nightmare, one compression layer at a time.

by mbrenaman on 2006-09-13 (#17313)

Well, it was either DTE or Huffman. In my situation, I wanted the tiny dictionary needed for DTE (SNROM RPG game). My dictionary of 64 entries (128 bytes) was conveniently put on the fixed bank. It was very easy to implement and I saw reduction in different text areas of about 20%-40%.

If I had a whole bank to spare, I would have dedicated it to words instead a small character pairs. Most of all banks have been dedicated to other things from the original design. If I had planned for text compression in the begining, I probably would have went with word compression.

I'm satisfied though, and see our game's goals being more releastic.

Thanks again for everyone's input.

by Bregalad on 2006-09-13 (#17323)

DTE can be more or less efficient in function of the effecivity of the pairs of letters made. I think is is very easy to implement, and quite flexible.
Using 100% a dictionary wouldn't be so good, because long code would come often for very used words, and consume a lot of space for rare words. Also, it would be a pain to compile the text.
You're not suck with ASCII (at least I think so). ASCII in NES is a terrible waste of tiles. You can just keep your own letter mapping, and do something like that (just build that as example) :
$00-$10 : Special commands in text (space, return, tab, etc...)
$10-$30 : Use for special commands such as location words, people or simply common long words
$30-$7f : Simply input one letter
$80-$ff : 128 different dual letter combinations.

by commodorejohn on 2006-09-13 (#17364)

That's not a bad idea. I always keep the ASCII text characters at their default location to save a conversion step (I just fill the unused characters such as $00-1F with commonly-used tiles,) but I like the idea of using unused text bytes for special purposes.

by abonetochew on 2006-09-29 (#18103)

Would implementing something like bzip2 be practical on the NES?
I've never implemented the Burrows-Wheeler transform or the move-to-front transform, but they don't look like they require kilobytes of dictionaries to work properly.

by tepples on 2006-09-29 (#18105)

BWT handles an entire block (several kilobytes) in RAM at a time. MTF takes a RAM segment of up to 256 bytes, one for each byte.

by tokumaru on 2006-09-30 (#18123)

I was just researching those 2! I haven't tried any implementations yet, but I think that traditional BWT will use too much memory to be implemented on the NES. I wonder if a version that works on smaller blocks would provide efficient compression...

by CaH4e3 on 2006-10-02 (#18162)

"Dick Tracy" and "Magician" games using some nice variations of huffman compression... First one encodes symbols as indexes in symbol array arranged by usage frequence for each symbol. Second one using some codes like huffman tree, that compared with code-table and then index in that table using as index of symbol in array similar to above...

Bot need pretty little memory (just reading bit-stream and decoding)... Compression routines pretty simple to write on PC, decoding routines you may cut from games.

by Celius on 2006-10-02 (#18168)

The way I'd go is I'd have $Ex (I'm not sure how many x is) combinations of letters that are very common. It'd be $Ex, and not $FF because there are letters that can be defined as single letters ($40-$5A). It's obvious that you can't have DTE for every combination, because eventually you'd be using 2 bytes to define dual tiles, and we can all just see that's pretty pointless. I was thinking TTE (Triple Tile Encoding) MAY be more effective. I wouldn't have a table though, I'd have a routine that decodes every tile manually. It'd be a huge table, and I don't even know if it would make sense. It'd use 2 bytes for 3 letters. It'd be hard to work in cubes though, but I've always found working in cubes on a system such as this was interesting. It's just an idea that came to mind.

Now that I think about it, DTE is possible for 24 bits for 2 letters, as apposed to 32 bits. I'm sure that's what many of you were thinking. It'd still only cut the size down by 1/4. Which in the grand scheme of things can really help. Instead of using 8 banks for text, you can use 6. You know? It really does help to have at least SOME sort of compression. I'm going to think a little more about TTE, or maybe a different method of compression.

by tokumaru on 2006-10-02 (#18169)

You know, if 64 characters is enough for all symbols, you could even try a mix of DTE and TTE. The top 2 bits could be used to define what the other 6 are: an index to a single character, and index to a pair of characters or an index to a group of 3 characters. Of course, you'd also have only 64 entries in each table, so I really don't know if that'd perform well. And there is still one code left, but I don't know what that could be used for, since I don't think that groups of 4 characters would be very common.

by commodorejohn on 2006-10-02 (#18171)

Here's a thought I had today on combining word-lookup with DTE and regular ASCII.

Suppose we have each word in our text be represented by a two-byte pointer to a zero-terminated string in the ROM. We'll assume that nobody's going to store a text lookup in zero-page, so if the high byte of the pointer is 0, that signals that a DTEed and zero-terminated string follows, and the pointer itself is discarded. From there, a DTE algorithm takes over and returns control to the word-lookup algorithm when it hits the terminator character (0, not Ahnuld.) You could even have DTE in the word-table entries.

Granted, it's a little over-wrought, but I'll bet it provides pretty decent compression.

by Bregalad on 2006-10-03 (#18181)

Tokumaru, your idea looks quite interesting. I think the fourth combination could be for variable lenght things, such as locations words and people's name, among with sprical caracters such as tabing, returning and message-end-ing.

by tokumaru on 2006-10-04 (#18189)

Now that I think of it, you don't NEED to have only 64 symbols to use a mix of DTE and TTE (although I don't see why anyone would want to use more than 1/4 of the tileset just for text - unless when doing midscreen CHR swapping).

Numbers 0 to 127 could mean uncompressed characters, like said before. But if the 8th bit is set (> 127), that would mean compression. Half of the codes could be for strings that are 2 characters long and the other half for 3 characters long, so you have 64 of each. Or this distribution could be arranged differently, with more codes indexing pairs of characters, if this works best.

But then you don't have that "4th mode" that Bregalad talked about.

EDIT: commodorejohn had a nice idea. Combining a dictionary with DTE might work quite well.

by mozz on 2006-10-07 (#18249)

Another possibility is using DTE codes within the DTE table itself.

That gives you a way to have single byte encodings of 3 or even 4-letter words (though it takes up more than one code, so you would only want to do it for very commonly-used words).

Edit: re tokumaru's idea, is there any reason it has to be a power of 2 range? Just partition the 256 codes into whatever ranges give you the best compression, even if that means 49 single characters, 97 two-byte DTE codes and 110 three-byte codes.

by commodorejohn on 2006-10-07 (#18254)

The reason we've been going with powers-of-two (specifically, 128 normal characters and 128 DTE characters,) is because, when the value is loaded into the accumulator, its high bit gets copied to the status register's Negative flag (i.e. the high bit is treated as a sign bit,) allowing us to simply BMI to detect DTE entries rather than having to use compare instructions. You certainly don't have to do it that way, it just saves some cycles.

by Bananmos on 2006-10-07 (#18256)

But seriously, why would you need to worry about saving a few cycles per letter in a NES game? How often do you plan to write a letter to the screen anyway? Most games even spit 'em out at a typewriter's pace for aesthetic reasons.

The only situation I can think of is if you would need to do a lot of random accesses to specified lines/positions in the text. But then, you're probably better off re-thinking the message engine itself. CPU usage ain't much of an issue in text compression for video games. Memory usage, on the other hand...

by commodorejohn on 2006-10-07 (#18261)

Like I said, you really don't have to if you don't want to. I did so in my example code because it was written assuming that the text characters were located at the ASCII positions in the tile table, and therefore printed text (excepting foreign-language text which uses the accented characters,) wouldn't use the top 128 tiles. Writing an LDA and a BMI was just a whole lot simpler than writing a series of compares and BEQs, especially since I was only demonstrating how DTE would work, before we got into discussions on other algorithms and combining algorithms. My method works. Your method would also work. You're right in that text display isn't processor-intensive; I just did it that way because it was easier and because I believe efficiency is always a good thing, especially when easily achieved.