Bregalad wrote:
Quote:
On the other hand with Huffman coding can get random access without much trouble, I've even exploited this fact in the past to parallelize an x86 decoder. Especially if you can afford to store the raw trees directly in ROM.
It really doesn't allow random acess at all. We shouldn't be talking about the same thing I guess.
What I meant was that you can start decompressing directly from any random byte in a Huffman stream as long as you've got a bit-pointer to it. The bytes are completely independent of each other, as opposed to LZ coding where any given piece of data depends on everything coded before it.
Bregalad wrote:
I don't say huffman is better than LZ series, just that personally I have less trouble understanding and implementing it. In the couple of books and tutorials I've seen arround, LZ seemed so much a headache. It would be allright to make a decoder, but horrible to do an encoder.
To be honest I had quite a hard time wrapping my head around Huffman coding myself, the whole tree-building thing and why it works is just plain weird ;)
Basic LZ77 coding is straightforward by comparison. The idea is to go through the file from start to end and at each byte search for the longest match (behind the cursor) to the current data. With a brute-force search you'd start at each byte before the cursor and test just how many bytes match, then keep track of the longest one found. If nothing is found, or if the longest match is still too short save any space, then write a literal byte instead.
You've undoubtedly seen some pseudo-code already in the aforementioned tutorials but in practice an implementation might look kind of like this:
Code:
void lz(const char *begin, const char *end) {
for(const char *cursor = begin; cursor != end; ) {
const char *match;
size_t match_len = 0;
for(const char *seek = cursor; seek-- != begin; ) {
for(size_t i = 0; &cursor[i] != end; ++i) {
if(seek[i] != cursor[i]) {
break;
}
}
if(i > match_len) {
match = seek;
match_len = i;
}
}
if(match_len < MIN_MATCH_LEN) {
emit_literal(*cursor++);
} else {
emit_match(cursor - match, match_len);
cursor += match_len;
}
}
}
Here I've ignored the issue of how the matches and literals are coded. There's quite a few options and whatever scheme you choose largely dictates your compression rate and decoder performance. A simple byte-oriented encoding might be to start each token with a run length byte, for which a positive value indicates a match whereas a negative value is a literal run. Then follow matches by a two-byte offset and literals by the actual literal data.
To achieve better compression than that most coders resort to bit-packing things such that nearer match offsets and shorter run lengths use fewer bits. Formats like ZIP and GZIP even use Huffman to coding on them.
You may think that a triple-loop like this would be unbearably slow but a naive search like this is actually what most native C64 packers use. As long as you put some limit on match lengths and match offsets and don't try to compress a megabyte of zeros anyway.
In the end even with a fair bit of bit-twiddling, some simple hashing to speed up the search process, and so-called optimal parsing to get slightly better matches my own encoder (the one I linked to above) only weighs in at 700 lines or so. Believe me, it's not exactly rocket science.
Bregalad wrote:
If someone could proof me wrong I'll be the happyest, as huffman is really annoying having not byte aligned data.
Most people on this forum probably know this old trick already but I figured I'd repeat it once more since it's immensely useful when dealing with bit-streams. The idea is to set up an 8-bit shift-register in memory from which we can fetch bits into carry with ASL/LSR instructions. The clever part is that if we shift in a one in the least-significant bit whenever the register is refilled we can detect when the register is empty simply by checking the zero flag after a shift, that is it'll only be zero if that one we forced in at the beginning has been shifted all the way out.
Code:
bitbuf = $00 ;initialized to $01
getbit:
asl bitbuf
bne +
jsr getbyte
sec
rol
sta bitbuf
+ rts
Typically you'd inline the shift itself and BEQ to a refill function to speed things up. You can even do a BCC immediately to test for a zero bit without bothering with the refill test (BEQ) since the last bit out will always be a one. Another neat trick is to store bit-pointers to constant data as bit-buffer states rather than as bit indices.
(Writing can be handled similarly of course, that is by rotating carry bits into a shift register initialized with the least-significant bit set and flushing the byte when carry is set afterwards, but I doubt whether writing is of much use on the NES.)