Map data and compression for scrollable map?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic

Map data and compression for scrollable map?
by CrashTest on 2008-09-25 (#37558)

Just curious, what are you doing to store your in game level maps?
Do you have separate collision detection data and character data?
Do you have special data to tell where enemies spawn from?
Did you make your own map editing tools?

For me I'm just using a tileset of up to 4x4 tiles, with one attribute byte per tile set. The map is just a 2d array of tile indices so a map that is 8x9 screens is about 4k for the map, and 4.25k for the tiles+att if I define all 256 of them

Different ranges of my character set define their usage e.g. solid, platform, wall, empty, ladder, etc.

For collision detection I'm debating mirroring the nametable in 1k of work ram (MMC3) as a quick lookup table, since I'll need to test against object against background quite a bit. I can copying into this temporary ram buffer, outside of vblank also.

Also, since I can copy from this buffer to the name table more efficiently, it will take less time in the vblank.

by Celius on 2008-09-25 (#37560)

For my game, which contains one giant map like Metroid and the non-stage based Castlevania games, I will hold a map that points to rooms, and those rooms will point to screens. There are 1024 different possible screens, and each screen is made up of 16 8x8 tiles. Each screen is 256x256 pixels. Each 8x8 tile is made up of 4 4x4 tiles, which are made up of 4 2x2 tiles, which are made up of 4 1x1 tiles. There are 256 different 8x8, 4x4, 2x2, and 1x1 tiles. I can swap different stacks of tile definitions to make more interesting/different areas.

I plan to have different ranges in my 2x2 metatile defs to represent different tile types, like solids or solid diagonals, etc.

I haven't quite figured out the exact format of the whole map, but I know there is a 16k bank full of the different metatile definitions, 16k for the screen definitions, and I'm sure a bank or so for the room definitions.

For collision detection, I plan to have a copy of the screen in RAM that's 16x16 metatiles big, and it'll be easy to do detection here. But this is something that has always concerned me, as there are objects off screen that sometimes I want to remain active, and they can't act properly if they don't collide with anything.

To spawn enemies, I load all of them as I enter a room. They all have 16-bit X/Y coordinates, and they are always somewhat active. When they are killed, they are removed from existence so there are no annoying respawns.

I plan to make my own map editor in Visual Basic, but I haven't gotten around to it yet. I need to make this as typing in levels is a nightmare.

by Banshaku on 2008-09-25 (#37564)

That's a good question. I'm searching on the subject at the moment. Since my project is still not that far, I put that part on hold until I figure out other things about the nes.

I'm planning to write my own map editor in c# since there is no generic one that I can easily use. I'm already able to load the CHR-DATA but now I need to create the editor itself. After that the format of the data, which I'm in the same situation as you (i.e. I don't know how to approach it yet).

Once I find the right format and can make my editor, it will be quite useful. I would love to be able to make a generic one so other people could benefit from it.

Re: Map data and compression for scrollable map?
by tokumaru on 2008-09-25 (#37565)

please think of Sonic games while you read this.

CrashTest wrote:
Just curious, what are you doing to store your in game level maps?

The basic building blocks are the metatiles (256 of them per zone), they have 2x2 tiles, a palette index (0 to 3), a type (solid, empty, platform, water, etc), a height map (the height of each pixel column) and an angle.

Then the crazy compression starts: The 16x16-pixel blocks can't be directly placed in the level. They are used to form 256 32x32-pixel blocks. Those are combined to form 256 64x64-pixel blocks. Those are combined to form 256 128x128-pixel blocks. Those are combined to form 256 256x256-pixel blocks, which can be directly placed in the level map.

This is so compact that the levels can be really huge, with up to 1536 256x256-pixel blocks (such as 128x12 blocks, or 32768x3072 pixels), although there must be a decent level of block reuse for it to work well (not hard in a platformer, would suck for RPG). Looks complicated, but the code that decompresses blocks is very efficient, so it's possible to read any 16x16-pixel block from the level without having to decompress anything to RAM.

Quote:
Do you have separate collision detection data and character data?

These are both attributes of the 16x16-pixel blocks.

Quote:
Do you have special data to tell where enemies spawn from?

My object definitions are completely separated from the level map, so that the blocks can be reused across different levels. Each object in the list has the following information: type, coordinates and an extra parameter (which each type of object uses differently). Having the objects separate from the map even allows for some layering effects, because the objects can draw tiles on top of the level map, making the basic blocks more unique, and not so obvious they are reused so much.

Quote:
Did you make your own map editing tools?

It would be next to impossible to encode my maps like that by hand, so I made a command line utility that creates all the intermediary blocks based on images of the level I want to create and an image that contains all 16x16-pixel blocks. It builds a dictionary of blocks based on that image and then tries to recognize the blocks in the other images, creating all the intermediate blocks in the process. When finished, it reports back how many of each type of block was used, so that I know when I need to start repeating stuff more often.

by clueless on 2008-09-25 (#37591)

I have not changed my game rendering code yet, but I have decided upon a map compression scheme. As with everything else, it is subject to change at any time.

I'm still writing the 'C' utility that will convert PNG files into compressed map data. Documentation on the map format are here [1] and the source code to the compressor program is here [2].

In the end, I suspect that each of us will implement whatever is the most efficient for our game engine to decompress and display based on our game's requirements.

I plan to decompress map to metatiles in SRAM while the PPU is drawing the screen and then convert metatiles into PPU tiles during VBLANK. If that does not work, I'll fully decompress the map section into PPU tiles during PPU rendering and blit the required columns and rows into the PPU during VBLANK. I haven't gotten that far yet.

[1] https://www.ecoligames.com/svn/nes-game/trunk/NOTES.txt

[2] https://www.ecoligames.com/svn/nes-game/trunk/tools/map-maker/

Re: Map data and compression for scrollable map?
by doynax on 2008-09-26 (#37599)

In my current 6502 game project, a single-directionally scrolling shoot'em up, I've tried to make things as simple as possible for the artist. That is I've got a tool which takes one big bitmap and spits out individual tile indices and attributes, and packs them using general-purpose LZ compression. Furthermore the tileset isn't fixed but is refilled automatically as the level scrolls (though that feature is still buggy.)
The idea here is that my potential level designer shouldn't have to deal with funky tilemap editors, meta-tiles or budgeting tilesets. That's not to say that you can ever forget about tiles and palettes, but I'm hoping that minimizing the amount of arbitrary limitations and being able to use standard tools will pay off in the end.
Oh, and since it's a vertical shoot'em up the background is purely decorative.

On the other hand the actual level script, that is a time-line primarily controlling how the enemies are spawned, is very simple. There are no reused formations, or run-time scripting, or even an editor. Just a bunch of commands of the form 'spawn enemy of class A at point B', 'wait C ticks' or 'start effect D', as generated by macros through the normal assembler (thus giving you access to the assemblers somewhat clunky preprocessor facilities for compile-time scripting.) The one nice thing about it is that it's all embedded into the same LZ-compressed stream as the backgrounds, so you don't have to pay much attention to the size of the script.

Frankly I've been looking to co-opt some sort of visual editor for laying out the enemies. Any ideas? All I really need is to be able to place enemies (represented by some bitmap for each class) over the level bitmap and set the attributes controlling their behavior, plus a margin or something for miscellaneous commands. Being able to test play the current section of the level at the press of a button would be nice too.

by Bregalad on 2008-09-26 (#37603)

I like having 32x32 pixels "blocks", themselves made by 4 16x16 pixels "metatiles", themselves made of 4 tiles plus color/collision information.

The day I'm making a RPG or sRPG I'll probably directly code my maps with 16x16 metatiles tough for more flexibility (until I'd come with really large characters and all).

by Memblers on 2008-09-27 (#37658)

Here's my level editor, it's not finished because I haven't needed it yet. It would save to WRAM and/or controller port (XMODEM). It should be able to save now, but it doesn't load.

http://www.parodius.com/~memblers/nes/editron/

by Dwedit on 2008-09-28 (#37715)

Having a lot of experience in Romhacking made me immediately implement map compression on my first real ASM program. I used 4-bit RLE just like Dragon Warrior.

by doynax on 2008-09-28 (#37718)

Dwedit wrote:
Having a lot of experience in Romhacking made me immediately implement map compression on my first real ASM program. I used 4-bit RLE just like Dragon Warrior.
I'm not familiar with how Dragon Warrior handles this. How do you scroll around within a compressed map efficiently?

by tepples on 2008-09-28 (#37727)

I'd imagine that these RPGs that use RLE maps decode a row of tiles at a time and keep them in a buffer.

by Dwedit on 2008-09-28 (#37733)

Dragon Warrior in addition to the RLE compressed map data also stored a lookup table which maps between rows and the start address of each row.
So you have random access to rows, and sequential access within a row.

Dragon Warrior 4 took it one step further, and split the map vertically down the middle, so there is random access to either the first or second half of a row, presumably because DW4 needed more CPU time in the overworld than the other games.

Only DW1 had RLE runs extending beyond the boundaries of a row, that was so they could have water on the right side of a map, and save a byte by using longer water on the left side of the next row.

DW1's map format:
Code:
xxxx.... = tile number
....xxxx = length

Zelda 2 almost uses the exact same format for the overworld, just with the nibbles swapped.

by doynax on 2008-09-28 (#37749)

That seems rather inefficient though. Scrolling involves decompressing up to N screens of data (where N width of the level) and storing the start pointers eats up precious bytes, especially considering the generally meager RLE savings.
I suppose it could be sped up with various techniques. For instance the pointer table could be generated at runtime when first entering a region. A buffer of a couple of columns could be stored on the sides of the screen, shifted horizontally relative to each other, such that you only need to decode a subset of the rows when scrolling.

But to be honest I don't see why they didn't just unpack the whole thing into memory using better compression schemes, kind of like SMB3 except with Exomizer or something instead of that funky 'vector' format. Or at least store the entire columns as tepples suggests. After all a few save games couldn't use up much of the SRAM, right?

Dwedit wrote:
DW1's map format:
Code:
xxxx.... = tile number
....xxxx = length

Zelda 2 almost uses the exact same format for the overworld, just with the nibbles swapped.
Wait, you mean there's only 16 tiles in a map?

by Bregalad on 2008-09-28 (#37751)

I guess it's 16 metatiles. The overworld is very simple graphically in Dragon Warrior games, altough very large.
I also use a similar format for my game, but can allow up to 32 metatiles (using 5 bits) and only runs up to 8 of the same metatile (using 3 bits).
I have random acess because I place everything in RAM (I only have 1 screen loaded at a time). I guess RPGs uses SRAM to do this, but definitely not Dragon Warrior 1&2, because they originally were without SRAM in Japan.

by clueless on 2008-09-28 (#37781)

https://www.ecoligames.com/svn/FariaEdi ... rWorld.cpp

Faria uses RLE, but with three different types of runs:

1) The three most common runs of metatiles (water, grass, forest) can have runs of 2 to 17 metatiles. Bit pattern is 0x11mmcccc (m = metatile index into lookup array and c = count - 2)

2) Then there are 8 lesser types (generally horizontal boundaries between grass and other terrain types). These runs can be from 2 to 9 in count. Bit pattern is 0x10mmmccc.

3) Other metatiles don't have repeat count. This is also how one would represent a single water tile. Bit pattern is %0mmmmmmm.

The overworld is 128x128 metatiles, and compresses down to 7443 bytes in size. Uncompressed it would be 16K. There are a possible 128 metatiles, but only about 120 are actually used.

The skyworld and cave world are not RLE compressed. There are only 4 possible map tiles in each, so those maps are just stored as 4 metatiles per byte.

And I disagree that RLE compression doesn't compress enough. It is enough to fit the entire overworld into a single MMC1 or MMC3 PROG-ROM bank. That makes decompression significantly easier (and faster) to implement. Typical MMC1 bank switching (LSR, STA 5 times) requires 2*(2+4) + jsr + rts = (too lazy to add it up) many CPU cycles.

Not shown in the ROM editor is another fact, Faria does decompress the visible portion of the overworld into temp space. It will decompress rows as needed as you scroll north and south. I don't remember if it buffers a off-screen few tiles in the east/west direction or not. If it doesn't, it would need to decompress 15 rows each time you move east or west.

by tomaitheous on 2008-09-28 (#37791)

I'm not sure I would even compress my tilemaps. Since all the methods described so far are sequential access for the decompression, that would add some overhead not to mention creating a marker system for entry points (I had to do this for a block of RLE compression fonts). For 8way scrolling, anyway.

Just setting up 16x16 metatile tilemap has some nice savings already. You'd only need 6bits to represent any given metatile. The upper bits could be used for palette association of that metatile. 8bits per 16x16 entry is pretty decent, plus the compression ratio is static and accessible in random access. If you broke down larger tilemaps into 256x240 pixel maps/screen, then you could compress for redundant screens that way.

Some examples that I've seen ingame:

Bonk's tilemap engine uses 16x16 metatiles. The palette association is hardcoded to the tile number for the stage. The overall tilemap is made up of screen sized tilemaps and these tilemaps are stored in vertical strips. I don't remember offhand if the vertical strips were compressed in RLE or not (probably is).

by clueless on 2008-09-28 (#37792)

Crystalis does something like that. "Maps" are composed of a small M x N grid of block numbers and some palettes and music data. Each grid # represents a 16x15 screen of metatiles. This is why so much of the game looks repetitive.

by tepples on 2008-09-28 (#37799)

clueless wrote:
Crystalis does something like that. "Maps" are composed of a small M x N grid of block numbers and some palettes and music data. Each grid # represents a 16x15 screen of metatiles.

Animal Crossing does much the same thing, building the town map out of acre-sized chunks.

by doynax on 2008-09-29 (#37833)

clueless wrote:
And I disagree that RLE compression doesn't compress enough. It is enough to fit the entire overworld into a single MMC1 or MMC3 PROG-ROM bank. That makes decompression significantly easier (and faster) to implement. Typical MMC1 bank switching (LSR, STA 5 times) requires 2*(2+4) + jsr + rts = (too lazy to add it up) many CPU cycles.
If it's enough, then it's enough. RLE is certainly better than nothing. *shrug*

The thing about RLE compression isn't so much that it compresses poorly as that it forces you to use a particular graphical style. This works nicely for some games, such as SMB3 with its straight lines and open spaces, but to be honest I think it makes RPGs look rather plain. A bit of dithering in the grass and the occasional flower or tree stump makes all the difference for a game's look.

Granted, a general-purpose compression algorithm is unlikely to rival the compression rates of a well-designed custom scheme with data designed to make the best use of it. But I consider the advantage in freedom for the artist, as well as not to having to work out half a dozen specialized packers different parts of the game well worth the cost. The other drawback here is, of course, that you'd have to split up the overworld and other huge maps. Keeping the whole map in RAM is not without certain advantages though, such as being able to keep track of updated tiles or allowing NPCs to move around off-screen. Not to mention that a gigantic 128x128 map (73 screens?) could surely need a few tileset and palette changes.

As for bank-switching overhead, well, that would depend on your mapper and how the decompressor is designed. For MMC3 you can simply set up both slots to straddle the pair of banks crossed. And some compressors (my own included) have to deal with 256-byte boundaries anyway in order to support floppy streaming. Besides, anything which saves you from having to juggle half the data in the game in order to fill up banks as best as possible is a good thing.

by Celius on 2008-09-29 (#37854)

RLE is good for lots of things, I think. For example, the overworld in Final Fantasy is probably more than 50% water. RLE compresses these rows to be two or so bytes instead of 140 in some spots.

I made an RPG map scroller that loads an RLE compressed map like this. The major disadvantage with it is that it can take FOREVER to work through. The problem is that you have to go through every individual byte to get to the tile you want. There can be no calculating, as there's no way to tell what's compressed and what's not without checking every individual byte. It can take longer than a frame to get a column of metatiles depending on where you are in the map.

by tepples on 2008-09-29 (#37856)

For big maps that can be scrolled in any direction, recognizing long "runs" in one direction can be hard to deal with. Perhaps a quadtree subdivision method like tokumaru's might help: if a node is entirely water, entirely grass, etc., we can replace it with one byte.

by Celius on 2008-09-29 (#37861)

It might be a little hard to make an RPG map with block metatiles. I use 64x64 pixel metatiles for my platformer similarly compressed to Tokumaru's metatiles, but RLE works better for RPG maps as for land, there needs to be more unique looking areas. Though it'd be nice to compress blocks of water to 1 byte.

by tepples on 2008-09-29 (#37863)

Celius wrote:
It might be a little hard to make an RPG map with block metatiles.

I figured that. I didn't mean apply tokumaru's solution directly, but instead to decompose each square area of the map (starting at 16x16 cell areas) into four subsquares. If a subsquare is solid, stop; otherwise, subdivide it and repeat.

Perhaps it might be easier for me to express my idea by drawing a diagram of how it would be applied to an actual RPG map. Which map should I try it on?

by Celius on 2008-09-29 (#37867)

Final Fantasy III. Or I, if it's easier to find.

by doynax on 2008-09-29 (#37882)

tepples wrote:
Celius wrote:
It might be a little hard to make an RPG map with block metatiles.

I figured that. I didn't mean apply tokumaru's solution directly, but instead to decompose each square area of the map (starting at 16x16 cell areas) into four subsquares. If a subsquare is solid, stop; otherwise, subdivide it and repeat.
Hm. Presumably you'd pre-calculate/store a pointer to the start of each such 16x16 block for "random access." And avoid decoding partial blocks by extending the screen buffer by 16 tiles on each axis, perhaps combined with some convoluted scheme to spread out the cost of decompressing a row of blocks over multiple frames. Besides that you'd need some way of indicating whether to fill up the current subsection with a fixed pattern or further subdivide it. Like a bit vector or an escape tile.

Here's my counter-proposal. Why not do nearly the same thing, that is divide the map into 16x16 blocks with a pointer to the start of each one, but use RLE within these blocks instead? And to improve locality lets store the tiles within these blocks in an alternative order, say according to a space-filling curve (note that the rearrangement can be trivially implemented with some unrolled copying code.)
That ought to compress about as well as the subdivision approach and be quite a bit easier to implement. Well, hopefully anyway...

by clueless on 2008-09-29 (#37885)

doynax wrote:
Here's my counter-proposal. Why not do nearly the same thing, that is divide the map into 16x16 blocks with a pointer to the start of each one, but use RLE within these blocks instead? And to improve locality lets store the tiles within these blocks in an alternative order, say according to a space-filling curve (note that the rearrangement can be trivially implemented with some unrolled copying code.)
That ought to compress about as well as the subdivision approach and be quite a bit easier to implement. Well, hopefully anyway...

That is what I'm doing right now in my own project. My maps are arrays of 8x8 metatile blocks. Each block is RLE compressed. Identical blocks share the same RLE sequence. I plan to buffer more than a screen's worth of blocks in SRAM.

Except that I've not thought of the 'space filling curve' thing. I'm not quite sure what you mean by that.

My goal right now is not to get the most optimal compression. My goal is to get a reasonable trade-off between size and decompression speed and jump into actual scrolling and character actions (jumping, fighting, interacting with objects, etc...)

Can you please elaborate on the 'space filling curve'? Sounds like theoretical physics

by doynax on 2008-09-29 (#37889)

clueless wrote:
My goal right now is not to get the most optimal compression. My goal is to get a reasonable trade-off between size and decompression speed and jump into actual scrolling and character actions (jumping, fighting, interacting with objects, etc...)
Again, I've got to wonder why you don't just make it easy for yourself and decompress the whole thing into memory when entering a region. Together with 2x2 metatiles you'll still get more than 18 screens if you set aside half of SRAM for the map.
Just plug in Exomizer and let it do its thing and I promise you that you'll get excellent compression rates.

clueless wrote:
Can you please elaborate on the 'space filling curve'? Sounds like theoretical physics :)
There's nothing remotely scary about it.
Essentially there's no reason why you have to save the tiles within each 16x16 block in left-to-right top-to-bottom order. And one possible set of such alternative orders are the so-called space-filling curves. Think of it as a puzzle where you have to cover every square of a grid by without ever lifting your pencil.

With the right curve you can make sure that the starting point of a long RLE run is as close to the endpoint as possible, as opposed to the traditional scheme where the opposite is true. The idea here is much the same as for tepple's subdivision approach, that is to exploit the fact that four tiles in a square are (presumably) more likely to share the same pattern than four tiles on a line.

You should also notice that you don't have to work out the curve itself at runtime. Rather you can simply reorder the tiles within a decompressed block through a series of LDA/STA opcodes (or let the RLE routine push onto the stack and use a PLA/STA series.)

by tomaitheous on 2008-09-29 (#37898)

Quote:
My goal right now is not to get the most optimal compression. My goal is to get a reasonable trade-off between size and decompression speed and jump into actual scrolling and character actions (jumping, fighting, interacting with objects, etc...)

You could setup the decompression routine as something similar to a processing thread. You just have to make sure to decompress enough rows into ram/buffer that the game tilemap engine can start to read and you would have to request the tilemap decompression system at an earlier state. It's tricky and requires knowing/calculating the max scrolling speed and some other variables. Plus, since you'd more than likely base it off number of bytes instead of a "time slice", you'd want to calculate for almost worst cast scenario in cpu cycles. This would lend it self nicely for LZSS or even more layered compressed variants like "pucrunch" or adding huffman to LZSS, etc.

If you limit your compression to a horizontal or vertical strip defined parameter, you can build in small headers for fast skipping, i.e. speed up "seeking" on sequential accessed streams. Add a more coarse layer(seek map/array) on top of that to really speed it up the seek time. RLE works great with this approach and "token pointer" compression also works pretty well, albeit slower than RLE. Do a variable length control code system and use both

I'm curious if anyone else has seen fixed "seeded" buffers for LZSS decompression, before the decompressor starts? The decompression engine knows the LZ window is primed with a series of values before hand and can/will access them from the very start. I saw it once in a game (it was a 4k ring buffer) and looked like it helped out for structured patterns and data like tilemaps.

Also, if anyone's interested, pucrunch sets up nicely for a ring buffer setup. The standalone decompresser is already written in 6502, you just need to make some minor modifications. Chris Covell has already adapted it for rom based projects, shouldn't be to much trouble to move it back over to 6502 from 6280. It works fairly decent for 256 and 512 byte ring buffers. If you don't know, a ring buffer setup allows you to decompress an LZ stream directly to an I/O port.

by tepples on 2008-09-30 (#37913)

doynax wrote:
tepples wrote:
Celius wrote:
It might be a little hard to make an RPG map with block metatiles.

I figured that. I didn't mean apply tokumaru's solution directly, but instead to decompose each square area of the map (starting at 16x16 cell areas) into four subsquares. If a subsquare is solid, stop; otherwise, subdivide it and repeat.
Hm. Presumably you'd pre-calculate/store a pointer to the start of each such 16x16 block for "random access." And avoid decoding partial blocks by extending the screen buffer by 16 tiles on each axis

That wouldn't be entirely necessary. You might traverse the quadtree a tile at a time, much like in tokumaru's decoder.

Quote:
Besides that you'd need some way of indicating whether to fill up the current subsection with a fixed pattern or further subdivide it. Like a bit vector or an escape tile.

$00-$7F: subdivide this section and jump several nodes ahead
$80-$FF: fill this section with constant tile

Quote:
Here's my counter-proposal. Why not do nearly the same thing, that is divide the map into 16x16 blocks with a pointer to the start of each one, but use RLE within these blocks instead?

You just described how Animala stores ACWW towns for copying them between the PC and DS.

Quote:
And to improve locality lets store the tiles within these blocks in an alternative order, say according to a space-filling curve (note that the rearrangement can be trivially implemented with some unrolled copying code.)

RLE on a space-filling curve, such as the Hilbert curve, tends to look a lot like a quadtree.

Quote:
Again, I've got to wonder why you don't just make it easy for yourself and decompress the whole thing into memory when entering a region. Together with 2x2 metatiles you'll still get more than 18 screens if you set aside half of SRAM for the map.

That's still a lot of SRAM. I'm trying to explore different points on the time-space tradeoff, especially for projects that can't use much more than the 2048 bytes in the NES itself.

by tokumaru on 2008-09-30 (#37917)

tepples wrote:
You might traverse the quadtree a tile at a time, much like in tokumaru's decoder.

You know, I intentionally worked with quadtrees before for CHR compression, but I didn't realize I was doing it with my map compression! O_o

Although I don't take advantage of completely filled blocks in my engine (they all must have 4 sub-blocks), I could easily add a flag for each square, indicating if that square uses a sub-block or is simply filled with a single type of metatile.

And yeah, you can easily read rows and columns from this map on the fly, without having to decompress a whole screen. All you need to do is set up the high bytes of the pointers to each type of block. Once you have the index of the huge (256x256 pixels) block, you use it as the lower byte of the addresses that point to the next set of blocks (128x128).

When rendering rows or columns, you can be sure that only 2 sub-blocks of each block will be read (for rows it's either the top 2 or the bottom 2, and for columns it's either the left 2 or the right 2). So I need 2 pointers for each type of block (256x256, 128x128, 64x64, 32x32).

Here are some details about my decompression method:

Code:
Metatiles are 16x16 pixels. The largest block is 16x16 metatiles, or 256x256 pixels. It is constantly divided into 4 pieces, each dived into 4 others, and so on until the metitiles are reached.

How to decode a 256x256 block:

X Coordinate (used for decoding columns): DCBAPPPP

PPPP: pixel coordinates, not used for decoding the map;
A: left or right of the 32x32 block;
B: left or right of the 64x64 block;
C: left or right of the 128x128 block;
D: left or right of the 256x256 block;

Y coordinate (used for decoding rows): DCBAPPPP
PPPP: pixel coordinates, not used for decoding the map;
A: top or bottom of the 32x32 block;
B: top or bottom of the 64x64 block;
C: top or bottom of the 128x128 block;
D: top or bottom of the 256x256 block;

These bits can be shifted right into the high bytes of the pointers. This will be handled differently depending on if you're rendering rows or columns, but once the pointers are ready the same code can be used to read rows and columns. The necessary pointer are the following:

Block256A, Block256B: 2 128x128 blocks of the 256x256 block;
Block128A, Block128B: 2 64x64 blocks of the 128x128 block;
Block64A, Block64B: 2 32x32 blocks of the 64x64 block;
Block32A, Block32B: 2 metatiles of the 32x32 block;

The pointers are reused as the parent blocks are completely decoded. The code looks basically like this:

;decode 128x128 first block
lda (Block256A), y ;get the index of the first 128x128 block
sta Block128A+0 ;use it to read the first 128x128 block
sta Block128B+0 ;use it to read the second 128x128 block

;DECODE 128x128 BLOCK HERE

;THE FIRST BLOCK WAS DECODED, REUSE POINTERS TO DECODE THE NEXT ONE!

;decode 128x128 second block
lda (Block256B), y ;get the index of the second 128x128 block
sta Block128A+0 ;use it to read the first 128x128 block
sta Block128B+0 ;use it to read the second 128x128 block

;DECODE 128x128 BLOCK HERE

Y must be loaded with 0, because we don't use it (although it is used when reading single metatiles for collision purposes, because it's faster). This code is repeated for each block. The blocks are read from the largest to the smallest, and 2 of each type are processed. You can see that the index taken from the parent block is used to read the indexes of 2 child blocks, until the metatiles are reached, and they are read 2 at a time into a buffer.

This is what I do when scrolling, and for the amount of compression I get I'd say it's not that complicated or slow. With little modification, the blocks could be flagged as "compressed blocks", in which case the index read would not point to a child block, but to a metatile that is used to fill the entire block. This would prevent one from wasting blocks if there are no details inside them.

Thanks tepples, for suggesting this expansion. I don't know if I'll use it in my current project, but it's good to keep in mind that there is an interesting compression method that offers somewhat random access to the data.

by doynax on 2008-09-30 (#37918)

tomaitheous wrote:
I'm curious if anyone else has seen fixed "seeded" buffers for LZSS decompression, before the decompressor starts? The decompression engine knows the LZ window is primed with a series of values before hand and can/will access them from the very start. I saw it once in a game (it was a 4k ring buffer) and looked like it helped out for structured patterns and data like tilemaps.
You could use the LZ78/LZW family of algorithms with fixed dictionaries, as opposed to LZ77/LZSS sliding-window variants.
To get fine-grained random access to a large map you probably want use a share a fixed dictionary in ROM for all of the blocks anyway, as small 256-byte (or thereabouts) blocks are too small for LZ77 to function effectively.

tomaitheous wrote:
Also, if anyone's interested, pucrunch sets up nicely for a ring buffer setup. The standalone decompresser is already written in 6502, you just need to make some minor modifications. Chris Covell has already adapted it for rom based projects, shouldn't be to much trouble to move it back over to 6502 from 6280. It works fairly decent for 256 and 512 byte ring buffers. If you don't know, a ring buffer setup allows you to decompress an LZ stream directly to an I/O port.
I offer my sliding-window compressor as well, if anyone is interested. It decompresses quite a bit faster than Pucrunch or Exomizer and gets compression rates somewhere between the two.
Though, again, if all you can spare is a 256-512 byte buffer then a fixed dictionary is probably a better solution.

tepples wrote:
doynax wrote:
Hm. Presumably you'd pre-calculate/store a pointer to the start of each such 16x16 block for "random access." And avoid decoding partial blocks by extending the screen buffer by 16 tiles on each axis
That wouldn't be entirely necessary. You might traverse the quadtree a tile at a time, much like in tokumaru's decoder.
I don't follow. Are you talking about decoding individual tiles within the block without having to process the whole thing?
You'd need to put in pointers to the internal section to do that.

tepples wrote:
doynax wrote:
And to improve locality lets store the tiles within these blocks in an alternative order, say according to a space-filling curve (note that the rearrangement can be trivially implemented with some unrolled copying code.)
RLE on a space-filling curve, such as the Hilbert curve, tends to look a lot like a quadtree.
Well, yeah, that was the general idea ;)

by tomaitheous on 2008-09-30 (#37925)

Quote:
I offer my sliding-window compressor as well, if anyone is interested. It decompresses quite a bit faster than Pucrunch or Exomizer and gets compression rates somewhere between the two.

I'm always on the look out for different compression schemes and variants Got a link?

by doynax on 2008-09-30 (#37928)

tomaitheous wrote:
Quote:
I offer my sliding-window compressor as well, if anyone is interested. It decompresses quite a bit faster than Pucrunch or Exomizer and gets compression rates somewhere between the two.

I'm always on the look out for different compression schemes and variants :D Got a link?
I just took another look at it and it's not trivially ROMable. I'd have to put some thought into how it first to come up with a reasonably fast scheme, sorry. Anyone thinking about doing some FDS development?

I've been thinking about writing an LZW coder instead. I don't remember seeing one for the 6502 before, presumably because it wastes memory on an RAM-based system.

http://doynax.googlepages.com/lz.zip (some credit should go to ByteBoozer which the format is based on)

by tokumaru on 2008-09-30 (#37929)

doynax wrote:
I've been thinking about writing an LZW coder instead. I don't remember seeing one for the 6502 before, presumably because it wastes memory on an RAM-based system.

LZ78/LZW need extra memory just for the dictionary, while LZ77/LZSS make direct copies of previously decompressed data. I doubt any 6502 system has enough RAM to implement a LZW dictionary.

by doynax on 2008-09-30 (#37932)

tokumaru wrote:
doynax wrote:
I've been thinking about writing an LZW coder instead. I don't remember seeing one for the 6502 before, presumably because it wastes memory on an RAM-based system.

LZ78/LZW need extra memory just for the dictionary, while LZ77/LZSS make direct copies of previously decompressed data. I doubt any 6502 system has enough RAM to implement a LZW dictionary.
The point here is that the dictionary can be stored in ROM rather than RAM. So you could unpack directly to CHR-RAM without a buffer on the NES-side. The other advantage is that you can decompress small blocks in the middle of a file without decompressing everything up to that point, or splitting it up into independent blocks, as with LZ77.

Then again I've never actually implemented such a coder, so just how effective it would be remains to be seen.

by tepples on 2008-09-30 (#37934)

tokumaru wrote:
LZ78/LZW need extra memory just for the dictionary

So does any game engine that uses metatiles, including yours.

tokumaru wrote:
I doubt any 6502 system has enough RAM to implement a LZW dictionary.

In other words, you doubt that giffy.prg, a C64 GIF viewer, works.

by tokumaru on 2008-09-30 (#37935)

tepples wrote:
So does any game engine that uses metatiles, including yours.

I meant RAM, lots of RAM.

tokumaru wrote:
In other words, you doubt that giffy.prg, a C64 GIF viewer, works.

Yeah, maybe I was a little extreme. Once I tried to create an LZW decoder (not 6502 related at all) and I managed to make the dictionary use around 20KB. Maybe the C64 can handle that... doesn't it have 64KB of RAM?.

I haven't looked into LZW in a while, but I guess it may be possible to make something even for the NES, if the amount of data is kept within a limit or something, to control how large the dictionary can get.

Having a static dictionary in ROM is pretty weird for LZW...

by tepples on 2008-09-30 (#37937)

But I do know of a couple dictionary-based codecs that use a static dictionary. One is metatiles. Another is the text codec "huffword", which first encodes each possible sequence of non-whitespace characters to an integer and then uses an entropy coder (e.g. Huffman) on those integers.

by doynax on 2008-09-30 (#37940)

tokumaru wrote:
tepples wrote:
So does any game engine that uses metatiles, including yours.
Having a static dictionary in ROM is pretty weird for LZW...
Ah, it appears that LZ78/LZW still dynamically build rebuild the dictionary at runtime. I simply meant a static dictionary coder, whatever the proper term for that may be.

Whether a static dictionary method can ever hope to compete with the dynamic ones is another matter. Presumably someone, somewhere, has researched this already and I just have to figure out the magic Google keywords.

by tomaitheous on 2008-09-30 (#37946)

Quote:
I simply meant a static dictionary coder, whatever the proper term for that may be.

I've seen similar token style setups like that. A string is replaced with a token that points to the source location / dictionary instance (ROM). The first instance in memory has a terminator at the end of it. On a normal read, the terminator is ignored. On a token call, the terminator ends the call. If setup correctly, you can get full random access and all without a need for separate buffer. Though you'll need a marker/position map to know where the offset starts for the specific strings you need to read out.

It's faster than LZ variants and doesn't necessarily require a temp buffer. Though it doesn't have the advantage of I/O to I/O decompression that LZSS variants offer.

by Bregalad on 2008-10-01 (#37954)

As far I know LZ series compression are quite complex, not too hard to decode but very hard to encode. Also they're not too efficient unless you have lot of data repeating itself, which isn't likely to be the case in data that is already organized to be compact.

Huffman is more intuitive and easier to understand (for me at least), and it's not too hard to encode and decode (but it's not really easy either). All it needs to decompress is an array of pointers into ROM. If compressed data is smaller than original data by an amount greater than the array's size, then it's work implementing I guess.

by tokumaru on 2008-10-01 (#37956)

Bregalad wrote:
LZ series compression are quite complex, not too hard to decode but very hard to encode.

You should know that LZ77 is way different than LZ78, so you can're really put them in the same "series". The compression in both schemes is not necessarily hard, it only gets more complex as you try to produce optimal compressed streams. The basic implementation is pretty simple.

Quote:
Also they're not too efficient unless you have lot of data repeating itself, which isn't likely to be the case in data that is already organized to be compact.

But that's the case with pretty much every compressing scheme. Most won't perform well on data that is already compressed in some other way.

Quote:
Huffman is more intuitive and easier to understand (for me at least), and it's not too hard to encode and decode (but it's not really easy either).

IMO, the gains from huffman compression are very modest. This is why it's rarely used by itself, while other schemes (most dictionary-based) are (were?) often used by themselves. Huffman is usually used to get that last drop of compression you can squeeze out of data that was already compressed in some other way.

Quote:
If compressed data is smaller than original data by an amount greater than the array's size, then it's work implementing I guess.

...and there's still the major overhead that is the table needed for decoding. There is an adaptive variation of huffman that doesn't need a table, but it'd probably be hard to implement in systems with little RAM.

by Bregalad on 2008-10-01 (#37957)

Quote:
...and there's still the major overhead that is the table needed for decoding. There is an adaptive variation of huffman that doesn't need a table, but it'd probably be hard to implement in systems with little RAM.

Well that's the array I was refering too. In a 8-bit implementation it would typically not be more than about 300-600 bytes I guess. Is it much worse than wasting bytes to say "0, 1, 2, 3, 4, 5" etc up to 256 like you said you liked to do ?

Quote:
You should know that LZ77 is way different than LZ78, so you can're really put them in the same "series". The compression in both schemes is not necessarily hard, it only gets more complex as you try to produce optimal compressed streams. The basic implementation is pretty simple.

Well, I don't remember very well the algorithms. I just remember you replace a string of bytes by a pointer to a previous reference of the same string, which isn't going to happen very often on a map.

Huffman works well on text, but not very well for other data. I guess LZ series were also developped with text in mind. Data is hard to be compressed by more than one algorithm (maybe if one of them is RLE). And of course the best is to come up with your own copression sheme, but we're not all doctors in computer science.

by doynax on 2008-10-01 (#37958)

Bregalad wrote:
tokumaru wrote:
You should know that LZ77 is way different than LZ78, so you can're really put them in the same "series". The compression in both schemes is not necessarily hard, it only gets more complex as you try to produce optimal compressed streams. The basic implementation is pretty simple.
Well, I don't remember very well the algorithms. I just remember you replace a string of bytes by a pointer to a previous reference of the same string, which isn't going to happen very often on a map.
I've gotten very good results from LZ on the maps I've tried. For one thing LZ77 handles RLE sequences quite efficiently (with overlapping references.)
Aside from that I've never managed to write even a semi-efficient 6502 Huffman decoder, nothing close to a good LZ implementation anyway. The usual table-based optimizations just aren't applicable here. Perhaps with a small alphabet, something on the order of a few dozen tiles, a hand-rolled decoder would be fast enough.
As for complexity, well, that would depend on the kind of refinements you want I suppose. But if you check out a tutorial I think you may discover that LZ coding is less scary than you think.

On the other hand with Huffman coding can get random access without much trouble, I've even exploited this fact in the past to parallelize an x86 decoder. Especially if you can afford to store the raw trees directly in ROM.

by tepples on 2008-10-01 (#37959)

Bregalad wrote:
Well, I don't remember very well the algorithms. I just remember you replace a string of bytes by a pointer to a previous reference of the same string, which isn't going to happen very often on a map.

But as doynax pointed out, run-length encoding is just a special case of LZ77 with copy distances restricted to 1 word. So your horizontal run of grass tiles will show up as a single byte, then a run whose pointer is to the tile you just wrote.

Quote:
Huffman works well on text, but not very well for other data.

Huffman works great for any where some integer values are significantly much more common than others. For example, in an RPG overworld map, there are more spaces set to water than road, and more spaces set to road than village. And if you dedicate one encoded value to "copy the tile north of this one", there will be a metric booty-load of that value, which represent elements in vertical runs.

by doynax on 2008-10-01 (#37961)

Don't anyone of you guys have an old RPG project or something lying around? Having some test data to experiment with couldn't hurt.

by Bregalad on 2008-10-01 (#37963)

Quote:
On the other hand with Huffman coding can get random access without much trouble, I've even exploited this fact in the past to parallelize an x86 decoder. Especially if you can afford to store the raw trees directly in ROM.

It really doesn't allow random acess at all. We shouldn't be talking about the same thing I guess. Huffman uses less bits for common elements, but more bits for uncommon elements. This should work well for maps as well, as some elements are more common. I don't say huffman is better than LZ series, just that personally I have less trouble understanding and implementing it. In the couple of books and tutorials I've seen arround, LZ seemed so much a headache. It would be allright to make a decoder, but horrible to do an encoder.

If someone could proof me wrong I'll be the happyest, as huffman is really annoying having not byte aligned data.
(I use neither in my project anyway right now, I'll probably compress text a little if I have enough of it).

by tepples on 2008-10-01 (#37965)

Bregalad wrote:
In the couple of books and tutorials I've seen arround, LZ seemed so much a headache. It would be allright to make a decoder, but horrible to do an encoder.

That's why you borrow someone else's LZ encoder and change the output format to suit your decoder design. That's probably what Nintendo did when adapting an LZSS implementation identical to that from the Allegro library for the GBA BIOS.

by tokumaru on 2008-10-01 (#37968)

Bregalad wrote:
Is it much worse than wasting bytes to say "0, 1, 2, 3, 4, 5" etc up to 256 like you said you liked to do ?

That's different, because that's a trade off. I chose to sacrifice space in order to improve performance. In the case of compression you'd be sacrificing space in order to... well, save space! It's not really a trade off! =)

And what's worse about the huffman tables is that you'd most likely need different ones for different sets of data, because the same code will simply not work for all pieces of data.

Quote:
Well, I don't remember very well the algorithms. I just remember you replace a string of bytes by a pointer to a previous reference of the same string,

That's LZ77. LZ78 doesn't point to previously output data, instead, it builds an actual dictionary as the data is processed. Each entry is made up of another entry + a new character (making the dictionary very easy to represent in RAM). The dictionary is maintained in a way that the compressor and the decompressor build the same dictionary as they work through the data.The algorythm is pretty interesting, you should look it up.

Quote:
which isn't going to happen very often on a map.

Of course it happens often. Maps have many repeating patterns, as well as lots of empty (filled with the same tile) areas, both vertically and horizontally. As it's been said, LZ77 even has built-in RLE, so you can be sure it will perform at least as well as RLE, but often much better. RLe is terrible for 2D data, but LZ can easily copy data from the previous row, taking some advantage of repeating vertical patterns.

I feel that Huffman is pretty bad when it comes to maps, because there is a lot of spacial redundancy that is simply not used. Huffman cares about how many times each value is used, but doesn't give a damn about where they are used, totally ignoring spacial redundancy. Maybe a hybrid of Huffamn and LZ would work well. And I don't mean one algorithm applied on top of the other like is commonly seen, but actually using variable-sized codes along with repeat counts and pointers.

Bregalad wrote:
In the couple of books and tutorials I've seen arround, LZ seemed so much a headache. It would be allright to make a decoder, but horrible to do an encoder.

An encoder is not hard to make at all if you don't care about speed. And why would you? I see nothing wrong with letting a completely unoptimized encoder working for a couple minutes (although even that sounds like too much) while you surf the web or something, waiting for it to finish. What matters is decompression, because that's what will be in you final product.

by doynax on 2008-10-02 (#37980)

Bregalad wrote:
Quote:
On the other hand with Huffman coding can get random access without much trouble, I've even exploited this fact in the past to parallelize an x86 decoder. Especially if you can afford to store the raw trees directly in ROM.
It really doesn't allow random acess at all. We shouldn't be talking about the same thing I guess.
What I meant was that you can start decompressing directly from any random byte in a Huffman stream as long as you've got a bit-pointer to it. The bytes are completely independent of each other, as opposed to LZ coding where any given piece of data depends on everything coded before it.

Bregalad wrote:
I don't say huffman is better than LZ series, just that personally I have less trouble understanding and implementing it. In the couple of books and tutorials I've seen arround, LZ seemed so much a headache. It would be allright to make a decoder, but horrible to do an encoder.
To be honest I had quite a hard time wrapping my head around Huffman coding myself, the whole tree-building thing and why it works is just plain weird ;)
Basic LZ77 coding is straightforward by comparison. The idea is to go through the file from start to end and at each byte search for the longest match (behind the cursor) to the current data. With a brute-force search you'd start at each byte before the cursor and test just how many bytes match, then keep track of the longest one found. If nothing is found, or if the longest match is still too short save any space, then write a literal byte instead.

You've undoubtedly seen some pseudo-code already in the aforementioned tutorials but in practice an implementation might look kind of like this:
Code:
void lz(const char *begin, const char *end) {
for(const char *cursor = begin; cursor != end; ) {
   const char *match;
   size_t match_len = 0;

   for(const char *seek = cursor; seek-- != begin; ) {
      for(size_t i = 0; &cursor[i] != end; ++i) {
         if(seek[i] != cursor[i]) {
            break;
         }
      }

      if(i > match_len) {
         match = seek;
         match_len = i;
      }
   }

   if(match_len < MIN_MATCH_LEN) {
      emit_literal(*cursor++);
   } else {
      emit_match(cursor - match, match_len);
      cursor += match_len;
   }
}
}

Here I've ignored the issue of how the matches and literals are coded. There's quite a few options and whatever scheme you choose largely dictates your compression rate and decoder performance. A simple byte-oriented encoding might be to start each token with a run length byte, for which a positive value indicates a match whereas a negative value is a literal run. Then follow matches by a two-byte offset and literals by the actual literal data.
To achieve better compression than that most coders resort to bit-packing things such that nearer match offsets and shorter run lengths use fewer bits. Formats like ZIP and GZIP even use Huffman to coding on them.
You may think that a triple-loop like this would be unbearably slow but a naive search like this is actually what most native C64 packers use. As long as you put some limit on match lengths and match offsets and don't try to compress a megabyte of zeros anyway.

In the end even with a fair bit of bit-twiddling, some simple hashing to speed up the search process, and so-called optimal parsing to get slightly better matches my own encoder (the one I linked to above) only weighs in at 700 lines or so. Believe me, it's not exactly rocket science.

Bregalad wrote:
If someone could proof me wrong I'll be the happyest, as huffman is really annoying having not byte aligned data.
Most people on this forum probably know this old trick already but I figured I'd repeat it once more since it's immensely useful when dealing with bit-streams. The idea is to set up an 8-bit shift-register in memory from which we can fetch bits into carry with ASL/LSR instructions. The clever part is that if we shift in a one in the least-significant bit whenever the register is refilled we can detect when the register is empty simply by checking the zero flag after a shift, that is it'll only be zero if that one we forced in at the beginning has been shifted all the way out.
Code:
bitbuf = $00 ;initialized to $01

getbit:
asl bitbuf
bne +
jsr getbyte
sec
rol
sta bitbuf
+ rts

Typically you'd inline the shift itself and BEQ to a refill function to speed things up. You can even do a BCC immediately to test for a zero bit without bothering with the refill test (BEQ) since the last bit out will always be a one. Another neat trick is to store bit-pointers to constant data as bit-buffer states rather than as bit indices.
(Writing can be handled similarly of course, that is by rotating carry bits into a shift register initialized with the least-significant bit set and flushing the byte when carry is set afterwards, but I doubt whether writing is of much use on the NES.)

by tomaitheous on 2008-10-02 (#37988)

LZ77 isn't as efficient as LZSS. Having to encode uncompressed data with run lengths can really bulk up smaller runs of data. I like LZSS +1bit per element of uncompress data method better. I don't think I've actually used the original spec of LZ77.

Not so much for NES, but SNES, Genesis, and Turbo Duo and others that have WORD size tilemap elements really benefit from a variable elements size RLE system. One control code for BYTE RLE and another for WORD RLE.

by doynax on 2008-10-02 (#37990)

tomaitheous wrote:
LZ77 isn't as efficient as LZSS. Having to encode uncompressed data with run lengths can really bulk up smaller runs of data. I like LZSS +1bit per element of uncompress data method better. I don't think I've actually used the original spec of LZ77.
Whether run-lengths for literals makes sense or not would depend on precisely how you encode things. For something like ZIP/GZIP DEFLATE where the literal and match-lengths are combined into a single Huffman symbol it seems natural not to bother with run lengths.

In my project the run lengths are written using elias gamma coding, and the LZSS-style indicator bit is skipped after a literal run since it wouldn't make sense to have two consecutive literal sequences. The end result is that I break even or save space for anything except two-byte literals, and save quite a bit of space for longer sequences (not to mention speeding up the decoder.)

For the record the match offsets are encoded as a two-bit prefix indexing a table of possible offset widths, followed by the actual offset data of course. Also two-byte matches are special-cased to use a shorter offset table since the largest offsets lengths wouldn't actually save any space otherwise.

(By the way this encoding straight from ByteBoozer, a C64 packer. Though I've shuffled the bits around and tweaked things to allow me to write an optimized decoder.)

by tepples on 2008-10-02 (#37992)

doynax wrote:
In my project the run lengths are written using elias gamma coding, and the LZSS-style indicator bit is skipped after a literal run since it wouldn't make sense to have two consecutive literal sequences.

Classic LZSS (as seen in Allegro and GBA) does the same thing; it just encodes the lengths of literal runs in unary, a special case of Golomb coding

by doynax on 2008-10-02 (#37993)

tepples wrote:
doynax wrote:
In my project the run lengths are written using elias gamma coding, and the LZSS-style indicator bit is skipped after a literal run since it wouldn't make sense to have two consecutive literal sequences.

Classic LZSS (as seen in Allegro and GBA) does the same thing; it just encodes the lengths of literal runs in unary, a special case of Golomb coding ;-)
Wouldn't that be equivalent to simply writing a single bit to indicate whether a match or literal byte follows?

by tepples on 2008-10-02 (#37994)

doynax wrote:
tepples wrote:
doynax wrote:
In my project the run lengths are written using elias gamma coding, and the LZSS-style indicator bit is skipped after a literal run since it wouldn't make sense to have two consecutive literal sequences.

Classic LZSS (as seen in Allegro and GBA) does the same thing; it just encodes the lengths of literal runs in unary, a special case of Golomb coding
Wouldn't that be equivalent to simply writing a single bit to indicate whether a match or literal byte follows?

Yes. I just presented an alternate interpretation that shows exactly how your method differs from the most common LZSS implementation. Gamma and unary codes just imply different assumed distributions of literal run lengths. A universal code, such as Elias gamma, Fibonacci, or ternary, implies a power law, while a Golomb code such as unary implies an exponential distribution. Would it be worth it to try compressing some real data with LZSS and taking statistics on the distribution of literal run lengths?

by doynax on 2008-10-02 (#37995)

tepples wrote:
Would it be worth it to try compressing some real data with LZSS and taking statistics on the distribution of literal run lengths?
I tried this when trying out different encodings for my packer. I didn't keep the statistics but gamma coding was a clear win on the files I tested, not that it made much of a difference either way (perhaps a single percent or so.)

by doynax on 2008-10-02 (#38001)

So I decided to hack the independent literals back into the encoder (further proof that I've got way too much free time on my hands.) Anyway, it seems like a did a poor job of testing things originally.

It turns out that the literal runs expands every single file in the Calgary Corpus, except for 'obj1' on which it saves four bytes and 'pic' which I didn't bother to wait for. On the other hand it consistently saves space (typically between 0.5% and 2%) on my old 8-bit projects, not to mention the half dozen NES ROMs I checked.

So now I don't know what to think anymore. It would be awesome if I could strip out this feature as it complicates things quite a bit and bloats the decruncher. Still, that extra kilobyte or two might come in handy one day, and it's nice to be able to cope with incompressible data gracefully. It couldn't hurt to know how it affects the decoder's performance either.

http://doynax.googlepages.com/lz_run_test.zip

by Bregalad on 2008-10-03 (#38011)

Oh my all of that is so much a headache !!

After all you're right huffman can allow semi-random acess if you keep a pointer (with bit precision) you can continue read data. But it's still sequential acess because you need to read all the first bytes at least once in order to make your pointer.

I'd definitely look into the LZ series again, but they seem so much a headache ! If one use one more bit to represent if data is a literal or a sting and that 80% of the data will be literals (this is likely to be your case unless you intentionally repeat things a lot in the original), then I dout anything gets compressed. You'd better reserve a special value for a string. But then how do you know what the string is refering to ?

by Dwedit on 2008-10-03 (#38013)

Apack takes 7 bits to encode any single byte which has appeared up to 15 bytes ago (or a literal zero). Of course, you can optimize a compression system to reduce the number of decision bits before you take 4 bits of near history.

But I think that explicit dictionary compression is a good idea since you get random access that way.

by doynax on 2008-10-03 (#38014)

Bregalad wrote:
Oh my all of that is so much a headache !!
No one claimed compression was easy. Well, I might have at one time or another, but I don't think I ever truly meant it.

Quote:
After all you're right huffman can allow semi-random acess if you keep a pointer (with bit precision) you can continue read data. But it's still sequential acess because you need to read all the first bytes at least once in order to make your pointer.
Random access might have been the wrong word, stateless access might be a better description.
You'd save a set of "bookmarks" at regular intervals ahead of time, from which decompression can be resumed. Perhaps one per row as in the RLE schemes mentioned earlier. Note that you don't necessarily have to save these pointers in the ROM itself. It's entirely possible to scan through the whole map when loading a zone and save the proper pointers that way.

Quote:
I'd definitely look into the LZ series again, but they seem so much a headache ! If one use one more bit to represent if data is a literal or a sting and that 80% of the data will be literals (this is likely to be your case unless you intentionally repeat things a lot in the original), then I dout anything gets compressed. You'd better reserve a special value for a string. But then how do you know what the string is refering to ?
That depends on the data. Certain file types are simply not compressible with lossless dictionary coders, such as previously compressed data, sampled audio or photographic images. On the other hand the types of data commonly used in 8-bit projects (like machine code, pixelled graphics, synthesized audio or text) tends compress quite well.
For instance my "game" is currently reduced nearly by half (from 13,428 down to 6,977 bytes) of which 63% are matches (9798 match vs. 3628 literal bytes.) YMMV, so feel free to try out this LZ coder or another one on your own data before deciding anything.

I'm not quite clear on what you meant by "strings." A match, that is a backwards reference as opposed to literal data, is stored as a length together with an offset pointing backwards in the file. While literals are stored as a length together with the actual literal data. And before each match/literal there's a bit indicating which one to expect (technically only after matches actually.) If instead I switch to only allowing single literals instead of runs I still only lose a 70 bytes or so, and apparently many larger files save space on such a coding.