Audio compression techniques

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Audio compression techniques
by on (#872)
Several NES games play back audio waveforms through timed writes to $4011. You can identify these in an NES debugger by catching dozens of writes to $4011 in one frame, or you can identify them without a debugger by finding missing samples in the NSF, incompatibility with PocketNES, etc.

Many of these games, such as Smash TV, Skate Or Die 2, and one of the Wheel of Fortune games, use raw 4-bit samples at a rather low sampling rate. This sounds bad but sometimes better than a DPCM sample at 4x the rate, and it allows playback of longer samples on mappers that hardwire $C000-$FFFF, albeit with pausing in the action.

Hi Tech's Sesame Street games with sampled voices, Big Bird's Hide and Speak and Sesame Street Countdown, use a different audio compression engine. It runs at 8 kHz and roughly 48 kbps (average 6 bits per sample). I've traced a bit of it, not having the patience to trace through the whole thing, and it seems to involve variable length coding of sample differences using 8 different word formats. Based on the bitrate, I'd almost guess that this is actually a lossless codec.

I'm in the middle of developing a codec that uses sample-to-sample differences but on a logarithmic scale with 15 distinct differences (4-bit), so that it doesn't slope-overload but doesn't get too granular either. It sounds much better than raw 4-bit audio but somewhat worse than the Sesame Street games.

Has anybody else either developed a $4011 codec or disassembled a commercial NES game's codec?