ADPCM codec? - NESdev BBS

by tokumaru on 2007-11-17 (#28397)

In this post, tepples wrote:
Eventually, I said screw it, I'm just writing my own ADPCM playback engine.

That would rock! Any progress?

by tepples on 2007-11-17 (#28398)

It depends on motivation. The bitrate would be the same as for DPCM (about 32 kbit/s), just possibly better quality. If 3/4 of a 512 KB ROM were to be used, that would be about 1.5 minutes of audio. What kind of game would benefit from that much recorded speech?

Splitting

by tokumaru on 2007-11-17 (#28399)

tepples wrote:
What kind of game would benefit from that much recorded speech?

Yeah, I didn't expect it to be smaller than DPCM. Hum... I can't think of anything very amusing with that little speech, but maybe we should look into speech synthesizing... I've seen programs reproducing speech from just a few different audio clips (I really don't know how that works, but that's why I said "look into it"). Maybe it would be possible to have a bunch of small clips that when correctly combined would produce intelligible sentences. Sure, it'd sound a bit robotic, but it would be your NES talking with a robotic voice, which fits! Maybe some sort of artificial intelligence program? Sure sounds like novelty to me.

EDIT: Seems like we could make the NES speak spanish: http://en.wikipedia.org/wiki/Diphone

by Memblers on 2007-11-17 (#28400)

tokumaru wrote:
I've seen programs reproducing speech from just a few different audio clips (I really don't know how that works, but that's why I said "look into it"). Maybe it would be possible to have a bunch of small clips that when correctly combined would produce intelligible sentences.

I made a speech synth that works like that, I don't think I'd released it but you can hear it on the last track of my Chipography NSF. Using DPCM (and not at the highest sample rate, I think it was $C, IIRC), the samples barely fit in 16kB and I had to trim a little bit off of them.

So the quality would benefit quite a bit with ADPCM, but it could be a bit more cumbersome. My speech synth is entirely IRQ-driven, so you can do anything you want while it's talking.

by Bregalad on 2007-11-18 (#28414)

I guess the simler language to synthethise would be japanese since they have 100 diaphones or so. French and English would be almost impossible to do, tough.

by atari2600a on 2007-11-18 (#28418)

Memblers wrote:
tokumaru wrote:
I've seen programs reproducing speech from just a few different audio clips (I really don't know how that works, but that's why I said "look into it"). Maybe it would be possible to have a bunch of small clips that when correctly combined would produce intelligible sentences.

I made a speech synth that works like that, I don't think I'd released it but you can hear it on the last track of my Chipography NSF. Using DPCM (and not at the highest sample rate, I think it was $C, IIRC), the samples barely fit in 16kB and I had to trim a little bit off of them.

So the quality would benefit quite a bit with ADPCM, but it could be a bit more cumbersome. My speech synth is entirely IRQ-driven, so you can do anything you want while it's talking.

Would you say an NES speech synth could be produced w/ a more analogue approach to it w/ a square or triangle, like Berzerk? Maybe when I get around w/ messing w/ the sound registers I'll try that...

by NotTheCommonDose on 2007-11-18 (#28424)

This is something I've actually been waiting for.

by tepples on 2007-11-18 (#28425)

NotTheCommonDose wrote:
This is something I've actually been waiting for.

The codec, or a speech synthesizer using the codec?

by Celius on 2007-11-18 (#28428)

The NES could speak akward japanese if you had sound samples of the following letters being pronounced:

a, i, u, e, o, k, g, s, sh, z, t, ch, ts, n, h, p, b, f, m, r, y

If you could have a sample that was about .06 seconds long of each of these, you could string them together to make japanese words.

You could maybe even skip the "y" sound, and just use "i", because "ya" sounds pretty much the same as "ia".

by tepples on 2007-11-18 (#28432)

Celius wrote:
The NES could speak akward japanese if you had sound samples of the following letters being pronounced:

a, i, u, e, o, k, g, s, sh, z, t, ch, ts, n, h, p, b, f, m, r, y

If you could have a sample that was about .06 seconds long of each of these, you could string them together to make japanese words.

It would sound like the "animalese" from Animal Crossing.

by Celius on 2007-11-18 (#28436)

Haha, yeah it would. But the animalese is just really fast. Maybe .06 seconds is too fast, but I've worked with a cartoon where every frame is shown for .07 seconds at minimum. The mouth movements were really hard to get right because that just was too slow in some places. However, these sounds are .06 seconds, and two of them make a syllable. So most syllables are .12 seconds long at minimum. This may be a moderate speed for japanese, but it might be too slow. I'm sure it would be hard to make it sound natural. Every sample would have to be pretty monotonous to make it not sound like animalese.

by NotTheCommonDose on 2007-11-18 (#28437)

The codec.

by Bregalad on 2007-11-18 (#28438)

You would have to play some sample faster than other (such as having the end of a senstance play lower) to sound slightly more natural. (or higher if the senstance is a question).

by NotTheCommonDose on 2007-11-18 (#28445)

but the program will do that right?

by tepples on 2007-11-18 (#28446)

It's difficult to make cycle-timed code that will smoothly change the pitch of a sample.

by NotTheCommonDose on 2007-11-18 (#28447)

If your asking me to do that, I can't even subtract or add numbers yet.

by tepples on 2007-11-18 (#28458)

I haven't yet made an NES-side decoder, but I've simulated the compression and decompression process in C. Have a listen:

Audio file
"What should a Nintendo sound like?"
1. original wave; 2. through my 4-bit codec; 3. through dpcm at 4x oversampling (close to rate $F)

by NotTheCommonDose on 2007-11-18 (#28459)

probably 2. Was that you saying that?

by tepples on 2007-11-18 (#28461)

NotTheCommonDose wrote:
Was that you saying that?

Yes.

Does this rom work at all on NES hardware?

by strangenesfreak on 2007-11-18 (#28463)

tepples wrote:
Yes.

Does this rom work at all on NES hardware?

I dunno about actual NES hardware (I cannot test it myself yet), but it does work on the latest versions of Nestopia and Nintendulator. Is that enough accuracy, or do you really need the NES itself? By the way, your demo's pretty cool - it has a pretty good sound quality.

by tepples on 2007-11-18 (#28464)

strangenesfreak wrote:
I dunno about actual NES hardware (I cannot test it myself yet), but it does work on the latest versions of Nestopia and Nintendulator. Is that enough accuracy, or do you really need the NES itself?

I too tried it on Nestopia and Nintendulator. But I seem to remember a long time ago when someone said that if nobody runs it on an NES, I might as well develop for DirectX instead.

by dXtr on 2007-11-18 (#28466)

Works fine* on my PAL NES using a PowerPak.

* Different pitch or what it is called = sounds like Nintendulator in PAL mode

by tepples on 2007-11-18 (#28470)

Thanks. That's good enough for now. But what else can I do with this engine?

by Bregalad on 2007-12-18 (#29722)

Hey ! I'm now playing with the PCM channel (manual $4011 mode) and I don't feel like starting a new thread for this.
Since using plain 8-bit PCM samples is pure waste (especially since the LSB is ignored) and because the ROM capacity of the NES is small, I did it in 4 bit DPCM (the wav can decrease by down to 8 or increase up to 7 each step). However, I got terrible results. A lot of time this slew rate is just too much for the sampling rate I use, and this sound terrible, almost as bad as 1-bit hardware DPCM.
I'd like to implement some kind of other compression such as Adaptative DPCM or something like this. I don't know much about ADPCM, but would 2 bits for the number of shits and 6 bits for 2 samples (3 bit per samples) all packed in a byte do the trick ? Would I end up with better quality than before ? I'm not sure how to encode for this format efficiently either.

by tepples on 2007-12-18 (#29745)

Bregalad wrote:
Since using plain 8-bit PCM samples is pure waste (especially since the LSB is ignored) and because the ROM capacity of the NES is small, I did it in 4 bit DPCM (the wav can decrease by down to 8 or increase up to 7 each step). However, I got terrible results. A lot of time this slew rate is just too much for the sampling rate I use, and this sound terrible, almost as bad as 1-bit hardware DPCM.

To avoid slope overload on "loud" portions, use bigger steps in loud portions. You can do this either by varying the step scale (like IMA ADPCM and SNES ADPCM do) or just by using nonlinear stepsizes:
{-64, -49, -36, -25, -16, -9, -4, -1, 0, 1, 4, 9, 16, 25, 36, 49}

Quote:
I'd like to implement some kind of other compression such as Adaptative DPCM or something like this. I don't know much about ADPCM, but would 2 bits for the number of shits and 6 bits for 2 samples (3 bit per samples) all packed in a byte do the trick ?

You might want to spread the number of shifts over a longer block of samples. SNES ADPCM encodes each block of 16 samples in 9 bytes: 4 bits for step scale (your "number of shits"), 2 bits for prediction method (in effect, a choice between literal and delta interpretation of the sample values), 2 bits for other shit related to looping, followed by 4*16 bits for sample values. IMA ADPCM (used on Nintendo DS and several other systems) uses a running count based on the delta values to predict the step scale.

But my question remains: What kind of game design would use a lot of audio that pauses the game?

by Bregalad on 2007-12-19 (#29751)

I'll try to get the parabolic step size thing in, I'll tell if it gives better results. That does seem like a good idea, however I don't know if it will sound a lot better or just slightly better.

Quote:

You might want to spread the number of shifts over a longer block of samples. SNES ADPCM encodes each block of 16 samples in 9 bytes: 4 bits for step scale (your "number of shits"), 2 bits for prediction method (in effect, a choice between literal and delta interpretation of the sample values), 2 bits for other shit related to looping, followed by 4*16 bits for sample values.

Yeah, I know how SNES ADPCM works, it's fairly good however not all bits are used and there is those different filters that require advanced maths that definitely cannot be done on the NES (in real time). Buffering the samples is out of question due to low RAM (even with SRAM). And I'd like to avoid to have a whole byte as a "header" of a block, as it's hard to make full good use to the whole 8 bits of the header.

Quote:
IMA ADPCM (used on Nintendo DS and several other systems) uses a running count based on the delta values to predict the step scale.

Sounds very interesting. Could the NES perform a such process in real-time ? If so where can I have more details about that thing ?

Quote:
But my question remains: What kind of game design would use a lot of audio that pauses the game?

RPG, tactical-RPG or anything that doesn't require the player to interact with the game in real time.

EDIT : Parabolic DPCM seems to do much better results ! Thank you tepples ! While not sounding absolutely perfect, my stuff now sounds decent. I wonder if there is any way of making sounding even better, but I doubt.

by Bananmos on 2007-12-25 (#29836)

Tepples:
Nice work. How about trying a 2-bit ADPCM codec as well? Even if most 2-bit codecs I've heard have a noticable degrade in sound quality, it might still be acceptable if you need lots of speech samples. =)

Then again, for speech-specific compression, there are better codecs to use. But most of them are very CPU-demanding IIRC.

by mic_ on 2007-12-28 (#29869)

Quote:
Then again, for speech-specific compression, there are better codecs to use. But most of them are very CPU-demanding IIRC.

Yeah, try to do an AMR decoder. It might be doable if the NES' CPU was running at, say, 30-40 MHz..

Quote:
Sounds very interesting. Could the NES perform a such process in real-time ? If so where can I have more details about that thing ?

http://nocash.emubase.de/gbatek.htm#dssound
http://wiki.multimedia.cx/index.php?title=IMA_ADPCM

Practical application.
by B00daW on 2008-01-03 (#30034)

As I am an infamously useless troll of sorts, I will just make a request for the betterment of the sk3n3 and get the codemonkeys to do the hard work.

1.) jsr and thefox, please hack up a way to apply this into the NES/Fami tracking experience in FamiTracker, NerdTracker, and PornoTracker plz plz, Scandi-niggahs.

2.) Please rewrite the NSF specs, kev so that we may be lovins some good ADPCM vocodings and/or sexamples.

+,

Tidings of lub, j'all.

--

teh b00, crusader for j00z.

Re: Practical application.
by NotTheCommonDose on 2008-01-03 (#30035)

B00daW wrote:
As I am an infamously useless troll of sorts, I will just make a request for the betterment of the sk3n3 and get the codemonkeys to do the hard work.

1.) jsr and thefox, please hack up a way to apply this into the NES/Fami tracking experience in FamiTracker, NerdTracker, and PornoTracker plz plz, Scandi-niggahs.

2.) Please rewrite the NSF specs, kev so that we may be lovins some good ADPCM vocodings and/or sexamples.

+,

Tidings of lub, j'all.

--

teh b00, crusader for j00z.

Are drunk or something?

by B00daW on 2008-01-03 (#30036)

... o_O ?

Not at the moment, sir. No.

By what misconstrued prejudice has begotten such a calumnious allegation?

...

(Maybe later... Maybe later...)

--

Anyway, to continue the utmost seriousness of this thread, when will the NSF spec support ADPCM; and when can we track with its awesome jihadity?