Sound Emulation, Resources, Tips, Etc?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Sound Emulation, Resources, Tips, Etc?
by on (#88367)
I'm curious about writing my own Sound Emulation for my NES emulator but it doesn't seem like there is a whole lot of help aimed at emulating that explains some of the basics, mainly just alot of technical documents that assume you know things about sound that you don't. So does anyone have any good documents, tutorials, advice, or any suggestions?

by on (#88372)
Go through Bores Introduction to DSP, then when you come back and read our APU docs, things should make more sense.

by on (#88377)
Certain terminology was not explained so well in the information you linked as in regards to the NES. I'm been talking to someone though to try to get a better idea as to what is going on. I haven't looked at a huge assortment of APU documents but the Brad Taylor and Blargg documents seemed quite technical but didn't really help with certain basic aspects I did not understand yet. Makes me wish I'd have taken on writing my own APU when Blargg was still around.

by on (#88378)
Start with a C program that plays a simple song with square waves, I usually use Twinkle Twinkle Little Star for this purpose.

You know the frequencies of the notes, because the A note after middle C is 440Hz.

Formula for frequency of any note: F = f0 * 2^((n-n0)/12)
where f0 is the reference frequency (440 hz)
n0 is your reference note number (note "A3", 3*12+9 = 45)
n is your note number (12 * octave + note number within octave), 0 = C, 1 = C#, 2 = D, 3 = D#, 4 = E..., 12 = C on next octave
Frequency of middle C (C3) is the frequency of note number 3*12 + 0
Which is ~261.625... Hz.

You have your sampling rate. (Let's make it 44100Hz)
From the sampling rate and frequency, comes the period measured in Samples. So the C3 note is ~168.561... samples for a full period.
We're making a square wave, so half of the time is max level, and half of the time is min level.

Anyway, let's output 1/60s of audio, middle C note.
1/60s of audio is 735 samples long. With our Middle C note, that's ~4.36 periods long.

Make 84 samples of VolHigh, a fractional sample, then 84 samples of VolLow, and another fractional sample. Repeat 4.36 times.

more to come...

by on (#88380)
the audio stuff confused me at first, too. Dwedit's post is a good simple example of generating a digital square wave based on a frequency. when emulating the NES audio unit, determining what each sample byte should be is actually just derived from the duty cycle loop and timer period, so you don't actually need to know the frequency of any notes being played.

basically, on every square wave clock, if the channel is enabled then it's period value is decremented. the current position in the duty cycle loop is stepped if the square channel's period has reached zero. if that happens, then the period value is also reset so the countdown begins again. that is highly oversimplifying it, other variables are involved, but that is the gist of it.

the square channels have four possible duty cycles that could be used. this is what my duty cycle array looks like:

Code:
uint8_t square_duty[4][8] = {
   { 0, 1, 0, 0, 0, 0, 0, 0 },
   { 0, 1, 1, 0, 0, 0, 0, 0 },
   { 0, 1, 1, 1, 1, 0, 0, 0 },
   { 1, 0, 0, 1, 1, 1, 1, 1 }
};


if the current array value for a square channel is 1, then the channel's sample output is equal to that channel's current envelope value. otherwise, silence.

i'm not good at explaining this, and maybe i got some details wrong. i hope it makes a little sense. i can provide more code if you want. my APU is not entirely complete, it doesn't handle the sweeping yet but otherwise sounds pretty good, and i think it's easy to follow when reading.
Basics
by on (#88384)
Here is my attempt at conveying the basics, beginning from fundamentals, in a short but organized manner. There are other tutorials, but I wanted to write one just for the exercise of it :-)

In order to produce sound, you have to generate "PCM sound".
"PCM sound" is a type of signal.
A signal means anything that changes over time.
In case of sound, the signal is the elevation of the diaphragram of the loudspeaker (which reproduces air pressure waves by pushing and pulling the air in front of it).

Sampling rate is how often it is measured (and emitted).

For example, a PCM signal at 8000 Hz sampling rate is a numeric value that is emitted 8000 times in a second.
If you have an array of 40000 integers, and you know the sampling rate is 8000, you have 5 seconds of signal. (5*8000=40000). If the sampling rate is 22050, you have there about 1.8 seconds of signal.

Signal has two fundamental properties: Frequency and amplitude.
Amplitude is how large the differences are between values. Frequency is how fast the value changes from small to large and back.

For example, a PCM signal, sampled at 22050 Hz rate, that happens to have the amplitude of 20000 and a frequency of 2205 hertz, could look like this:
   -10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
   -10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
   -10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
   (repeated for thousands of times).
Within 22050 samples (which represents 1.0 seconds of audio, because of the sampling rate of 22050), it oscillates 2205 times between -10000 and 10000, hence an amplitude of 20000 and frequency of 2205 Hz. The wave length is 10 samples (sampling rate divided by frequency).
If the amplitude was smaller, it would be quieter (the diaphragram moves very little); if it were larger, it would be louder (the diaphragram moves a lot).
If the frequency was lower, the pitch would be lower (the diaphragram moves slowly). The intervals between the extremes (wave length) would be greater.
If the frequency was higher, the pitch would be higher (the diaphragram moves rapidly). The intervals between the extremes (wave length) would be shorter.

When the signal samples are plotted in a graph, it forms a shape. The shape is called a wave. Different waves are called with different names.
There is the square wave, which goes from maximum value to minimum value and back in an abrupt manner, with no intermediates. For example, 100 100 100 100 100 20 20 20 20 20 100 100 100 100 100 20 20 20 20 20.
There is the triangle wave, which goes from maximum to minimum, and back, in a linear fashion. For example, 100 90 80 70 60 50 40 30 20 30 40 50 60 70 80 90 100.
There is the sine wave, which is a smooth wave that is generated with the mathematical sin() function.
Unlimited number of different wavetypes exist and can be devised.

Here is example C code that generates ten seconds of 8000 hertz PCM signal, consisting of a 440 hertz sinewave that has the amplitude of 60:
Code:
for(int pos=0; pos<80000; pos++)   putchar( 60*sin(440*pos*2*M_PI/8000) );


To mix different signals together, you usually simply add them. For example, this code outputs a 440 hertz sinewave and a 300 hertz sinewave together:
Code:
for(int pos=0; pos<80000; pos++)   putchar( 60*sin(440*pos*2*M_PI/8000)  +   60*sin(300*pos*2*M_PI/8000));


This covers the basics; the rest is extrapolation. :-)

by on (#88387)
Thanks for that post, Bisqwit. I'm not an emulator developer, but I have been in need of a basic intro to audio and that was it. Now I feel like I have a good starting point for whenever I do decide to do some audio programming. (I have been avoiding it in every program I've ever written.)
Re: Sound Emulation, Resources, Tips, Etc?
by on (#88390)
MottZilla wrote:
(...)but it doesn't seem like there is a whole lot of help aimed at emulating that explains some of the basics, mainly just alot of technical documents that assume you know things about sound that you don't.


There are three units:
1. the emulated APU generating sound samples,
2. the audio output, usually SDL, DirectX or... Allegro, and
3. the resample unit.

Generating samples is pretty easy. I use a downcounter set with the frequency value. Such value represents the number of CPU cycles for the next sample.
Code:
chan->freq--;
if(0 == chan->freq)
{
   //do stuff
   //...
   chan->output = data; //sound sample
   chan->freq = chan->freq_cache; //value written to freq registers
}


Well, you must find an algorithm to resample the generated NES sound. The most simple that I've found & use is adding the samples and divide by the number of updates.

I'll write more later.

by on (#88399)
Thank you everyone who's contributed so far and in the future. And please add more if you can. I'll be looking this over as I work on learning this.

by on (#88409)
The way I picture it in my head, is that each channel has a certain "Update" logic, that is executed every x input cycles. Where x is the frequency value. For triangle, the update logic is that another step is taken through the pyramid:

Code:
timer += cycles since last update

if (timer >= frequency value)
{
    timer -= frequency value; // do NOT set to 0, otherwise you will lose cycles
    step = (step + 1) % 32;
}


Channels also have a configurable way to disable output (register 4015h), as well as internal ways to disable output. For instance, the square channel's duty cycles can disable output while in a low cycle, or the LFSR in the noise channel can disable output when bit 0 is set.

Code:
if (channel can output)
{
    return amplitude;
}
else
{
    return 0;
}


One thing to note, is that not all channels output 0 when silenced. Triangle for instance always outputs it's current amplitude, but when it stops counting, there is no more sound wave being generated and the channel flat lines.

Each channel is made up of a few different 'primitive' components (Envelope, Sweep, Duty, etc). For my purposes, I found it easier to code the individual components, and reference them as objects in my channel classes. This keeps the logic for those components static between all channels you make. These components also have an update logic, but their logic is invoked depending on the APU's "frame" sequencer.

I hope that wasn't convoluted or confusing, and I also hope it helped. Audio was the biggest problem for me, and it sounds like many others had issues with it as well.

by on (#88420)
beannaich wrote:
The way I picture it in my head, is that each channel has a certain "Update" logic, that is executed every x input cycles.


This is part of the APU, the "quarter" and "half" frames.

by on (#88436)
Zepper wrote:
This is part of the APU, the "quarter" and "half" frames.


Incorrect, the "quarter" and "half" frames, are where the components are updated (Sweep, Envelope, Linear Counter, etc). The channel's output is constantly updating. With every CPU cycle

by on (#88476)
beannaich wrote:
Zepper wrote:
This is part of the APU, the "quarter" and "half" frames.


Quote:
Incorrect, the "quarter" and "half" frames, are where the components are updated (Sweep, Envelope, Linear Counter, etc).


Absolutely, but don't call me "incorrect". That's EXACTLY what I mean... and what I understood from you: The way I picture it in my head, is that each channel has a certain "Update" logic, that is executed every x input cycles.

Quote:
The channel's output is constantly updating. With every CPU cycle


Yes, but you need to resample it to the PC sample rate.

by on (#88480)
You just misunderstood me, is all Zepper. :)

By channel update logic, I meant what the channel does to actually render it's waveforms (Taking steps through a duty cycle, shifting the noise register, etc). And by component update logic, I mean what each individual part of a channel does on the APU's "half" and "quarter" frame counter clocks.

It's a very important distinction to make, and I called you "incorrect" as to not confuse MottZilla and other people in the future. I meant no harm by it :)

And yes, you have to re-sample for whatever audio rendering API you're using, and most people do so using the 44.1kHz sample rate. Those calculations weren't included in my first post because it comes later, and MottZilla wanted to know about emulation, not so much playback at present.

But, now that we're on the topic, the amount of cycles in between samples is simply:

Code:
sample delay = cpu frequency / sample frequency


In the case of the NES, with 44.1kHz sample rate:

Code:
sample delay = 1789772.72 / 44100


and the following logic (simplified) is executed:

Code:
sample timer += cycles since last update;

if (sample timer >= sample delay)
{
    sample timer -= sample delay;
    render sample();
}

by on (#88747)
Quote:
Well, you must find an algorithm to resample the generated NES sound. The most simple that I've found & use is adding the samples and divide by the number of updates.


Hello i am new to emulation..

I dont undesrstand that one. E.g. trianglaoutput ranges from 0-15 and square too? how is that wdoes that with the divison work exatly...

by on (#88751)
Xampf wrote:
Quote:
Well, you must find an algorithm to resample the generated NES sound. The most simple that I've found & use is adding the samples and divide by the number of updates.


Hello i am new to emulation..

I dont undesrstand that one. E.g. trianglaoutput ranges from 0-15 and square too? how is that wdoes that with the divison work exatly...


Resampling is referring to the frequency domain. In this case, going from an audio signal with a sample rate of 1,789,772 samples-per-second to an audio signal with a sample rate of 44,100 samples-per-second. That's where you need to add together the 40 or 41 samples you'll get from the 1,789,772 samples-per-second signal and divide that value by that number-of-samples to get the 'average' sample value for the resampled 44,100 samples-per-second signal.

The ranges you're referring to [0-15 for the square, triangle, and noise DACs, 0-127 for the DMC DAC] are the energy that each channel contributes to the overal signal strength at any given sample in the original signal. The square, triangle, and noise channels can contribute 0 [no energy] to 15 [full energy] to the signal strength. No dividing here.

by on (#88753)
Ok, thanks for your quick reply, i think i understood the part with the resampling.

But another questions arises:
Do you mean by energy the amplitude of the signal?

Ehm i started off with trying to produce a trianglewave at a certain frequency.
I managed to get the signal changing from 0-$f and back at the probably correct frequency. I used the formula from http://nesdev.com/apu_ref.txt at the DAC output section. I directly take the computed amplitude from this formla ( and scale it to get it hearable). Now i try to playback that trianglwave at 200hz and it sounds like ehm.. ah machinegun or so compared to that http://en.wikipedia.org/wiki/Triangle_wave

the sampling part looks like this, i think i got that correct, currently im taking directly the value the triangle channel currently holds ( so no averaging which you described yet)
Code:
      triangleChannel.updateProgrammableTimer(cycles, 253);

      sampleTimer += cycles;
      if (sampleTimer >= sampleDelay) {//sample Delay is around 40
         sampleTimer -= sampleDelay;
         sample();
      }

eh here is the sample() method
Code:
      float tndOut = 159.79f / ((1f / (triangleChannel.out) / 8227f) + 100f) + 95.88f / 100; //formula from apu_ref.txt, others channels are simply 0
      int ampl = (int) (tndOut * 20000);
      sampleBuffer[sampleOffset++] = (byte) (ampl & 0xff);//little endian
      sampleBuffer[sampleOffset++] = (byte) (ampl >>> 8);


then i simply write the buffers into the sourceDataLine (signed, 16bit, mono,44.1khz)

Edit 2: i managed to find out the problem...
figured out that signed means that the "lower half" of the wave has to be negative. so i subtracted -0.5 from tndOut and this did it.