Here is my attempt at conveying the basics, beginning from fundamentals, in a short but organized manner. There are other tutorials, but I wanted to write one just for the exercise of it :-)
In order to produce sound, you have to generate "PCM sound".
"PCM sound" is a type of signal.
A signal means anything that changes over time.
In case of sound, the signal is the elevation of the diaphragram of the loudspeaker (which reproduces air pressure waves by pushing and pulling the air in front of it).
Sampling rate is how often it is measured (and emitted).
For example, a PCM signal at 8000 Hz sampling rate is a numeric value that is emitted 8000 times in a second.
If you have an array of 40000 integers, and you know the sampling rate is 8000, you have 5 seconds of signal. (5*8000=40000). If the sampling rate is 22050, you have there about 1.8 seconds of signal.
Signal has two fundamental properties: Frequency and amplitude.
Amplitude is how large the differences are between values. Frequency is how fast the value changes from small to large and back.
For example, a PCM signal, sampled at 22050 Hz rate, that happens to have the amplitude of 20000 and a frequency of 2205 hertz, could look like this:
-10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
-10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
-10000 -6000 -2000 4000 7000 10000 6000 1000 -4000 -7000
(repeated for thousands of times).
Within 22050 samples (which represents 1.0 seconds of audio, because of the sampling rate of 22050), it oscillates 2205 times between -10000 and 10000, hence an amplitude of 20000 and frequency of 2205 Hz. The wave length is 10 samples (sampling rate divided by frequency).
If the amplitude was smaller, it would be quieter (the diaphragram moves very little); if it were larger, it would be louder (the diaphragram moves a lot).
If the frequency was lower, the pitch would be lower (the diaphragram moves slowly). The intervals between the extremes (wave length) would be greater.
If the frequency was higher, the pitch would be higher (the diaphragram moves rapidly). The intervals between the extremes (wave length) would be shorter.
When the signal samples are plotted in a graph, it forms a shape. The shape is called a wave. Different waves are called with different names.
There is the square wave, which goes from maximum value to minimum value and back in an abrupt manner, with no intermediates. For example, 100 100 100 100 100 20 20 20 20 20 100 100 100 100 100 20 20 20 20 20.
There is the triangle wave, which goes from maximum to minimum, and back, in a linear fashion. For example, 100 90 80 70 60 50 40 30 20 30 40 50 60 70 80 90 100.
There is the sine wave, which is a smooth wave that is generated with the mathematical sin() function.
Unlimited number of different wavetypes exist and can be devised.
Here is example C code that generates ten seconds of 8000 hertz PCM signal, consisting of a 440 hertz sinewave that has the amplitude of 60:
Code:
for(int pos=0; pos<80000; pos++) putchar( 60*sin(440*pos*2*M_PI/8000) );
To mix different signals together, you usually simply add them. For example, this code outputs a 440 hertz sinewave and a 300 hertz sinewave together:
Code:
for(int pos=0; pos<80000; pos++) putchar( 60*sin(440*pos*2*M_PI/8000) + 60*sin(300*pos*2*M_PI/8000));
This covers the basics; the rest is extrapolation. :-)