Questions regarding SNES sound system

2008-02-13

Hi everyone,

I'm a MD programmer, and I've been working for quite some time on a sound engine for MD. While I've had lots of headache (doing insane optimizations), MD has quite some power in its sound part (its just it wasn't utilized by most games). Z80 is a pain sometimes.
Here's a demo of my work (sounds best on real HW, but Kega Fusion is good enough, other emus certainly aren't) http://www.hot.ee/tmeeco3/TMFPLAY2.RAR
(BTW, my engine is used in "Pier Solar and the great Architects", the most kickass MD RPG to be released).

I'd like to know what SNES can do, what are the main limitations, what causes the most amount of trouble ? How easy would be to make a sound engine and all related tools from scratch ?

-Tiido

2008-02-13

The biggest difference is that you use instrument samples instead of FM parameters, so you can't as easily adjust the timbre in real-time. The SNES has a separate sound processor with 64K of private RAM. All communication with the main SNES processor is via 4 8-bit I/O ports; RAM is NOT shared. It's similar to the 6502, but more flexible, and a lot more pleasant than the Z-80 in my opinion. That controls a DSP which can play 8 independent voices of a looped sample at an adjustable rate, with independent left/right panning. The samples are stored in a compressed delta format with an adjustable low-pass filter to remove the high-frequency noise introduced by the compression. Notes get a configurable volume envelope with lots of adjustments. There's also an adjustable-delay echo with low-pass FIR filter and adjustable feedback.

2008-02-13

Quote:

The samples are stored in a compressed delta format with an adjustable low-pass filter to remove the high-frequency noise introduced by the compression.

I'm not sure what you're talking about that filter here. The only configurable filter I know in the SPC700 is the echo-filter, which isn't compulsary a low-pass, it's a FIR filter you you can make it both lowpass or highpass if you like.
The samples are compressed in blocks of 16 samples, there is 4 modes possible : The first is "uncompressed", you directly get 4-bit samples. That is the most pratical one, but without a doubt the worst sounding one, as all samples are 4-bit (altrough the main step can be chosen). The second is delta compressed, and the last 2 are kind of delta of delta compressed. That's not quite it, but check any SPC doccumentation for more details.

2008-02-13

The sound processor can't get data form ROM area, right ? All must be in the tiny 64KB RAM ? So for example, playing a sound stream is quite some pain (if not impossible) ? I can't really imagine fitting much into 64KB, even when the samples are ADPCM, samples are only 50% smaller... my demo uses almost 400KB of samples, ADPCM them, its 200KB, remove different pitched samples and leave in 1 (i.e toms, few cymbals), its maybe 100KB then... I would need to do lots of cutting, especially when I need sound effects too. Also there's room needed for sound engine itself, and music data...

How fast is the sound processor (thinking of software waveforms) ?

2008-02-13

The SNES compression is not 50% of the original size, but in fact 9/32 of the original size, that is about 28% of the original size. Your samples will fit 56 KB of the memory then.
It's possible to update samples in real time while they are playing, Tales of Phantasia does this. This isn't really easy I guess, however. I guess the SPU runs quite fast.

2008-02-13

K.... Does the SPU feel like a 8 bit CPU or like 16 bit CPU ? If it feels 8bit like Z80 (which can't really do 16 or 32bit aritcmetics), software waveform will be rather pain to do...

Well, only big limitation is the fact that SPU can't look into ROM itself... I can't do things which I can do on MD on SNES then... damn

I'll cross out SNES form my "going to dev for" list.

2008-02-13

Quote:

The samples are compressed in blocks of 16 samples, there is 4 modes possible : The first is "uncompressed", you directly get 4-bit samples. That is the most pratical one, but without a doubt the worst sounding one, as all samples are 4-bit (altrough the main step can be chosen). The second is delta compressed, and the last 2 are kind of delta of delta compressed. That's not quite it, but check any SPC doccumentation for more details.

The last three you describe apply a low-pass IIR filter to get rid of the high frequency noise introduced by the compression. Look more closely at the docs and you'll see that this is the case. And this is why samples on the SNES can be much smaller, because you can lower the sample rate more without introducing stair steps (these are what the low-pass filter removes). And you can lower the playback rate as well, since the gaussian interpolation will also eliminate stair-steps.

2008-02-13

Bregalad wrote:

The SNES compression is not 50% of the original size, but in fact 9/32 of the original size, that is about 28% of the original size. Your samples will fit 56 KB of the memory then.

More like 56 percent. A lot of games for MD and GBA use 8-bit samples.

2008-02-13

The SPC outputs 16-bits samples, even if those could be encoded 8-bit samples.

Quote:

The last three you describe apply a low-pass IIR filter to get rid of the high frequency noise introduced by the compression. Look more closely at the docs and you'll see that this is the case. And this is why samples on the SNES can be much smaller, because you can lower the sample rate more without introducing stair steps (these are what the low-pass filter removes). And you can lower the playback rate as well, since the gaussian interpolation will also eliminate stair-steps.

Yeah, gaussian interpolation is done while resampling, not while decoding the sample. I think what you say is a filter is the weird formula to decode the samples, which is kind of fun to play with. You can really make a lot of different samples with really short string of 3 or 4 blocks looping over themselves by arranging them in some way. Few other SPUs can do this.

2008-02-13

BRR decoding:
Read 4-bit signed nibble.
Shift left by scaling in header.
Apply "weird formula to decode the samples".

That "weird formula to decode the samples" is an IIR filter. The filter combines the current sample with the previous two in four different ways. Below, input is the current sample, prev [-1] is the previous, and prev [-2] is the one before that:

0: out = input
1: out = input + prev [-1] * 0.9375
2: out = input + prev [-1] * 1.90625 - prev [-2] * 0.9375
3: out = input + prev [-1] * 1.796875 - prev [-2] * 0.8125

Looks a lot like an IIR filter, doesn't it? That's because it is.

2008-02-14

How clear the samples (or overall) will be, how big is the loss in high freq areas (I remember the sound of SNES rather "muffled") ? For example, FM in MD is sampled at 52KHz, and you can play digital audio at that rate too (and its nice and clear, but only with nice code since its all software, and my code is rather nice). BTW, I'm doing 2 digital channels in my engine and has anyone cared to listen my MD ROM ?
And what's the highest rate the samples can be played on SNES ?

2008-02-14

The Super NES outputs at 32 kHz.

2008-02-14

I've heard it OUTPUTs something near 32KHz, but I'd like to know, what it can do internally, i.e if you give it a 44KHz sample, will it be able to play it in 1 second ?

2008-02-14

Samples can be resampled up to 4 times their original frequency by setting the freq to $3fff, that means you can play a sample 2 octave higher than it's 32KHz frequency, mixed at 32kHz before output.

2008-02-15

I don't give a damn about resampling ATM... so its just 32KHz... I could get little over 100KHz from MD (doesn't work stably on real HW).

2008-02-15

Right, a pair of 16-bit samples is sent to the DAC at an approximate rate of 32000 per second, so there's no way to vary the waveform by more than that. The main impact is limiting frequencies to 16 kHz. Is there a noticeable difference if your music is resampled to 32 kHz?

There is a way to feed a 16-bit stereo sample stream to the DAC, without having to go through the BRR compression, in case you want to try generating some of it in real-time (without losing the ability to also play the 8 voices). You could generate the audio on the main processor (65816) and stream it to the SPC, though this would mean your music wouldn't play on SPC music players (SNSF players would support it).

What are your main goals? A music engine for use in SNES games, or something standalone? And overall, are you wanting to try a different sound system than the Mega Drive? The SNES is quite different.

2008-02-15

its just that I am curious and want to know few more things about SNES sound system. SNES lacks nice action games and for a while I had an idea of making an attempt to make one (like I'm doing for MD though it has plenty of action games). SNES seems to have little CPU horse power (and that's the reason I wouldn't use it in sound generation process), and as much as I've heard, its a pretty pain system to program for... I haven't done much research on SNES...

Thing my MD driver mostly relies on can't be done on SNES - the SPC can't look into ROM, and 64KB or RAM which is shared between code, samples and music data is gonna set me quite some restrictions... Z80 in MD has free access to all ROM area (though it will give me around 5% performance hit on 68K when samples are played (and they're played intensively in my songs).

For the 32KHz, it wouldn't sound very clear like FM would, especially since there's some low pass filtering done which makes things sound even less clear not to mention compression which would again have some negative effect on clearness. Now I wonder if there's some analog filter in SNES... it might be a nice idea to get rid of it... most (except the really early)MDs need some modding to hear all the 52KHz goodness(though there's plenty of games with rather crap sound engines and poor sound) that comes form YM2612. Too bad that there's no synth in SNES...

2008-02-15

TmEE wrote:

SNES seems to have little CPU horse power (and that's the reason I wouldn't use it in sound generation process)

Certainly not compared to the 68K, with its sixteen 32-bit registers and built-in multiply and divide support. What sorts of real-time synthesis do you do on the Mega Drive? More than drums I take it. Seems that you'd most likely be playing looped samples at different rates, which the DSP would best do on the SNES.

Quote:

the SPC can't look into ROM

The SNES CPU could act as a ROM server, fulfilling requests made by the SPC-700 via the 4 I/O ports. Most communication between the two has lots of acknowledgement, but it's possible to avoid that by carefully timing both parties. For example, the SNES CPU could constantly be reading the first three output ports as a 24-bit address and placing that byte of ROM into the first input port, so the SPC-700 could just place the address, wait a bit, then retrieve the byte from ROM. Not very practical, but possible.

Quote:

Now I wonder if there's some analog filter in SNES... it might be a nice idea to get rid of it...

There is some slight low-pass filtering after the DAC (perhaps the reconstruction filter). People have modded their SNES units to output SPDIF (whatever that digital audio standard is), which obviously eliminates this low-pass. I would have figured 16 kHz would be plenty for clarity, though.

Quote:

Too bad that there's no synth in SNES...

You can frequency-modulate a voice based on the output of the previous one. So you could have four independent frequency-modulated voices and four modulator voices. There's a 7-bit register named PMON which tells which voices are modulated.

2008-02-15

Certainly not compared to the 68K, with its sixteen 32-bit registers and built-in multiply and divide support. [/quote]

But why there's so much slowdown in many games then ? Or its just the low clock ?

Quote:

What sorts of real-time synthesis do you do on the Mega Drive? More than drums I take it. Seems that you'd most likely be playing looped samples at different rates, which the DSP would best do on the SNES.

I just play 2 channels of PCM for drums (and speech synthesis), the rest is FM and PSG.... and realtime synthesis would have been something I would have used on SNES to compensate for the lack of sound RAM...

Quote:

The SNES CPU could act as a ROM server, fulfilling requests made by the SPC-700 via the 4 I/O ports. Most communication between the two has lots of acknowledgement, but it's possible to avoid that by carefully timing both parties. For example, the SNES CPU could constantly be reading the first three output ports as a 24-bit address and placing that byte of ROM into the first input port, so the SPC-700 could just place the address, wait a bit, then retrieve the byte from ROM. Not very practical, but possible.

The point is that main CPU is not involved in the process... if you'd use this method in a game, I doubt there would be enough time to process game stuff.... my MD engine is all Z80, so 68K does the game with no other tasks.

Quote:

I would have figured 16 kHz would be plenty for clarity, though.

20...24 would be plenty for clarity

Quote:

You can frequency-modulate a voice based on the output of the previous one. So you could have four independent frequency-modulated voices and four modulator voices. There's a 7-bit register named PMON which tells which voices are modulated.

And how complex sounds could be made ? I guess FM synthesis would sound better....

2008-02-15

Quote:

blargg wrote:

Certainly not compared to the 68K, with its sixteen 32-bit registers and built-in multiply and divide support.

But why there's so much slowdown in many games then ? Or its just the low clock?

Sorry, I was saying that the 65816 certainly wouldn't be as powerful, as the 68K has all those registers listed above, while the 65816 has only three registers and can't even do arithmetic directly between them. Clock-wise, I think the 65816 might do more work per cycle/byte than the 68K does.

Quote:

The point is that main CPU is not involved in the process... if you'd use this method in a game, I doubt there would be enough time to process game stuff.... my MD engine is all Z80, so 68K does the game with no other tasks.

OK, so your goal is to make a music engine usable in a game, as opposed to something standalone. Rules out complex arrangements like this.

Quote:

blargg wrote:

You can frequency-modulate a voice based on the output of the previous one. So you could have four independent frequency-modulated voices and four modulator voices. There's a 7-bit register named PMON which tells which voices are modulated.

And how complex sounds could be made ? I guess FM synthesis would sound better....

Well, at the very least you should be able to do FM modulation with two sine waves. You could change the volume of the modulator to change the depth. I'm not very familiar with FM synthesis so I don't know what all it can do. I guess the general (complete?) lack of SNES games with FM synthesis means this isn't that viable.

2008-02-15

TmEE wrote:

Quote:

Certainly not compared to the 68K, with its sixteen 32-bit registers and built-in multiply and divide support.

But why there's so much slowdown in many games then ? Or its just the low clock ?

Because some games are inefficiently coded. This can happen on any platform. On the NES, SMB3 slows down, but Recca does not.

2008-02-16

blargg wrote:

Sorry, I was saying that the 65816 certainly wouldn't be as powerful, as the 68K has all those registers listed above, while the 65816 has only three registers and can't even do arithmetic directly between them. Clock-wise, I think the 65816 might do more work per cycle/byte than the 68K does.

I misunderstood a little... I read that 65816 has 16regs.... oops... everything beats a 68K on work/cycles relation, but nothing beats its flexibility, so far, I've not messed with such good CPU on ASM level.
I hope these regs of 65816 are 32bits, or just 16bits (or 16bit in a way like Z80 can do)?

Quote:

OK, so your goal is to make a music engine usable in a game, as opposed to something standalone. Rules out complex arrangements like this.

Pretty much... but it seems, SNES goes to the end of the list for now.

Quote:

Well, at the very least you should be able to do FM modulation with two sine waves. You could change the volume of the modulator to change the depth. I'm not very familiar with FM synthesis so I don't know what all it can do. I guess the general (complete?) lack of SNES games with FM synthesis means this isn't that viable.

2 operators allows for poor sounds like for example Adlib can do... 4ops is much more fun and with some effort you can get really nice sounding instruments. Most instruments I've made sound almost like my Yamaha's MIDI. Best part is that each instrument is less than 32bytes.

http://www.soundshock.se/phpBB2/viewtopic.php?t=158
One HW recording of a tiny song I made...

2008-02-16

Well, anyone seems to deny how much better the SNES sound is compared to the Megadrive (at least emulated, I have no real Megadrive).
Hironically, the Megadrive can make things that are hard to reproduce on the SNES processor, both are in fact different. The Megadrive SPU is rather good for techno, and can output awesome synth-bass sound, but aside of that most instruments sound horrible. The SNES CPU would be pretty hard to reproduce techno, but WAY better at reproducting more classic music tunes.
32KHz isn't a lot, but if you keep the sampling rate of all your samples high (something most commercial games don't for memory-saving purposes, but some like Chrono Trigger or Dragon Quest 3/6 actually does), it can output very good sound.
The SNES is also bad at reproducting NES/Gameboy sound due to interpolation. It is good however to make music actually intended to be played with real instruments, where both the NES and the Megadrive music falls flat.

I guess if I were to ever code something for the Megadrive I'd just use FM synthetis for synth basses and sound effects, and leave melody for PSG channels. That way you could have Mega Man Battle Network-like music, heh.

2008-02-16

Bregalad wrote:

Hironically, the Megadrive can make things that are hard to reproduce on the SNES processor, both are in fact different. The Megadrive SPU is rather good for techno, and can output awesome synth-bass sound, but aside of that most instruments sound horrible. The SNES CPU would be pretty hard to reproduce techno

Listen to the SNES try to do techno: "Rave Dancetune" from Cool Spot (SNES)

Case in point: When Namco decided to port Pac-Attack to the GBA as part of Pac-Man Collection, it used a stream of the Genesis version's music rather than the Super NES version's music.

2008-02-16

Stop doing bad publicity for the SNES sound, which is able of more than this, I'm pretty sure, even when it comes to do techno. Most early SNES games use the hardware pretty badly, because good samples at high sampling rate takes a lot of space, and most games have only one set of samples they write once in memory and never touches again, greatly limiting the number of available samples (and quality too) as opposed to games that change samples for each song as they need them.

You could rather listen the soundtrack in Chrono Trigger, and then you'll see that it puts the Megadrive to shame.

2008-02-16

I agree with Bregalad - IMO, "Rave Dancetune" used very poor samples, and I believe the SNES version could've sampled waveforms more or less richer than the Genesis's FM synthesized waveforms. Although the Genesis version's "Rave Dancetune" is superior to the SNES version and is one of the better Genesis songs.

2008-02-16

You can get nice sound from SNES, but you can also get the same from MD, in FM its just little more pain to get "real" sounding instruments... on SNES you can just use recording of a real instrument... Also, you may want to hear the demo of a MD game "Pier Solar and the Great Architects" once its out, then you can hear that MD can do much more than techno. BTW, the game uses my sound engine.

http://www.hot.ee/tmeeco3/CA-TITLE.OGG - some kind of metal

2008-02-16

Quote:

I hope these regs of 65816 are 32bits, or just 16bits (or 16bit in a way like Z80 can do)?

They're 16 bits (or even 8 bits, depending on some mode bits). Unlike the Z80, true 16 bits.

Quote:

2 operators allows for poor sounds like for example Adlib can do... 4ops is much more fun and with some effort you can get really nice sounding instruments.

I guess pitch modulation can be layered on the DSP, actually, where for example voice 0's amplitude becomes a pitch modulation for voice 1, and voice 1's amplitude in turn becomes a pitch modulation for voice 2. Not sure how useful it would be. I'll have to try it sometime.

Here is some SPC music from Donkey Kong Country 2 & 3 which do some techno timbre-changing effects: Techno_SPC.zip

2008-02-16

Quote:

I guess pitch modulation can be layered on the DSP, actually, where for example voice 0's amplitude becomes a pitch modulation for voice 1, and voice 1's amplitude in turn becomes a pitch modulation for voice 2. Not sure how useful it would be. I'll have to try it sometime.

You are right, in Secret of Mana when you call Flammy with the Drum item (you have to be pretty far in the game to do that), the sound effect plays on all 8 voices, where all of them are daisy-chained to pitch-modulate the next one.

And yeah, Donkey Kong Country games make quite notable use of SPC hardware.

2008-02-17

I just listened the SoM soundtrack, the instruments are pretty nice, but music itself is boring, but that's just my opinion. Those 2 DCK songs aren't anything impressive, and it seems most of its instruments are recorded off from a FM synth.

Nice listening material :
http://project2612.org/details.php?id=55
http://project2612.org/details.php?id=443
http://project2612.org/details.php?id=61
http://project2612.org/details.php?id=44
http://project2612.org/details.php?id=16
http://project2612.org/details.php?id=172

2008-08-29

Quote:

Well, at the very least you should be able to do FM modulation with two sine waves. You could change the volume of the modulator to change the depth. I'm not very familiar with FM synthesis so I don't know what all it can do. I guess the general (complete?) lack of SNES games with FM synthesis means this isn't that viable.

2 operators allows for poor sounds like for example Adlib can do... 4ops is much more fun and with some effort you can get really nice sounding instruments. Most instruments I've made sound almost like my Yamaha's MIDI. Best part is that each instrument is less than 32bytes.

Except Adlib uses a sterilely produced sound tone for the carriar and operator. The SNES DSP is already using a very complex waveform. More complex than what even 4 operators can deliver form FM synths. It would akin to having one channel's final output of the 2612 modulate another channel entirely. But the DSP is already working with complex waveforms, so I don't see a real need for this. Stuff like vibrato/LFO and sweeps can be handle SPC700 processor.

2011-03-16

for what i do know the genesis has actualy 10 soundchannels whereass 4 were originaly intended for sms games but they can do all work simonteniousely in md modes.
channel 6 can can do 8bit pcm samples i guess at 54 khz.
the snes has 8 digital soundchannels,those samples are in 4bit brr format,but those samples are decoded,reconstructed into 16bit 32khz,will the guessian filter does remove annoying noise but sadly also futher reduces the quality,the annalogue lowpass filter gives a nice mega bass sound and remove extra noise but also sadly eliminate even more hich freq,s,will the 4bit samples are upsampled to 16bit to compensate degradation,you will end up with a somewhat flatty,muffed sound,trough voices sounds can be fairly okey but sometomes
voices sounds more like they were recorded behind bars/dours,even then they can be still scratchy or noisy if the recording was bad done.
but the best advantage of the snes chip is, it can also stream 100% true 16bit 32khz samples toobad a 3 minute audio track takes up more space then even the largest snes game.

2011-03-16

johannesmutlu wrote:

but the best advantage of the snes chip is, it can also stream 100% true 16bit 32khz samples toobad a 3 minute audio track takes up more space then even the largest snes game.

That's not the best advantage. :shock:

The advantage of the SNES sound setup is that it's not strictly limited to specific set of instrument sounds like a game system FM chip. It can have a huge range of sound; soft or muffled ... or not. And it can do complex 4 note chords via a single channel. That's the strengths and "best advantage" of sample based synth - with the SNES audio.

Also, if you're talking about the Genesis channel 6 DAC, you're not gonna get 54khz PCM playback. The chip busy/wait flag. You can't just simply brute write to the DAC and expect whatever frequency you hard coded the z80 to. You'll miss sample updates/writes. TmEE leaves it at a reasonable 20-26khz and does double writes to ensure he doesn't get a missed sample update.

2011-03-16

tomaitheous wrote:

The advantage of the SNES sound setup is that it's not strictly limited to specific set of instrument sounds like a game system FM chip. It can have a huge range of sound; soft or muffled ... or not.

At a substantial cost in ROM space, which was expensive in the Super NES era. And at a mild cost in loading time; notice how much faster The Lord of the Rings switches between areas when music is turned off (not that it isn't still a piece of poo). One thing samples alone can't do is make a timbre that varies slowly over time, unlike the duty cycle of a pulse channel (think SID, not NES), the amplitude of an FM operator, or the resonant frequency of a TB303 bass.

Quote:

And it can do complex 4 note chords via a single channel.

I've been able to coax fifth, fourth, major third, and minor third intervals out of TFM (tracker for Genesis's FM chip) by running the operators in a 2+2 configuration. Then I can do a 4-note chord on two channels. Doing this in samples would require a lot of sample RAM.

2011-03-16

Quote:

out of TFM (tracker for Genesis's FM chip) by running the operators in a 2+2 configuration. Then I can do a 4-note chord on two channels.

Only channels 3 and 6 allow this on the YM2612 (Genesis FM chip), since they're the only two channels that can be configured so that each operator has its own frequency setting.
Unless you're able to achieve all the desired frequencies of your chord with the MUL and DT parameters.

2011-03-16

mic_ wrote:

Unless you're able to achieve all the desired frequencies of your chord with the MUL and DT parameters.

This is the case. Choices for MUL pairs are 2 and 3 (fifth), 3 and 4 (fourth), 4 and 5 (major third), and 5 and 6 (minor third), and a quick perusal through Wikipedia's article on just intonation will help you figure out why these work. These are also the valid choices for pairs when doing intervals on Game Boy channel 3.

2011-03-17

mic_ wrote:

Only channels 3 and 6 allow this on the YM2612 (Genesis FM chip), since they're the only two channels that can be configured so that each operator has its own frequency setting.
Unless you're able to achieve all the desired frequencies of your chord with the MUL and DT parameters.

Only channel3 has freq per operator setting, 6th does not.