Rate of streaming data to the SPC700?

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Rate of streaming data to the SPC700?
by on (#176847)
This has been beaten to death, but I've never actually understood how it worked. I gather that the CPU has to write to audio ram through the SPC700 and they have to be "synched up", but what does this really mean? I guess it's that because the SPC700 operates at a lower frequency, you need to make sure you're writing to it when a cycle is happening on both, but the frequency of the main cpu and the SPC700 isn't even related at all, as 2.68/1= about 0.373, so I have no clue how this works. I guess it's never perfect, but I imagine there's a period of time a cycle takes up, so the SPC700 could get the data toward the end of a cycle and it wouldn't make a difference or something like that. Yeah, a cycle doesn't last until the next one, does it?

I was really wondering though is, in the 32KHz, 16 bit sound demo that apparently takes up all CPU time, is this because of decompression, or is it because of waiting for the right time to write to the SPC700? I don't even know what bit depth in sound is, but I guess it's just the range the sound wave can be at, so if I were to draw a line graph that represented sound waves, each axis would only have whole numbers and the y axis would be bit depth, and the x axis would be the "KHz" frequency. The thing is though, would you ever even really need to jump from 0 to 65536 in one Hz? Could you do a sort of lossy compression where instead of 16 bit audio, you have 8 bit audio where you can either go up 128 or down 128, or would this sound like crap even at 32KHz (or really, would this sound better than 16 bit 16KHz)? You'd also have to have the SPC 700 decompress it, which I'm not too sure how possible that is. I think I heard that the DSP (for the most part, but there's a way to get around it) actually uses lossless "BBR" compression (which I don't know how it's formatted, but I can look into it) but if it is lossless, than that means the rate of streaming could be inconsistent, unless you doctored all your samples to work around it and made them kind of lossy, but whatever. Ofcourse, constant streaming rate really only matters if you're streaming the data to the SPC700 and then streaming that data through the DSP right away in that you're not creating a buffer in audio ram.

I actually thought of something again... If the SPC700 were completely busy streaming data, couldn't it not even decompress any lossy audio samples? Grr...
Re: Rate of streaming data to the SPC700?
by on (#176851)
Normally, when you're talking to a friend, you know when to listen and when to talk by gestures and gaps in the conversation.

With the SNES, the S-CPU and the S-SMP have a much more restricted means of communication: there are exactly four boxes, each of which can hold a single number between 0 and 255. They can't know when the other side put a new number in: they can only see that the number has changed since the last time they checked.

Streaming uncompressed 32kHz stereo 16-bit audio requires transmitting approximately 128000 bytes per second. But the exact rate is determined by the S-SMP's clock, which isn't a constant speed relative to the S-CPU. (Because the S-SMP originally used a lower-precision "ceramic resonator", it can drift by up to 0.1% from what's intended, while the S-CPU uses a calibrated quartz crystal)

So the S-CPU not only has to push 128000 bytes per second through this cramped communication path, but it has to ask the S-SMP whether it needs more data yet. And the S-SMP has other things it has to do, too, namely copy the data from the S-CPU to the DAC (which is complicated, but we'll ignore that for now)


As far the other questions, I'd strongly recommend xiphmont's tutorials.
Re: Rate of streaming data to the SPC700?
by on (#176853)
Quote:
If the SPC700 were completely busy streaming data, couldn't it not even decompress any lossy audio samples? Grr...

The audio is always lossy compressed in the SPC700. The hardware decompress it just in time when it extract samples from BRR encoded data. There is not much point in compressing it further with another algorithm, because the data is already close to full randomness, which makes it extremely difficult to compress.

The only way this could be done is separateing BRR sample data itelf (last 8 bytes of each 9 byte BRR packet) and BRR data hearders (first byte of BRR packet), and while the audio data is incompressible the headers are, with RLE or huffman. The problem is that in the absolute best case you reduce data to 8/9 of it's original size, which is not very interesting.

Also, please do not use "grr" or anything like that in your post, this is not appropriate. It might not be the intent, but it makes you sound like an aggressive dog threatening us, even though we're not responsible of the issue (in this case SNES hardware development).

One major problem for SPU/CPU sync is that both runs on completely separate crystals, unlike, say, the CPU and PPU in the NES, where they run at different rates, but based on different divisors of the same crystal.
Re: Rate of streaming data to the SPC700?
by on (#176857)
There are different degrees of lossy. At times I've thought of how to compress mostly periodic samples, such as speech, by repeating smaller portions. And at times I've thought of how to compress audio at 2 bits per sample instead of 4, unpacking it to proper BRR at runtime. (I can't do any experiments at the moment because of The Curse of Possum Hollow.)


EDIT (2016-11-12): 2-bit BRR is on
Re: Rate of streaming data to the SPC700?
by on (#176860)
Espozo wrote:
you need to make sure you're writing to it when a cycle is happening on both

It looks like you have a fundamental misconception happening here.

Communication between the two processors (the main CPU and the audio CPU) is accomplished by means of I/O ports that hold persistent values. You write to a port, with either processor, and the value you wrote shows up on the other side for all time, or until the system is turned off, or until the port is written again from the same side (values read from these ports are always the values that were written from the other side).

http://wiki.superfamicom.org/snes/show/SPC700+Reference

"Sync" in this case doesn't mean cycle-alignment. It means making sure that (a) the S-CPU doesn't write a new value before the S-SMP has read the previous one, and (b) the S-SMP doesn't read an old value again and mistake it for new data. (Also (c) the S-SMP doesn't end up with corrupted data due to reading during an S-CPU write, or reading the top port after the bottom two ports were written in rapid succession. See the link above for descriptions of these two hardware bugs.) There are lots of ways to accomplish this, from two-way handshake protocols for each byte (slow but reliable in a variety of situations; the SPC700's boot ROM does this) to semi-free-running timed code loops (fast but require intermittent sync protocol to account for the fact that the clock ratio is not reliable). N-Warp Daisakusen's audio streaming uses multi-line HDMA bursts on the S-CPU side and a cycle-counted read loop on the S-SMP side, but its peak bandwidth is limited by the fact that it dumps the data on the stack and copies it to the desired location after the burst. My own (untested) attempt at HDMA streaming uses self-modifying code to pass the data straight to the desired location during the read loop, which should in principle allow much higher bandwidth. (Again, my code is totally untested; for all I know, d4s' method could be the best you can do with HDMA...)

blargg's uncompressed 32 kHz 16-bit stereo streaming technique uses single-sided sync from the S-SMP (since that's the side that knows exactly how fast the audio is playing) and writes the data to a fixed location in audio RAM to avoid the overhead of indexing and page handling. Playback is via an echo buffer exploit: If you set the echo buffer size to zero, it will continually pick up the echo sample from the same four bytes (16-bit stereo = 4 bytes per sample), so if you can write to that location at an absolutely consistent rate of four bytes per 32 S-SMP cycles, you can play arbitrary 32 kHz stereo sound. Unfortunately the S-CPU has to pay very close attention to the I/O ports to pull this off; there's room for extra processing since it's so much faster than its counterpart (blargg used a lossless compression algorithm to jam more audio into the ROM) but the whole thing has to be designed around the audio streaming loop because the S-CPU has to reliably be there for the S-SMP approximately twice per scanline.

Basically, lidnariq describes the situation pretty well. You have four I/O ports as described, and whatever you do has to work reliably given that constraint.

Espozo wrote:
The thing is though, would you ever even really need to jump from 0 to 65536 in one Hz? Could you do a sort of lossy compression where instead of 16 bit audio, you have 8 bit audio where you can either go up 128 or down 128

That sounds like DPCM. The NES uses 1-bit DPCM for its sample channel, which is why it sounds so awful.

BRR is a type of ADPCM.

Quote:
If the SPC700 were completely busy streaming data, couldn't it not even decompress any lossy audio samples?

The nice part about BRR is that it's used natively by the S-DSP. The S-SMP doesn't need to worry about it. (Unfortunately, since the S-DSP can't use uncompressed audio, it's hard to do software synthesis; to avoid having to do BRR compression you'd have to use the echo buffer, and the echo buffer is unavoidably 32 kHz stereo, which is really hard work for these primitive chips...)

...

The nice part about the MSU1's streaming functionality is that it bypasses essentially the entire SNES, passing analog audio directly to the output stage of the APU via a pair of cartridge slot pins designed for exactly this. Eight channels of sound effects, anyone?
Re: Rate of streaming data to the SPC700?
by on (#176876)
Bregalad wrote:
Also, please do not use "grr" or anything like that in your post, this is not appropriate. It might not be the intent, but it makes you sound like an aggressive dog threatening us, even though we're not responsible of the issue (in this case SNES hardware development).


Are ... are you joking? Because you can't possibly be serious. But if you are ... you got it.

93143 wrote:
Communication between the two processors (the main CPU and the audio CPU) is accomplished by means of I/O ports that hold persistent values.


I still need to revise that. My understanding is that there are eight actual byte values, and all of them reside inside the SMP. Not that it makes a difference from a user programming perspective, of course. But for documentation purposes, it'd be good to know for sure and get it right.

Right now I have it that CPU $2140-2143 (mirrored through 217f) hold four bytes; and SMP $f4-f7 hold four bytes as well.
* CPU read => return SMP bytes
* CPU write => assign CPU bytes
* SMP read => return CPU bytes
* SMP write => assign SMP bytes
If they are really all inside the SMP, then four of the bytes don't actually exist in RAM anywhere (the SMP ones actually are stored in the APURAM itself.)
Re: Rate of streaming data to the SPC700?
by on (#176880)
byuu wrote:
If they are really all inside the SMP, then four of the bytes don't actually exist in RAM anywhere (the SMP ones actually are stored in the APURAM itself.)


That's something I really think you need to check against hardware again, because it doesn't seem physically possible to me. The way higan implements the I/O ports, you can put the DSP echo buffer in APURAM page $00 and then the S-CPU can directly read four bytes of the echo buffer via $2140-2143. I don't see how that's possible memory-access-timing-wise. I think it's much more likely that SMP writes to $F4-$F7 also fall through to APURAM (just like writes to $FFC0-$FFFF fall through even when the IPLROM is mapped in), but the ports themselves are internal to the chip so they can be read asynchronously by the S-CPU.

In any case, there are definitely eight 8-bit ports, four which are read-only on the CPU side and write-only on the SMP side and the other four which are write-only on the CPU side and read- or erase-only on the SMP side.
Re: Rate of streaming data to the SPC700?
by on (#176939)
Thanks everyone for commenting. I think I get why it's so difficult, because I was thinking you could set aside a fixed amount of time for the 5A22 to do nothing but send data and the SPC700 to do nothing but receive data, but where the 5A22 would only use about 16 cycles to write 4 bytes (16 bit accumulator, 2 4 cycle lda's, 2 4 cycle sta's) the SPC700, assuming it's like the 6502, would take I think 24 cycles to write 4 bytes (only 8 bit accumulator, 4 3 cycle lda's, 4 3 cycle sta's). Not only that, but the SPC700 is only like 2/5's as fast as the main cpu, so get 6 cycles (24/4) times 2.68 = 16.08, times 128,000 = 2,058,240. :(

I don't quite get the whole feedback thing though, if you're only sending data to the SPC700, why would you need this whole "handshake" thing? It seems you'd just need to have your audio ram filling code calibrated with the SPC700's speed, and you would be fine, unless that speed fluctuation is that bad, but you could still calibrate the code around it because I'd imagine it's fairly predictable?

Just thinking some more, I imagine HDMA would send something like 1024 bytes over in one frame (it works during vblank, doesn't it? how long is the "screen" on the SNES, including "off the screen"?), and 128,000/60 = 2,133, and 1024 just about knocks half of that out. Use BRR compression instead of the echo buffer thing, and you don't even have to worry. Could you actually fire an interrupt (that would work for the 5A22, and hopefully the SPC700) every line instead of HDMA, so you could still use all 8 HDMA channels for the video hardware that requires you to send it then? I mean, when HDMA is done, fill out the 4 IO ports then. You could even do it to where the CPU fills out them in 16 cycles like I was talking about earlier, wait about 60 (doing hopefully something, in worst case nop's) and then use another 16 cycles and be done? That would knock out about the rest of the 2,133 bytes, Actually, I just found out, this is now still about half of the CPU time... :lol:

Is there a program out there that will turn a music sample into BRR format, and either back or let you play it, because I'm interested in seeing how lossy it is. It can't be that bad, considering how clear a lot of the Donkey Kong Country music is and I doubt it's at 32KHz either.
Re: Rate of streaming data to the SPC700?
by on (#176955)
Espozo wrote:
alibrated with the SPC700's speed, and you would be fine, unless that speed fluctuation is that bad, but you could still calibrate the code around it because I'd imagine it's fairly predictable?
You can't calibrate just once against something that drifts.

You can probably save a little time by resynchronizing less often, but even so.

Quote:
[HDMA] works during vblank, doesn't it?
No.

Quote:
hopefully the SPC700
The S-SMP has no interrupt sources.

Quote:
Is there a program out there that will turn a music sample into BRR format, and either back or let you play it, because I'm interested in seeing how lossy it is. It can't be that bad, considering how clear a lot of the Donkey Kong Country music is and I doubt it's at 32KHz either.
The SNES APU is a wavetable synthesizer. Individual samples tolerate lossy compression noise a lot better than whole audio streams.
Re: Rate of streaming data to the SPC700?
by on (#176963)
My Super NES project template includes a BRR encoder written in Python. I think it can also decode.
Re: Rate of streaming data to the SPC700?
by on (#176965)
lidnariq wrote:
You can probably save a little time by resynchronizing less often, but even so.

That's what I was thinking. The real problem seems to be the fact that the speed varies randomly, not the fact that the speed is different.

lidnariq wrote:
No.

:(

lidnariq wrote:
The S-SMP has no interrupt sources.

Oh... So it constantly has to be doing nothing other than looking at the IO ports? I'm not sure how that's going to work. How does HDMA work for this then? Can the S-SMP generate interrupts for the S-CPU? That would be a miracle, but I bet it's too good to be true.

lidnariq wrote:
Individual samples tolerate lossy compression noise a lot better than whole audio streams.

That's what I would have figured do to the fact the sound wave is much less complicated. I guess one thing you could do is only have the audio be mono, and reduce the 128,000 bytes per second to 64,000 bytes per second. Non mono music really doesn't make sense anyway, considering the music isn't actually coming from the game onscreen anywhere. Sound effects being stereo is important to me though, and that's not hard to do. Yeah, mono sounds fine, just 64,000 divided by 60 for 1,067 bytes, and 224 times 4 is 896. It can't be that hard to send 171 bytes (hopefully) and if it is, I doubt anyone would notice if the sampling rate was brought down to 26KHz, which is about 867 bytes, and HDMA could get rid of that by itself. (I imagine you could play it at a random sample rate like that?) I do really wonder if this sounds better than 32KHz BRR though, and this could be stereo. I'd have to test it on some tracks and see what I think sounds better, although I don't have any kind of tool that will turn a generic music file into a BRR and back.

A new post just came in...

tepples wrote:
My Super NES project template includes a BRR encoder written in Python. I think it can also decode.

Sweet! :) I'll be sure to try it out (and probably even post the results here).

Edit: Unfortunately, it doesn't seem able to decode.
Re: Rate of streaming data to the SPC700?
by on (#176970)
You know, now that you mentioned it, I wonder if the drifting SPC700 speed is why some of the music sounds different from how I remember it.
Re: Rate of streaming data to the SPC700?
by on (#176972)
Espozo wrote:
I was thinking you could set aside a fixed amount of time for the 5A22 to do nothing but send data and the SPC700 to do nothing but receive data

You can. It's just a substantial hit to S-CPU power if you want high bandwidth. I think the Bad Apple demo does this, but with a low-rate mono stream.

The advantage of HDMA is that the S-CPU can go do something else. It's actually slower than a tight handshake loop, but it's automatic on the S-CPU side.

Quote:
128,000

That's well past the theoretical maximum data rate on the S-SMP side. The only reason it works is the fixed destination address in zero page, so it can be written with non-indexed direct-page word writes. You can't fill arbitrary areas of ARAM that way.

Quote:
I imagine HDMA would send something like 1024 bytes over in one frame

It only works in active display. You get 225 lines in normal mode, or 240 in overscan mode (it will fire at the end of line 0, which is blank because sprites, but apparently it will also fire at the end of the last active line when it shouldn't matter any more).

So an HDMA channel can transfer at most 900 bytes, or 960 bytes in overscan mode. But IIRC the observed variance in APU clock speed is such that a cycle-counted read loop could desync over that much time, and the loop is so tight that I doubt it's possible to resync without stopping the data flow temporarily. Besides, d4s is using the stack, and I'm relying on an 8-bit index register, so neither of our methods can handle more than 256 bytes in one chunk no matter what.

My method is designed to use bursts of up to 32 lines separated by resync (and, optionally, unrelated audio engine processing). If it works, it would peak around 750-800 bytes per frame, or less if you wanted to do other audio processing in between chunks so as to allow sub-frame audio timing. I'm hoping for 640 bytes (five chunks), which is enough for 32 kHz stereo BRR, or three 22 kHz streams (enough for Ryu, Ken, and the announcer), with enough wiggle room (hopefully) to allow adaptive chunk sizing and/or scheduling based on live timing checks.

Quote:
how long is the "screen" on the SNES, including "off the screen"?

262 lines for NTSC, 312 for PAL. In interlace mode, every other frame has an extra line.

Quote:
128,000/60 = 2,133, and 1024 just about knocks half of that out
Quote:
That would knock out about the rest of the 2,133 bytes

I suppose combining HDMA and CPU writes might work with the stack method, but the timing would be pretty tight, and the recopy gaps would kill the data rate. In any case, HDMA isn't magic; it uses the same I/O ports as manual data, and it takes APU-side processing time to pick it up just like manual data, the only difference being that HDMA doesn't check if the S-SMP is ready before writing data. Since your method would require the S-CPU to stay in a transfer loop anyway, you might as well just do the whole thing manually.

FYI, in d4s' code, the APU-side loop looks like this:
Code:
ReceiveStreamTransferLoop:         ;66 cycles per scanline. this loop must be exactly 66 bytes long while looping.
;26 cycles               ;59 cycles per scanline left for actual transfer

   movw   ya,$f4            
   push a               ;save data on stack cause that's the fastest way possible
   push y
   movw   ya,$f6
   push a
   push y
   
;waste 33 cyles:
;******************************
   mov   a,TempBuffer1         ;real snes/bsnes need a total of 33 cycles here
   mov   a,TempBuffer1
   mov   a,TempBuffer1
   mov   a,TempBuffer1
   mov   a,TempBuffer1
   mov   a,TempBuffer1
   mov   a,TempBuffer1
   mov   a,TempBuffer1
   mov   a,TempBuffer1
   mov   a,TempBuffer1
   mov   a,TempBuffer1

;7 cycles when looping back
   dbnz   TempBuffer5,ReceiveStreamTransferLoop

In my code, it looks like this:
Code:
get_data_HDMA:
   mov A, $F4                ; 3 cycles - get byte 0 of the data shot
   mov !buf+X, A             ; 6 cycles - write it to the current buffer position
   mov A, $F5                ; 3 cycles - get byte 1
   mov !(buf+1)+X, A         ; 6 cycles - write it to the current buffer position plus one
   mov A, $F6                ; 3 cycles - get byte 2
   mov !(buf+2)+X, A         ; 6 cycles
   mov A, $F7                ; 3 cycles - get byte 3
   mov !(buf+3)+X, A         ; 6 cycles
   inc X                     ; 2 cycles - increment the current buffer position four times
   inc X                     ; 2 cycles
   inc X                     ; 2 cycles
   inc X                     ; 2 cycles
   cmp (X), (Y)              ; waste 5 cycles
   cmp (X), (Y)              ; waste 5 cycles
   cmp (X), (Y)              ; waste 5 cycles
   dbnz Y, get_data_HDMA     ; 6/4 cycles - repeat for next scanline, or exit if done
; TOTAL:  65 cycles
Code:
; FOR PAL, REPLACE 15-CYCLE TIME DELAY IN DATA PICKUP LOOP WITH:
   mov A, Y      ; 2 cycles
   and #$01      ; 2 cycles
   beq +         ; 4/2 cycles
   cmp A, (X)    ; 0/3 cycles
+  nop           ; 2 cycles
   cmp (X), (Y)  ; 5 cycles

You will note that my code uses a 65.5-cycle loop in PAL mode, vs. d4s using a 66-cycle loop. This is because 66 cycles is slightly longer than an ideal scanline (65.632 cycles PAL, vs. 65.033 NTSC), and since my method picks up the data more slowly, I have less wiggle room than he did, plus I'm hoping to support longer data bursts.

Quote:
Is there a program out there that will turn a music sample into BRR format, and either back or let you play it

BRRTools.

The encoder has an option for a high-frequency boost prefilter to mitigate the muffling effect from the gaussian interpolation (it's not perfect, but it's fast), and the decoder has an option for simulating said interpolation (though it seems that as designed, it's only accurate for full-rate input; if my Matlab work isn't wrong, resampling changes the frequency response somewhat at the top end. Also, use the latest version; 3.11 had a bug that would muffle the output much more strongly than it should have).

(Just to be extra clear, the gaussian interpolation is unrelated to the BRR codec. They are both operational features of the S-DSP, and thus both have an effect on how the output sounds, but that's the only connection.)

Espozo wrote:
Can the S-SMP generate interrupts for the S-CPU?

No.

...

Also note that uncompressed 32 kHz 16-bit stereo is over 7 MB (not Mbit) per minute. Even with BRR it's still above 2 MB per minute. 26 kHz mono BRR is still nearly a megabyte per minute, and stereo does sound better than mono. ROM size limitations are still relevant, unless you're using MSU1 which kinda removes the point of streaming music to the APU in the first place... IMO you'd get much more bang for your buck using the APU the way it was designed to be used, and maybe using streaming to augment its capabilities like Tales of Phantasia, Star Ocean and N-Warp Daisakusen did.
Re: Rate of streaming data to the SPC700?
by on (#177000)
93143 wrote:
It's actually slower than a tight handshake loop, but it's automatic on the S-CPU side.

So, only slower for the S-SMP side. I really don't care about that, from my understanding, even 1MHz is overkill for just writing a few things to the DSP.

93143 wrote:
I'm hoping for 640 bytes (five chunks), which is enough for 32 kHz stereo BRR

And, according to what you said, you'd have 110-160 bytes unused. It could just be for streaming miscellaneous stuff, like maybe if certain sound effects changed throughout the level. Wait though, the SPC700 still needs to do processing unrelated to streaming though, even if it is little.

93143 wrote:
Even with BRR it's still above 2 MB per minute. 26 kHz mono BRR is still nearly a megabyte per minute, and stereo does sound better than mono. ROM size limitations are still relevant, unless you're using MSU1 which kinda removes the point of streaming music to the APU in the first place...

Are there no other memory mapper chips that have been made for the SNES? I've never been a fan of the MSU1 audio, because I feel it's not much better than just having an audio port come out of the cartridge slot, even if the DAC gets the information.

93143 wrote:
and stereo does sound better than mono.

A little? :lol: (Also, like your choice of music... :roll: )

Yeah though, this has turned into some complex timing mumbo jumbo that I know I'm not yet cut out for. I knew it was stupid to ask this whole thing, but I was curious. I'll probably just end up using someone else's sound engine. :lol:

lidnariq wrote:
Individual samples tolerate lossy compression noise a lot better than whole audio streams.

I don't know, this turned out almost perfectly. Maybe it's not a very good song to do a fair comparison though. I used the BRR encoder and decoder on two different tracks (the left and the right) and then put the back together to be stereo.

Attachment:
Eine Kleine Nachtmusik (WAV vs. BRR).zip [1.55 MiB]
Downloaded 125 times
Re: Rate of streaming data to the SPC700?
by on (#177002)
Espozo wrote:
93143 wrote:
I'm hoping for 640 bytes (five chunks), which is enough for 32 kHz stereo BRR

And, according to what you said, you'd have 110-160 bytes unused.

I think it bears repeating that my code is untested. Don't count your chickens before they hatch...

Quote:
Wait though, the SPC700 still needs to do processing unrelated to streaming though, even if it is little.

Yeah, that's why I degraded the performance. I was trying to leave enough space in between chunks for the audio engine to go do something else before having to turn around and get ready for the next chunk. Otherwise all other audio processing would have to happen during VBlank, and having everything happen with 16.7 ms granularity could sound bad in some situations.

Quote:
Are there no other memory mapper chips that have been made for the SNES?

Well, there's the S-DD1, which is supposed to be able to handle 256 MB in theory. In practice, the pinout may be insufficient - it's thought that the real limit was probably no higher than 8 MB.

Quote:
even if the DAC gets the information.

Oh, no, it bypasses the DAC too. It's purely analog. Almost more like the Voicer-kun than traditional SPC700 audio. The Super Game Boy does the same thing - what you hear is real DMG sound output.

Quote:
your choice of music

It was the first example that came to mind. The cartridge version used mono; the 64DD expansion (and the official soundtrack CD) used stereo.

Quote:
I'll probably just end up using someone else's sound engine.

IIRC the version of SNESMod KungFuFurby and Augustus Blackheart are working on offers up to 12 kHz mono streaming for sound effects. It appears to use a high-speed semi-free-running loop (intermittent sync), with manual data feed on the S-CPU side.

Quote:
I used the BRR encoder and decoder on two different tracks (the left and the right) and then put the back together to be stereo.

Did you forget to pan them? What you've got here is an excellent mono/stereo comparison, but as a result it's hard to hear the difference the BRR makes... at least on my laptop's cheap sound card...
Re: Rate of streaming data to the SPC700?
by on (#177003)
93143 wrote:
Well, there's the S-DD1, which is supposed to be able to handle 256 MB in theory.

Isn't that the chip used in Street Fighter Alpha 2?

93143 wrote:
The Super Game Boy does the same thing - what you hear is real DMG sound output.

I never actually knew that...

I can't help but find it funny though that the audio hardware is probably the most advanced (compared to other systems about how advanced their audio hardware was to their video hardware) part of the system, yet the easiest to bypass. :lol:

93143 wrote:
Did you forget to pan them?

I'm not entirely sure what panning in the context of audio means, but I think I messed up. When I split the audio in two, they were set as "right" and "left" instead of "mono" and brr_encode said that it had to convert two audio channels into one, but I didn't think about how all that really means is it's muffling it, so I went back and made sure that both tracks said "mono" and brr_encode didn't complain about there being too tracks. The results are so similar now, that I actually had to go and look at a hex editor to make sure the final outputted file was even any different.

Attachment:
Fair Comparison (WAV vs. BRR).zip [1.58 MiB]
Downloaded 133 times

I wonder though, is this just not a good sample to try and see quality loss? I might want to try a song with vocals or something, but considering this turned out next to flawless, I can't imagine anything will even sound as muffled as most SNES soundtracks.

It seems the 32KHz demo of this song using uncompressed audio was really just to show off more than anything. :lol: (Not that it's not impressive.)

Edit: I tried it with a couple more samples that I thought were about the polar opposite of this, and still, it doesn't make a difference in that they sound exactly the same. Even songs with vocals don't make a difference. I bet only something deliberately designed to screw this up would make a difference, but it probably wouldn't even qualify as music.
Re: Rate of streaming data to the SPC700?
by on (#177083)
Espozo wrote:
Isn't that the chip used in Street Fighter Alpha 2?

Yes. The largest ROM it has been demonstrated to work with is 48 Mbit (Star Ocean). Anything higher is apparently speculation, though it seems reasonable to expect at least 64 Mbit...

Quote:
I'm not entirely sure what panning in the context of audio means

It means positioning in the stereo field.

https://en.wikipedia.org/wiki/Panning_(audio)

With a stereo recording or mix, the stereo distribution can be very complex in terms of frequency and phase, but ultimately it consists of one signal on the left channel and one signal on the right channel. If you take those signals and mix them in equal amounts on both channels, the result is a mono track; the stereo information is lost.

And apparently that's what BRRTools does if you feed it a stereo input...

Quote:
I wonder though, is this just not a good sample to try and see quality loss? I might want to try a song with vocals or something, but considering this turned out next to flawless, I can't imagine anything will even sound as muffled as most SNES soundtracks.

Did you use the -g option when encoding/decoding? It's the Gaussian anti-aliasing interpolation that causes muffling, not the BRR compression; BRR just adds noise. Even with a treble boost filter applied before encoding (that's what -g does in the encoder), the result won't sound as good as a straight encode/decode with no Gaussian, partly due to the boost filter not exactly mirroring the effect of the interpolator and partly (I imagine) due to the boosted treble consuming more of the BRR format's limited bandwidth than it deserves... (I wonder if this second effect wouldn't be significantly worse with my heavy-duty high-precision prefilter than with the simple 8-tap one in BRRTools, since mine boosts a lot more up near the Nyquist - this could be bad with a DPCM relative, however distantly related...)

I should also mention that even the Gaussian interpolator isn't the whole story; a lot of the "characteristic SNES muffling" is due to lower-rate samples used to save memory. You're using 32 kHz, which can be noticeably less sharp than 44.1 kHz on some material if you've got good ears, but isn't really what you'd call "muffled" (and it's as good as the SNES can do anyway). The vocal samples in Street Fighter Alpha 2 seem to be 22 kHz in the original, which is perfectly adequate, but between 4 and 6 kHz in the SNES version, which really isn't...
Re: Rate of streaming data to the SPC700?
by on (#177084)
How much does your prefilter boost the frequencies above the nyquist frequency?
Re: Rate of streaming data to the SPC700?
by on (#177085)
It doesn't. It's designed to be used on audio that's already at the final sample rate, so there's nothing above the Nyquist to boost.
Re: Rate of streaming data to the SPC700?
by on (#177087)
93143 wrote:
Yes. The largest ROM it has been demonstrated to work with is 48 Mbit (Star Ocean). Anything higher is apparently speculation, though it seems reasonable to expect at least 64 Mbit...

I wonder if byuu knows if it can hold more data. I'm not too keen on the fact that really only two things support the MSU-1 (and I don't like how it completely bypasses the audio hardware) so that's why this seems more appealing to me, but it would be handy to know how it works first. :lol:

93143 wrote:
Did you use the -g option when encoding/decoding? It's the Gaussian anti-aliasing interpolation that causes muffling, not the BRR compression

No, I didn't use it. If all it does is just muffle the thing and (I assume) not lower the sample size like BRR does by itself (assuming again), then what's the point, unless this is specifically for samples less than 32KHz in that it tries to smoothen them out, but I didn't need to do that.

93143 wrote:
I should also mention that even the Gaussian interpolator isn't the whole story; a lot of the "characteristic SNES muffling" is due to lower-rate samples used to save memory.

Isn't the Gaussian interpolator only used because of lower rate samples? I imagine that just because a sample is lower rate doesn't mean it's muffled, doesn't that just mean it's choppy or distorted?

93143 wrote:
You're using 32 kHz, which can be noticeably less sharp than 44.1 kHz on some material if you've got good ears

I didn't know a 17 year old could have a hearing deficit then, because I don't hear the slightest difference. It could be my cheap laptop speakers too.

93143 wrote:
but between 4 and 6 kHz in the SNES version, which really isn't...

That really is shit. The reason I wanted to stream audio in the first place is not because I don't believe 8 channels aren't sufficient or something, but if I wanted as many things in audio ram as possible (different instruments and sound effects) it would have to be at sub 8KHz levels unless some miracle happened. I got to thinking about streaming audio samples out so much that I figured it might actually just be easier and be less data to push through (why send data for 4 instruments to audio ram when you can send two tracks for less?). The problem now is that because the bandwidth going to audio ram is maxed out, there is no way to replace sound effects during a level or something, if I can even get them all in there to begin with.

Edit (before even releasing this for the first time, because I now realize this is even more of a stupid idea):

I might try to do something like vram slots where if the sample isn't being used, it can be overwritten, (some data will be reserved for variables and the code for running the audio, but the data for the composition of the song could be overwritten.) but this might be insane for multiple reasons, like if you're trying to stream more data than you're supposed to, with vram, you just bring a little bit of black down the screen for a frame, but you can't even do that here, you'll be missing data, and it won't be nothing, it will be the remnants of the sample under that slot unless you programmed it to where the SPC700 wouldn't make the DSP play there. You could also use code at the beginning of them that tells the SPC700 how they want to be played. What I mean by this is that although this: https://www.youtube.com/watch?v=aeXT5ma3_so is one sound effect, it is clearly a sample being looped multiple times, so you could have the self modifying code (because it's all in ram) that This whole scheme is probably crazy though because of the limited processing power of the SPC700 (mainly due to the fact that you're crippling it by streaming) but I really don't like programming for the worst case scenario, because the worst case scenario almost never happens in games unless you deliberately try and break them by doing something like luring all the enemies in a level into one area. This also eliminates the chance of streaming one long audio sample for a song, but I probably would have been screwed anyway. I don't even know what I'm going to do, but I have other things to worry about.
Re: Rate of streaming data to the SPC700?
by on (#177095)
Espozo wrote:
93143 wrote:
Did you use the -g option when encoding/decoding? It's the Gaussian anti-aliasing interpolation that causes muffling, not the BRR compression

No, I didn't use it. If all it does is just muffle the thing and (I assume) not lower the sample size like BRR does by itself (assuming again), then what's the point, unless this is specifically for samples less than 32KHz in that it tries to smoothen them out, but I didn't need to do that.

The S-DSP runs an anti-aliasing interpolator on decoded BRR data before playback. It does not recover the original audio when it's 32 kHz. In fact it always produces a muffling effect which is uniform when normalized to the input sample rate - for example, at 70% of the Nyquist folding frequency (that is, 35% of the sample rate), the attenuation is about 8 dB, meaning if you play your sample back at 32 kHz, the attenuation will reach 8 dB a little past 11 kHz in the output spectrum, and if you play it back at 8 kHz, you'll get 8 dB of attenuation a little below 3 kHz. (The interpolator also produces a slight brightening effect at 32 kHz in the output spectrum, but it's probably cancelled out by the analog filtering downstream of the DAC, so I wouldn't worry about it.) The fact that the muffling is basically constant in the input frequency domain means that you can counter it with a static treble boost baked into the sample, even if the sample is expected to be played back at lots of different pitches.

What the -g option does depends on whether you're encoding or decoding. When you're decoding, it simulates the effect of the S-DSP's interpolator so you can hear what your sample will sound like on an actual SNES (except that it assumes your sample is at 32 kHz, so it includes the secondary brightening effect). When you're encoding, though, what it does is boost the treble before converting the sample to BRR so as to somewhat counteract the muffling effect of the S-DSP's anti-aliasing.

Quote:
Isn't the Gaussian interpolator only used because of lower rate samples?

Yes. But the way it's designed, it affects 32 kHz playback too.

Quote:
I imagine that just because a sample is lower rate doesn't mean it's muffled, doesn't that just mean it's choppy or distorted?

You imagine wrong. Lower sample rate means you're chopping off high-frequency information, which muffles the sound. On a system without anti-aliasing, like the Mega Drive or Game Boy Advance, the output waveform is more jaggy, and this manifests as ugly high-frequency noise that can somewhat compensate for the lack of real treble information. But the SNES was specifically designed to eliminate that noise, so it just sounds muffled. Extra muffled if the input wasn't treble-boosted to compensate for the Gaussian anti-aliasing...

Also, some Mega Drive audio engines have trouble feeding the DAC at a consistent rate, so the audio gets choppy and distorted.

Here, have an example. Original is at 22 kHz. I converted it to 6.5 kHz to show you what that sounds like (through your PC's interpolator, which is generally pretty good). Then I truncated the 6.5 kHz version to 8-bit (which is a basically inaudible change in this case) and upped it to 26 kHz with sample-and-hold (the equivalent of nearest-neighbour in graphics) to show you roughly what it would sound like on the Mega Drive with a good, stable audio engine. Then I ran the 6.5 kHz version through BRRTools (without -g) and used my own implementation of the S-DSP's anti-aliasing interpolator to resample it to 32 kHz, to show you what it would sound like on the SNES. You will note that the SNES version is even more muffled than the baseline 6.5 kHz version; this is because I didn't use any sort of treble boost prefilter...

Attachment:
koryuken.7z [130.04 KiB]
Downloaded 122 times

Quote:
It could be my cheap laptop speakers too.

I think we've identified the problem.

...

What I figured I'd do with streaming was plan the song so that it never exceeds a certain streaming budget, which would be lower than the maximum available bandwidth so as to leave room for any sound effects that might need to be streamed at the same time. (This isn't as restrictive as it sounds; just about nothing actually needs to be at 32 kHz in terms of individual instruments, and if you're replacing whole samples that aren't playing at the moment, they don't even need to be streamed at their desired playback speed.) This somewhat limits the application of streaming to sound effects, since you have to make sure you can never have more of them playing than will fit in the reserved bandwidth, but (a) you might want to limit them anyway so as to minimize interruptions in the music channels, and (b) not every sound effect needs a long high-fidelity sample that won't fit in audio RAM alongside the music.

This would use far, far less memory than trying to fit whole songs in ROM as massive BRR streams, while still potentially sounding much better than the usual ARAM-bound fare. Especially coupled with advanced techniques like pitch modulation and KungFuFurby's loop switching trick...

But we're getting ahead of ourselves. The state of the art is nowhere near the bandwidth I'm talking about, not without a severe hit to S-CPU time. I need to get my method actually assembled and tested in context before bragging about how much it can do...
Re: Rate of streaming data to the SPC700?
by on (#177104)
Why was the sound quality in SFA so bad?, and why did it take so long to load?
Re: Rate of streaming data to the SPC700?
by on (#177106)
93143 wrote:
The S-DSP runs an anti-aliasing interpolator on decoded BRR data before playback. It does not recover the original audio when it's 32 kHz.

That's unfortunate... :( Could this be why they have the "32KHz, Uncompressed" demo, or does this also affect the echo buffer trick thing?

Well, anyway, here's a comparison of the same track from earlier, but encoded and or decoded:

Attachment:
(Hopefully) The Final Comparison.zip [2.32 MiB]
Downloaded 119 times

Honestly, it's still hard for me to tell the difference. If I had to pick between Decoded or Encoded And Decoded, I'd go with Encoded and Decoded though, because I think it might be marginally better.

I guess the uncompressed demo is still unjustified... :lol:

psycopathicteen wrote:
Why was the sound quality in SFA so bad?

I heard the music in SFA is pretty good: http://www.music.sfasu.edu/
Re: Rate of streaming data to the SPC700?
by on (#177137)
psycopathicteen wrote:
Why was the sound quality in SFA so bad?, and why did it take so long to load?

Because they jammed both fighters' vocals into ARAM alongside the music and SFX samples, and their loading code wasn't very good. That's why the game freezes every time the announcer says something. With the original 22 kHz samples, even BRR compressed, Dan Hibiki alone would be on the order of 300 KB.

For this game, if I were trying to do an audio hack with streaming (assuming my streaming engine pans out as hoped), I'd probably stream the vocals and store everything else in ARAM. Unique instrument sets for each track, with a smart loading engine to avoid freezing the game when switching tracks. I imagine the music would sound a lot better without the memory pressure from the vocals, even if nothing else changed, and the vocals would certainly sound a lot better (it would of course require a ROM expansion).

Reading old reviews, it seems the port wasn't well regarded for gameplay or animation quality either - everyone was damning it with faint praise, talking about how it was such a heroic effort on such a primitive system, how the SNES was really showing its age, etc. Perhaps a wholesale remake with a 64 Mbit ROM is in order...

...say - how does the S-DD1 hook up to the cartridge slot? Might it be possible to exceed the chip's addressable memory limit the same way as with the Super FX, by simply having ROM in parallel with it that the SNES knows about but the special chip doesn't?

Espozo wrote:
does this also affect the echo buffer trick thing?

No, I'm pretty sure it only gets applied to decoded BRR data. The echo buffer is known to always be uncompressed 32 kHz 16-bit stereo, so it would be a waste of circuitry to anti-alias it, besides which the muffling effect would compound during echo feedback...

Quote:
Honestly, it's still hard for me to tell the difference.

That could be the laptop speakers. I can hear the difference pretty clearly on headphones, but honestly it is fairly modest. The sample rate is high, and the material isn't especially ear-piercing, so it's not really a glaringly obvious difference.

How'd you like my examples? The actual SNES version was only 4 kHz or so, and was considerably quicker; it might have been a re-recording... EDIT: if there's a loud buzz when you play this sample, it's not Capcom's fault. Some (old?) media players seem to dislike it, and I have no idea why...
Re: Rate of streaming data to the SPC700?
by on (#177139)
93143 wrote:
...say - how does the S-DD1 hook up to the cartridge slot? Might it be possible to exceed the chip's addressable memory limit the same way as with the Super FX, by simply having ROM in parallel with it that the SNES knows about but the special chip doesn't?
Should always be able to do that (e.g. stuffing in 1MiB from $6000-$7FFF in banks $00-$3F and $80-$BF), but with only two games that used the S-DD1 we don't have much in the way of good documentation about it.

Here's what little I've found.
* http://problemkaputt.de/fullsnes.htm#sn ... ssionchips
* http://romlaboratory.dbwbp.com/romlab/sdd1.htm
It looks like siudym had a more useful pinout for it, but didn't bother drawing it publicly:
* http://romlaboratory.dbwbp.com/romlab/sdd1dev.htm

Byuu has a memory map somewhere in higan, but I only found the one for SFA2:
Code:
cartridge sha256:910a29f834199c63c22beddc749baba746da9922196a553255deade59f4fc127
  :board region=ntsc
  :  sdd1
  :    map address=00-3f,80-bf:4800-480f
  :    rom name=program.rom size=0x400000
  :      map address=00-3f,80-bf:8000-ffff
  :      map address=c0-ff:0000-ffff
  :
  :information
  :  serial:   SNS-AUZE-USA
  :  board:    SHVC-1N0N-01
  :  revision: 1.0
  :  name:     Street Fighter Alpha 2
  :  title:    Street Fighter Alpha 2
Re: Rate of streaming data to the SPC700?
by on (#177508)
I just noticed that byuu had quite recently rechecked the S-DD1's memory map.

Byuu observed that there seems to be four bits for banking, so 16 MiB is addressable, most likely as four 4MiB ROMs.

So I've also re-re-labelled siudym's pinout with a combination of known things and suspected things:
Attachment:
sdd1.png
sdd1.png [ 6.54 KiB | Viewed 3124 times ]


So, in summary: for an S-DD1 game WITHOUT cart RAM, the entire 3.9 MiB in banks $40-$7D are available for a S-CPU dedicated ROM, plus all the random things one can map in the normally open bus regions of banks $00-$3F and $80-$BF. For a game WITH cart RAM, the practical range is "just" banks $40-$5F or -$6F.
Re: Rate of streaming data to the SPC700?
by on (#177510)
lidnariq wrote:
for an S-DD1 game WITHOUT cart RAM, the entire 3.9 MiB in banks $40-$7D are available for a S-CPU dedicated ROM

Isn't that not any better than Mode 25? I don't know how much the now non open bus stuff will be. Isn't that just any part of the lower $8000 of those banks that isn't being used by hardware registers?

lidnariq wrote:
so 16 MiB is addressable

So, wait, this is extra, right? Even if the SNES can only have access to the data through DMA or HDMA (?) transfers, tile data and sound samples are the only things big enough you'd ever need this for and that's all you need to do to them 99.9% of the time. Now we're talking! :)

Yeah, so if I am correct, that's you have the SNES's original 8MB, plus the 16MB from the SDD1 for an SNES game that (if I'm not mistaken,) is the largest SNES game by about a factor of 4x (wasn't the largest, aside from Super Road Blaster of course, 6MB and used the SDD1?). There's also the decompression this chip can do to effectively give you even more space. I'm not sure how effective it is though, but I heard it can decompress graphics at the speed of DMA. Is there actually a way to turn off the decompression though, because I imagine all you would do is end up making a BRR sample bigger if you tried to compress it though.

Edit: Actually, it appears this is still kind of lame for streaming audio, if BRR compressed audio is 2MB per minute...
Re: Rate of streaming data to the SPC700?
by on (#177512)
Espozo wrote:
Isn't that not any better than Mode 25? I don't know how much the now non open bus stuff will be. Isn't that just any part of the lower $8000 of those banks that isn't being used by hardware registers?

So, wait, this is extra, right? Even if the SNES can only have access to the data through DMA or HDMA (?) transfers, tile data and sound samples are the only things big enough you'd ever need this for and that's all you need to do to them 99.9% of the time. Now we're talking! :)
S-DD1:
Banks 0-$3F and $80-$BF: first 2 MiB of first ROM behind S-DD1 (unless you use the bizarre disable that Byuu discovered, which limits it to just the first 1 MiB)
Banks $C0-$FF: four 1 MiB swappable banks, any 1 MiB of 16 MiB available. Can be DMAed from to use the built-in streaming decompression of tile data.
Banks $40-$6F and $74-$7D: could be available for an extra ROM on the cart that could only be used by the S-CPU, not available to the S-DD1's bankswitching or decompression.
Banks $70-$73: mapped to RAM by the S-DD1 if RAM is available.

Quote:
Yeah, so if I am correct, that's you have the SNES's original 8MB
No, the Mode 25h layout overlaps the S-DD1 layout in a not-useful way. The simplest combination only makes 19.9 MiB available, of which 14 MiB has to go via a bankswitching mechanism.

This is more than the simple “psychotic granularity” map with its 14.8 MiB, but not tremendously so. And because so much of the "useful" space address is already used by the S-DD1, you can't usefully combine the “psychotic granularity” map with the S-DD1 to get enough more data (22.8 MiB).

In practice, there is no "authentic contemporary" way to do better. The MSU1 is about as good as you could ask for, when using modern tools.

Quote:
I'm not sure how effective it is though, but I heard it can decompress graphics at the speed of DMA.
It seems to be good enough to compress the data for Star Ocean by approximately 50%.

Quote:
it appears this is still kind of lame for streaming audio, if BRR compressed audio is 2MB per minute...
Do keep in mind that when 128 kbit/sec MP3 was the new hot thing, that was 1 MB per minute... streaming prerecorded soundtracks wasn't an option until you could have lots of storage.
Re: Rate of streaming data to the SPC700?
by on (#177525)
lidnariq wrote:
The MSU1 is about as good as you could ask

Yeah, I was mostly reluctant to use it because only two things support it, (although the accuracy of some emulators seems questionable that I'm not sure they'd run my game anyway) and I'm not a fan of how it completely bypasses the audio hardware, although I could just not use that feature and maybe advertise it as such... It would also keep the file size down, but with the MSU1, that doesn't seem to be an issue... :lol:

I did find out though, that a non "GIGA POWER" Neo Geo game is at most 330 mega bits, so divide that by 8 to get 41.25 MB, then half that, assuming it will compress as well as Star Ocean, and you get 20.625 MB out of 22.8 MB. Because I think it's pretty safe to say that we'll never be able to draw enough artwork to fill a Neo Geo game, (unless, of course, you're trying to port a Neo Geo game) this is more than enough for if you're not streaming audio.
Re: Rate of streaming data to the SPC700?
by on (#177551)
lidnariq wrote:
Byuu observed that there seems to be four bits for banking, so 16 MiB is addressable, most likely as four 4MiB ROMs.
Quote:
for an S-DD1 game WITHOUT cart RAM, the entire 3.9 MiB in banks $40-$7D are available for a S-CPU dedicated ROM, plus all the random things one can map in the normally open bus regions of banks $00-$3F and $80-$BF. For a game WITH cart RAM, the practical range is "just" banks $40-$5F or -$6F.

Sweet. That sounds like "no effective limit", at least in the context of Street Fighter Alpha 2: SFC vs. Capcom...

The arcade version seems to have 4 MB of audio data and 20 MB of graphics. It also has 3 MB of CPU code and 256 KB of APU code, but I can't imagine it actually needed that much...

Assuming the QSound samples are uncompressed 8-bit PCM, that's a little over 2 MB of BRR, which could easily fit in the CPU-only area regardless of the presence or absence of save RAM. I suspect the game code could fit in there too. That leaves 16 MB for graphics, and 20 MB of CPS2 graphics rescaled to SNES resolution should be well under 16 MB before S-DD1 compression... The CPU-only ROM doesn't look especially necessary in this case, unless one were to run into problems trying to make all resources available simultaneously...

I'd say about 9-10 MB should cover everything nicely, depending on how well the graphics compress. By the time this game came out, the Nintendo 64 was already on the market with an 8 MB standard cartridge size, so perhaps it could even have managed a halfway credible retail price...
Re: Rate of streaming data to the SPC700?
by on (#177603)
About audio, how do MP3s and OGGs keep everything compressed at certain rate, like 128kbps?
Re: Rate of streaming data to the SPC700?
by on (#177606)
They're both lossy.
Re: Rate of streaming data to the SPC700?
by on (#177607)
There are several methods for controlling how much loss is applied over the course of an audio or video stream.

  1. "Average bitrate" uses two-pass encoding. First it reads all the audio once to determine how acoustically complex each frame (split second of audio) is in order to calculate how much audible distortion would occur at various bitrates. Then it uses this information to estimate the overall quality level to apply over a piece of music to keep total data divided by time equal to, say, 128 kbps. This doesn't keep the stream at a rock-solid 128 kbps and is thus not ideal for streaming over a channel with a hard maximum rate, but it's great for downloads or physical media if you have enough rate headroom that overall capacity is the limiting factor.
  2. "Constant bitrate" dials the distortion up and down based on the acoustic complexity of each frame so that each uses exactly the same amount of data. Simpler codecs such as ADPCM (IMA, BRR, VAG, etc.) do this implicitly.
  3. "Bit reservoir" is a method used in MP3 to allow the data for a less complex frame to include part of the data used in later, more complex frames. This method can be thought of as short-term ABR, allowing some of the consistency of average bitrate over a channel with a given peak rate.
Re: Rate of streaming data to the SPC700?
by on (#177608)
Because they don't operate in the time domain.

Unlike BRR or its close relative ADPCM, which take a sample in, apply some simple filters, and emit a sample out, everything newer than MP2 operates in the frequency domain. They usually use the MDCT, and then use simple rules to figure out what data they can throw away (the "lossy" stage) before using some lossless compression algorithm.


JPEG is substantially similar, and someone's written a handy guide of how its compression works: (1), (2).



'Constant bitrate' MP3 actually isn't. Each MP3 frame contains what's called a "bit reservoir" which allows it to reserve some of the bits in the current frame of audio for the next frame. So it's not as good as a "real" VBR encoding, but it allows for a guaranteed bandwidth allocation instead.

Vorbis (and opus) are true VBR codecs and have no bit reservoir. There, the only way to get CBR-like effect is to adjust quality on a frame-by-frame basis, which isn't really all that great, quality-wise.
Re: Rate of streaming data to the SPC700?
by on (#177610)
lidnariq wrote:
'Constant bitrate' MP3 actually isn't. Each MP3 frame contains what's called a "bit reservoir" which allows it to reserve some of the bits in the current frame of audio for the next frame. So it's not as good as a "real" VBR encoding, but it allows for a guaranteed bandwidth allocation instead.

Vorbis (and opus) are true VBR codecs and have no bit reservoir. There, the only way to get CBR-like effect is to adjust quality on a frame-by-frame basis, which isn't really all that great, quality-wise.

As I understand it, the only reason that Vorbis and Opus don't use short-term ABR (which MP3 calls "bit reservoir") is that Xiph is waiting for MP3 patents to expire.
Re: Rate of streaming data to the SPC700?
by on (#177636)
I still wonder how much the SNES can compress graphics on it's own. I'll write code for a Huffman decoder right here.

Code:
huffman:
rep #$20
sep #$20
ldx tree
-;
lda (source)
sta byte_buffer
inc source
ldy #$0008
-;
lda $0000,x
beq found_byte
lsr byte_buffer
bcs +
inx #3
dey
bne -
bra --
+;
lda $0002,x
xba
lda $0001,x
tax
dey
bne -
bra --

found_byte:
lda $0001,x
sta (destination)
inc destination
ldx tree
dec legnth
bne -
rts
Re: Rate of streaming data to the SPC700?
by on (#177641)
Would Huffman-style rANS be any faster than regular Huffman on the SNES? Or, coming at it the other way, can the SNES handle multiplication fast enough to decode full rANS in a reasonable amount of time?
Re: Rate of streaming data to the SPC700?
by on (#177647)
I did some math, and found out the Huffman code would only be able to do about 200 compressed bytes per frame. Not quite enough.

Although I did think up of a fast way to do pixel-wise RLE. Have the runs of pixels vertically instead of horizontally. Instead of drawing the entire vertical runs of pixels, just draw points at the vertical edges with the XOR value of the two surrounding colors. Then do XOR filling to fill in between the lines.

Edit:
It would still probably kinda slow. Maybe an "xor 8x1 slivers and remove blank bytes" would do the trick.
Re: Rate of streaming data to the SPC700?
by on (#177785)
I just did some experimenting with music samples, and I found out it's possible to make even 8khz samples sound good in Audacity. Go into the "equalizer" and make a slope going up 12db from 2khz-3khz, and up to 30db at 4khz.

It's funny how much unnecessary muffling most programs add to samples when they're downsampling, just to prevent antialiasing artifacts.
Re: Rate of streaming data to the SPC700?
by on (#177793)
The term "unnecessary muffling" makes me think you have a lot to learn about signal processing. Have you watched the Digital Show & Tell video on Xiph.org Video and read about the Nyquist theorem?

Now that Pocket Heaven is dead, I'll make this the official site for a demo that I call "Fake Highs Off U". When someone asked me why the music in Luminesweeper sounds better than 18 kHz music sounds through GSM Player, I took a short sample of "The legend of MAX" from Dance Dance Revolution Extreme to demonstrate interpolation techniques.

LOM_lerping.ogg (23 seconds)

I think* this is what I did for that demo:
  1. Original, with energy up to 16 kHz
  2. Lowpass to 9 kHz to simulate downsampling to 18157 kHz, a common sample rate for GBA audio and close to the Neo Geo sample rate
  3. Nearest neighbor interpolation of the 18 kHz signal to 36 kHz
  4. Linear interpolation of the 18 kHz signal to 36 kHz


* Wayback Machine by Archive.org is useless. Whenever a domain expires and ends up sold, the new owners can hide the archive from the user using /robots.txt.
Re: Rate of streaming data to the SPC700?
by on (#177811)
I meant extra muffling as in whatever type of antialiasing filter Audacity uses to resample music, also mutes a large portion of sub-Nyquist frequencies.
Re: Rate of streaming data to the SPC700?
by on (#177829)
Not as much as you're adding back: http://src.infinitewave.ca/

I suspect your method is making the samples sound better by boosting what high frequencies are left so as to somewhat compensate for the loss of everything past the Nyquist. I wouldn't recommend using a single set of filter settings for this in the general case; you could get some really nasty results if you don't twiddle the knobs first to see how the material responds.

I suppose one could design a program that used power spectrum analysis and psychoacoustics to try to figure out how to add back the high-frequency energy lost to the bandlimiting procedure in a form that sounds better than simple Nyquist folding. But that sounds complicated...

...

Oh, hey - SoX is apparently free. How have I never heard of it before? Just look at those graphs...
Re: Rate of streaming data to the SPC700?
by on (#178195)
Since we're talking about sound quality, I felt like analysing what types of harmonic content can be created using different loop lengths.

perfect 5ths require 2 wavelegnths
major 4ths require 3 wavelengths
major 3rds require 4 wavelengths
major 2nds require 8 wavelengths
15 cents require 128 wavelengths

Anything between a major 2nd and 15 cents can make your ears bleed.
Re: Rate of streaming data to the SPC700?
by on (#178197)
psycopathicteen wrote:
Since we're talking about sound quality, I felt like analysing what types of harmonic content can be created using different loop lengths.

perfect 5ths require 2 wavelegnths
major 4ths require 3 wavelengths
major 3rds require 4 wavelengths
major 2nds require 8 wavelengths
15 cents require 128 wavelengths

Anything between a major 2nd and 15 cents can make your ears bleed.

To be more accurate, there's two numbers of wavelengths in any harmonic ratio like this. E.g. you need 3 wavelengths of the higher pitch in a perfect fifth in the space of 2 wavelengths of the lower note. There are a lot of useful harmonic ratios:
  • major second = 9:8
  • minor third = 6:5
  • major third = 5:4
  • perfect fourth = 4:3
  • perfect fifth = 3:2
  • minor sixth = 8:5
  • major sixth = 5:3
  • minor seventh = 7:4
  • octave = 2:1

These are just intonation intervals, however. The fifth, fourth, and second will sound normal, but just thirds will be slightly out of tune with music using the common equal tempered scale. The minor seventh will be very out of tune compared to equal temperament. (They will, however, be perfectly in tune with each other. These are pure harmonic ratios.)

There are other useful harmonic ratios too, but they can be looked up if you're interested.
Re: Rate of streaming data to the SPC700?
by on (#178268)
Questions re: the SPC700:

1) What is the relationship between addw, carry, and half carry? Specifically, can I simply use addw without worrying about the flag values? From what I can tell, it sets them but doesn't use them...

2) It seems that the observed variance in SPC700 clock timing is between about zero and about +0.3%. Is this true of both NTSC and PAL? What sort of safety margin would the experts recommend?
Re: Rate of streaming data to the SPC700?
by on (#178269)
As far as I know the SPC700 always uses a 24576 kHz ceramic resonator. It should have standard ceramic resonator problems (Precision of 5‰; jitter of 100ppm; drift of 1‰).

I thought I had heard someone say that one of the later variants of the SNES used a ÷7 from the master clock, but I definitely saw a ceramic resonator next to the APU for everything I could find PCB photos of.
Re: Rate of streaming data to the SPC700?
by on (#178279)
lidnariq wrote:
Precision of 5‰; jitter of 100ppm; drift of 1‰

Ouch. Is that a standard deviation or a guaranteed performance spec?

Does anyone have better numbers for this oscillator specifically? Or is it safest to just assume it's a cheap generic part?

Quote:
I thought I had heard someone say that one of the later variants of the SNES used a ÷7 from the master clock

I think ZSNES does that, or used to. In any case that's only -1.2‰ or so, which isn't an issue...

...

Anyone know for sure what the deal is with addw? It would be great if I could just use it without worrying about the flags...
Re: Rate of streaming data to the SPC700?
by on (#178284)
lidnariq wrote:
As far as I know the SPC700 always uses a 24576 kHz ceramic resonator. It should have standard ceramic resonator problems (Precision of 5‰; jitter of 100ppm; drift of 1‰).

And if anyone else is unfamiliar with the ‰ symbol, it means "parts per thousand": 0.5% precision, 0.01% jitter, 0.1% drift. That's why a (more expensive) crystal is used instead when oscillators need to stay in phase for a couple hundred cycles, such as the QAM encoder and decoder in NTSC and PAL.

Quote:
I thought I had heard someone say that one of the later variants of the SNES used a ÷7 from the master clock

Dunno if the 1CHIP or Mini actually uses that simplification, but I've long suggested that a clone could use it and remain within spec. And 93143 is right that 135/44 MHz (master÷7) is only 0.12% off from the expected 3.072 MHz.
Re: Rate of streaming data to the SPC700?
by on (#178286)
93143 wrote:
1) What is the relationship between addw, carry, and half carry? Specifically, can I simply use addw without worrying about the flag values? From what I can tell, it sets them but doesn't use them...


The addw instruction adds the low bytes without carry, then adds the high bytes with carry (in other words, you don't need to clrc first). The half carry flag might be set as a result, but doesn't affect the value.

(In other words, yes, you can use it without worrying about H/C)
Re: Rate of streaming data to the SPC700?
by on (#178297)
93143 wrote:
lidnariq wrote:
Precision of 5‰; jitter of 100ppm; drift of 1‰
Ouch. Is that a standard deviation or a guaranteed performance spec?
The latter. But's not as much consolation as you'd hope, I fear.
More reading from muRata:
https://www.westfloridacomponents.com/m ... 00DETF.pdf ( page 6 contains drift characteristics, but not aging characteristics )
http://www.murata.com/~/media/webrenewa ... /p17e.ashx ( page 18 contains aging characteristics )
From Mobius:
http://web.eecs.umich.edu/~mmccorq/semi ... leUT07.pdf ( pages 32-35 show strong temperature relation )

Quote:
Does anyone have better numbers for this oscillator specifically? Or is it safest to just assume it's a cheap generic part?
muRata's only really been able to get initial as-vended precision to less than 1‰ in the past 15 years, so I think it's safe to assume that a "high precision ceramic resonator" would snarkily be called a quartz crystal.
Re: Rate of streaming data to the SPC700?
by on (#178314)
Revenant wrote:
The addw instruction adds the low bytes without carry, then adds the high bytes with carry (in other words, you don't need to clrc first). The half carry flag might be set as a result, but doesn't affect the value.

Awesome. That's what I figured from the docs, and from looking at higan's source code, but docs can be misleading and I don't trust myself to read someone else's C++, especially when it's so different from my own coding style...

Saves me two cycles on the tight end of the timing window. Or a lot more if you count not having to use clrv, which my code didn't do anyway...

lidnariq wrote:
93143 wrote:
Is that a standard deviation or a guaranteed performance spec?
The latter. But's not as much consolation as you'd hope, I fear.
More reading from muRata:
https://www.westfloridacomponents.com/m ... 00DETF.pdf ( page 6 contains drift characteristics, but not aging characteristics )
http://www.murata.com/~/media/webrenewa ... /p17e.ashx ( page 18 contains aging characteristics )
From Mobius:
http://web.eecs.umich.edu/~mmccorq/semi ... leUT07.pdf ( pages 32-35 show strong temperature relation )

Okay. So it's probably unwise to assume it never goes past ±6‰.

I think I can deal with that. It looks like most of the variation is manufacturing and history, and it should be possible to detect that at boot (or perhaps after a short warm-up period). I've already got two versions of the pickup loop to deal with the difference between NTSC and PAL scanline lengths, and I can do quarter-cycles as easily as half-cycles. As long as thermal drift by itself doesn't exceed 4‰ or so after the timing check, I should be fine, unless I've screwed up the cycle count...

If someone's been storing their SNES in the freezer, they shouldn't be surprised at a few audio glitches... alternately, I could retest every now and then and reupload the timing code if necessary - pretty sure I need regular sync checks anyway to schedule audio data, and if the streaming routine were double-buffered it could update itself...
Re: Rate of streaming data to the SPC700?
by on (#178343)
This probably feels more at home here than in the froyo thread...

The code below is supposed to be an APU-side HDMA streaming routine; it starts right after a "prepare to receive data" command is detected and processed, and ends right before jumping back to the main audio routine. Setup and communications protocols for managing streaming data requests, timing checks, etc. are out of scope for now...

As with the previous two versions, the HDMA streaming table format is as follows:
- one line containing a "prepare to receive data" command
- enough empty lines for the audio engine to notice the command and start listening for "data start" regardless of what it was doing when "prepare to receive data" was sent
- one line containing a "data start" command in byte 0, the stream ID number in byte 1, and the transfer size in scanlines in byte 2.
- the number of lines indicated by byte 2 above, containing data in all four bytes
- one line containing a "no instruction" command in at least one and possibly all four bytes
- maybe a couple of empty lines, to allow the audio engine to see the "no instruction" and go do something else
- back to top, if another transfer is desired - it should be easy to pack multiple transfers into a single indirect-addressed HDMA table
Alternately, the "no instruction" line can be replaced with a "prepare to receive data" line if back-to-back transfers are desired.

This time, I've dispensed with a layer of indirection. Instead of using the stream ID number to look up the direct-page address of the buffer pointer, I have decided that there is no reason the stream ID number can't be the direct-page address of the buffer pointer.

The result is that this version supports an effectively unlimited number of streaming buffers, without requiring the main streaming loop to be in direct page. It also dispenses with the stringent data format requirements of the second version; all that's necessary for this version is that (a) each transfer be less than 256 bytes, since X is used to increment the buffer position during the main pickup loop, and (b) the end of the buffer coincide with the end of a transfer - there's just not enough time during the pickup loop to detect and handle end-of-buffer in the middle of a transfer, so it's assumed to not happen.

As usual, I haven't tested this, so there may be errors. Does the basic concept look okay?

Code:
; BUFFER METADATA STRUCTURE (WIP):
; byte 0-1:  current buffer write position
; byte 2-3:  buffer start address
; byte 4-5:  buffer end address
; This data is stored in zero page.  More complex bookkeeping can be done elsewhere,
; so as not to waste space.

; HDMA STREAMING CODE:
data_incoming_HDMA:
   mov A, #data_start_HDMA      ; 2 cycles - load data start code value
-  cbne $F4, -                  ; 7 cycles - listen for the write
; TOTAL: roughly 3-9 cycles since data start code written to $2140

; Pick up transfer parameters written to $2141 and $2142:
   mov X, $F5                   ; 3 cycles - load direct-page address of buffer pointer
   mov temp, $F6                ; 5 cycles - load transfer size in scanlines
; TOTALS:  8 cycles since start code noticed in $F4, 11-17 cycles since $2140 written

; Now rewrite the pickup loop to target the current buffer position:
   mov A, (X)                   ; 3 cycles - get low byte of buffer address
   mov Y, $01+X                 ; 4 cycles - get high byte of buffer address
   mov !(get_data_HDMA+3), A    ; 5 cycles - write buffer address low byte
   mov !(get_data_HDMA+4), Y    ; 5 cycles - write buffer address high byte
   addw YA, one                 ; 5 cycles - "one" is a constant in zero page
   mov !(get_data_HDMA+8), A    ; 5 cycles
   mov !(get_data_HDMA+9), Y    ; 5 cycles
   addw YA, one                 ; 5 cycles
   mov !(get_data_HDMA+13), A   ; 5 cycles
   mov !(get_data_HDMA+14), Y   ; 5 cycles
   addw YA, one                 ; 5 cycles
   mov !(get_data_HDMA+18), A   ; 5 cycles
   mov !(get_data_HDMA+19), Y   ; 5 cycles
; TOTALS:  62 cycles since parameters loaded, 73-79 since start code written to $2140

; Set Y to the number of scanlines in the current transfer and set X to zero:
   mov Y, temp                  ; 3 cycles - pick up transfer size in scanlines
   mov temp, X                  ; 4 cycles - store buffer pointer address
   mov X, #$00                  ; 2 cycles - set X to zero
; TOTALS:  9 cycles since loop rewritten, 82-88 since start code written to $2140

; Ideally, one NTSC scanline should be almost exactly 65 cycles long.  The port reads
; are between cycles 3 and 30 past this point, putting them between ~18 cycles after
; the first HDMA write and about 13 cycles before the fourth one on the next line.
; That leaves room for about -6‰ to +9‰ worth of clock inaccuracy over 32 lines.
; Putting the buffer metadata in page one adds 4 cycles to the preceding code, which
; reduces the margin to about -4‰ on the slow side.

; STREAMING LOOP:
get_data_HDMA:
   mov A, $F4                   ; 3 cycles - get byte 0 of the data shot
   mov !buf+X, A                ; 6 cycles - write it to the current buffer position
   mov A, $F5                   ; 3 cycles - get byte 1
   mov !(buf+1)+X, A            ; 6 cycles - write it to the current buffer position plus one
   mov A, $F6                   ; 3 cycles - get byte 2
   mov !(buf+2)+X, A            ; 6 cycles
   mov A, $F7                   ; 3 cycles - get byte 3
   mov !(buf+3)+X, A            ; 6 cycles
   inc X                        ; 2 cycles - increment the current buffer position four times
   inc X                        ; 2 cycles
   inc X                        ; 2 cycles
   inc X                        ; 2 cycles
   [                                           ]
   [      INSERT TIME DELAY BLOCK HERE         ]
   [                                           ]
   dbnz Y, get_data_HDMA        ; 6/4 cycles - repeat for next scanline, or exit if done
; ITERATION TIME:  (50+delay) cycles if branch taken
; END STREAMING LOOP

; Allow dead space after this instruction, so the delay section can be hot-swapped
; without requiring uniform code size:
   jmp !data_finished_HDMA      ; 3 cycles

; Add X to the current buffer position and handle page rollover and end-of-buffer:
data_finished_HDMA:
   mov A, X                     ; 2 cycles - load the buffer address index into A
   mov X, temp                  ; 3 cycles - pick up the buffer pointer address
   clrc                         ; 2 cycles - clear carry
   adc A, (X)                   ; 3 cycles - add the index to the low byte of the buffer pointer
   mov (X), A                   ; 4 cycles - store the result back
   mov A, Y                     ; 2 cycles - Y should be zero after the main pickup loop
   adc A, $01+X                 ; 4 cycles - add zero to the high byte of the buffer pointer, with carry
   mov $01+X, A                 ; 5 cycles - store the result back
   cbne $05+X, done_HDMA        ; 8/6 cycles - check high byte against buffer end address
   mov A, (X)                   ; 0/3 cycles - pick up low byte
   cbne $04+X, done_HDMA        ; 0/8/6 cycles check low byte against buffer end address
   mov A, $02+X                 ; 0/0/4 cycles - if end of buffer reached, load buffer start address low byte
   mov (X)+, A                  ; 0/0/4 cycles - store to buffer pointer low byte and increment X
   mov A, $02+X                 ; 0/0/4 cycles - load buffer start address high byte
   mov (X), A                   ; 0/0/4 cycles - store to buffer pointer high byte
done_HDMA:
Code:
; TIME DELAY IN DATA PICKUP LOOP (proof of concept)
; very short - 14.5 cycles (APU oscillator -8.3‰ vs. NTSC nominal, or -17.6‰ vs. PAL nominal):
   mov A, Y                     ; 2 cycles
   and #$01                     ; 2 cycles - get bottom bit of scanline counter
   beq +                        ; 4/2 cycles (avg. 3)
   cmp A, (X)                   ; 0/3 cycles (avg. 1.5)
+  mov A, [$01+X]               ; 6 cycles

; short - 14.75 cycles (-4.4/-13.6):
   mov A, Y                     ; 2 cycles
   and #$03                     ; 2 cycles - get bottom two bits of scanline counter
   beq +                        ; 4/2 cycles (avg. 2.5)
   cmp A, (X)                   ; 0/3 cycles (avg. 2.25)
+  mov A, [$01+X]               ; 6 cycles

; NTSC - 15 cycles (-0.5/-9.7):
   cmp (X), (Y)                 ; 5 cycles
   cmp (X), (Y)                 ; 5 cycles
   cmp (X), (Y)                 ; 5 cycles

; intermediate - 15.25 cycles (+3.3/-5.9):
   mov A, Y                     ; 2 cycles
   and #$03                     ; 2 cycles
   bne +                        ; 4/2 cycles (avg. 3.5)
   cmp A, (X)                   ; 0/3 cycles (avg. 0.75)
+  nop                          ; 2 cycles
   cmp (X), (Y)                 ; 5 cycles

; PAL - 15.5 cycles (+7.2/-2.0):
   mov A, Y                     ; 2 cycles
   and #$01                     ; 2 cycles
   beq +                        ; 4/2 cycles (avg. 3)
   cmp A, (X)                   ; 0/3 cycles (avg. 1.5)
+  nop                          ; 2 cycles
   cmp (X), (Y)                 ; 5 cycles

; long - 15.75 cycles (+11.0/+1.8):
   mov A, Y                     ; 2 cycles
   and #$03                     ; 2 cycles
   beq +                        ; 4/2 cycles (avg. 2.5)
   cmp A, (X)                   ; 0/3 cycles (avg. 2.25)
+  nop                          ; 2 cycles
   cmp (X), (Y)                 ; 5 cycles

; very long - 16 cycles (+14.9/+5.6):
   cmp (X), (Y)                 ; 5 cycles
   cmp (X), (Y)                 ; 5 cycles
   mov A, [$01+X]               ; 6 cycles