Practical audio streaming while limiting kbps and CPU usage

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Practical audio streaming while limiting kbps and CPU usage
by on (#89062)
Its a fact audio streaming is possible on the SNES, blargg proved it.
However I wonder if there is a way to limit the size of the data (kbps) and CPU usage.

I'll explain :
In Blarg's demo, he writes at the rate of 32kHz (the output rate of the SPC) data in the echo buffer. This works well but he used raw uncompressed data which is not acceptable for a system such as the SNES where the memory is limited.
Uncompressed mono data at 32kHz is 512kbps, so a one minute song will take about 3MB, the size of a big game like Final Fantasy VI, which is not acceptable.
Also this almost monopolizes CPU usage, similarly to using $4011 with the NES.

The first idea is to use SNES' native BRR format. It will compress data to a ratio of 9/32, to about 144kbps. That way a one-minute song will take about 850kb, wich is more acceptable.

This can be done in having a huge sample that takes a significant part of the memory in the SNES, and you update the first half when the second half is playing and vice versa (double buffering).
The problem is to sync the updates in the BRR sample between the CPU and the SPC. If you do it in open loop (with carefully timed code) chances are that it will depend on NTSC/PAL settings, and maybe even will not work so well with all SNES as there is two different crystals for the SPC and the CPU/PPU (I think).
So you'll need some way of keeping track of where the replaying is to tell the CPU what to update when. If this is possible, then it'll be possible to do streaming at a more reasonable bitrate without monopolizing the CPU.

The best option would be to use low bitrate OGG/Vorbis encoding, which can go as low as 45kbps with acceptable loss of quality, a one minute song would take only about 260kb then !
The problem is that of course the SPC can't decode this format natively so it'll be up to either the SPC and/or the CPU to handle the decoding and use the echo buffer for replaying. Then I wonder if the computing power of both is sufficient for vorbis decoding.

The best would be the CPU sending compressed data to the SPC, which would handle itself the decoding on the fly and paste it in its echo buffer. However, the SPC is clocked at only 1.024 MHz, while the CPU can reach about 3Mhz. So if this SPC can't decode vorbis, it'll up to the CPU to do it and then it'll monopolize it of course.

by on (#89063)
You can forget OGG Vorbis on SNES. Chilly Willy has made a decoder for 32X and it struggles on it, and 32X has much more CPU power than several SNES together. But OGG quality is quite acceptable at such low bit rates.

Double buffering of BRR data should not be so difficult, and you can upload it to the SPC faster than it can play it, so you can have chunks of CPU time between uploads to do other things.

by on (#89064)
Quote:
However, the SPC is clocked at only 1.024 MHz, while the CPU can reach about 3Mhz. So if this SPC can't decode vorbis, it'll up to the CPU to do it and then it'll monopolize it of course.


You'll only get full speed (3.55/3.57 Mhz) when accessing certain regions of memory. WRAM is not included among those, so you'll be limited to 2.68 Mhz when accessing WRAM.

by on (#89065)
If vorbis decoding is too CPU intensive, then is there some other decoding that could achieve better performance than plain BRR and that could be decoded in real time by the 65816 or the SPC700 (preferably by the SPC700, so that less data has to be transfered and the 65816 is free for gameplay) ?

Personally I think vorbis is quite perfect as I've never heard any losses even at q-1. However with MP3 you can hear occasion loss at around 160kbps. You hear them the most on music with saw-wave-ish melodies.

by on (#89068)
You've got a peak MIPS count of ~1.8 on the S-CPU. I haven't heard of any vorbis/mp3 decoder implementations that come even close to meeting that sort of performance constraint, even on far more capable processors.

by on (#89073)
I still think Moero Pro Yakyuu (Japanese Bases Loaded) had the right idea: audio decompression hardware on the cartridge board. Are there any MP3 player chipsets that can be controlled with SPI or I2C?

by on (#89077)
You can probably use HDMA to send data to the SPC700, to save on CPU power.

by on (#89078)
Or use zero CPU power and use the analog inputs on the cart edge. Unlike the NES, the Super NES shares the Super Famicom's pinout, and Super Game Boy depends on this.

by on (#89079)
tepples wrote:
Or use zero CPU power and use the analog inputs on the cart edge. Unlike the NES, the Super NES shares the Super Famicom's pinout, and Super Game Boy depends on this.


That's sounds like a good way to bullshit friends.

by on (#89083)
Tepples, I was talking about something that would work with the power pak and if possible emulators (at least BSNES) of course.

Is it really possible to use HDMA to transfer data to the SPC ? Even if this is possible I bet the bitrate would be very low (something like 200 bytes per frame or something in the like), but yeah it'd use few CPU power.

by on (#89085)
It's enough for BRR compressed 22kHz mono.

by on (#89103)
Bregalad wrote:
Tepples, I was talking about something that would work with the power pak and if possible emulators (at least BSNES) of course.

Is it really possible to use HDMA to transfer data to the SPC ? Even if this is possible I bet the bitrate would be very low (something like 200 bytes per frame or something in the like), but yeah it'd use few CPU power.


I did this back in 2008 with N-Warp Daisakusen, so yeah, it is possible.
Makes minimal use of the main CPU and if you do it right, it will work on PAL and NTSC consoles all the same.
I wouldn't recommend using my audio player directly, but you can at least get an idea of how this is achieved by looking at my sourcecode.

Inferior emulators such as Zsnes and Snes9x will have trouble with this kind of timing-critical hardware usage, though.

by on (#89143)
It seems like you can combined the streamed audio with the music... at least that's what I think. N-Warp Daisakusen doesn't quite combine them, but there is a piece of SFX still playing at the end of a round when the streaming sample starts up.

by on (#89170)
d4s wrote:
Inferior emulators such as Zsnes and Snes9x will have trouble with this kind of timing-critical hardware usage, though.


This reply made me think of Tales of Phantasia .... Oh it did ...

by on (#89313)
Tales of Phantasia only streams small BRRs. If you notice during the intro song, there is interruption during the singing all the times. That's because every phrase of signing is a different BRR.

I'm pretty sure the timing is based on the CPU side, beacuse the intro song don't work well on my PAL console. The game even sometimes crashes during the intro. Because the CPU and SPC use different crystal oscillators, there is no way to have them completely in sync without some kind of synchronization during the communication.

However I think there is a way to make BRR streaming work fine on both NTSC and PAL console without changing anything. Since the SPC timers increase at a frequency of 8 kHz, they update exactly once every 4 output samples.

Therefore if you run your engine in a typical way, that is when the timer has reached some N value, you know the DSP had output exacly 4*N samples.

So based on this you can send signals to the CPU when it needs to send more data. Of couse since the CPU has something else to do you'll have to wait it's available, so the "sample" should be long enough to compensate for this.

For example, say you want to stream in a BRR sample by blocks of 720 bytes (= 80 BRR blocks = 1280 samples).
Then you'll have to reserve memory for twice that size, that is 1440 bytes, in a special BRR sample reserved for this. (which is 160 blocks long, and loops back to block 0). The CPU should have sent the initial block before to start playing, too.

The duration of a block is 1280/32000 = 40ms
The SPC needs to watch one of its timer, and he knowns, that after having keyed on the channel, once the timer should increment exactly 1280/4 = 320 times before the SPC needs the CPU to send new data.
The timer is only 8 bit but this part of the timing can be done in software from smaller timer increments.
The CPU doesn't have to respond immediately as there is still ~2 PAL frames before it would get critical, but the sooner is the better. Since the SPC supposedly have it's loop going way faster than the CPU, the CPU will then have to wait for the SPC to accept data and the actual transfer can be done.

by on (#89330)
When I mentioned ToP I was thinking about how it "bloops" "burps" and "clicks" a lot on these small BRR samples during the music playback. And that seems to come from inaccuracies on the CPU emulation.

by on (#89335)
I don't know what it's supposed to sound like since I don't have an NTSC console, but here's what ToP sounds like on my PAL SNES, where it has clear artefacts.

by on (#89341)
mic_ wrote:
I don't know what it's supposed to sound like since I don't have an NTSC console, but here's what ToP sounds like on my PAL SNES, where it has clear artefacts.


Yes that's broken (quite obvious) but emulators have it way worse, even while emulating a NTSC machine... O_O

by on (#89348)
Mmh, really ?
I remember Snes9x emulating it more or less properly, even tough this emulator is quite inaccurate. Maybe ZSNES is even more inacurate though.

Aside of that I have another idea for practical BRR streaming synchronization.
At all times, one of the communication regs would tell the CPU how many BRR blocks it's supposed to send to the SPC.

This way, the SPC increment this number automatically as the streamed audio plays. Then it's up the CPU to respond by sending its data, then the SPC acknownledge data, overwrite the old buffer and the count reset to zero.

This would also work on both PAL and NTSC consoles (and probably emulators) without changing any line of code. and it would be simpler than my previous solution.

by on (#89424)
Slightly off topic, but since this is an SPC700 topic I'll ask this question.

How fast does the SPC700 respond to changes in it's DSP registers? I was thinking of doing FM-synth by manipulating the channel frequency registers.

by on (#89425)
How fast ? I don't think there is any kind of Delay - that is a write will take effects immediately.
Maybe you should check out anomie's docs for details.

by on (#89448)
Bregalad wrote:
Maybe ZSNES is even more inacurate though.


In ZSNES, the 12-cycle DIV instruction takes the same time as the 2-cycle NOP instruction, because a cycle-table lookup was slower than just using an average constant cycle value for all opcodes.

So yes. Yes, it most certainly is.

by on (#89450)
ZSNES is a relic of the past from when PCs could not hope to emulate the SNES very accurately and still run at full speed. Plenty of liberties were taken for performance and other things probably just because many details were unknown. This should sound familiar to NES emulation.

by on (#89528)
> Plenty of liberties were taken for performance and other things probably just because many details were unknown.

SMP opcode cycle counts were known before ZSNES was started on. We even had the WDC CPU documentation on per-cycle operation. Many of its decisions were deliberate. And for the time, wise.

What's worse is that even knowing the information now, and even with faster computers, simple one-line fixes are still not added.

> This should sound familiar to NES emulation.

The NES was much more timing-sensitive, and the SNES alleviates a lot of that through HDMA and IRQs. But make no mistake, having emulated both systems myself: NESticle was a more faithful NES emulator than ZSNES is an SNES emulator.

Imagine if NESticle were given a crude Win32 windowed-mode port, and a new minor release every 3-5 years. And saying anything at all bad about it invoked ridicule on most emulation forums. Wouldn't that be a treat? :P

by on (#89531)
What you say is true of ZSNES, or at least it was when ZSNES relied on x86-only assembly language code. (Does it still?) Snes9x might get a free pass because its pure C++ code allows ports to Wii consoles and Android phones, which aren't especially "faster computers".

by on (#89534)
> What you say is true of ZSNES, or at least it was when ZSNES relied on x86-only assembly language code. (Does it still?)

Yes, the parts ported to C were mostly the GUI and path loading code.

> Snes9x might get a free pass because its pure C++ code allows ports to Wii consoles and Android phones, which aren't especially "faster computers".

I don't really mind emulators based around speed hacks aimed at older/slower hardware. Although in most cases it's the owners being penny wise, pound foolish; there are legitimate cases where faster hardware can't be easily obtained. I just wish the people who did have faster hardware cared a bit more about quality. Improving ZSNES would be addressing the symptom rather than the cause, but it'd be better than nothing.

I've had enough of trying myself, but it'd be nice if we had someone like Marty to make a compelling, user-friendly UI and worry about accuracy and worry about making it as quick as possible. I could certainly lend my assistance toward the first two.

by on (#89536)
byuu wrote:
Although in most cases it's the owners being penny wise, pound foolish

Penny wise? What a bozo.

In the case of the Wii, there are a few issues in play, apart from end users' mental set against the home theater PC.
  • A HBC'd Wii is a lot cheaper than a second PC, despite that the PC can do more.
  • Most PCs don't come bundled with input devices meant for 10-foot use.
  • Full-width PC towers look out of place next to a "consumer electronics device", and a lot of people don't know about smaller models such as the Aspire X1.
  • Wii supports SDTV output. PCs generally don't. Though VGA to composite adapters exist (such as those sold on SewellDirect.com), they're sold online, not in stores.
  • Some HDTVs have trouble with VGA or DVI/HDMI video signals from a PC. They might not support the exact resolution that the PC outputs, for instance, and might insist on scaling the image wrong, such as cutting off the menu bar and taskbar.
  • Most gamepads not made by Sony or Nintendo have dodgy directional pads. Even Microsoft's. Adapters like the EMS Dual Shooter are sold online, not in stores.

by on (#89553)
Quote:
But make no mistake, having emulated both systems myself: NESticle was a more faithful NES emulator than ZSNES is an SNES emulator.

See guys : I've always said Nesticle weren't THAT bad. :wink:

I think the NES needs some accuracy because about half of the game library relies on cycle timed code and midframe register writes somewhere.

However SNES emu can get all the timings wrong, emulate IRQs that firest at the wrong time of the good scanline and emulate HDMA that doesn't interrupt the CPU and 99% of games will still be working fine.

by on (#89563)
Bregalad wrote:
However SNES emu can get all the timings wrong, emulate IRQs that firest at the wrong time of the good scanline and emulate HDMA that doesn't interrupt the CPU and 99% of games will still be working fine.


Yes, that is all very true. And the games that require the timing are titles such as:
* Mecarobot Golf
* Jumbo Osaki no Hole in One Golf
* Street Racer
* Power Rangers (one of them, anyway)
* Sink or Swim
* Speedy Gonzales
* Battle Blaze
Not exactly A-list games here, so it's no wonder when people play Zelda and Mario alone that they think things are perfect. (Even though there are obvious problems even in both of those, heh.)

In most cases, you can get away with murder by running too fast. Mortal Kombat II breaks completely if you have one extra I/O cycle on WAI (as little as one ten-millionth of a second per frame), but you can run it twice as fast as you should and it works fine.

I think it's really just the PPU that scares people off the SNES. The PPU is a nightmare: 64 registers versus 8, and every one is packed full of flags that all interact and blend with each other. That, and CPU hell. SMP, SuperFX, SA-1, uPD96050, HG51B169 ... unlike mappers, these are full-fledged processors with lots of auxiliary functions built-in. And unlike obscure NES mappers, support for all of these is basically mandatory if you want anyone to use your work.

by on (#89565)
These last few posts discussing emulator (in)accuracies made me think of this great email sent to the Nesticle developers back in 1997:

Quote:
Hello, this is a question.

you do nots think to make a SNESTLICLE or think to make it?

if make it make please it but that they could so that run to a speed 100 % in a pentium to 133 MHz with 16MB of RAM, make it it but similar to nesticle that for my the best emulator.

Ahh and that good has sound as nesticle.

Thanks and I wait a good response.


:P