How to emulate the APU

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
How to emulate the APU
by on (#17971)
I need ideas of how to emulate the APU. Since i must create the mixingbuffer myself it gets much harder. When do you mix all the audio?
Ive thinked of doing it in the end of each PPU frame but this makes it hard to keep effects and status reads accurate. any ideas?

thanks

by on (#17979)
If you're just starting out, mix every time an audio register is written or read or whenever the APU's 240 Hz timer ticks, and decide how many samples to mix by how many CPU cycles have elapsed since the last update.

by on (#17980)
What about this.. The linear counter is set, I start to write until next APU tick. the next thing that happens is that it stops the channel, then ive to remove everything?

I was thinking about an action buffer that put all writes and at which CPU cycle they occur in a buffer and then in the end mix everything, but this will make it hard to fix the status reads=/

by on (#17982)
When a status read occurrs, emulate the APU up to that point.

by on (#17984)
but do you think this is a good idea? whats the common number of writes to the APU each frame?
have anyone tried this technique, what do you think?

by on (#17985)
Do you have CPU emulation working apart from the APU?

by on (#17986)
I'vent decided yet

by on (#17992)
There's two "basic" ways to emulate: using OR not using "timestamps". Personally, I disagree because of a couple of reasons that's useless to discuss here. But if you're already using such timestamp system in your emulator, so go ahead. -_-;;

About the APU, it's quite easy, but you might read the blargg's APU reference a couple of times before writting your first beta of the APU emulator. Basically, take care about when the APU changes the sample data, and write the samples into the buffer following the NES frequency, or in other words, write 1 sample at every CPU cycle. Bump.

by on (#18005)
Yeah I think i'll stick to it, mix one sample per CPU cycle would be a real performance killer

by on (#18023)
n6 wrote:
mix one sample per CPU cycle would be a real performance killer

IIRC, that's what FCE Ultra does in high/highest quality mode, so... well, decide for yourself how you want to balance performance and quality.

by on (#18034)
Nope, that's what it IS supposed to do. You can, however, change the way to RESAMPLE the buffer. ^_^;;

by on (#18037)
That's what every APU emulator does, it effectively generates sound at the 1.79 MHz NES clock rate then resamples it to 44.1 kHz (or 48 kHz etc.) for the PC. The only difference is how the resampling is done. Contrary to the common view, trading off quality for performance does not give a very large gain if the quality version is implemented using an efficient algorithm.

by on (#18040)
Sorry if I misunderstood something, but I was referring to the following stuff from the FCE Ultra documentation:
FCE Ultra documentation wrote:
Sound channels are emulated with CPU instruction granularity. There are two sound quality options. Low-quality sound, the default sound quality option, generates sound data at 16x the playback rate and averages those samples together to 1 sample. This method works fairly well and is reasonably fast, but there is still some aliasing and sound distortion. All sample rates between 8192Hz and 96000Hz are supported.

The high-quality sound emulation is much more CPU intensive, but the quality is worth it, if your machine is fast enough. Sound data is generated at the NES' CPU clock rate (...), and then resampled to the output sample rate. Custom-designed 483rd order Parks-McClellan algorithm filter coefficients are used. Supported playback rates are 44100Hz, 48000Hz, and 96000Hz. (...)

The "highest" sound quality mode is similar to the normal high-quality mode, but the filters are of a higher order(1024 coefficients). Ripple is reduced, the upper bound of the passband is higher, and the stopband attenuation is slightly higher. The highest-quality mode filter coefficients were created using "gmeteor". The parameters used to create these filters can be found in the source code distribution.
Of course, I'm ignoring the issue of FIR filters, but I was trying to contrast FCEU's low-quality sound mode (which clearly does not "mix one sample per CPU cycle") with its better modes.

EDIT - @n6 (post right below mine): I was not trying to make a suggestion; I was simply stating the facts that I knew.

by on (#18045)
But I am not going to skip samples like that. Iam going to do exactly as my PPU works. when something interesting happens update to that point.

by on (#18126)
I know I never post here, but I thought I'd respond to this since I recently finished overhauling my sound core and learned a few things that you might find useful...

Accurate sound emulation (that is, emulating sound at the APU's clock) is not as slow as you might believe, since most of the time you'll be doing little more than counting down timers for various units and whatnot. Very simple (and fast) integer operations that equate to little overhead.

For a speed comparison of the various methods used, FakeNES CVS has the following emulation modes:

Fast - Emulate and render at output sample rate. This mode uses delta timers to try and keep the APU emulation from becoming too grossly inaccurate, but it has a huge speed boost. In fact, combined with an all-integer emulation, this is probably as fast as you can emulate the NES' sound efficiently, but results vary by sample rate.
Speed on my system: ~450FPS

Accurate - Emulate at APU clock, subsample and output at sample rate. This seems to be the most common method used.
Speed on my system: ~300FPS.

Ultra - Emulate and render at APU clock and supersample to the output sampling rate using a simple linear mixing scheme. Residue removal allows for fractional input sample per output sample counts. In FakeNES, this method gives the best quality and least aliasing since it emulates the waveform generators logically instead of synthetically.
Speed on my system: ~250FPS

Note that in all modes it uses "catch-up" type timing, where it performs delayed processing before reads and writes to the APU's registers to synchronize the interface state with the current CPU/PPU state before it can be modified or tested by software (the game).

Hope this helps to put some things into perspective. ^^