Audio volume/mixing and dithering

This is an archive of a topic from NESdev BBS, taken in mid-October 2019 before a server upgrade.
View original topic
Audio volume/mixing and dithering
by on (#237349)
Regarding DSP, when attenuating audio or mixing audio (which involves attenuation and addition), should audio dithering be applied? Is seems like a similar process as bit depth reduction?
Re: Audio volume/mixing and dithering
by on (#237352)
Audio dithering is only really appropriate as part of bit depth reduction. It's approximately delta-sigma modulation on the LSbits to increase the effective bit depth at lower frequency in exchange for more noise at higher frequency.
Re: Audio volume/mixing and dithering
by on (#237353)
Dithering definitely makes a slight improvement for 8-bit output, but it's probably not worth doing for 16-bit or higher, except for very high fidelity applications.

Like it's a reasonable operation for Audacity to do when you're saving a file to disk (the application is entirely about audio, and overhead is drowned by disk access anyway), but for an emulator I wouldn't even bother thinking about it.
Re: Audio volume/mixing and dithering
by on (#237356)
In general high-end music production, you only dither when you need to convert your project from 32-bit float to 16 or 24 bit media targets, as a final step once all the other mastering tweaks are done. A cargo cult idea is that it makes the mix sound more "analog", which is in a sense is true since you mitigate the rounding errors which are very digital domain, but the amount of noise added (compared to tape hiss) shouldn't be noticeable.

Elsewhere... i guess the following could be a test. Are you bit depth reducing? can you hear the rounding error affect the sound as a result (in the case of audio - elsewhere, does it provide significant inaccuracies)? Is it unacceptable? If yes to all three, dithering may be a worthwhile feature.
Re: Audio volume/mixing and dithering
by on (#237357)
Some excellent explanations of dither.

My current emulator has 64-bit float DSP and 8/16-bit integer output, so I add simple TPDF dither at the final downsample. Scale and clamp rail-to-rail signal range to to +/- 32766 (or +/- 126), add two random numbers in the range +/- 0.5 and cast to int. Practical performance overhead is nil compared with resampling, and it makes an audible difference because the signal is often very quiet (headroom needed to prevent clipping and/or tracks intentionally not using full range).