This thread is to discuss the audio sampling rate and method for MSU1 FMV playback.
Initial design goal: hardware flexibility. Audio is a simple system of "play or pause track #n". Video is a raw block of data. No codecs, no lossiness, no fixed playback rates, no fixed resolutions, no fixed color depths or palettes. Strictly speaking, the MSU1 has no concept of video. It's a software implementation detail.
Proposed change: a method to ensure long video sequences can be played without the audio desynchronizing. We want identical behavior between emulator and hardware implementations of MSU1, at least to the extent that it continues to allow for flexible hardware implementations.
Audio current design: audio is stored in a 44100hz 16-bit stereo file. This is not a hard requirement of MSU1 itself. Audio could be a 96000hz 24-bit 7.1 surround MP3 (only on an emulator), theoretically, and the same SNES-side software would just work with it. But of course, if we deviate from the 44100hz .pcm file format, it complicates compatibility between implementations.
Video current design: due to limited bandwidth, you have to make concessions. You can trade horizontal and/or vertical resolution in return for higher framerates. Or you can limit both for smaller video sizes (and thus, longer maximum video lengths.) For instance, video can be 240x160@15fps, or 224x144@20fps.
Video format timings:
NTSC non-interlace = 21477272/(1364*262-4) = 60.0991482hz
NTSC interlace = 21477272/(1364*525)*2 = 59.9840022 / 2 = 29.9920011hz
PAL non-interlace = 21281370/(1364*312) = 50.0069789hz
PAL interlace = 21281370/(1364*625)*2 = 49.9269677 / 2 = 24.9634839hz
NTSC standard video = 60 / 1.001 = 59.9400599 / 2 = 29.97003
PAL standard video = 50hz / 2 = 25hz
Proposed simplification: I don't think we should consider running PAL FMVs on an unmodified NTSC console, or NTSC FMVs on a PAL console. Further, I don't think we should consider interlaced FMVs being played in non-interlaced mode, or vice versa.
ikari's proposed solution: we adjust audio sample playback in the MSU1 to compensate for official NTSC / PAL video.
byuu's proposed solution: when we want our videos to be based off of NTSC source material, we perform our own sample rate conversion on the audio track of the initial video. A very simple linear or hermite filter should be fine. However, we do still have to consider the slight oscillator variance of the SNES CPU. The official specification is 21477272hz. Ceramic oscillators are terrible at +/-5%, but thankfully the SNES CPU oscillator is crystal, with a tolerance of +/-0.00000125%.
I think that basing the playback rate at NTSC / PAL video rates rules out the possibility of playing back videos not in NTSC / PAL format. Given that MSU1 is entirely lossless, the possibility exists to render base video at any refresh rate we like. I suspect that nearly all MSU1 video will be from NTSC / PAL sources, but one could conceivably CG render a sequence at the native SNES rate of 60.09hz / 2.
I'm also concerned that if we play back the audio at a different rate than .pcm stores, that the time lengths of the tracks will shift. Eg you'll see 10:00 in your media player, but 10:04 will be what you actually get on hardware.
However, I do think there may be strong incentive to base audio playback rate off of the SNES master clock. That +/-0.00000125% tolerance may be minimal, but it may quickly add up into big trouble. So I might propose a hardware design of ouputting one sample after every 487 SNES CPU clock ticks, which gives us 44101hz on NTSC. PAL is not quite so easy, as 21281370hz isn't very divisible by 44100. But combine a tying to the SNES CPU clock to audio resampling, and the same video file should play perfectly on every MSU1 implementation, be it hardware or emulator. It does however require access to the SNES CPU, and a non-trivial clock divider.
Other solutions? feel free to share your input below.
Initial design goal: hardware flexibility. Audio is a simple system of "play or pause track #n". Video is a raw block of data. No codecs, no lossiness, no fixed playback rates, no fixed resolutions, no fixed color depths or palettes. Strictly speaking, the MSU1 has no concept of video. It's a software implementation detail.
Proposed change: a method to ensure long video sequences can be played without the audio desynchronizing. We want identical behavior between emulator and hardware implementations of MSU1, at least to the extent that it continues to allow for flexible hardware implementations.
Audio current design: audio is stored in a 44100hz 16-bit stereo file. This is not a hard requirement of MSU1 itself. Audio could be a 96000hz 24-bit 7.1 surround MP3 (only on an emulator), theoretically, and the same SNES-side software would just work with it. But of course, if we deviate from the 44100hz .pcm file format, it complicates compatibility between implementations.
Video current design: due to limited bandwidth, you have to make concessions. You can trade horizontal and/or vertical resolution in return for higher framerates. Or you can limit both for smaller video sizes (and thus, longer maximum video lengths.) For instance, video can be 240x160@15fps, or 224x144@20fps.
Video format timings:
NTSC non-interlace = 21477272/(1364*262-4) = 60.0991482hz
NTSC interlace = 21477272/(1364*525)*2 = 59.9840022 / 2 = 29.9920011hz
PAL non-interlace = 21281370/(1364*312) = 50.0069789hz
PAL interlace = 21281370/(1364*625)*2 = 49.9269677 / 2 = 24.9634839hz
NTSC standard video = 60 / 1.001 = 59.9400599 / 2 = 29.97003
PAL standard video = 50hz / 2 = 25hz
Proposed simplification: I don't think we should consider running PAL FMVs on an unmodified NTSC console, or NTSC FMVs on a PAL console. Further, I don't think we should consider interlaced FMVs being played in non-interlaced mode, or vice versa.
ikari's proposed solution: we adjust audio sample playback in the MSU1 to compensate for official NTSC / PAL video.
ikari_01 wrote:
The MSU playback sample rate equals one sample per 59378938/122500 SNES master clocks (rounded to nearest neighbor). Audio output is synced to the SNES master clock in order to achieve frame synchronization even with slightly differently clocked consoles, and especially on modified PAL consoles. 21477270/(59378938/122500) Hz is an empirical value based on frame cycle count measurements (NTSC) and the assumption that audio should be synchronous to a source video material @29.97fps and 44100Hz, played back on the SNES at 2 SNES frames per video frame. On a perfectly tuned NTSC SNES the actual playback rate equals 44308.06Hz, on a perfectly tuned PAL SNES it's 43903.91Hz.
byuu's proposed solution: when we want our videos to be based off of NTSC source material, we perform our own sample rate conversion on the audio track of the initial video. A very simple linear or hermite filter should be fine. However, we do still have to consider the slight oscillator variance of the SNES CPU. The official specification is 21477272hz. Ceramic oscillators are terrible at +/-5%, but thankfully the SNES CPU oscillator is crystal, with a tolerance of +/-0.00000125%.
I think that basing the playback rate at NTSC / PAL video rates rules out the possibility of playing back videos not in NTSC / PAL format. Given that MSU1 is entirely lossless, the possibility exists to render base video at any refresh rate we like. I suspect that nearly all MSU1 video will be from NTSC / PAL sources, but one could conceivably CG render a sequence at the native SNES rate of 60.09hz / 2.
I'm also concerned that if we play back the audio at a different rate than .pcm stores, that the time lengths of the tracks will shift. Eg you'll see 10:00 in your media player, but 10:04 will be what you actually get on hardware.
However, I do think there may be strong incentive to base audio playback rate off of the SNES master clock. That +/-0.00000125% tolerance may be minimal, but it may quickly add up into big trouble. So I might propose a hardware design of ouputting one sample after every 487 SNES CPU clock ticks, which gives us 44101hz on NTSC. PAL is not quite so easy, as 21281370hz isn't very divisible by 44100. But combine a tying to the SNES CPU clock to audio resampling, and the same video file should play perfectly on every MSU1 implementation, be it hardware or emulator. It does however require access to the SNES CPU, and a non-trivial clock divider.
Other solutions? feel free to share your input below.