Autocorrelation, cepstrum, and Mozer transform are related

Autocorrelation, cepstrum, and Mozer transform are related
by tepples on 2018-07-14 (#221067)

Autocorrelation, cepstrum, and Mozer transform are three algorithms for processing pitched signals. I realized in the shower this morning how closely related these algorithms are. The only difference is what function is applied to each element of a spectrum in the frequency domain to change the balance of weak and strong signals.

Trigger warning: This follow-up to a post from six years ago will be very math-heavy, as I'm taking notes for a possible future project.

Given two functions f[] and g[] on the integers, their convolution (f * g)[] is defined as

(f * g)[t] = Σ[m=-∞ to ∞](f[m]g[t - m])

The related cross-correlation is

(f *_ g)[t] = Σ[m=-∞ to ∞](f*[m]g[t + m])

where f* is the complex conjugate (real part same, imaginary part negated) of f. This is equivalent to convolution of the time reversal of first signal's complex conjugate (or the signal itself if it is real-valued) with a second signal. If one signal is real and symmetric, correlation and convolution are essentially equivalent.

The support (width of domain where the function is nonzero) of a discrete convolution or cross-correlation equals the sum of the support of the inputs minus one. For example, convolving functions of length 3 and 6 produces a function of length 8.

Convolution with a finite impulse response kernel is often used to filter discrete-time signals, such as audio. Cross-correlation of a signal with itself, or autocorrelation, is used to determine the pitch of a signal. A signal's autocorrelation is time symmetric and has a peak once per period of the signal with an overall simpler shape than the signal itself.

Convolution of two long signals takes O(n^2) operations, where n is the width of their support. But because convolution in the time domain correspods to multiplication in the frequency domain (and vice versa), this can be reduced to O(n log n) through an algorithm using the Fourier transform:

Pad both signals to at least the sum of their support (to avoid circular convolution effects)
Take the Fourier transform of both signals, producing two spectra F{f[]} and F{g[]}
If performing cross-correlation, take the complex conjugate of F{g[]}
Multiply the spectra together
Take the inverse Fourier transform of the resulting spectrum

Thus autocorrelation becomes this:

Pad the signal to at least twice its support
Take the FT of the signal
Multiply each element of the spectrum by its complex conjugate, or equivalently, take the square of the magnitude
Take the inverse FT

Autocorrelation{f[]} = F-1{|F{f[]}|^2}

Time symmetry thus results from loss of phase information, and only the positive half of the autocorrelation is considered.

A power cepstrum is similar to an autocorrelation, except a logarithm is inserted to boost the weak frequencies and separate the formants (timbre) of a periodic signal from its pitch.

PowerCepstrum{f[]} = F-1{log(|F{f[]}|^2)}

Pad the signal to at least twice its support
Take the FT of the signal
Take the logarithm of the square of the magnitude of each element of the spectrum
Take the inverse FT

The power cepstrum discards phase and is suitable for analysis rather than synthesis, such as pitch detection. There also exists a complex cepstrum that preserves phase and is thus useful for synthesis, but the documents I initially found about calculating the unwrapped phase were paywalled at $33.00 per article from IEEE.

Another approach is to replace squared magnitude with magnitude. This results in the Mozer transform, which discards phase information to find a time-symmetric representation for a given timbre. This was used to compress voice in several products using a codec developed by Electronic Speech Systems, such as Impossible Mission for C64 ("Another visitor") and Mito Koumon for Famicom.

Pad the signal to at least twice its support
Take the FT of the signal
Take the magnitude and discard the phase
Take the inverse FT

Mozer{f[]} = F-1{|F{f[]}|}

Because the signal is symmetric, half of it can be discarded and computed from the time reversal of the other half. It also concentrates the signal's power close to zero in the time domain, allowing lossy compression of the timbre by windowing out small components far from zero time. And theoretically, the only difference between this and the power cepstrum is that the logarithm (which boosts weak signals) is replaced with the square root (which also boosts weak signals).

So a speech codec designed for real-time playback on a 1.8 MHz microprocessor without a hardware multiply might break the signal into segments on the order of 25 ms and then do the following to each segment:

Across the previous, current, and next segment, apply a suitable window function to taper out edge discontinuities.
Apply one of these dephasing transforms to the windowed signal.
Pick a peak of the signal with a lag (aka quefrency) greater than 2 ms to find the period. One suggestion is to find the amplitude of the strongest such peak, then find the lowest-lag peak with at least 80% of that amplitude. If needed, use the peaks found when analyzing the previous and next segment to help avoid transient octave errors.
Apply the Mozer transform to the signal.
Crop out the signal more than half a period from zero, and see how much of the signal between a quarter and half period you can crop out.
Store the cropped time domain formants and pitch data.

Re: Autocorrelation, cepstrum, and Mozer transform are relat
by rainwarrior on 2018-07-14 (#221072)

Well, they're related in that they all do some work in the frequency domain, but I think that's true of most sound/music/DSP operations? It's just the natural place to do signal analysis most of the time.

The Fourier Transform step is just the practical way to get to that domain. Like you can do the autocorrelation convolution in the signal domain if you want, but the existence of the Fast Fourier Transform makes it way more practical to go to the frequency domain to apply the convolution there. (Convolutions where one of the kernels is very short, on the other hand, are often faster if done directly in the signal domain.)

Re: Autocorrelation, cepstrum, and Mozer transform are relat
by tepples on 2018-07-14 (#221076)

My point is that all three transforms can be thought of as multiband compressors that discard phase, just with different compression curves. So algorithms using one, particularly pitch detection, can be ported to another.

Re: Autocorrelation, cepstrum, and Mozer transform are relat
by lidnariq on 2018-07-14 (#221094)

The cepstrum does not discard phase, and if you do you get garbage out.

You have to preserve phase, you have to take the complex log of the input function, and you have to "linearize" the phase too so that there are no discontinuities.

If you don't linearize phase, the central assumption (i.e. ifft(log(fft(x)·fft(y)))=ifft(log(fft(x)))+ifft(log(fft(y))) is no longer true.

Re: Autocorrelation, cepstrum, and Mozer transform are relat
by tepples on 2018-07-14 (#221100)

lidnariq wrote:

The cepstrum does not discard phase

The complex cepstrum does not. The power cepstrum does. I will edit the above to clarify the difference.

Re: Autocorrelation, cepstrum, and Mozer transform are relat
by lidnariq on 2018-07-14 (#221104)

I hadn't even heard of the Power cepstrum before. Sorry about the tangent.

(From my experience with the complex cepstrum, I'm not altogether convinced the power cepstrum is actually useful for anything... it's hard enough to get something useful out of the reversible complex cepstrum)