Pascal Wallisch
This chapter introduces the most common method of decomposing a time series into frequency components, Fourier analysis. You will learn about the Fourier transform and the associated amplitude and phase spectra. The MATLAB® implementation of the fast Fourier transform (FFT), an efficient algorithm for calculating Fourier transformations, will be introduced and applied to the analysis of human speech sounds.
Fourier analysis; Fourier transform; fast Fourier transform; Fourier series; power; phase analysis
This chapter introduces the most common method of decomposing a time series into frequency components, Fourier analysis. You will learn about the Fourier transform and the associated amplitude and phase spectra. The MATLAB® implementation of the fast Fourier transform (FFT), an efficient algorithm for calculating Fourier transformations, will be introduced and applied to the analysis of human speech sounds.
Figure 11.1 shows typical recordings of two human vowel sounds. How can you characterize these different sounds? Frequency analysis provides a way to examine the relative contributions of various frequencies to an overall signal. In the case of an auditory signal, a given frequency component would be termed pitch.
Take some continuous function f. We can approximate such a function with a weighted series of sinusoids of various frequencies. Such a series is termed the real Fourier series:
(11.1)
Here, the coefficients an and bn represent the relative strength of each frequency component n/2π. [a0 represents the nonoscillatory component of f(t).] So, given f(t), determining the coefficients an and bn allows for the representation of f(t) as a series sum of sinusoids.
We will exploit two special properties of the sine and cosine functions to find the Fourier series coefficients an and bn. Over the interval −π to π, cosine and sine functions with differing frequencies have the special property of orthogonality. The integral of the product of two mutually orthogonal functions evaluates to zero. So, the integral of the product of cosine or sine functions with differing frequencies results in zero over this interval. Another interesting property of sine and cosine is that the integral of the square of a cosine or sine function over this integral is π.
To find the strength, am, of a cosine component m, multiply by the corresponding cosine function and integrate:
(11.2)
All terms on the right side except the cosine term where m=n yield zero:
(11.3)
The right side integral evaluates to one over the integration range, yielding an expression for the Fourier series term coefficient:
(11.4)
(11.5)
In general, the interval of f(t) will not be −π to π. For an interval centered on x with length 2L, the expression becomes
(11.6)
A similar procedure using sine functions yields the coefficients for the sine terms of the Fourier series.
Euler’s identity,
(11.7)
provides a straightforward way to formulate complex Fourier series representation for a given function, f(t):
(11.8)
Similar to the real transform, coefficients for the complex Fourier transform can be found by
(11.9)
for a given coefficient m over the interval −π to π. Over the interval x−L to x+L, this becomes
(11.10)
Let’s look at how this method of the Fourier transform scales with N. Given a time series with N values, this method requires a multiplication of the series and the corresponding Fourier component and subsequent sum for each coefficient. Assuming a number of coefficients equivalent to N, then you have a process that scales with N2. In other words, as N increases, the time required to compute the Fourier transform increases as N2.
With a few special tricks, a faster algorithm, the fast Fourier transform (FFT) that scales in N log N time can be formulated. One of these tricks involves taking advantage of datasets exactly 2N elements long. The increase in processing speed has made the FFT ubiquitous in signal processing. While a complete derivation of the algorithm is beyond the scope of this book, invoking the MATLAB implementation of the FFT will be discussed.
MATLAB provides an FFT function fft(X), where X is a vector in time space. fft returns the frequency space representation of X.
To visualize the importance of the difference in scaling, execute the following code:
As you might imagine, there is an inverse to the DFT:
(11.12)
MATLAB provides ifft() to perform the inverse discrete Fourier transform.
Often when you are using Fourier analysis, the amplitude spectrum is one of the first analyses performed. The amplitude spectrum graphs amplitude against frequency. In terms of the Fourier series representation, the amplitude spectrum depicts the magnitude of the coefficients at various frequencies. As such, it depicts the relative strengths of the various frequency components.
The following code generates a time series composed of 10 sine waves whose frequencies and amplitudes vary systematically.
Now, the variable Y contains the normalized FFT of X. Note the normalization factor L. Displaying the amplitude spectrum of X requires plotting the amplitudes at various frequencies. Note that fft returns only a single value, the transform coefficients. Now, how do you determine the frequency scale?
The return value of the FFT assumes that frequency is evenly spaced, from 0 to a theoretical result called the Nyquist limit. Nyquist demonstrated that a discrete sampling of a continuous process can capture frequencies no higher than half the sampling frequency. Since the code above has the sampling interval, this Nyquist limit is half the inverse of the sampling interval.
The following code calculates the Nyquist limit for the time series:
When viewing the FFT, it is important to remember that the result is the complex transform. Thus, simply using the result of the FFT as a set of real coefficients can cause a number of problems. To display the amplitude spectrum, the absolute value of the complex coefficients will be used. The values returned by fft are the coefficients for frequencies from the negative Nyquist limit to the positive Nyquist limit. If the time series data are purely real, then the resultant transform will have even symmetry. That is, the transform will be symmetrical across the abscissa. So, in this very frequent case, only the first half of the result of fft is used. The following code employs linspace to generate frequency values and plots the amplitude spectrum. linspace generates a linearly spaced sequence of values given initial and final values. Here, the initial and final values are 0 and 1, with a value count of L/2. The resultant vector is scaled by the Nyquist limit to generate the frequency vector.
Power at a given frequency is defined as
(11.13)
where F* is the complex conjugate of F. To do this in MATLAB, use the function conj to return the complex conjugate of a series of complex values.
Here is a plot of the power spectrum of the time series generated for the amplitude spectrum:
A power spectrum alone is not a complete representation of the information in the original signal. The various Fourier components can have various phases relative to one another, as illustrated in Figure 11.2.
You can plot relative phase by frequency by plotting the inverse tangent of the ratio between the imaginary component and the real component. Why is this the case? Imagine the complex plane, with pure real values along the abscissa (x-axis) and pure imaginary values along the ordinate (y-axis). Any complex value in your 1D Fourier transform can be represented with a coordinate pair. The magnitude of the value is simply the distance from the origin to the coordinates, or the complex modulus. The phase is the angle formed by the abscissa and the line passing through the origin and the complex point. Thus, using basic trigonometry, the phase angle is.
How can you represent this in MATLAB?
In this project, you will be asked to use Fourier decomposition to analyze vowel sounds produced by human speakers. On the companion website, you will find five examples of vowel sounds as produced by male American English speakers. Each sound corresponds to one of the vowel sounds in Table 11.1. The formant frequencies in Table 11.1 note the average formant frequencies as spoken by a male speaker of American English. You will use power spectra of these sounds to classify the recordings as one of these vowel sounds in the table.
Table 11.1
Average First and Second Formant Frequencies for Selected American English Vowels
Vowel sound | First formant | Second formant |
Bit | 342 | 2322 |
But | 623 | 1200 |
Bat | 588 | 1952 |
Boot | 378 | 997 |
(Data from Hillenbrand et al., 1995.)
To complete this project, you need to understand how formants relate to frequency analysis. The human vocal tract has multiple cavities in which speech sounds resonate. As such, most sounds have multiple strong frequency components. In classifying speech sounds, the lowest strong frequency band is termed the first formant. The next highest is termed the second formant, and so on.
Vowels lend themselves to a particularly simple characterization through their formants. Typically, vowel sounds have distinguishable first and second formants. Table 11.1 shows first and second formants for four vowel sounds in American English. Thus, the short “i” sound would have strong frequency representation at 342 Hz and at 2322 Hz.