We now have a way of computing the spectrum for an arbitrary signal: The Discrete Fourier Transform (DFT) computes the spectrum at N equally spaced frequencies from a length- N sequence. An issue that never arises in analog "computation," like that performed by a circuit, is how much work it takes to perform the signal processing operation such as filtering. In computation, this consideration translates to the number of basic computational steps required to perform the needed processing. The number of steps, known as the complexity, becomes equivalent to how long the computation takes (how long must we wait for an answer). Complexity is not so much tied to specific computers or programming languages but to how many steps are required on any computer. Thus, a procedure's stated complexity says that the time taken will be proportional to some function of the amount of data used in the computation and the amount demanded.
For example, consider the formula for the discrete Fourier transform. For each frequency we chose, we must multiply each signal value by a complex number and add together the results. For a real-valued signal, each real-times-complex multiplication requires two real multiplications, meaning we have 2N multiplications to perform. To add the results together, we must keep the real and imaginary parts separate. Adding N numbers requires N − 1 additions. Consequently, each frequency requires 2N + 2(N − 1) = 4N − 2 basic computational steps. As we have N frequencies, the total number of computations is N(4N − 2) .
In complexity calculations, we only worry about what happens as the data lengths increase, and take the dominant term—here the 4N 2 term—as reflecting how much work is involved in making the computation. As multiplicative constants don't matter since we are making a "proportional to" evaluation, we find the DFT is an O(N 2) computational procedure. This notation is read "order N -squared". Thus, if we double the length of the data, we would expect that the computation time to approximately quadruple.
Exercise 13.1.1. (Go to Solution)
In making the complexity evaluation for the DFT, we assumed the data to be real. Three questions emerge. First of all, the spectra of such signals have conjugate symmetry, meaning that negative frequency components ( in the DFT) can be computed from the corresponding positive frequency components. Does this symmetry change the DFT's complexity?
Secondly, suppose the data are complex-valued; what is the DFT's complexity now?
Finally, a less important but interesting question is suppose we want K frequency values instead of N ; now what is the complexity?
Solution to Exercise 13.1.1. (Return to Exercise)
When the signal is real-valued, we may only need half the spectral values, but the complexity remains unchanged. If the data are complex-valued, which demands retaining all frequency values, the complexity is again the same. When only K frequencies are needed, the complexity is O(KN) .
The Fast Fourier Transform (FFT) is an efficient O(NlogN) algorithm for calculating DFTs The FFT exploits symmetries in the W matrix to take a "divide and conquer" approach. We will first discuss deriving the actual FFT algorithm, some of its implications for the DFT, and a speed comparison to drive home the importance of this powerful algorithm.
To derive the FFT, we assume that the signal's duration is a power of two: N = 2 l . Consider what happens to the even-numbered and odd-numbered elements of the sequence in the DFT calculation.
Each term in square brackets has the form of a -length DFT. The first one is a DFT of the even-numbered elements, and the second of the odd-numbered elements. The first DFT is combined with the second multiplied by the complex exponential . The half-length transforms are each evaluated at frequency indices k ∈ {0, …, N − 1} . Normally, the number of frequency indices in a DFT calculation range between zero and the transform length minus one. The computational advantage of the FFT comes from recognizing the periodic nature of the discrete Fourier transform. The FFT simply reuses the computations made in the half-length transforms and combines them through additions and the multiplication by , which is not periodic over , to rewrite the length-N DFT. Figure 13.1 illustrates this decomposition. As it stands, we now compute two length- transforms (complexity ), multiply one of them by the complex exponential (complexity O(N) ), and add the results (complexity O(N) ). At this point, the total complexity is still dominated by the half-length DFT calculations, but the proportionality coefficient has been reduced.
Now for the fun. Because N = 2 l , each of the half-length transforms can be reduced to two quarter-length transforms, each of these to two eighth-length ones, etc. This decomposition continues until we are left with length-2 transforms. This transform is quite simple, involving only additions. Thus, the first stage of the FFT has length-2 transforms (see the bottom part of Figure 13.1). Pairs of these transforms are combined by adding one to the other multiplied by a complex exponential. Each pair requires 4 additions and 4 multiplications, giving a total number of computations equaling . This number of computations does not change from stage to stage. Because the number of stages, the number of times the length can be divided by two, equals log2 N , the complexity of the FFT is O(N logN) .
Figure 13.1. Length-8 DFT decomposition
Doing an example will make computational savings more obvious. Let's look at the details of a length-8 DFT. As shown on Figure 13.1, we first decompose the DFT into two length-4 DFTs, with the outputs added and subtracted together in pairs. Considering Figure 13.1 as the frequency index goes from 0 through 7, we recycle values from the length-4 DFTs into the final calculation because of the periodicity of the DFT output. Examining how pairs of outputs are collected together, we create the basic computational element known as a butterfly (Figure 13.2).
Figure 13.2. Butterfly
By considering together the computations involving common output frequencies from the two half-length DFTs, we see that the two complex multiplies are related to each other, and we can reduce our computational work even further. By further decomposing the length-4 DFTs into two length-2 DFTs and combining their outputs, we arrive at the diagram summarizing the length-8 fast Fourier transform (Figure 13.1). Although most of the complex multiplies are quite simple (multiplying by ⅇ – (ⅈπ) means negating real and imaginary parts), let's count those for purposes of evaluating the complexity as full complex multiplies. We have complex multiplies and 2N = 16 additions for each stage and log2 N = 3 stages, making the number of basic computations as predicted.
Exercise 13.2.1. (Go to Solution)
Note that the ordering of the input sequence in the two parts of Figure 13.1 aren't quite the same. Why not? How is the ordering determined?
We now have a way of computing the spectrum for an arbitrary signal: The Discrete Fourier Transform (DFT) computes the spectrum at N equally spaced frequencies from a length- N sequence. An issue that never arises in analog "computation," like that performed by a circuit, is how much work it takes to perform the signal processing operation such as filtering. In computation, this consideration translates to the number of basic computational steps required to perform the needed processing. The number of steps, known as the complexity, becomes equivalent to how long the computation takes (how long must we wait for an answer). Complexity is not so much tied to specific computers or programming languages but to how many steps are required on any computer. Thus, a procedure's stated complexity says that the time taken will be proportional to some function of the amount of data used in the computation and the amount demanded.
For example, consider the formula for the discrete Fourier transform. For each frequency we chose, we must multiply each signal value by a complex number and add together the results. For a real-valued signal, each real-times-complex multiplication requires two real multiplications, meaning we have 2N multiplications to perform. To add the results together, we must keep the real and imaginary parts separate. Adding N numbers requires N − 1 additions. Consequently, each frequency requires 2N + 2(N − 1) = 4N − 2 basic computational steps. As we have N frequencies, the total number of computations is N(4N − 2) .
In complexity calculations, we only worry about what happens as the data lengths increase, and take the dominant term—here the 4N 2 term—as reflecting how much work is involved in making the computation. As multiplicative constants don't matter since we are making a "proportional to" evaluation, we find the DFT is an O(N 2) computational procedure. This notation is read "order N -squared". Thus, if we double the length of the data, we would expect that the computation time to approximately quadruple.
Exercise 13.2.2. (Go to Solution)
In making the complexity evaluation for the DFT, we assumed the data to be real. Three questions emerge. First of all, the spectra of such signals have conjugate symmetry, meaning that negative frequency components ( in the DFT) can be computed from the corresponding positive frequency components. Does this symmetry change the DFT's complexity?
Secondly, suppose the data are complex-valued; what is the DFT's complexity now?
Finally, a less important but interesting question is suppose we want K frequency values instead of N ; now what is the complexity?
How much better is O(NlogN) than O( N 2 )?
Figure 13.3.
N | 10 | 100 | 1000 | 106 | 109 |
---|---|---|---|---|---|
N 2 | 100 | 104 | 106 | 1012 | 1018 |
N logN | 1 | 200 | 3000 | 6 × 106 | 9 × 109 |
Say you have a 1 MFLOP machine (a million "floating point" operations per second). Let N = 1million = 106 .
An O( N 2 ) algorithm takes 1012 flors → 106 seconds ≃ 11.5 days.
An O( N logN ) algorithm takes 6 × 106 Flors → 6 seconds.
Example 13.1.
3 megapixel digital camera spits out 3 × 106 numbers for each picture. So for two N point sequences f[n] and h[n]. If computing (f[n] ⊛ h[n]) directly: O( N 2 ) operations.
taking FFTs -- O(NlogN)
multiplying FFTs -- O(N)
inverse FFTs -- O(NlogN).
the total complexity is O(NlogN).
Other "fast" algorithms have been discovered, most of which make use of how many common factors the transform length N has. In number theory, the number of prime factors a given integer has measures how composite it is. The numbers 16 and 81 are highly composite (equaling 24 and 34 respectively), the number 18 is less so ( 2132 ), and 17 not at all (it's prime). In over thirty years of Fourier transform algorithm development, the original Cooley-Tukey algorithm is far and away the most frequently used. It is so computationally efficient that power-of-two transform lengths are frequently used regardless of what the actual length of the data. It is even well established that the FFT, alongside the digital computer, were almost completely responsible for the "explosion" of DSP in the 60's.
Solution to Exercise 13.2.1. (Return to Exercise)
The upper panel has not used the FFT algorithm to compute the length-4 DFTs while the lower one has. The ordering is determined by the algorithm.
Solution to Exercise 13.2.2. (Return to Exercise)
When the signal is real-valued, we may only need half the spectral values, but the complexity remains unchanged. If the data are complex-valued, which demands retaining all frequency values, the complexity is again the same. When only K frequencies are needed, the complexity is O(KN) .
To derive the FFT, we assume that the signal's duration is a power of two: N = 2 l . Consider what happens to the even-numbered and odd-numbered elements of the sequence in the DFT calculation.
Each term in square brackets has the form of a -length DFT. The first one is a DFT of the even-numbered elements, and the second of the odd-numbered elements. The first DFT is combined with the second multiplied by the complex exponential . The half-length transforms are each evaluated at frequency indices k ∈ {0, …, N − 1} . Normally, the number of frequency indices in a DFT calculation range between zero and the transform length minus one. The computational advantage of the FFT comes from recognizing the periodic nature of the discrete Fourier transform. The FFT simply reuses the computations made in the half-length transforms and combines them through additions and the multiplication by , which is not periodic over , to rewrite the length-N DFT. Figure 13.4 illustrates this decomposition. As it stands, we now compute two length- transforms (complexity ), multiply one of them by the complex exponential (complexity O(N) ), and add the results (complexity O(N) ). At this point, the total complexity is still dominated by the half-length DFT calculations, but the proportionality coefficient has been reduced.
Now for the fun. Because N = 2 l , each of the half-length transforms can be reduced to two quarter-length transforms, each of these to two eighth-length ones, etc. This decomposition continues until we are left with length-2 transforms. This transform is quite simple, involving only additions. Thus, the first stage of the FFT has length-2 transforms (see the bottom part of Figure 13.4). Pairs of these transforms are combined by adding one to the other multiplied by a complex exponential. Each pair requires 4 additions and 4 multiplications, giving a total number of computations equaling . This number of computations does not change from stage to stage. Because the number of stages, the number of times the length can be divided by two, equals log2 N , the complexity of the FFT is O(N logN) .
Figure 13.4. Length-8 DFT decomposition
Doing an example will make computational savings more obvious. Let's look at the details of a length-8 DFT. As shown on Figure 13.4, we first decompose the DFT into two length-4 DFTs, with the outputs added and subtracted together in pairs. Considering Figure 13.4 as the frequency index goes from 0 through 7, we recycle values from the length-4 DFTs into the final calculation because of the periodicity of the DFT output. Examining how pairs of outputs are collected together, we create the basic computational element known as a butterfly (Figure 13.5).
Figure 13.5. Butterfly
By considering together the computations involving common output frequencies from the two half-length DFTs, we see that the two complex multiplies are related to each other, and we can reduce our computational work even further. By further decomposing the length-4 DFTs into two length-2 DFTs and combining their outputs, we arrive at the diagram summarizing the length-8 fast Fourier transform (Figure 13.4). Although most of the complex multiplies are quite simple (multiplying by ⅇ – (ⅈπ) means negating real and imaginary parts), let's count those for purposes of evaluating the complexity as full complex multiplies. We have complex multiplies and 2N = 16 additions for each stage and log2 N = 3 stages, making the number of basic computations as predicted.
Exercise 13.3.1. (Go to Solution)
Note that the ordering of the input sequence in the two parts of Figure 13.4 aren't quite the same. Why not? How is the ordering determined?
Other "fast" algorithms were discovered, all of which make use of how many common factors the transform length N has. In number theory, the number of prime factors a given integer has measures how composite it is. The numbers 16 and 81 are highly composite (equaling 24 and 34 respectively), the number 18 is less so ( 2132 ), and 17 not at all (it's prime). In over thirty years of Fourier transform algorithm development, the original Cooley-Tukey algorithm is far and away the most frequently used. It is so computationally efficient that power-of-two transform lengths are frequently used regardless of what the actual length of the data.
Solution to Exercise 13.3.1. (Return to Exercise)
The upper panel has not used the FFT algorithm to compute the length-4 DFTs while the lower one has. The ordering is determined by the algorithm.
A great many applications in signal processing, image processing, and beyond involve determining the presence and location of a target signal within some other signal. A radar system, for example, searches for copies of a transmitted radar pulse in order to determine the presence of and distance to reflective objects such as buildings or aircraft. A communication system searches for copies of waveforms representing digital 0s and 1s in order to receive a message.
Two key mathematical tools that contribute to these applications are inner products and the Cauchy-Schwarz inequality. As is shown in the module on the Cauchy-Schwarz inequality, the expression attains its upper bound, which is 1, when y = a x for some scalar a in a real or complex field. The lower bound, which is 0, is attained when x and y are orthogonal. In informal intuition, this means that the expression is maximized when the vectors x and y have the same shape or pattern and minimized when x and y are very different. A pair of vectors with similar but unequal shapes or patterns will produce relatively large value of the expression less than 1, and a pair of vectors with very different but not orthogonal shapes or patterns will produce relatively small values of the expression greater than 0. Thus, the above expression carries with it a notion of the degree to which two signals are “alike”, the magnitude of the normalized correlation between the signals in the case of the standard inner products.
This concept can be extremely useful. For instance consider a situation in which we wish to determine which signal, if any, from a set X of signals most resembles a particular signal y . In order to accomplish this, we might evaluate the above expression for every signal x ∈ X , choosing the one that results in maxima provided that those maxima are above some threshold of “likeness”. This is the idea behind the matched filter detector, which compares a set of signals against a target signal using the above expression in order to determine which is most like the target signal.
The simplest variant of the matched filter detector scheme would be to find the member signal in a set X of signals that most closely matches a target signal y . Thus, for every x ∈ X we wish to evaluate
in order to compare every member of X to the target signal y . Since the member of X which most closely matches the target signal y is desired, ultimately we wish to evaluate
Note that the target signal does not technically need to be normalized to produce a maximum, but gives the desirable property that m(x,y) is bounded to [0,1].
The element x m ∈ X that produces the maximum value of m(x,y) is not necessarily unique, so there may be more than one matching signal in X . Additionally, the signal x m ∈ X producing the maximum value of m(x,y) may not produce a very large value of m(x,y) and thus not be very much like the target signal y . Hence, another matched filter scheme might identify the argument producing the maximum but only above a certain threshold, returning no matching signals in X if the maximum is below the threshold. There also may be a signal x ∈ X that produces a large value of m(x,y) and thus has a high degree of “likeness” to y but does not produce the maximum value of m(x,y). Thus, yet another matched filter scheme might identify all signals in X producing local maxima that are above a certain threshold.
Example 13.2.
For example, consider the target signal given in Figure 13.6 and the set of two signals given in Figure 13.7. By inspection, it is clear that the signal g 2 is most like the target signal f . However, to make that conclusion mathematically, we use the matched filter detector with the L 2 inner product. If we were to actually make the necessary computations, we would first normalize each signal and then compute the necessary inner products in order to compare the signals in X with the target signal f . We would notice that the absolute value of the inner product for g 2 with f when normalized is greater than the absolute value of the inner product of g 1 with f when normalized, mathematically stated as
Figure 13.6. Template Signal
Figure 13.7. Candidate Signals
A somewhat more involved matched filter detector scheme would involve attempting to match a target time limited signal y = f to a set of of time shifted and windowed versions of a single signal indexed by R. The windowing funtion is given by where is the interval to which f is time limited. This scheme could be used to find portions of g that have the same shape as f . If the absolute value of the inner product of the normalized versions of f and w S t g is large, which is the absolute value of the normalized correlation for standard inner products, then g has a high degree of “likeness” to f on the interval to which f is time limited but left shifted by t . Of course, if f is not time limited, it means that the entire signal has a high degree of “likeness” to f left shifted by t .
Thus, in order to determine the most likely locations of a signal with the same shape as the target signal f in a signal g we wish to compute
to provide the desired shift. Assuming the inner product space examined is L 2(R (similar results hold for L 2(R[a,b)), l 2(Z), and l 2(Z[a,b))), this produces
Since f and w are time limited to the same interval
Making the subsitution ,
Noting that this expression contains a convolution operation
where h is the conjugate of the time reversed version of g defined by
In the special case in which the target signal f is not time limited, w has unit value on the entire real line. Thus, the norm can be evaluated as ||w S t g|| = ||S t g|| = ||g|| = ||h||. Therefore, the function reduces to where The function is known as the normalized cross-correlation of f and g .
Hence, this matched filter scheme can be implemented as a convolution. Therefore, it may be expedient to implement it in the frequency domain. Similar results hold for the L 2(R[a,b)), l 2(Z), and l 2(Z[a,b]) spaces. It is especially useful to implement the l 2(Z[a,b]) cases in the frequency domain as the power of the Fast Fourier Transform algorithm can be leveraged to quickly perform the computations in a computer program. In the L 2(R[a,b)) and l 2(Z[a,b]) cases, care must be taken to zero pad the signal if wrap-around effects are not desired. Similar results also hold for spaces on higher dimensional intervals with the same inner products.
Of course, there is not necessarily exactly one instance of a target signal in a given signal. There could be one instance, more than one instance, or no instance of a target signal. Therefore, it is often more practical to identify all shifts corresponding to local maxima that are above a certain threshold.
Example 13.3.
The signal in Figure 13.9 contains an instance of the template signal seen in Figure 13.8 beginning at time t = s 1 as shown by the plot in Figure 13.10. Therefore,
Figure 13.8. Pattern Signal
Figure 13.9. Longer Signal
Figure 13.10. Absolute Value of Output
Matched Filtering is used in image processing to detect a template image within a reference image. This has real-word applications in verifying fingerprints for security or in verifying someone's photo. As a simple example, we can turn to the ever-popular "Where's Waldo?" books (known as Wally in the UK!), where the reader is tasked with finding the specific face of Waldo/Wally in a confusing background rife with look-alikes! If we are given the template head and a reference image, we can run a two dimensional convolution of the template image across the reference image to obtain a three dimensional convolution map, Figure 13.11, where the height of the convolution map is determined by the degree of correlation, higher being more correlated. Finding our target then becomes a matter of determining the spot where the local surface area is highest. The process is demonstrated in Figure 13.11. In the field of image processing, this matched filter-based process is known as template matching.
Figure 13.11.
then we could easily develop a program to find the closest resemblance to the image of Waldo's head in the larger picture. We would simply implement our same match filter algorithm: take the inner products at each shift and see how large our resulting answers are. This idea was implemented on this same picture for a Signals and Systems Project at Rice University (click the link to learn more).
Exercise 13.4.1. Pros and Cons (Go to Solution)
What are the advantages of the matched filter algorithm to image detection? What are the drawbacks of this method?
Matched filter detectors are also commonly used in Communications Systems. In fact, they are the optimal detectors in Gaussian noise. Signals in the real-world are often distorted by the environment around them, so there is a constant struggle to develop ways to be able to receive a distorted signal and then be able to filter it in some way to determine what the original signal was. Matched filters provide one way to compare a received signal with two possible original ("template") signals and determine which one is the closest match to the received signal.
For example, below we have a simplified example of Frequency Shift Keying (FSK) where we having the following coding for '1' and '0':
Figure 13.12.
Based on the above coding, we can create digital signals based on 0's and 1's by putting together the above two "codes" in an infinite number of ways. For this example we will transmit a basic 3-bit number, 101, which is displayed in Figure 13.13:
Figure 13.13.
Now, the signal picture above represents our original signal that will be transmitted over some communication system, which will inevitably pass through the "communications channel," the part of the system that will distort and alter our signal. As long as the noise is not too great, our matched filter should keep us from having to worry about these changes to our transmitted signal. Once this signal has been received, we will pass the noisy signal through a simple system, similar to the simplified version shown in Figure 13.14:
Figure 13.14.
Figure 13.14 basically shows that our noisy signal will be passed in (we will assume that it passes in one "bit" at a time) and this signal will be split and passed to two different matched filter detectors. Each one will compare the noisy, received signal to one of the two codes we defined for '1' and '0.' Then this value will be passed on and whichever value is higher (i.e. whichever FSK code signal the noisy signal most resembles) will be the value that the receiver takes. For example, the first bit that will be sent through will be a '1' so the upper level of the block diagram will have a higher value, thus denoting that a '1' was sent by the signal, even though the signal may appear very noisy and distorted.
The interactive example below supposes that our transmitter sends 1000 bits, plotting how many of those bits are received and interpreted correctly as either 1s and 0s, and also keeps a tally of how many are accidentally misinterpreted. You can play around with the distance between the energy of the "1" and the "0" (discriminability), the degree of noise present in the channel, and the location of the criterion (threshold) to get a feel for the basics of signal detection theory.
Example 13.4.
Let's use a matched filter to find the "0" bits in a simple signal.
Let's use the signal s 1(t) from example 1 to represent the bits. s 1(t) represents 0, while – s 1(t) represents 1.
0 ⇒ b = 1 ⇒ s 1(t) = s(t) for 0 ≤ t ≤ T
1 ⇒ b = -1 ⇒ s 2(t) = – (s(t)) for 0 ≤ t ≤ T
Figure 13.15.
The matched filter output clearly shows the location of the "0" bits.
One of the first, and more intriguing forms of communication that used the matched filter concept was radar. A known electromagnetic signal is sent out by a transmitter at a target and reflected off of the target back to the sender with a time delay proportional to the distance between target and sender. This scaled, time-shifted signal is then convolved with the original template signal, and the time at which the output of this convolution is highest is noted.
This technology proved vital in the 1940s for the powers that possessed it. A short set of videos below shows the basics of how the technology works, its applications, and its impact in World War 2.
Figure 13.16. History of Radar
See the video in Figure 13.17 for an analysis of the same basic principle being applied to adaptive cruise control systems for the modern car.
Figure 13.18.
As can be seen, the matched filter detector is an important signal processing application, rich both in theoretical concepts and in practical applications. The matched filter supports a wide array of uses related to pattern recognition, including image detection, frequency shift keying demodulation, and radar signal interpretation. Despite this diversity of purpose, all matched filter applications operate in essentially the same way. Every member of some set of signals is compared to a target signal by evaluating the absolute value of the inner product of the the two signals after normalization. However, the signal sets and result interpretations are application specific.
Solution to Exercise 13.4.1. (Return to Exercise)
This algorithm is very simple and thus easy to code. However, it is susceptible to certain types of noise - for example, it would be difficult to find Waldo if his face was rotated, flipped, larger or smaller than expected, or distorted in some other way.