An audio signal is typically visualized as a bar chart or wave. The bar chart or wave is high when the sound is loud and low when it is soft. We recognize that a repetitive sound, such as a metronome's beat, makes repetitive peaks and valleys in the visualization. When audio has multiple channels (being a stereo or surround sound recording), each channel can be considered as a separate signal and can be visualized as a separate bar chart or wave.
Similarly, in a video, every channel of every pixel can be considered as a separate signal, rising and falling (becoming brighter and dimmer) over time. Imagine that we use a stationary camera to capture a video of a metronome. Then, certain pixel values rise and fall at a regular interval as they capture the passage of the metronome's needle. If the camera has an attached microphone, its signal values rise and fall at the same interval. Based on either the audio or the video, we can measure the metronome's frequency—its beats per minute (bpm) or its beats per second (Hertz or Hz). Conversely, if we change the metronome's bpm setting, the effect on both the audio and the video is predictable. From this thought experiment, we can learn that a signal—be it audio, video, or any other kind—can be expressed as a function of time and, equivalently, a function of frequency.
Consider the following pair of graphs. They express the same signal, first as a function of time and then as a function of frequency. Within the time domain, we see one wide peak and valley (in other words, a tapering effect) spanning many narrow peaks and valleys. Within the frequency domain, we see a low-frequency peak and a high-frequency peak.
The transformation from the time domain to the frequency domain is called the Fourier transform. Conversely, the transformation from the frequency domain to the time domain is called the inverse Fourier transform. Within the digital world, signals are discrete, not continuous, and we use the terms discrete Fourier transform (DFT) and inverse discrete Fourier transform (IDFT). There is a variety of efficient algorithms to compute the DFT or IDFT and such an algorithm might be described as a Fast Fourier Transform or an Inverse Fast Fourier Transform.
For algorithmic descriptions, refer to the following Wikipedia article: http://en.wikipedia.org/wiki/Fast_Fourier_transform.
The result of the Fourier transform (including its discrete variants) is a function that maps a frequency to an amplitude and phase. The amplitude represents the magnitude of the frequency's contribution to the signal. The phase represents a temporal shift; it determines whether the frequency's contribution starts on a high or a low. Typically, amplitude and phase are encoded in a complex number, a+bi, where amplitude=sqrt(a^2+b^2) and phase=atan2(a,b).
For an explanation of complex numbers, refer to the following Wikipedia article: http://en.wikipedia.org/wiki/Complex_number.
The FFT and IFFT are fundamental to a field of computer science called digital signal processing. Many signal processing applications, including Lazy Eyes, involve taking the signal's FFT, modifying or removing certain frequencies in the FFT result, and then reconstructing the filtered signal in the time domain using the IFFT. For example, this approach enables us to amplify certain frequencies while leaving others unchanged.
Now, where do we find this functionality?
Several Python libraries provide FFT and IFFT implementations that can process NumPy arrays (and thus OpenCV images). Here are the five major contenders:
numpy.fft
(for more information, refer to http://docs.scipy.org/doc/numpy/reference/routines.fft.html). The module also offers other signal processing functions to work with the output of the FFT.scipy.fftpack
(for more information refer to http://docs.scipy.org/doc/scipy/reference/fftpack.html). This SciPy module is closely based on the numpy.fft
module, but adds some optional arguments and dynamic optimizations based on the input format. The SciPy module also adds more signal processing functions to work with the output of the FFT.cv2.dft
) and IFT (cv2.idft
). An official tutorial provides examples and a comparison to NumPy's FFT implementation at http://docs.opencv.org/doc/tutorials/core/discrete_fourier_transform/discrete_fourier_transform.html. OpenCV's FFT and IFT interfaces are not directly interoperable with the numpy.fft
and scipy.fftpack
modules that offer a broader range of signal processing functionality. (The data is formatted very differently.)numpy.fft
and scipy.fftpack
.reikna.fft
. Reinka internally uses PyCUDA or PyOpenCL arrays (not NumPy arrays), but it provides interfaces for conversion from NumPy arrays to these GPU arrays and back. The converted NumPy output is compatible with other signal processing functionality as implemented in numpy.fft
and scipy.fftpack
. However, this compatibility comes at a high overhead cost due to locking, reading, and converting the contents of the GPU memory.NumPy, SciPy, OpenCV, and PyFFTW are open-source libraries under the BSD license. Reinka is an open-source library under the MIT license.
I recommend PyFFTW because of its optimizations and its interoperability (at a low overhead cost) with all the other functionality that interests us in NumPy, SciPy, and OpenCV. For a tour of PyFFTW's features, including its NumPy- and SciPy-compatible interfaces, refer to the official tutorial at https://hgomersall.github.io/pyFFTW/sphinx/tutorial.html.
Depending on our platform, we can set up PyFFTW in one of the following ways:
$ sudo port install py27-pyfftw
$ sudo apt-get install python-fftw3
In Ubuntu 14.04 and earlier versions (and derivatives thereof), do not use this package, as its version is too old. Instead, use the PyFFTW source bundle, as described in the last bullet of this list.
$ sudo apt-get install python-pyfftw
In Debian Wheezy and its derivatives, including Raspbian, this package does not exist. Instead, use the PyFFTW source bundle, as described in the next bullet.
setup.py
script inside the decompressed folder.We have our FFT and IFFT needs covered by FFTW (and if we were cowboys instead of secret agents, we could say, "Cover me!"). For additional signal processing functionality, we will use SciPy, which can be set up in the way described in Chapter 1, Preparing for the Mission, Setting up a development machine.
Signal processing is not the only new material that we must learn for Lazy Eyes, so let's now look at other functionality that is provided by OpenCV.