Extracting repeating signals from video using the Fast Fourier Transform (FFT)

An audio signal is typically visualized as a bar chart or wave. The bar chart or wave is high when the sound is loud and low when it is soft. We recognize that a repetitive sound, such as a metronome's beat, makes repetitive peaks and valleys in the visualization. When audio has multiple channels (being a stereo or surround sound recording), each channel can be considered as a separate signal and can be visualized as a separate bar chart or wave.

Consider the following pair of graphs. They express the same signal, first as a function of time and then as a function of frequency. Within the time domain, we see one wide peak and valley (in other words, a tapering effect) spanning many narrow peaks and valleys. Within the frequency domain, we see a low-frequency peak and a high-frequency peak.

Extracting repeating signals from video using the Fast Fourier Transform (FFT)

The transformation from the time domain to the frequency domain is called the Fourier transform. Conversely, the transformation from the frequency domain to the time domain is called the inverse Fourier transform. Within the digital world, signals are discrete, not continuous, and we use the terms discrete Fourier transform (DFT) and inverse discrete Fourier transform (IDFT). There is a variety of efficient algorithms to compute the DFT or IDFT and such an algorithm might be described as a Fast Fourier Transform or an Inverse Fast Fourier Transform.

Note

For algorithmic descriptions, refer to the following Wikipedia article: http://en.wikipedia.org/wiki/Fast_Fourier_transform.

The result of the Fourier transform (including its discrete variants) is a function that maps a frequency to an amplitude and phase. The amplitude represents the magnitude of the frequency's contribution to the signal. The phase represents a temporal shift; it determines whether the frequency's contribution starts on a high or a low. Typically, amplitude and phase are encoded in a complex number, a+bi, where amplitude=sqrt(a^2+b^2) and phase=atan2(a,b).

Note

For an explanation of complex numbers, refer to the following Wikipedia article: http://en.wikipedia.org/wiki/Complex_number.

The FFT and IFFT are fundamental to a field of computer science called digital signal processing. Many signal processing applications, including Lazy Eyes, involve taking the signal's FFT, modifying or removing certain frequencies in the FFT result, and then reconstructing the filtered signal in the time domain using the IFFT. For example, this approach enables us to amplify certain frequencies while leaving others unchanged.

Now, where do we find this functionality?

Choosing and setting up an FFT library

Several Python libraries provide FFT and IFFT implementations that can process NumPy arrays (and thus OpenCV images). Here are the five major contenders:

NumPy: This provides FFT and IFFT implementations in a module called numpy.fft (for more information, refer to http://docs.scipy.org/doc/numpy/reference/routines.fft.html). The module also offers other signal processing functions to work with the output of the FFT.
SciPy: This provides FFT and IFFT implementations in a module called scipy.fftpack (for more information refer to http://docs.scipy.org/doc/scipy/reference/fftpack.html). This SciPy module is closely based on the numpy.fft module, but adds some optional arguments and dynamic optimizations based on the input format. The SciPy module also adds more signal processing functions to work with the output of the FFT.
OpenCV: This has implementations of FFT (cv2.dft) and IFT (cv2.idft). An official tutorial provides examples and a comparison to NumPy's FFT implementation at http://docs.opencv.org/doc/tutorials/core/discrete_fourier_transform/discrete_fourier_transform.html. OpenCV's FFT and IFT interfaces are not directly interoperable with the numpy.fft and scipy.fftpack modules that offer a broader range of signal processing functionality. (The data is formatted very differently.)
PyFFTW: This is a Python wrapper (https://hgomersall.github.io/pyFFTW/) around a C library called the Fastest Fourier Transform in the West (FFTW) (for more information, refer to http://www.fftw.org/). FFTW provides multiple implementations of FFT and IFFT. At runtime, it dynamically selects implementations that are well optimized for given input formats, output formats, and system capabilities. Optionally, it takes advantage of multithreading (and its threads might run on multiple CPU cores, as the implementation releases Python's Global Interpreter Lock or GIL). PyFFTW provides optional interfaces matching NumPy's and SciPy's FFT and IFFT functions. These interfaces have a low overhead cost (thanks to good caching options that are provided by PyFFTW) and they help to ensure that PyFFTW is interoperable with a broader range of signal processing functionality as implemented in numpy.fft and scipy.fftpack.
Reinka: This is a Python library for GPU-accelerated computations using either PyCUDA (http://mathema.tician.de/software/pycuda/) or PyOpenCL (http://mathema.tician.de/software/pyopencl/) as a backend. Reinka (http://reikna.publicfields.net/en/latest/) provides FFT and IFFT implementations in a module called reikna.fft. Reinka internally uses PyCUDA or PyOpenCL arrays (not NumPy arrays), but it provides interfaces for conversion from NumPy arrays to these GPU arrays and back. The converted NumPy output is compatible with other signal processing functionality as implemented in numpy.fft and scipy.fftpack. However, this compatibility comes at a high overhead cost due to locking, reading, and converting the contents of the GPU memory.

NumPy, SciPy, OpenCV, and PyFFTW are open-source libraries under the BSD license. Reinka is an open-source library under the MIT license.

I recommend PyFFTW because of its optimizations and its interoperability (at a low overhead cost) with all the other functionality that interests us in NumPy, SciPy, and OpenCV. For a tour of PyFFTW's features, including its NumPy- and SciPy-compatible interfaces, refer to the official tutorial at https://hgomersall.github.io/pyFFTW/sphinx/tutorial.html.

Depending on our platform, we can set up PyFFTW in one of the following ways:

In Windows, download and run a binary installer from https://pypi.python.org/pypi/pyFFTW. Choose the installer for either a 32-bit Python 2.7 or 64-bit Python 2.7 (depending on whether your Python installation, not necessarily your system, is 64-bit).
In Mac with MacPorts, run the following command in Terminal:
```
$ sudo port install py27-pyfftw
```
In Ubuntu 14.10 (Utopic) and its derivatives, including Linux Mint 14.10, run the following command in Terminal:
```
$ sudo apt-get install python-fftw3
```
In Ubuntu 14.04 and earlier versions (and derivatives thereof), do not use this package, as its version is too old. Instead, use the PyFFTW source bundle, as described in the last bullet of this list.
In Debian Jessie, Debian Sid, and their derivatives, run the following command in Terminal:
```
$ sudo apt-get install python-pyfftw
```
In Debian Wheezy and its derivatives, including Raspbian, this package does not exist. Instead, use the PyFFTW source bundle, as described in the next bullet.
For any other system, download the PyFFTW source bundle from https://pypi.python.org/pypi/pyFFTW. Decompress it and run the setup.py script inside the decompressed folder.
Note
Some old versions of the library are called PyFFTW3. We do not want PyFFTW3. However, on Ubuntu 14.10 and its derivatives, the packages are misnamed such that python-fftw3 is really the most recent packaged version (whereas python-fftw is an older PyFFTW3 version).

We have our FFT and IFFT needs covered by FFTW (and if we were cowboys instead of secret agents, we could say, "Cover me!"). For additional signal processing functionality, we will use SciPy, which can be set up in the way described in Chapter 1, Preparing for the Mission, Setting up a development machine.

Signal processing is not the only new material that we must learn for Lazy Eyes, so let's now look at other functionality that is provided by OpenCV.