Chapter 6. Seeing a Heartbeat with a Motion Amplifying Camera

 

Remove everything that has no relevance to the story. If you say in the first chapter that there is a rifle hanging on the wall, in the second or third chapter it absolutely must go off. If it's not going to be fired, it shouldn't be hanging there.

 
 --Anton Chekhov
 

King Julian: I don't know why the sacrifice didn't work. The science seemed so solid.

 
 --Madagascar: Escape 2 Africa (2008)

Despite their strange design and mysterious engineering, Q's gadgets always prove useful and reliable. Bond has such faith in the technology that he never even asks how to charge the batteries.

One of the inventive ideas in the Bond franchise is that even a lightly equipped spy should be able to see and photograph concealed objects, anyplace, anytime. Let's consider a timeline of a few relevant gadgets in the movies:

These gadgets deal with unseen wavelengths of light (or radiation) and are broadly comparable to real-world devices such as airport security scanners and night vision goggles. However, it remains difficult to explain how Bond's equipment is so compact and how it takes such clear pictures in diverse lighting conditions and through diverse materials. Moreover, if Bond's devices are active scanners (meaning they emit X-ray radiation or infrared light), they will be clearly visible to other spies using similar hardware.

To take another approach, what if we avoid unseen wavelengths of light but instead focus on unseen frequencies of motion? Many things move in a pattern that is too fast or too slow for us to easily notice. Suppose that a man is standing in one place. If he shifts one leg more than the other, perhaps he is concealing a heavy object, such as a gun, on the side that he shifts more. We also might fail to notice deviations from a pattern. Suppose the same man has been looking straight ahead but suddenly, when he believes no one is looking, his eyes dart to one side. Is he watching someone?

We can make motions of a certain frequency more visible by repeating them, like a delayed afterimage or a ghost, with each repetition being more faint (less opaque) than the last. The effect is analogous to an echo or a ripple, and it is achieved using an algorithm called Eulerian video magnification.

By applying this technique, we will build a desktop app that allows us to simultaneously see the present and selected slices of the past. The idea of experiencing multiple images simultaneously is, to me, quite natural because for the first 26 years of my life, I had strabismus—commonly called a lazy eye—that caused double vision. A surgeon corrected my eyesight and gave me depth perception but, in memory of strabismus, I would like to name this application Lazy Eyes.

Let's take a closer look—or two or more closer looks—at the fast-paced, moving world that we share with all the other secret agents.

Of all our apps, Lazy Eyes has the simplest user interface. It just shows a live video feed with a special effect that highlights motion. The parameters of the effect are quite complex and, moreover, modifying them at runtime would have a big effect on performance. Thus, we do not provide a user interface to reconfigure the effect, but we do provide many parameters in code to allow a programmer to create many variants of the effect and the app.

The following is a screenshot illustrating one configuration of the app. This image shows me eating cake. My hands and face are moving often and we see an effect that looks like light and dark waves rippling around the places where moving edges have been. (The effect is more graceful in a live video than in a screenshot.)

Planning the Lazy Eyes app

Regardless of how it is configured, the app loops through the following actions:

  1. Capturing an image.
  2. Copying and downsampling the image while applying a blur filter and optionally an edge finding filter. We will downsample using so-called image pyramids, which will be discussed in Compositing two images using image pyramids, later in this chapter. The purpose of downsampling is to achieve a higher frame rate by reducing the amount of image data used in subsequent operations. The purpose of applying a blur filter and optionally an edge finding filter is to create haloes that are useful in amplifying motion.
  3. Storing the downsampled copy in a history of frames, with a timestamp. The history has a fixed capacity and once it is full, the oldest frame is overwritten to make room for the new one.
  4. If the history is not yet full, we continue to the next iteration of the loop.
  5. Decomposing the history into a list of frequencies describing fluctuations (motion) at each pixel. The decomposition function is called a Fast Fourier Transform. We will discuss this in the Extracting repeating signals from video using the Fast Fourier Transform section, later in this chapter.
  6. Setting all frequencies to zero except a certain chosen range that interests us. In other words, filter out the data on motions that are faster or slower than certain thresholds.
  7. Recomposing the filtered frequencies into a series of images that are motion maps. Areas that are still (with respect to our chosen range of frequencies) become dark, and areas that are moving might remain bright. The recomposition function is called an Inverse Fast Fourier Transform (IFFT), and we will discuss it later alongside the FFT.
  8. Upsampling the latest motion map (again using image pyramids), intensifying it, and overlaying it additively atop the original camera image.
  9. Showing the resulting composite image.

There it is—a simple plan that requires a rather nuanced implementation and configuration. Let's prepare ourselves by doing a little background research.