This book will show you how to use OpenCV's Python bindings to capture video, manipulate images, and track objects with either a normal webcam or a specialized depth sensor, such as the Microsoft Kinect. OpenCV is an open source, cross-platform library that provides building blocks for computer vision experiments and applications. It provides high-level interfaces for capturing, processing, and presenting image data. For example, it abstracts details about camera hardware and array allocation. OpenCV is widely used in both academia and industry.
Today, computer vision can reach consumers in many contexts via webcams, camera phones, and gaming sensors such as the Kinect. For better or worse, people love to be on camera, and as developers, we face a demand for applications that capture images, change their appearance, and extract information from them. OpenCV's Python bindings can help us explore solutions to these requirements in a high-level language and in a standardized data format that is interoperable with scientific libraries such as NumPy and SciPy.
Although OpenCV is high-level and interoperable, it is not necessarily easy for new users. Depending on your needs, OpenCV's versatility may come at the cost of a complicated setup process and some uncertainty about how to translate the available functionality into organized and optimized application code. To help you with these problems, I have endeavored to deliver a concise book with an emphasis on clean setup, clean application design, and a simple understanding of each function's purpose. I hope you will learn from this book's project, outgrow it, and still be able to reuse the development environment and parts of the modular code that we have created together.
Specifically, by the end of this book's first chapter, you can have a development environment that links Python, OpenCV, depth camera libraries (OpenNI, SensorKinect), and general-purpose scientific libraries (NumPy, SciPy). After five chapters, you can have several variations of an entertaining application that manipulates users' faces in a live camera feed. Behind this application, you will have a small library of reusable functions and classes that you can apply in your future computer vision projects. Let's look at the book's progression in more detail.
Chapter 1, Setting up OpenCV, lets us examine the steps to setting up Python, OpenCV, and related libraries on Windows, Mac, and Ubuntu. We also discuss OpenCV's community, documentation, and official code samples.
Chapter 2, Handling Files, Cameras, and GUIs, helps us discuss OpenCV's I/O functionality. Then, using an object-oriented design, we write an application that displays a live camera feed, handles keyboard input, and writes video and still image files.
Chapter 3, Filtering Images, helps us to write image filters using OpenCV, NumPy, and SciPy. The filter effects include linear color manipulations, curve color manipulations, blurring, sharpening, and outlining edges. We modify our application to apply some of these filters to the live camera feed.
Chapter 4, Tracking Faces with Haar Cascades, allows us to write a hierarchical face tracker that uses OpenCV to locate faces, eyes, noses, and mouths in an image. We also write functions for copying and resizing regions of an image. We modify our application so that it finds and manipulates faces in the camera feed.
Chapter 5, Detecting Foreground/Background Regions and Depth, helps us learn about the types of data that OpenCV can capture from depth cameras (with the support of OpenNI and SensorKinect). Then, we write functions that use such data to limit an effect to a foreground region. We incorporate this functionality in our application so that we can further refine the face regions before manipulating them.
Appendix A, Integrating with Pygame, lets us modify our application to use Pygame instead of OpenCV for handling certain I/O events. (Pygame offers more diverse event handling functionality.)
Appendix B, Generating Haar Cascades for Custom Targets, allows us to examine a set of OpenCV tools that enable us to build trackers for any type of object or pattern, not necessarily faces.