The first concept to introduce is related to images, which can be seen as a two-dimensional (2D) view of a 3D world. A digital image is a numeric representation, normally binary, of a 2D image as a finite set of digital values, which are called pixels (the concept of a pixel will be explained in detail in the Concepts of pixels, colors, channels, images, and color spaces section). Therefore, the goal of computer vision is to transform this 2D data into the following:
- A new representation (for example, a new image)
- A decision (for example, perform a concrete task)
- A new result (for example, correct classification of the image)
- Some useful information extraction (for example, object detection)
Computer vision may tackle common problems (or difficulties) when dealing with image-processing techniques:
- Ambiguous images because they are affected by perspective, which can produce changes in the visual appearance of the image. For example, the same object viewed from different perspectives can result in different images.
- Images commonly affected by many factors, such as illumination, weather, reflections, and movements.
- Objects in the image may also be occluded by other objects, making it difficult to detect or classify the occluded ones. Depending on the level of the occlusion, the required task (for example, classification of an image into some predefined categories) can be really challenging.
To put all of these difficulties together, imagine that you want to develop a face-detection system. This system should be robust enough to deal with changes in illumination or weather conditions. Additionally, the system should tackle the movements of the head, and could even deal with the fact that the user can be farther from or closer to the camera. It should be able to detect the head of the user with some degree of rotation in every axis (yaw, roll, and pitch). For example, many face-detection algorithms show good performance when the head is near frontal. However, they fail to detect a face if it's not frontal (for example, a face in profile). Moreover, you may want to detect the face even if the user is wearing glasses or sunglasses, which produces an occlusion in the eye region. When developing a computer vision project, you must take all of these factors into consideration. A good approximation is to have many test images to validate your algorithm by incorporating some difficulties. You can also classify your test images in connection with the main difficulty they have to easily detect the weak points of your algorithm.