Image processing includes the following three steps:
- Get the image to work with. This process usually involves some functions so that you can read the image from different sources (camera, video stream, disk, online resources).
- Process the image by applying image-processing techniques to achieve the required functionality (for example, detecting a cat in an image).
- Show the result of the processing step (for example, drawing a bounding box in the image and then saving it to disk).
Furthermore, step two can be broken down into three processing levels:
- Low-level process
- Mid-level process
- High-level process
The low-level process usually takes an image as the input and then outputs another image. Example procedures that can be applied in this step include the following:
- Noise removal
- Image sharpening
- Illumination normalization
- Perspective correction
In connection with the face-detection example, the output image can be an illumination normalization image to deal with changes caused by sun reflections.
The mid-level process takes the preprocessed image to output some kind of representation of the image. Consider this as a collection of numbers (for example, a vector containing 100 numbers), which summarizes the main information of the image to be used for further processing. In connection with the face-detection example, the output could be a rectangle defined by a point (x,y), the width and the height containing the detected face.
The high-level process takes this vector of numbers (usually called attributes) and outputs the final result. For example, the input could be the detected face and the output could be the following:
- Face recognition
- Emotion recognition
- Drowsiness and distraction detection
- Remote heart rate measurement from the face