To make sure there is at least a one-second gap between collecting new faces, we need to measure how much time has passed. This is done as follows:
// Check how long since the previous face was added.
double current_time = (double)getTickCount();
double timeDiff_seconds = (current_time -
old_time) / getTickFrequency();
To compare the similarity of two images, pixel by pixel, you can find the relative L2 error, which just involves subtracting one image from the other, summing the squared value of it, and then getting the square root of it. So if the person had not moved at all, subtracting the current face from the previous face should give a very low number at each pixel, but if they had just moved slightly in any direction, subtracting the pixels would give a large number and so the L2 error will be high. As the result is summed over all pixels, the value will depend on the image resolution. So to get the mean error, we should divide this value by the total number of pixels in the image. Let's put this in a handy function, getSimilarity(), as follows:
double getSimilarity(const Mat A, const Mat B) {
// Calculate the L2 relative error between the 2 images.
double errorL2 = norm(A, B, CV_L2);
// Scale the value since L2 is summed across all pixels.
double similarity = errorL2 / (double)(A.rows * A.cols);
return similarity;
}
...
// Check if this face looks different from the previous face.
double imageDiff = MAX_DBL;
if (old_prepreprocessedFaceprepreprocessedFace.data) {
imageDiff = getSimilarity(preprocessedFace,
old_prepreprocessedFace);
}
This similarity will often be less than 0.2 if the image did not move much, and higher than 0.4 if the image did move, so let's use 0.3 as our threshold for collecting a new face.
There are many tricks we can perform to obtain more training data, such as using mirrored faces, adding random noise, shifting the face by a few pixels, scaling the face by a percentage, or rotating the face by a few degrees (even though we specifically tried to remove these effects when preprocessing the face!). Let's add mirrored faces to the training set, so that we have both a larger training set and a reduction in the problems of asymmetrical faces, or if a user is always oriented slightly to the left or right during training but not testing. This is done as follows:
// Only process the face if it's noticeably different from the
// previous frame and there has been a noticeable time gap.
if ((imageDiff > 0.3) && (timeDiff_seconds > 1.0)) {
// Also add the mirror image to the training set.
Mat mirroredFace;
flip(preprocessedFace, mirroredFace, 1);
// Add the face & mirrored face to the detected face lists.
preprocessedFaces.push_back(preprocessedFace);
preprocessedFaces.push_back(mirroredFace);
faceLabels.push_back(m_selectedPerson);
faceLabels.push_back(m_selectedPerson);
// Keep a copy of the processed face,
// to compare on next iteration.
old_prepreprocessedFace = preprocessedFace;
old_time = current_time;
}
This will collect the std::vector arrays, preprocessedFaces, and faceLabels for a preprocessed face, as well as the label or ID number of that person (assuming it is in the integer m_selectedPerson variable).
To make it more obvious to the user that we have added their current face to the collection, you could provide a visual notification by either displaying a large white rectangle over the whole image, or just displaying their face for just a fraction of a second so they realize a photo was taken. With OpenCV's C++ interface, you can use the + overloaded cv::Mat operator to add a value to every pixel in the image and have it clipped to 255 (using saturate_cast, so it doesn't overflow from white back to black!). Assuming displayedFrame will be a copy of the color camera frame that should be shown, insert this after the preceding code for face collection:
// Get access to the face region-of-interest.
Mat displayedFaceRegion = displayedFrame(faceRect);
// Add some brightness to each pixel of the face region.
displayedFaceRegion += CV_RGB(90,90,90);