Image feature matching

SfM, as presented in the last section, relies on the understanding of the geometric relationship between images as it pertains to the visible objects in them. We saw that we can calculate the exact motion between two images with sufficient information on how the objects in the images move. The essential or fundamental matrices, which can be estimated linearly from image features, can be decomposed to the rotation and translation elements that define a 3D rigid transform. Thereafter, this transform can help us triangulate the 3D position of the objects, from the 3D-2D projection equations or from a dense stereo matching over the rectified epilines. It all begins with image feature matching, so we will see how to obtain robust and noise-free matching.

OpenCV has an extensive offering of 2D feature detectors (also called extractors) and descriptors. Features are designed to be invariant to image deformations so they can be matched through translation, rotation, scaling, and other more complicated transformations (affine, projective) of the objects in the scene. One of the latest additions to OpenCV's APIs is the AKAZE feature extractor and detector, which presents a very good compromise between speed of calculation and robustness to transformation. AKAZE was shown to outperform other prominent features, such as ORB (short for Oriented BRIEF) and SURF (short for Speeded Up Robust Features).

The following snippet will extract an AKAZE key point, calculate AKAZE features for each of the images we collect in imagesFilenames, and save them in the keypoints and descriptors arrays respectively:

auto detector = AKAZE::create();
auto extractor = AKAZE::create();

for (const auto& i : imagesFilenames) {
    Mat grayscale;
    cvtColor(images[i], grayscale, COLOR_BGR2GRAY);
    detector->detect(grayscale, keypoints[i]);
    extractor->compute(grayscale, keypoints[i], descriptors[i]);

    CV_LOG_INFO(TAG, "Found " + to_string(keypoints[i].size()) + " 
    keypoints in " + i);
}

Note we also convert the images to grayscale; however, this step may be omitted and the results will not suffer.

Here's a visualization of the detected features in two adjacent images. Notice how many of them repeat; this is known as feature repeatability, which is one of the most desired functions in a good feature extractor:

Next up is matching the features between every pair of images. OpenCV provides an excellent feature matching suite. AKAZE feature descriptors are binary, meaning they cannot be regarded as binary-encoded numbers when matched; they must be compared on the bit level with bit-wise operators. OpenCV offers a Hamming distance metric for binary feature matchers, which essentially count the number of incorrect matches between the two-bit sequences:

vector<DMatch> matchWithRatioTest(const DescriptorMatcher& matcher, 
                                  const Mat& desc1, 
                                  const Mat& desc2) 
{
    // Raw match
    vector< vector<DMatch> > nnMatch;
    matcher.knnMatch(desc1, desc2, nnMatch, 2);

    // Ratio test filter
    vector<DMatch> ratioMatched;
    for (size_t i = 0; i < nnMatch.size(); i++) {
        const DMatch first = nnMatch[i][0];
        const float dist1 = nnMatch[i][0].distance;
        const float dist2 = nnMatch[i][1].distance;

        if (dist1 < MATCH_RATIO_THRESHOLD * dist2) {
            ratioMatched.push_back(first);
        }
    }

    return ratioMatched;
}

The preceding function not only invokes our matcher (for example, a BFMatcher(NORM_HAMMING)) regularly, it also performs the ratio test. This simple test is a very fundamental concept in many computer vision algorithms that rely on feature matching (such as SfM, panorama stitching, sparse tracking, and more). Instead of looking for a single match for a feature from image A in image B, we look for two matches in image B and make sure there is no confusion. Confusion in matching may arise if two potential matching-feature descriptors are too similar (in terms of their distance metric) and we cannot tell which of them is the correct match for the query, so we discard them both to prevent confusion.

Next, we implement a reciprocity filter. This filter only allows feature matches that match (with a ratio test) in A to B, as well as B to A. Essentially, this is making sure there's a one-to-one match between features in image A and those in image B: a symmetric match. The reciprocity filter removes even more ambiguity and contributes to a cleaner, more robust match:

// Match with ratio test filter
vector<DMatch> match = matchWithRatioTest(matcher, descriptors[imgi], descriptors[imgj]);

// Reciprocity test filter
vector<DMatch> matchRcp = matchWithRatioTest(matcher, descriptors[imgj], descriptors[imgi]);
vector<DMatch> merged;
for (const DMatch& dmrecip : matchRcp) {
    bool found = false;
    for (const DMatch& dm : match) {
        // Only accept match if 1 matches 2 AND 2 matches 1.
        if (dmrecip.queryIdx == dm.trainIdx and dmrecip.trainIdx == 
        dm.queryIdx) {
            merged.push_back(dm);
            found = true;
            break;
        }
    }
    if (found) {
        continue;
    }
}

Lastly, we apply the epipolar constraint. Every two images that have a valid rigid transformation between them would comply with the epipolar constraint over their feature points, , and those who don't pass this test (with sufficient success) are likely not a good match and may contribute to noise. We achieve this by calculating the fundamental matrix with a voting algorithm (RANSAC) and checking for the ratio of inliers to outliers. We apply a threshold to discard matches with a low survival rate with respect to the original match:

// Fundamental matrix filter
vector<uint8_t> inliersMask(merged.size());
vector<Point2f> imgiPoints, imgjPoints;
for (const DMatch& m : merged) {
    imgiPoints.push_back(keypoints[imgi][m.queryIdx].pt);
    imgjPoints.push_back(keypoints[imgj][m.trainIdx].pt);
}
findFundamentalMat(imgiPoints, imgjPoints, inliersMask);

vector<DMatch> final;
for (size_t m = 0; m < merged.size(); m++) {
    if (inliersMask[m]) {
        final.push_back(merged[m]);
    }
}

if ((float)final.size() / (float)match.size() < PAIR_MATCH_SURVIVAL_RATE) {
    CV_LOG_INFO(TAG, "Final match '" + imgi + "'->'" + imgj + "' has less than "+to_string(PAIR_MATCH_SURVIVAL_RATE)+" inliers from orignal. Skip");
    continue;
}

We can see the effect of each of the filtering steps, raw match, ratio, reciprocity, and epipolar, in the following figure: