Writing an image tracking filter

We will write our tracker as an implementation of the Filter interface, which we created in the previous chapter. The tracker's class name will be ImageDetectionFilter. As member variables, this class has instances of FeatureDetector, DescriptorExtractor, and DescriptorMatcher, as well as several Mat instances that store image data and intermediate or final results of tracking calculations. Some of these results are stored because they do not change from frame to frame. Others are stored simply because it is more efficient than recreating the Mat instance for each frame. The declarations of the class and member variables are as follows:

public class ImageDetectionFilter implements Filter {

  private final Mat mReferenceImage;
  private final MatOfKeyPoint mReferenceKeypoints =
    new MatOfKeyPoint();
  private final Mat mReferenceDescriptors = new Mat();
  // CVType defines the color depth, number of channels, and
  // channel layout in the image.
  private final Mat mReferenceCorners =
    new Mat(4, 1, CvType.CV_32FC2);

  private final MatOfKeyPoint mSceneKeypoints =
    new MatOfKeyPoint();
  private final Mat mSceneDescriptors = new Mat();
  private final Mat mCandidateSceneCorners =
    new Mat(4, 1, CvType.CV_32FC2);
  private final Mat mSceneCorners = new Mat(4, 1,
    CvType.CV_32FC2);
  private final MatOfPoint mIntSceneCorners = new MatOfPoint();

  private final Mat mGraySrc = new Mat();
  private final MatOfDMatch mMatches = new MatOfDMatch();

  private final FeatureDetector mFeatureDetector =
    FeatureDetector.create(FeatureDetector.STAR);
  private final DescriptorExtractor mDescriptorExtractor =
    DescriptorExtractor.create(DescriptorExtractor.FREAK);
  private final DescriptorMatcher mDescriptorMatcher =
    DescriptorMatcher.create(
      DescriptorMatcher.BRUTEFORCE_HAMMING);

  private final Scalar mLineColor = new Scalar(0, 255, 0);

We want a convenient way to make an image tracker for any arbitrary image. We can package images with our app as so-called drawable resources, which can be loaded by any Android Context subclass such as Activity. Thus, we provide a constructor, ImageDetectionFilter(final Context context, final int referenceImageResourceID), which loads the reference image with the given Context and resource identifier. RGBA and grayscale versions of the image are stored in the member variables. The image's corner points are also stored, and so are its features and descriptors. Its code is as follows:

  public ImageDetectionFilter(final Context context,
    final int referenceImageResourceID) throws IOException {

    mReferenceImage = Utils.loadResource(context,
      referenceImageResourceID,
        Highgui.CV_LOAD_IMAGE_COLOR);

    final Mat referenceImageGray = new Mat();
    Imgproc.cvtColor(mReferenceImage, referenceImageGray,
      Imgproc.COLOR_BGR2GRAY);
    Imgproc.cvtColor(mReferenceImage, mReferenceImage,
      Imgproc.COLOR_BGR2RGBA);

    mReferenceCorners.put(0, 0,
      new double[] {0.0, 0.0});
    mReferenceCorners.put(1, 0,
      new double[] {referenceImageGray.cols(), 0.0});
    mReferenceCorners.put(2, 0,
      new double[] {referenceImageGray.cols(),
        referenceImageGray.rows()});
    mReferenceCorners.put(3, 0,
      new double[] {0.0, referenceImageGray.rows()});

    mFeatureDetector.detect(referenceImageGray,
      mReferenceKeypoints);
    mDescriptorExtractor.compute(referenceImageGray,
      mReferenceKeypoints, mReferenceDescriptors);
  }

Recall that the Filter interface declares a method, apply(final Mat src, final Mat dst). Our implementation of this method applies the feature detector, descriptor extractor, and descriptor matcher to a grayscale version of the source image. Then, we call helper functions that find the four corners of the tracked target (if any), and draw the quadrilateral outline. The code is as follows:

  @Override
  public void apply(final Mat src, final Mat dst) {
    Imgproc.cvtColor(src, mGraySrc, Imgproc.COLOR_RGBA2GRAY);

    mFeatureDetector.detect(mGraySrc, mSceneKeypoints);
    mDescriptorExtractor.compute(mGraySrc, mSceneKeypoints,
      mSceneDescriptors);
    mDescriptorMatcher.match(mSceneDescriptors,
      mReferenceDescriptors, mMatches);

    findSceneCorners();
    draw(src, dst);
  }

The findSceneCorners() helper method is a bigger block of code, but a lot of it simply iterates through the matches to assemble a list of the best ones. If all the matches are really bad (as indicated by a large distance value), we assume that the target is not in the scene and we clear any previous estimate of its corner locations. If the matches are not really bad, but are not really good either, we assume that the target is somewhere in the scene but we keep our previous estimate of its corner locations. This policy helps to stabilize the estimate of the corner locations. Finally, if the matches are good and there are at least four of them, we find the homography and use it to update the estimated corner locations.

Note

For a mathematical description of finding the homography, see the official OpenCV documentation at http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html?highlight=findhomography#findhomography.

The implementation of findSceneCorners() is as follows:

  private void findSceneCorners() {

    List<DMatch> matchesList = mMatches.toList();
    if (matchesList.size() < 4) {
      // There are too few matches to find the homography.
      return;
    }

    List<KeyPoint> referenceKeypointsList =
      mReferenceKeypoints.toList();
    List<KeyPoint> sceneKeypointsList =
      mSceneKeypoints.toList();

    // Calculate the max and min distances between keypoints.
    double maxDist = 0.0;
    double minDist = Double.MAX_VALUE;
    for(DMatch match : matchesList) {
      double dist = match.distance;
      if (dist < minDist) {
        minDist = dist;
      }
      if (dist > maxDist) {
        maxDist = dist;
      }
    }

    // The thresholds for minDist are chosen subjectively
    // based on testing. The unit is not related to pixel
    // distances; it is related to the number of failed tests
    // for similarity between the matched descriptors.
    if (minDist > 50.0) {
      // The target is completely lost.
      // Discard any previously found corners.
      mSceneCorners.create(0, 0, mSceneCorners.type());
      return;
    } else if (minDist > 25.0) {
      // The target is lost but maybe it is still close.
      // Keep any previously found corners.
      return;
    }

    // Identify "good" keypoints based on match distance.
    ArrayList<Point> goodReferencePointsList =
      new ArrayList<Point>();
    ArrayList<Point> goodScenePointsList =
      new ArrayList<Point>();
    double maxGoodMatchDist = 1.75 * minDist;
    for(DMatch match : matchesList) {
      if (match.distance < maxGoodMatchDist) {
        goodReferencePointsList.add(
          referenceKeypointsList.get(match.trainIdx).pt);
        goodScenePointsList.add(
          sceneKeypointsList.get(match.queryIdx).pt);
      }
    }

    if (goodReferencePointsList.size() < 4 ||
        goodScenePointsList.size() < 4) {
        // There are too few good points to find the homography.
      return;
    }

    MatOfPoint2f goodReferencePoints = new MatOfPoint2f();
    goodReferencePoints.fromList(goodReferencePointsList);

    MatOfPoint2f goodScenePoints = new MatOfPoint2f();
    goodScenePoints.fromList(goodScenePointsList);

    Mat homography = Calib3d.findHomography(
      goodReferencePoints, goodScenePoints);
    Core.perspectiveTransform(mReferenceCorners,
      mCandidateSceneCorners, homography);

    mCandidateSceneCorners.convertTo(mIntSceneCorners,
      CvType.CV_32S);
    if (Imgproc.isContourConvex(mIntSceneCorners)) {
      mCandidateSceneCorners.copyTo(mSceneCorners);
    }
  }

Our other helper method, draw(Mat src, Mat dst), starts by copying the source image to the destination. Then, if the target is not being tracked, we draw a thumbnail of it in a corner of the image, so that the user knows what to seek. If the target is being tracked, we draw an outline around it. The code is as follows:

  protected void draw(Mat src, Mat dst) {

    if (dst != src) {
      src.copyTo(dst);
    }

    if (mSceneCorners.height() < 4) {
      // The target has not been found.

      // Draw a thumbnail of the target in the upper-left
      // corner so that the user knows what it is.

      int height = mReferenceImage.height();
      int width = mReferenceImage.width();
      int maxDimension = Math.min(dst.width(),
        dst.height()) / 2;
      double aspectRatio = width / (double)height;
      if (height > width) {
        height = maxDimension;
        width = (int)(height * aspectRatio);
      } else {
        width = maxDimension;
        height = (int)(width / aspectRatio);
      }
      Mat dstROI = dst.submat(0, height, 0, width);
      Imgproc.resize(mReferenceImage, dstROI, dstROI.size(),
        0.0, 0.0, Imgproc.INTER_AREA);

      return;
    }

    // Outline the found target in green.
    Core.line(dst, new Point(mSceneCorners.get(0, 0)),
      new Point(mSceneCorners.get(1, 0)), mLineColor, 4);
    Core.line(dst, new Point(mSceneCorners.get(1, 0)),
      new Point(mSceneCorners.get(2, 0)), mLineColor, 4);
    Core.line(dst, new Point(mSceneCorners.get(2, 0)),
      new Point(mSceneCorners.get(3, 0)), mLineColor, 4);
    Core.line(dst, new Point(mSceneCorners.get(3,0)),
      new Point(mSceneCorners.get(0, 0)), mLineColor, 4);
  }
}

Although ImageDetectionFilter has a more complicated implementation than our previous filters, it still has a simple interface. Just instantiate it with a drawable resource, and then apply the filter to source and destination images as needed.