Modifying ImageDetectionFilter for 3D tracking

For 3D tracking, ImageDetectionFilter needs all the same member variables as before, plus several more to store computations about the target's pose. Moreover, the class needs to implement the ARFilter interface. Let's modify ImageDetectionFilter as follows:

public class ImageDetectionFilter implements ARFilter {

  // ...

  private final MatOfDouble mDistCoeffs = new MatOfDouble(
    0.0, 0.0, 0.0, 0.0);

  private final CameraProjectionAdapter mCameraProjectionAdapter;
  private final MatOfDouble mRVec = new MatOfDouble();
  private final MatOfDouble mTVec = new MatOfDouble();
  private final MatOfDouble mRotation = new MatOfDouble();
  private final float[] mGLPose = new float[16];

  private boolean mTargetFound = false;

The constructor should require an instance of CameraProjectionAdapter as an additional argument. We store it in a member variable, as seen in the following code:

  public ImageDetectionFilter(final Context context,
    final int referenceImageResourceID,
        final CameraProjectionAdapter cameraProjectionAdapter)
          throws IOException {

    // ...

    mCameraProjectionAdapter = cameraProjectionAdapter;
  }

To satisfy the ARFilter interface, we need to implement a getter for the OpenGL pose matrix. When the target is lost, this getter should return null because we have no valid data about the pose. We can implement the getter as follows:

  @Override
  public float[] getGLPose() {
    return (mTargetFound ? mGLPose : null);
  }

Let's rename our findHomography method to findPose. To reflect this name change, the implementation of the apply method changes as follows:

  @Override
  public void apply(final Mat src, final Mat dst) {
    Imgproc.cvtColor(src, mGraySrc, Imgproc.COLOR_RGBA2GRAY);

    mFeatureDetector.detect(mGraySrc, mSceneKeypoints);
    mDescriptorExtractor.compute(mGraySrc, mSceneKeypoints,
      mSceneDescriptors);
    mDescriptorMatcher.match(mSceneDescriptors,
      mReferenceDescriptors, mMatches);

    findPose();
    draw(src, dst);
  }

After finding keypoints, the implementation of findPose starts to differ from the old findHomography method. We convert the reference keypoints to 3D (with a z value of 0), for using in 3D computations. Then, we get an OpenCV projection matrix from our instance of CameraProjectionAdapter. Next, we solve for the target's position and rotation, based on the matching keypoints and the projection. Most of the calculations are done by an OpenCV function called Calib3d.solvePnP(MatOfPoint3f objectPoints, MatOfPoint2f imagePoints, Mat cameraMatrix, MatOfDouble distCoeffs, Mat rvec, Mat tvec). This function puts the position and rotation results in two separate vectors. The y and z directions in OpenCV are inverted compared to OpenGL, so we need to multiply these components of the vectors by -1. We convert the rotation vector into a matrix using another OpenCV function called Calib3d.Rodrigues(Mat src, Mat dst). Last, we manually convert the resulting rotation matrix and position vector into a float[16] array that is appropriate for OpenGL. The code is as follows:

  private void findPose() {

    // ...

    // Identify "good" keypoints based on match distance.
    List<Point3> goodReferencePointsList =
      new ArrayList<Point3>();
    ArrayList<Point> goodScenePointsList =
      new ArrayList<Point>();
    double maxGoodMatchDist = 1.75 * minDist;
    for(DMatch match : matchesList) {
      if (match.distance < maxGoodMatchDist) {
        Point point =
          referenceKeypointsList.get(match.trainIdx).pt;
        Point3 point3 = new Point3(point.x, point.y, 0.0);
        goodReferencePointsList.add(point3);
        goodScenePointsList.add(
          sceneKeypointsList.get(match.queryIdx).pt);
      }
    }

    if (goodReferencePointsList.size() < 4 ||
      goodScenePointsList.size() < 4) {
        // There are too few good points to find the pose.
         return;
    }

    MatOfPoint3f goodReferencePoints = new MatOfPoint3f();
    goodReferencePoints.fromList(goodReferencePointsList);

    MatOfPoint2f goodScenePoints = new MatOfPoint2f();
    goodScenePoints.fromList(goodScenePointsList);

    MatOfDouble projection =
      mCameraProjectionAdapter.getProjectionCV();
    Calib3d.solvePnP(goodReferencePoints, goodScenePoints,
      projection, mDistCoeffs, mRVec, mTVec);

    double[] rVecArray = mRVec.toArray();
    rVecArray[1] *= -1.0;
    rVecArray[2] *= -1.0;
    mRVec.fromArray(rVecArray);

    Calib3d.Rodrigues(mRVec, mRotation);

    double[] tVecArray = mTVec.toArray();

    mGLPose[0]  =  (float)mRotation.get(0, 0)[0];
    mGLPose[1]  =  (float)mRotation.get(1, 0)[0];
    mGLPose[2]  =  (float)mRotation.get(2, 0)[0];
    mGLPose[3]  =  0f;
    mGLPose[4]  =  (float)mRotation.get(0, 1)[0];
    mGLPose[5]  =  (float)mRotation.get(1, 1)[0];
    mGLPose[6]  =  (float)mRotation.get(2, 1)[0];
    mGLPose[7]  =  0f;
    mGLPose[8]  =  (float)mRotation.get(0, 2)[0];
    mGLPose[9]  =  (float)mRotation.get(1, 2)[0];
    mGLPose[10] =  (float)mRotation.get(2, 2)[0];
    mGLPose[11] =  0f;
    mGLPose[12] =  (float)tVecArray[0];
    mGLPose[13] = -(float)tVecArray[1];
    mGLPose[14] = -(float)tVecArray[2];
    mGLPose[15] =  1f;

    mTargetFound = true;
  }

Last, let's modify our draw method by removing the code that draws a green border around the tracked image. (Instead, the ARCubeRenderer class will be responsible for drawing a cube atop the tracked image.) After removing the unwanted code, we are left with the following implementation of the draw method:

  protected void draw(Mat src, Mat dst) {

    if (dst != src) {
      src.copyTo(dst);
    }

    if (!mTargetFound) {
      // The target has not been found.

      // Draw a thumbnail of the target in the upper-left
      // corner so that the user knows what it is.

      int height = mReferenceImage.height();
      int width = mReferenceImage.width();
      int maxDimension = Math.min(dst.width(),
        dst.height()) / 2;
      double aspectRatio = width / (double)height;
      if (height > width) {
        height = maxDimension;
        width = (int)(height * aspectRatio);
      } else {
        width = maxDimension;
        height = (int)(width / aspectRatio);
      }
      Mat dstROI = dst.submat(0, height, 0, width);
      Imgproc.resize(mReferenceImage, dstROI, dstROI.size(),
        0.0, 0.0, Imgproc.INTER_AREA);
    }
  }
}

Next, we look at how to render the cube with OpenGL.