Building projection matrices in CameraProjectionAdapter

Here is an exercise for sightseers. Choose a famous photo that was taken at a recognizable location, somewhere that should still look similar today. Travel to that site and explore it until you know how the photographer set up the shot. Where was the camera positioned and how was it rotated?

If you found an answer, and if you are sure of it, you must have already known which lens or zoom setting the photographer used. Without that information, you could not have narrowed down the feasible camera poses to the one, true pose.

Fortunately, we can get these data via the android.hardware.Camera.Parameters class. Our CameraProjectionAdapter class will allow client code to provide a Camera.Parameters object and then get a projection matrix in either OpenCV or OpenGL format.

Note

Unfortunately, on some devices, the data provided by Camera.Parameters are misleading or just plain wrong.

On a device with a zoom lens, the horizontal and vertical fields of view may be based on the lens's widest (1x) zoom setting. For advice on finding fields of view based on the current zoom setting, see the following StackOverflow thread at http://stackoverflow.com/questions/3261776/determine-angle-of-view-of-smartphone-camera.

On some devices, the fields of view are reported as 360 degrees or other invalid/incorrect values. For example, the Sony Xperia Arc may report 360 degree fields of view.

As an alternative to relying on Camera.Parameters, we could require the user to calibrate the camera at runtime. OpenCV provides calibration functions that require the user to take a picture of a chessboard. We do not cover these functions in this book but you can read about them in the official documentation at http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html or in other OpenCV books such as OpenCV 2 Computer Vision Application Programming Cookbook (Packt Publishing), by Robert Laganière.

As member variables, CameraProjection stores all the data that it needs to construct the projection matrices. It also stores the matrices themselves, and Boolean flags to indicate whether the matrices are dirty (whether they need to be reconstructed the next time that client code fetches them). Let's write the following declaration of the class and member variables:

public class CameraProjectionAdapter {

  float mFOVY = 43.6f; // 30mm equivalent
  float mFOVX = 65.4f; // 30mm equivalent
  int mHeightPx = 640;
  int mWidthPx = 480;
  float mNear = 1f;
  float mFar = 10000f;

  final float[] mProjectionGL = new float[16];
  boolean mProjectionDirtyGL = true;

  MatOfDouble mProjectionCV;
  boolean mProjectionDirtyCV = true;

Note that we assume some default values, just in case the client code fails to provide a Camera.Parameters instance. Also note that the mNear and mFar variables store the near and far clipping distances, meaning that the OpenGL camera will not render anything nearer or farther than these respective distances. We can declare the class and member variables as follows:

  public void setCameraParameters(Parameters parameters) {
    mFOVY = parameters.getVerticalViewAngle();
    mFOVX = parameters.getHorizontalViewAngle();

    Size pictureSize = parameters.getPictureSize();
    mHeightPx = pictureSize.height;
    mWidthPx = pictureSize.width;

    mProjectionDirtyGL = true;
    mProjectionDirtyCV = true;
  }

For the near and far clipping distances, we just need a simple setter, which we can implement as follows:

  public void setClipDistances(float near, float far) {
    mNear = near;
    mFar = far;
    mProjectionDirtyGL = true;
  }

Since the clipping distances are only relevant to OpenGL, we set the dirty flag for only the OpenGL matrix.

Next, let's consider the getter for the OpenGL projection matrix. If the matrix is dirty, we reconstruct it. For constructing a projection matrix, OpenGL provides a function called frustumM(float[] m, int offset, float left, float right, float bottom, float top, float near, float far). The first two arguments are an array and offset where the matrix data should be stored. The rest of the arguments describe the edges of the view frustum, which is the region of space that the camera can see. Although you might be tempted to think that this region is conical, it is actually a truncated pyramid, due to near and far clipping, and the rectangular shape of the user's screen. Here is a visualization of the view frustum:

Building projection matrices in CameraProjectionAdapter

Based on the clipping distances and the fields of view, we can find the view frustum's other measurements by simple trigonometry, as seen in the following implementation:

  public float[] getProjectionGL() {
    if (mProjectionDirtyGL) {
      final float top =
        (float)Math.tan(mFOVY * Math.PI / 360f) * mNear;
      final float right =
        (float)Math.tan(mFOVX * Math.PI / 360f) * mNear;
      Matrix.frustumM(mProjectionGL, 0,
        -right, right, -top, top, mNear, mFar);
      mProjectionDirtyGL = false;
    }
    return mProjectionGL;
  }

The getter for the OpenCV projection matrix is slightly more complicated because the library does not offer a similar helper function for constructing the matrix. Thus, we must understand the contents of the OpenCV projection matrix and construct it ourselves. It has the following 3 x 3 format:

focalLengthXInPixels    0                       centerXInPixels
0                       focalLengthYInPixels    centerYInPixels
0                       0                       1

For a symmetrical lens system (which ought to be the norm), the matrix format simplifies to the following:

focalLengthInPixels  0                      (0.5 * widthInPixels)
0                    focalLengthInPixels    (0.5 * heightInPixels)
0                    0                      1

Focal length is the distance between the camera's sensor and the rear lens element. For OpenCV's purposes, the focal length is expressed in pixel-related units. Notionally, we could attribute a physical size to a pixel, by dividing the camera sensor's width or height by its horizontal or vertical resolution. However, since we do not know any physical measurements of the sensor or lens system, we instead use trigonometry to determine the pixel-related focal length. The implementation is as follows:

  public MatOfDouble getProjectionCV() {
    if (mProjectionDirtyCV) {
      if (mProjectionCV == null) {
        mProjectionCV = new MatOfDouble();
        mProjectionCV.create(3, 3, CvType.CV_64FC1);
      }

      double diagonalPx = Math.sqrt(
        (Math.pow(mWidthPx, 2.0) +
          Math.pow(mHeightPx, 2.0)));
      double diagonalFOV = Math.sqrt(
        (Math.pow(mFOVX, 2.0) +
          Math.pow(mFOVY, 2.0)));
      double focalLengthPx = diagonalPx /
        (2.0 * Math.tan(0.5 * diagonalFOV));

      mProjectionCV.put(0, 0, focalLengthPx);
      mProjectionCV.put(0, 1, 0.0);
      mProjectionCV.put(0, 2, 0.5 * mWidthPx);
      mProjectionCV.put(1, 0, 0.0);
      mProjectionCV.put(1, 1, focalLengthPx);
      mProjectionCV.put(1, 2, 0.5 * mHeightPx);
      mProjectionCV.put(2, 0, 0.0);
      mProjectionCV.put(2, 1, 0.0);
      mProjectionCV.put(2, 2, 0.0);
    }
    return mProjectionCV;
  }
}

Client code can use CameraProjectionAdapter by instantiating it, calling setCameraParameters whenever the active camera changes, and calling getProjectionGL and getProjectionCV whenever a projection matrix is needed for OpenGL or OpenCV computations.