Chapter 12. Projection and 3D Vision

In this chapter we'll move into three-dimensional vision, first with projections and then with multicamera stereo depth perception. To do this, we'll have to carry along some of the concepts from Chapter 11. We'll need the camera instrinsics matrix M, the distortion coefficients, the rotation matrix R, the translation vector T, and especially the homography matrix H.

We'll start by discussing projection into the 3D world using a calibrated camera and reviewing affine and projective transforms (which we first encountered in Chapter 6); then we'll move on to an example of how to get a bird's-eye view of a ground plane. ^[189] We'll also discuss POSIT, an algorithm that allows us to find the 3D pose (position and rotation) of a known 3D object in an image.

We will then move into the three-dimensional geometry of multiple images. In general, there is no reliable way to do calibration or to extract 3D information without multiple images. The most obvious case in which we use multiple images to reconstruct a three-dimensional scene is stereo vision. In stereo vision, features in two (or more) images taken at the same time from separate cameras are matched with the corresponding features in the other images, and the differences are analyzed to yield depth information. Another case is structure from motion. In this case we may have only a single camera, but we have multiple images taken at different times and from different places. In the former case we are primarily interested in disparity effects (triangulation) as a means of computing distance. In the latter, we compute something called the fundamental matrix (relates two different views together) as the source of our scene understanding. Let's get started with projection.

Projections

Once we have calibrated the camera (see Chapter 11), it is possible to unambiguously project points in the physical world to points in the image. This means that, given a location in the three-dimensional physical coordinate frame attached to the camera, we can compute where on the imager, in pixel coordinates, an external 3D point should appear. This transformation is accomplished by the OpenCV routine cvProjectPoints2().

void cvProjectPoints2(
    const CvMat* object_points,
    const CvMat* rotation_vector,
    const CvMat* translation_vector,
    const CvMat* intrinsic_matrix,
    const CvMat* distortion_coeffs,
    CvMat*       image_points,
    CvMat*       dpdrot          = NULL,
    CvMat*       dpdt            = NULL,
    CvMat*       dpdf            = NULL,
    CvMat*       dpdc            = NULL,
    CvMat*       dpddist         = NULL,
    double       aspectRatio     = 0
);

At first glance the number of arguments might be a little intimidating, but in fact this is a simple function to use. The cvProjectPoints2() routine was designed to accommodate the (very common) circumstance where the points you want to project are located on some rigid body. In this case, it is natural to represent the points not as just a list of locations in the camera coordinate system but rather as a list of locations in the object's own body centered coordinate system; then we can add a rotation and a translation to specify the relationship between the object coordinates and the camera's coordinate system. In fact, cvProjectPoints2() is used internally in cvCalibrateCamera2(), and of course this is the way cvCalibrateCamera2() organizes its own internal operation. All of the optional arguments are primarily there for use by cvCalibrateCamera2(), but sophisticated users might find them handy for their own purposes as well.

The first argument, object_points, is the list of points you want projected; it is just an N-by-3 matrix containing the point locations. You can give these in the object's own local coordinate system and then provide the 3-by-1 matrices rotation_vector^[190] and translation_vector to relate the two coordinates. If in your particular context it is easier to work directly in the camera coordinates, then you can just give object_points in that system and set both rotation_vector and translation_vector to contain 0s. ^[191]

The intrinsic_matrix and distortion_coeffs are just the camera intrinsic information and the distortion coefficients that come from cvCalibrateCamera2() discussed in Chapter 11. The image_points argument is an N-by-2 matrix into which the results of the computation will be written.

Finally, the long list of optional arguments dpdrot, dpdt, dpdf, dpdc, and dpddist are all Jacobian matrices of partial derivatives. These matrices relate the image points to each of the different input parameters. In particular: dpdrot is an N-by-3 matrix of partial derivatives of image points with respect to components of the rotation vector; dpdt is an N-by-3 matrix of partial derivatives of image points with respect to components of the translation vector; dpdf is an N-by-2 matrix of partial derivatives of image points with respect to f_x and f_y; dpdc is an N-by-2 matrix of partial derivatives of image points with respect to c_x and c_y; and dpddist is an N-by-4 matrix of partial derivatives of image points with respect to the distortion coefficients. In most cases, you will just leave these as NULL, in which case they will not be computed. The last parameter, aspectRatio, is also optional; it is used for derivatives only when the aspect ratio is fixed in cvCalibrateCamera2() or cvStereoCalibrate(). If this parameter is not 0 then the derivatives dpdf are adjusted.

^[189] This is a recurrent problem in robotics as well as many other vision applications.

^[190] The "rotation vector" is in the usual Rodrigues representation.

^[191] Remember that this rotation vector is an axis-angle representation of the rotation, so being set to all 0s means it has zero magnitude and thus "no rotation".