In this chapter we'll move into three-dimensional vision, first with projections and then with multicamera stereo depth perception. To do this, we'll have to carry along some of the concepts from Chapter 11. We'll need the camera instrinsics matrix M, the distortion coefficients, the rotation matrix R, the translation vector T, and especially the homography matrix H.
We'll start by discussing projection into the 3D world using a calibrated camera and reviewing affine and projective transforms (which we first encountered in Chapter 6); then we'll move on to an example of how to get a bird's-eye view of a ground plane. [189] We'll also discuss POSIT, an algorithm that allows us to find the 3D pose (position and rotation) of a known 3D object in an image.
We will then move into the three-dimensional geometry of multiple images. In general, there is no reliable way to do calibration or to extract 3D information without multiple images. The most obvious case in which we use multiple images to reconstruct a three-dimensional scene is stereo vision. In stereo vision, features in two (or more) images taken at the same time from separate cameras are matched with the corresponding features in the other images, and the differences are analyzed to yield depth information. Another case is structure from motion. In this case we may have only a single camera, but we have multiple images taken at different times and from different places. In the former case we are primarily interested in disparity effects (triangulation) as a means of computing distance. In the latter, we compute something called the fundamental matrix (relates two different views together) as the source of our scene understanding. Let's get started with projection.
Once we have calibrated the camera (see Chapter 11), it is possible to unambiguously project points in the physical world to points in the
image. This means that, given a location in the three-dimensional physical coordinate frame
attached to the camera, we can compute where on the imager, in pixel coordinates, an
external 3D point should appear. This transformation is accomplished by the OpenCV routine
cvProjectPoints2()
.
void cvProjectPoints2( const CvMat* object_points, const CvMat* rotation_vector, const CvMat* translation_vector, const CvMat* intrinsic_matrix, const CvMat* distortion_coeffs, CvMat* image_points, CvMat* dpdrot = NULL, CvMat* dpdt = NULL, CvMat* dpdf = NULL, CvMat* dpdc = NULL, CvMat* dpddist = NULL, double aspectRatio = 0 );
At first glance the number of arguments might be a little intimidating, but in fact this
is a simple function to use. The cvProjectPoints2()
routine was designed to accommodate the (very common) circumstance where the points you want
to project are located on some rigid body. In this case, it is natural to represent the
points not as just a list of locations in the camera coordinate system but rather as a list
of locations in the object's own body centered coordinate system; then we can add a rotation
and a translation to specify the relationship between the object coordinates and the
camera's coordinate system. In fact, cvProjectPoints2()
is used internally in cvCalibrateCamera2()
, and of course
this is the way cvCalibrateCamera2()
organizes its own
internal operation. All of the optional arguments are primarily there for use by cvCalibrateCamera2()
, but sophisticated users might find them
handy for their own purposes as well.
The first argument, object_points
, is the list of
points you want projected; it is just an N-by-3 matrix containing the
point locations. You can give these in the object's own local coordinate system and then
provide the 3-by-1 matrices rotation_vector
[190] and translation_vector
to relate the two
coordinates. If in your particular context it is easier to work directly in the camera
coordinates, then you can just give object_points
in that
system and set both rotation_vector
and translation_vector
to contain 0s. [191]
The intrinsic_matrix
and distortion_coeffs
are just the camera intrinsic information and the distortion
coefficients that come from cvCalibrateCamera2()
discussed in Chapter 11. The image_points
argument is an N-by-2 matrix into which the
results of the computation will be written.
Finally, the long list of optional arguments dpdrot, dpdt,
dpdf, dpdc
, and dpddist
are all Jacobian
matrices of partial derivatives. These matrices relate the image points to each of the
different input parameters. In particular: dpdrot
is an
N-by-3 matrix of partial derivatives of image points with respect to
components of the rotation vector; dpdt
is an
N-by-3 matrix of partial derivatives of image points with respect to components of the translation vector; dpdf
is an N-by-2 matrix of partial
derivatives of image points with respect to fx
and fy; dpdc
is an N-by-2 matrix of partial derivatives of image points with respect
to cx and
cy; and dpddist
is an N-by-4 matrix of partial derivatives of
image points with respect to the distortion coefficients. In most cases, you will just leave
these as NULL
, in which case they will not be computed.
The last parameter, aspectRatio, is also optional; it is used for derivatives only when the
aspect ratio is fixed in cvCalibrateCamera2()
or cvStereoCalibrate()
. If this parameter is not 0 then the
derivatives dpdf
are adjusted.
[189] This is a recurrent problem in robotics as well as many other vision applications.
[190] The "rotation vector" is in the usual Rodrigues representation.
[191] Remember that this rotation vector is an axis-angle representation of the rotation, so being set to all 0s means it has zero magnitude and thus "no rotation".