Motion templates were invented in the MIT Media Lab by Bobick and Davis [Bobick96; Davis97] and were further developed jointly with one of the authors [Davis99; Bradski00]. This more recent work forms the basis for the implementation in OpenCV.
Motion templates are an effective way to track general movement and are especially applicable to gesture recognition. Using motion templates requires a silhouette (or part of a silhouette) of an object. Object silhouettes can be obtained in a number of ways.
The simplest method of obtaining object silhouettes is to use a reasonably stationary camera and then employ frame-to-frame differencing (as discussed in Chapter 9). This will give you the moving edges of objects, which is enough to make motion templates work.
You can use chroma keying. For example, if you have a known background color such as bright green, you can simply take as foreground anything that is not bright green.
Another way (also discussed in Chapter 9) is to learn a background model from which you can isolate new foreground objects/people as silhouettes.
You can use active silhouetting techniques—for example, creating a wall of near-infrared light and having a near-infrared-sensitive camera look at the wall. Any intervening object will show up as a silhouette.
You can use thermal imagers; then any hot object (such as a face) can be taken as foreground.
Finally, you can generate silhouettes by using the segmentation techniques (e.g., pyramid segmentation or mean-shift segmentation) described in Chapter 9.
For now, assume that we have a good, segmented object silhouette as represented by the white rectangle of Figure 10-13(A). Here we use white to indicate that all the pixels are set to the floating-point value of the most recent system time stamp. As the rectangle moves, new silhouettes are captured and overlaid with the (new) current time stamp; the new silhouette is the white rectangle of Figure 10-13(B) and Figure 10-13(C). Older motions are shown in Figure 10-13 as successively darker rectangles. These sequentially fading silhouettes record the history of previous movement and thus are referred to as the "motion history image".
Figure 10-13. Motion template diagram: (A) a segmented object at the current time stamp (white); (B) at the next time step, the object moves and is marked with the (new) current time stamp, leaving the older segmentation boundary behind; (C) at the next time step, the object moves further, leaving older segmentations as successively darker rectangles whose sequence of encoded motion yields the motion history image
Silhouettes whose time stamp is more than a specified duration
older than the current system time stamp are set to 0, as shown in
Figure 10-14. The OpenCV function that
accomplishes this motion template construction is cvUpdateMotionHistory()
:
void cvUpdateMotionHistory( const CvArr* silhouette, CvArr* mhi, double timestamp, double duration );
Figure 10-14. Motion template silhouettes for two moving objects (left); silhouettes older than a specified duration are set to 0 (right)
In cvUpdateMotionHistory()
, all image arrays consist
of single-channel images. The silhouette
image is a byte
image in which nonzero pixels represent the most recent segmentation silhouette of the
foreground object. The mhi
image is a floating-point
image that represents the motion template (aka motion history image). Here timestamp
is the current system time (typically a millisecond
count) and duration
, as just described, sets how long
motion history pixels are allowed to remain in the mhi
.
In other words, any mhi
pixels that are older (less) than
timestamp
minus duration
are set to 0.
Once the motion template has a collection of object silhouettes overlaid in time, we can derive an indication of overall
motion by taking the gradient of the mhi
image. When we
take these gradients (e.g., by using the Scharr or Sobel gradient functions discussed in Chapter 6), some
gradients will be large and invalid. Gradients are invalid when older or inactive parts of
the mhi
image are set to 0, which produces artificially
large gradients around the outer edges of the silhouettes; see Figure 10-15(A). Because we know the time-step
duration with which we've been introducing new silhouettes into the mhi
via cvUpdateMotionHistory()
, we know how
large our gradients (which are just dx and dy step
derivatives) should be. We can therefore use the gradient magnitude to eliminate gradients
that are too large, as in Figure 10-15(B).
Finally, we can collect a measure of global motion; see Figure 10-15(C). The function that effects parts
(A) and (B) of the figure is cvCalcMotionGradient()
:
void cvCalcMotionGradient( const CvArr* mhi, CvArr* mask, CvArr* orientation, double delta1, double delta2, int aperture_size=3 );
Figure 10-15. Motion gradients of the mhi image: (A) gradient magnitudes and directions; (B) large gradients are eliminated; (C) overall direction of motion is found
In cvCalcMotionGradient()
, all image arrays are
single-channel. The function input mhi
is a
floating-point motion history image, and the input variables delta1
and delta2
are (respectively) the
minimal and maximal gradient magnitudes allowed. Here, the expected gradient magnitude will
be just the average number of milliseconds in the time-stamp between each silhouette in
successive calls to cvUpdateMotionHistory()
; setting
delta1
halfway below and delta2
halfway above this average value should work well. The variable aperture_size
sets the size in width and height of the gradient
operator. These values can be set to -1
(the 3-by-3
CV_SCHARR
gradient filter), 3
(the default 3-by-3 Sobel filter), 5
(for
the 5-by-5 Sobel filter), or 7
(for the 7-by-7 filter).
The function outputs are mask
, a single-channel 8-bit
image in which nonzero entries indicate where valid gradients were found, and orientation
, a floating-point image that gives the gradient
direction's angle at each point.
The function cvCalcGlobalOrientation()
finds the
overall direction of motion as the vector sum of the valid gradient directions.
double cvCalcGlobalOrientation( const CvArr* orientation, const CvArr* mask, const CvArr* mhi, double timestamp, double duration );
When using cvCalcGlobalOrientation()
, we pass in the
orientation
and mask
image computed in cvCalcMotionGradient()
along with the
timestamp
, duration
,
and resulting mhi
from cvUpdateMotionHistory()
; what's returned is the vector-sum global orientation,
as in Figure 10-15(C). The timestamp
together with duration
tells the routine how much motion to consider from the mhi
and motion
orientation
images. One could compute the global motion
from the center of mass of each of the mhi
silhouettes, but summing up the precomputed motion vectors is much
faster.
We can also isolate regions of the motion template mhi
image and determine the local motion within that region, as shown in Figure 10-16. In the figure, the mhi
image is scanned for current silhouette regions. When a
region marked with the most current time stamp is found, the region's perimeter is searched
for sufficiently recent motion (recent silhouettes) just outside its perimeter. When such
motion is found, a downward-stepping flood fill is performed to isolate the local region of
motion that "spilled off" the current location of the object of interest. Once found, we can calculate local motion gradient direction
in the spill-off region, then remove that region, and repeat the process until all regions
are found (as diagrammed in Figure 10-16).
Figure 10-16. Segmenting local regions of motion in the mhi image: (A) scan the mhi image for current silhouettes (a) and, when found, go around the perimeter looking for other recent silhouettes (b); when a recent silhouette is found, perform downward-stepping flood fills (c) to isolate local motion; (B) use the gradients found within the isolated local motion region to compute local motion; (C) remove the previously found region and search for the next current silhouette region (d), scan along it (e), and perform downward-stepping flood fill on it (f); (D) compute motion within the newly isolated region and continue the process (A)-(C) until no current silhouette remains
The function that isolates and computes local motion is cvSegmentMotion()
:
CvSeq* cvSegmentMotion( const CvArr* mhi, CvArr* seg_mask, CvMemStorage* storage, double timestamp, double seg_thresh );
In cvSegmentMotion()
, the mhi
is the single-channel floating-point input. We also pass in storage
, a CvMemoryStorage
structure allocated via cvCreateMemStorage()
. Another
input is timestamp
, the value of the most current
silhouettes in the mhi
from which you want to
segment local motions. Finally, you must pass in seg_thresh
, which is the maximum downward step (from current time to previous
motion) that you'll accept as attached motion. This parameter is provided because there
might be overlapping silhouettes from recent and much older motion that you don't want to connect
together.
It's generally best to set seg_thresh
to something
like 1.5 times the average difference in silhouette time stamps. This function returns a
CvSeq
of CvConnectedComp
structures, one for each separate motion found, which
delineates the local motion regions; it also returns seg_mask
, a single-channel, floating-point image in which each region of
isolated motion is marked a distinct nonzero number (a zero pixel in seg_mask
indicates no motion). To compute these local motions
one at a time we call cvCalcGlobalOrientation()
, using
the appropriate mask region selected from the appropriate CvConnectedComp
or from a particular value in the seg_mask
; for example,
cvCmpS( seg_mask, // [value_wanted_in_seg_mask], // [your_destination_mask], CV_CMP_EQ )
Given the discussion so far, you should now be able to understand the motempl.c example that ships with OpenCV in the …/opencv/samples/c/ directory. We will now extract and explain
some key points from the update_mhi()
function in
motempl.c. The update_mhi()
function extracts templates by thresholding frame differences and
then passing the resulting silhouette to cvUpdateMotionHistory()
:
... cvAbsDiff( buf[idx1], buf[idx2], silh ); cvThreshold( silh, silh, diff_threshold, 1, CV_THRESH_BINARY ); cvUpdateMotionHistory( silh, mhi, timestamp, MHI_DURATION ); ...
The gradients of the resulting mhi
image are then
taken, and a mask of valid gradients is produced using cvCalcMotionGradient()
. Then CvMemStorage
is
allocated (or, if it already exists, it is cleared), and the resulting local motions are
segmented into CvConnectedComp
structures in the CvSeq
containing structure seq
:
... cvCalcMotionGradient( mhi, mask, orient, MAX_TIME_DELTA, MIN_TIME_DELTA, 3 ); if( !storage ) storage = cvCreateMemStorage(0); else cvClearMemStorage(storage); seq = cvSegment Motion( mhi, segmask, storage, timestamp, MAX_TIME_DELTA );
A "for" loop then iterates through the seq->total
CvConnectedComp
structures extracting bounding rectangles
for each motion. The iteration starts at -1
, which has
been designated as a special case for finding the global motion of the whole image. For the
local motion segments, small segmentation areas are first rejected and then the orientation
is calculated using cvCalcGlobalOrientation()
. Instead of
using exact masks, this routine restricts motion calculations to regions of interest (ROIs)
that bound the local motions; it then calculates where valid motion within the local ROIs
was actually found. Any such motion area that is too small is rejected. Finally, the routine
draws the motion. Examples of the output for a person flapping their arms is shown in Figure 10-17, where the output is drawn above the
raw image for four sequential frames going across in two rows. (For the full code, see
…/opencv/samples/c/motempl.c.) In the same sequence,
"Y" postures were recognized by the shape descriptors (Hu moments) discussed in Chapter 8, although the shape
recognition is not included in the samples
code.
for( i = -1; i < seq->total; i++ ) { if( i < 0 ) { // case of the whole image // ...[does the whole image]... else { // i-th motion component comp_rect = ((CvConnectedComp*)cvGetSeqElem( seq, i ))->rect; // [reject very small components]... } ...[set component ROI regions]... angle = cvCalcGlobalOrientation( orient, mask, mhi, timestamp, MHI_DURATION); ...[find regions of valid motion]... ...[reset ROI regions]... ...[skip small valid motion regions]... ...[draw the motions]... }