Digital Compositing for Film and Video

Transforms are used to change the position, shape, size, and orientation of images and can be animated over time, as in the case of motion tracking. An understanding of the effects of pivot points and filter choices will protect you from degraded or unexpected results. The discussion of pivot points leads to some helpful techniques for the sometimes-onerous task of lining one image up onto another. And as long as we are pushing images around, a discussion of mesh warps and spline warps – as well as morphs – is in order.

14.1 Geometric Transforms

Geometric transforms are the processes of altering the size, position, shape, or orientation of an image. This might be required to resize or reposition an element to make it suitable for a shot or to animate an element over time. Geometric transforms are also used after motion tracking to animate an element to stay locked onto a moving target as well as for shot stabilizing.

14.1.1 2D Transforms

2D transforms are called “2D” because they only alter the image in two dimensions (X and Y), whereas 3D transforms put the image into a three-dimensional world within which it can be manipulated. The 2D transforms – translation, rotation, scale, skew, and corner pinning – each bring with them hazards and artifacts. Understanding the source of these artifacts and how to minimize them is an important component of all good composites.

14.1.1.1 Translation

Translation is the technical term for moving an image in X (horizontally) or Y (vertically), but may also be referred to as a “pan” or “reposition” by some less-formal software packages. Figure 14.1 shows an example of a simple image translate from left to right, which simulates a camera pan from right to left.

Figure 14.1 Image translating from left to right

Some software packages offer a “wraparound” option for their translation operations. The pixels that get pushed out of frame on one side will “wrap around” to be seen on the other side of the frame. One of the main uses of a wraparound feature is with a paint program. By blending away the seam in a wrapped-around image, a “tile” can be made that can be used repeatedly to seamlessly tile a region.

Figure 14.2 Original

Figure 14.3 Wraparound

Figure 14.4 Blended

Figure 14.5 Tiled

Figure 14.2 shows an original cloud image. In Figure 14.3 it has been translated half its width in wraparound mode which created a seam down the middle of the picture. The seam has been blended away in Figure 14.4 using a paint program then the resulting image scaled down and tiled three times horizontally in Figure 14.5. If you took the blended image in Figure 14.4 and wrapped it around vertically and painted out the horizontal seam you would have an image that could be tiled infinitely both horizontally and vertically.

14.1.1.1.1 Float vs. Integer Translation

You may find that your software offers two different translate operations, one that is integer and one that is floating point. The integer version may go by another name, such as “pixel shift”. The difference is that the integer version can only position the image on exact pixel boundaries, such as moving it by exactly 100 pixels in X. The floating-point version, however, can move the image by any fractional amount, such as 100.73 pixels in X.

The integer version should never be used in animation because the picture would “hop” from pixel to pixel with a jerky motion. For animation the floating-point version is the only choice because it can move the picture smoothly at any speed to any position. The integer version, however, is the tool of choice if you need to simply reposition a plate to a new fixed location. The integer operation will not soften the image with filtering operations like the floating-point version will. The softening of images due to filtering is discussed in Section 14.1.4: Filtering. However, if you only have a floating-point type of transform but carefully enter integer values and do not animate them you can avoid any filtering.

14.1.1.1.2 Source and Destination Movement

Figure 14.6 Source and destination type translate

With some software packages you just enter the amount to translate the image in X and Y and off it goes. We might call these “absolute” translate operations. Others have a source and destination type format, which is a “relative” translate operation. An example is shown in Figure 14.6. The relative translate operation is a bit more obtuse but can be more useful, as we will see in the image-stabilizing and difference-tracking operations later. The source and destination values work by moving the point that is located at source X,Y to the position located at destination X,Y, dragging the whole image with it, of course. In the example in Figure 14.6 the point at source location 100, 200 will be moved to destination location 150, 300. This means that the image will move 50 pixels in X and 100 pixels in Y. It is also true that if the source were 1100, 1200 and the destination were 1150, 1300 it would still move 50 pixels in X and 100 pixels in Y. In other words, the image is moved the relative distance from the source to the destination. So what good is all this source and destination relative positioning stuff?

Suppose you wanted to move an image from point A to point B. With the absolute translate you have to do the arithmetic yourself to find how much to move the image. You will get out your calculator and subtract the X position of point A from the X position of point B to get the absolute movement in X, then subtract the Y position of point A from the Y position of point B to get the absolute movement in Y. You may now enter your numbers in the absolute translate node and move your picture. With the relative translate node, you simply enter the X and Y position of point A as the source, then the X and Y position of point B as the destination, and off you go. The computer does all the arithmetic, which, I believe, is why computers were invented in the first place.

14.1.1.2 Rotation

Rotation appears to be the only image-processing operation in history to have been spared multiple names, so it will undoubtedly appear as “rotation” in your software. Most rotation operations are described in terms of degrees, so no discussion is needed here unless you would like to be reminded that 360 degrees makes a full circle. However, you may encounter a software package that refers to rotation in radians, a more sophisticated and elitist unit of rotation for the true trigonometry enthusiast.

While very handy for calculating the length of an arc section on the perimeter of a circle, radians are not an intuitive form of angular measurement when the mission is to straighten a tilted image. What you would probably really like to know is how big a radian is so that you can enter a plausible starting value and avoid whip-sawing the image around as you grope your way towards the desired angle. One radian is roughly 60 degrees. For somewhat greater precision, here is the official conversion between degrees and radians:

360 degrees = 2π radians

Not very helpful, was it. Two pi radians might be equal to 360 degrees, but you are still half a dozen calculator strokes away from any useful information. Table 14.1 contains a few pre-calculated reference points that you may actually find useful when confronted with a radian rotator:

Table 14.1 Converting degrees and radians

radians to degrees	degrees to radians
1 radian @ 57.3 degrees	10 degrees @ 0.17 radians
0.1 radian @ 5.7 degrees	90 degrees @ 1.57 radians

At least now you know that if you want to tilt your picture by a couple of degrees it will be somewhere around 0.03 radians. Since 90 degrees is about 1.57 radians, then 180 degrees must be about 3.14 radians. You are now cleared to solo with radian rotators.

14.1.1.2.1 Pivot Points

Rotate operations have a “pivot point”, the point that is the center of the rotation operation. The location of the pivot point dramatically affects the results of the transform, so it is important to know where it is, its effects on the rotation operation, and how to reposition it when necessary.

Figure 14.7 Centered pivot point

Figure 14.8 Off-center pivot point

The rotated rectangle in Figure 14.7 has its pivot point at its center so the results of the rotation are completely intuitive. In Figure 14.8, however, the pivot point is down in the lower right-hand corner and the results of the rotation operation are quite different. The degree of rotation is identical between the two rectangles, but the final position has been shifted up and to the right by comparison. Rotating an object about an off-center pivot point also displaces it in space. If the pivot point were placed far enough away from the rectangle it could have actually rotated completely out of frame!

14.1.1.3 Resize vs. Scale

Scaling, resizing, and zooming are tragically interchangeable terms in many software packages. Regardless of the terminology, there are two possible ways that these operations can behave. A “scale” or “resize” operation most often means that the size of the image changes while the composition of the picture is unchanged, like the examples in Figure 14.9. Going forward we will use the unambiguous term “resize” for this type of transform.

Figure 14.9 Image resize operation: image size changes but composition stays constant

A “zoom” or “scale” most often means that the image stays the same dimensions in X and Y, but the picture within it changes framing to become larger or smaller as though the camera were zoomed in or out like the example in Figure 14.10. Going forward we will use the term “scale” for this type of transform.

Figure 14.10 Image scale operation: image size stays constant but composition changes

14.1.1.3.1 Pivot Points

The resize operation does not have a pivot point because it simply changes the dimensions of the image in X and Y. The scale operation, however, is like rotation, in that it must have a center about which the scale operation occurs, which is also referred to as the pivot point.

Figure 14.11 Centered pivot point

Figure 14.12 Off-center pivot point

The scaled rectangle in Figure 14.11 has its pivot point at the center of the outline so the results of the scale are not surprising. In Figure 14.12, however, the pivot point is down in the lower right-hand corner and the results of the scale operation are quite different. The amount of scale is identical with Figure 14.11, but the final position of the rectangle has been shifted down and to the right. Scaling an object about an off-center pivot point also displaces it in space. Again, if the pivot point were to be placed far enough away from an object it could actually scale itself completely out of the frame.

14.1.1.4 Skew

Figure 14.13 Horizontal and vertical skews

The overall shape of an image may also be deformed with a skew or “shear”, like the examples in Figure 14.13. The skew shifts one edge of the image relative to its opposite edge: the top edge of the image relative to the bottom edge in a horizontal skew, or the left and right edges in a vertical skew. Some systems will use one edge as the pivot point, others will use the images pivot point for the skew. The skew is occasionally useful for deforming a matte or alpha channel to be used as a shadow on the ground, but corner pinning will give you more control over the shadow’s shape and allow the introduction of perspective as well.

14.1.1.5 Corner Pinning

Figure 14.14 Corner pinning examples

Corner pinning, shown in Figure 14.14, is an important image deformation tool because it allows you to arbitrarily alter the overall shape of an image. This is especially useful when the element you want to add to a shot isn’t exactly the right shape, or needs a subtle perspective shift. The image can be deformed as if it were mounted on a sheet of rubber where any or all of the four corners can be moved in any direction. The arrows in Figure 14.14 show the direction each corner point was moved in each example. The corner locations can also be animated over time so that the perspective change can follow a moving target in a motion-tracking application such as a monitor insert shot. A corner pin exactly matches the perspective shift you would get if the image were placed on a 3D card and re-photographed with a 3D camera, because 3D cameras uses a “pinhole” camera model with no lens distortion. Real cameras add lens distortion so would not exactly match a corner pin deformation.

Always keep in mind that corner pinning does not actually change the perspective in the image content itself. The only way that the perspective can really be changed is to view the original scene from a different camera position. All corner pinning can do is deform the image plane that the picture is on, which can appear as a fairly convincing perspective change – if it isn’t pushed too far.

Figure 14.15 Original image

Figure 14.16 Perspective removed

An example of a perspective change can be seen starting with Figure 14.15. The building was shot at an up-angle from a distance away so the picture has a noticeable perspective that tapers the building inwards at the top. Figure 14.16 shows the taper taken out of the building by stretching the two top corners apart with corner pinning. The perspective change makes the building appear as though it were shot from a greater distance with a longer lens. Unless you look closely, that is. There are still a few clues in the picture that reveal the camera distance if you know what to look for, but the casual observer won’t notice them, especially if the picture is only on screen briefly.

Figure 14.17 Original graphic

Figure 14.18 Original background

Figure 14.19 Corner pin perspective

Figure 14.20 Perspective graphic on background

Another application of the corner pin perspective change is to lay an element on top of a flat surface, which is illustrated from Figure 14.17 to Figure 14.20. The sign graphic in Figure 14.17 is to be added to the side of the railroad car in Figure 14.18. The sign graphic has been deformed with four-corner pinning in Figure 14.19 to match the perspective. In Figure 14.20 the sign has been comped over the railroad car and now looks like it belongs in the scene.

Corner pinning is really big in monitor insert shots. Motion tracking tools are used to track all four corners of the monitor then that tracking data is applied to the four corners of a corner pin operation. The corner pinned element will then track on the target monitor and change perspective over the length of the shot. This technique is used to place pictures on monitors and TV sets instead of trying to film them on the set.

WWW Corner Pin – this folder contains the two images shown in Figure 14.17 and Figure 14.18 above, so you too can try your hand at corner pinning the “no comping” graphic onto the railcar.

14.1.2 Managing Motion Blur

Figure 14.21 Photographic motion blur

If an object moves while the camera shutter is open the resulting image will be “smeared” in the direction of motion. This motion blur is an essential component of moving pictures because without it the motion will appear jerky with what is called “motion judder” or “motion strobing”. An object will become motion blurred if it moves fast enough but the entire frame will be motion-blurred if the camera itself is moved quickly enough.

The modern digital compositor is expected to manage the motion blur of a shot, and that can mean a couple things. If you add motion to a static element (a still image) then you will have to give it an appropriate motion blur. If you do a speed change, the motion blur from the original shot may be unsuitable for the new speed-changed version.

14.1.2.1 Transform Motion Blur

Figure 14.22 Stochastic motion blur

Figure 14.23 Multi-copy motion blur

Transform motion blur is when the transform operation that moved the item can itself add motion blur, and there are two types. Figure 14.22 illustrates a stochastic motion blur, which generates motion-blurred pixels using a random (stochastic) pattern. It turns out that this is actually more visually appealing than a perfectly uniform motion blur. The multi-copy type of motion blur in Figure 14.23 essentially just stamps a series of semi-transparent copies on top of each other. It is somewhat less convincing but computationally cheaper.

Your transform motion blur will hopefully have motion blur with two parameters you can adjust – quality and quantity. The quality parameter affects how computationally expensive the motion blur is since higher quality means more samples, which requires more computing. The quantity parameter is how “long” the motion blur is – that is, how much it smears the image. This is actually a shutter parameter because the longer you leave the shutter open the more the image moves and the more motion blur it gets. The shutter parameter may also have a “phase” option that allows you to advance or retard the shutter timing, which moves the motion blur forward or backward from its center.

14.1.2.2 Motion UV Motion Blur

Rendering a CGI object with motion blur is very expensive, so a far less expensive 2D motion-blur technique is often used instead. One of the AOVs (Arbitrary Output Variables) that the CGI renderer can generate is a motion UV pass. The idea is that for each RGB pixel rendered, a two-channel pixel is also generated that contains the information on how that pixel is moving relative to the screen. If, for example, the RGB pixel were moving to the right 1.4 pixels and up by 0.3 pixels on that frame, then the motion UV pixel would contain 1.4 in the U channel and 0.3 in the V channel. This motion UV data has several uses, and one of them is to apply an inexpensive 2D motion blur in compositing.

Figure 14.24 Original image

Figure 14.25 Motion UV pass

Figure 14.26 Motion-blurred image

Figure 14.24 shows a freshly rendered CGI character and Figure 14.25 shows the two-channel motion UV data for that frame. To make the motion data visible it has been normalized to between 0 and 1.0 and the UV data has been loaded into the viewer’s red and green channels. The motion UV data is piped to a vector-blur operation that applies the actual blur to the CGI render like the example in Figure 14.26, which is somewhat overdone for illustration purposes. Note that each part of the character is motion-blurred in a unique direction based on its own local motion. Those parts that are not moving on this frame, like the right foot and left hand, have no motion blur at all.

A true 3D motion blur would give somewhat better results but would be far more expensive and require the re-render of all of the lighting and AOV passes just to refine the motion blur, because it affects all of the lighting passes. In an age where one frame of high-end CGI for a feature film could take 24 hours PER FRAME to render this is a vastly cheaper and more practical alternative.

In this example the motion UV data from a CGI render provided the motion data to a vector blur operation but it is entirely possible to use this same technique for live action. The key is to be able to produce high quality motion data from the live action, which can be done by specialized motion analysis software.

14.1.2.3 Speed Changes

Another motion-blur management opportunity arises with speed changes, which we do often in visual effects. The basic problem is that when the clip was initially photographed the motion blur was baked into the frames. If the clip is sped up it should have more motion blur than the initial photography but if slowed down it should have less.

Figure 14.27 Original motion blur

Figure 14.28 Increased motion blur from speed change

So how do we change the motion blur of a running clip? The answer is that it is done in your own excellent speed-change software. Figure 14.27 shows the original motion blur captured during principal photography. The clip was sped up by a factor of 10 in Figure 14.28 and the speed-change software, under the nuanced guidance of a talented vfx artist, has introduced an appropriate amount of motion blur for the much faster speed. The speed-changed clip needs to be inspected at full speed by a practiced eye to make sure the new motion blur is appropriate for the new speed.

14.1.3 3D Transforms

Figure 14.29 Three-dimensional axes

3D transforms are so named because the image behaves as though it exists in three dimensions. You can think of it as though the picture were placed on a card and the card can then be rotated, translated and scaled in any direction then rendered with a new perspective. The traditional placement of the three-dimensional axes are shown in Figure 14.29. The X axis goes left to right, the Y axis up and down, and the Z axis is perpendicular to the screen, going into the picture.

If an image is translated in X it will move left and right. If it is rotated in X (rotated about the X axis) it will rotate like the example in Figure 14.30. If the image is translated in Y it will move vertically. If it is rotated in Y it will rotate like the example in Figure 14.31. When translated in Z the image will get larger or smaller like a zoom as it moves towards or away from the “camera”. When rotated around Z it appears to rotate like a conventional 2D rotation. The 3D transform node will have rotate, scale, and translate operations in the same node so that they can all be choreographed together with one set of motion curves.

Figure 14.30 Rotate in X

Figure 14.31 Rotate in Y

Figure 14.32 Rotate in Z

The time to use 3D transforms is when you want to “fly” something around the screen. Take a simple example like flying a logo in from the upper corner of the screen. The logo needs to translate from the corner of the screen to the center, and at the same time zoom from small to large. Coordinating these two movements so that they look natural together using conventional 2D scales and translations is tough. This is because the problem is in reality a three-dimensional problem and you are trying to fake it with separate two-dimensional tools. Better to use a 3D transform node to perform a true 3D move on the logo.

14.1.4 Filtering

Whenever an image goes through a transform its pixels are resampled, or “filtered” to create the pixels for the new version of the image. These filtering operations soften the image, degrading its sharpness. Therefore an understanding of how they work and which ones to use under what circumstances can be very important. Your professional compositing software will allow you to choose which filter to use for the resampling.

14.1.4.1 The Effects of Filtering

The filtering operation will soften the image because the new version is created by blending together percentages of adjacent pixels to create each new pixel. This is an irreversible degradation of the image. If you rotate an image a few degrees then rotate it back, it will not return to its original sharpness.

Figure 14.33 Original image

Figure 14.34 Rotation

Figure 14.35 Translation

An example can be seen starting with an extreme close-up of a small rectangular detail in the original image in Figure 14.33. It has a one-pixel border of anti-aliasing pixels at the outset. In Figure 14.34 the image has been rotated 5 degrees and the resampled pixels can be seen all around the perimeter. Figure 14.35 illustrates a simple floating-point translation of the original image and shows how the edge pixels have become further blurred by the filtering operation. Of course, scaling an image up softens it even more because in addition to the filtering operation you have fewer pixels to spread out over more picture space.

If you have stacked up several transforms on an image and the softening becomes objectionable, see if you have a multi-function transform operation that has all (or most) of the transforms incorporated in one. These operators concatenate all of the different transforms (rotate, scale, translate) into a single operation, so the amount of softening is greatly reduce because the image is only filtered once. Some professional software will concatenate multiple separate transform operations automatically if they are adjacent. However, if you insert a non-transform operation such as a color correction between them you will break the concatenation and suffer multiple filtering hits. Check your manual.

Simple pixel shift operations do not soften the image because they don’t filter. The pixels are just picked up and placed into their new location without any processing. Of course, this makes it an integer operation, which should never be used for animation but is fine for the overall repositioning of a background plate. The one case where filtering does not soften the image is when scaling an image down. Here the pixels are still being filtered, but they are also being squeezed down into a smaller image space so it tends to sharpen the whole picture. In film work, the scaled-down image could become so sharp that it actually needs to be softened to match the rest of the shot.

14.1.4.2 Twinkling Starfields

One situation where pixel filtering commonly causes problems is the “twinkling starfield” phenomenon. You’ve created a lovely starfield and then you go to animate it – either rotating it or perhaps translating it around – only to discover that the stars are twinkling! Their brightness is fluctuating during the move for some mysterious reason, and when the move stops, the twinkling stops.

What’s happening is that the stars are very small, only 1 or 2 pixels in size, and when they are animated they become filtered with the surrounding black pixels. If a given star were to land on an exact pixel location, it might retain its original brightness. If it landed on the “crack” between two pixels it might be averaged across them and drop to 50% brightness on each pixel. The brightness of the star then fluctuates depending on where it lands each frame, thus twinkling as it moves.

So what’s the fix? Bigger stars, I’m afraid. The basic problem is that the stars are at or very near the size of a pixel, so they become badly hammered by the filtering operation. Make a starfield that is twice the size needed for the shot so that the stars are 2 or 3 pixels in diameter. Perform the motion on the oversized starfield, then size it down for the shot. With the stars several pixels in diameter they are much more resistant to the effects of being filtered with the surrounding black.

14.1.4.3 Choosing a Filter

There are a variety of filters that have been developed for pixel resampling, each with its own virtues and vices. Some of the more common ones are listed here, but you should read your user guide to know what your software provides. The better software packages will allow you to choose the most appropriate filter for your transform operations. The internal mathematical workings of the filters are not described since that is a ponderous and ultimately unhelpful topic for the compositor. The effects of the filter and its most appropriate application is offered instead.

Bicubic – high-quality filter for scaling images up. It actually incorporates an edge-sharpening process so the scaled up image doesn’t go soft so quickly. Under some conditions the edge-sharpening operation can introduce “ringing” artifacts that degrade the results. Most appropriate use is reformatting images up in size or scaling up.

Bilinear – simple filter for scaling images up or down. Runs faster than the bicubic because it uses simpler math and has no edge-sharpening. As a result images get soft sooner. Best use is for scaling images down, since it does not sharpen edges.

Gaussian – another high-quality filter for scaling images up. It does not have an edge-sharpening process. As a result, the output images are not as sharp, but they also do not have any ringing artifacts. Most appropriate use is to substitute for the Mitchell filter when it introduces ringing artifacts.

Impulse – a.k.a. “nearest neighbor”, a very fast, very low-quality filter. Not really a filter, it just “pixel plucks” from the source image – that is, it simply selects the nearest appropriate pixel from the source image to be used in the output image. This filter is commonly used for the viewer in compositing packages because it is fast. It will look fine on photographic images (except for very fine details) but when you put in graphics the deficiencies become apparent. Most appropriate use is for quickly making lower resolution motion test of a shot.

Mitchell – a type of bicubic filter where the filtering parameters have been dialed in for the best look on most images. Also does edge sharpening, but less prone to edge artifacts as a plain bicubic filter. Most appropriate use is to replace bicubic filter when it introduces edge artifacts.

Sinc – a special high quality filter for downsizing images. Other filters tend to lose small details or introduce aliasing when scaling down. This filter retains small details with good anti-aliasing. Most appropriate use is for scaling images down or zooming out.

Lanczos – high-quality filter for sizing images up or down with sharpening. It is considered the best compromise in terms of reducing aliasing, sharpness, with minimal ringing artifacts.

Triangle – simple filter for scaling images up or down. Runs faster than the sharpening filters because it uses simpler math and has no edge-sharpening. As a result, up-sized images get soft sooner. Best use is for scaling images down quickly since it does not sharpen edges.

14.1.5 Lining Up Images

It often comes to pass that you need to precisely line up one image on top of another. Most compositing packages offer some kind of “A/B” image comparison capability that can be helpful for lineup. Two images are loaded into a display window, then you can wipe or toggle between them to check and correct their alignment. This approach can be adequate for many situations, but there are times when you really want to see both images simultaneously rather than wiping or toggling between them.

The problem becomes trying to see what you are doing, since one layer covers the other. A simple 50% dissolve between the two layers (aka “onionskin”) is hopelessly confusing to look at, and overlaying the layers while wiping between them with one hand while you nudge their positions with the other is slow and awkward. What is really needed is some way of displaying both images simultaneously that still keeps them visually distinguished while you nudge them into position. Two different lineup display methods are offered here that will help you to do just that.

14.1.5.1 Offset Mask Lineup Display

Figure 14.36 Embossed effect from offset images

The offset mask method combines two images in such a way that if the images are not perfectly aligned an embossed outline shows up like the example in Figure 14.36. The embossing marks all pixels that are not identical between the two images. If they are exactly lined up on each other the embossing disappears and it becomes a featureless gray image.

To make an offset mask simply invert one of the two images, then mix them together in equal amounts. This can be done with a dissolve node or a mix node set to 50%. To use discrete nodes, scale each image’s RGB values by 50% then add them together. A pictographic flowgraph of the whole process is shown in Figure 14.37. A good way to familiarize yourself with this procedure is to use the same image for both input images until you are comfortable that you have it set up right and can “read” the signs of misalignment. Once set up correctly, substitute the real image to be lined up for one of the inputs.

Figure 14.37 Pictographic flowgraph for making an offset mask

Once set up, the procedure is to simply reposition one of the images until the offset mask becomes a uniform gray in the region you want lined up. While this is a very precise “offset detector” that will reveal the slightest difference between the two images, it suffers from the drawback that it does not directly tell you which way to move which image to line the two up. It simply indicates that they are not perfectly aligned. Of course, you can experimen-tally slide the image in one direction, and if the embossing gets worse, go back the other way. With a little practice you should become clear which image is which.

14.1.5.2 Edge-detection Lineup Display

Lineup method number two is the edge-detection method. The beauty of this method is that it makes it perfectly clear which image to move and in what direction to achieve the lineup. The basic idea is to use an edge-detection operation on a monochrome version of the two images to make an “outline” version of each. One outline is placed in the red channel, the other in the green channel, and the blue channel is filled with black, as shown in the flowgraph in Figure 14.38. When the red and green outline images are correctly lined up on each other, the lines turn yellow. If they slide off from each other, you see red lines and green lines again which tell you exactly how far and in what direction to move in order to line them up. You will see shortly why I like to put the image that needs to be moved in the green channel.

Figure 14.38 Flowgraph of lineup edges

Examples of this lineup method can be seen in Figure 14.39 through Figure 14.42. They show the monochrome image and its edge-detection version, then how the colors change when the edges are misaligned (Figure 14.41) vs. aligned (Figure 14.42).

Figure 14.39 Grayscale image

Figure 14.40 Edge detection

Figure 14.41 Images offset

Figure 14.42 Images aligned

One of the difficulties of an image-lineup procedure is trying to stay clear in your mind as to which image is being moved and in what direction. You have an image being repositioned (the repo image) and a reference image that you are trying to line it up to. If you will put the reference image in the red channel and the repo image in the green channel like the flowgraph in Figure 14.38, then you can just remember the little mnemonic “move the green to the red”. Since the color green is associated with “go” and the color red with “stop” for traffic lights, it is easy to stay clear on what is being moved (green) vs. what is the static reference (red). This method makes it easier to see what you are doing, but it can be less precise than the offset mask method described above.

14.1.5.3 The Pivot Point Lineup Procedure

Now that we have a couple of good ways to see what we are doing when we want to line up two images, this is a good time to discuss an efficient method of achieving the actual lineup itself. The effects of the pivot-point location can be used to excellent advantage when trying to line up two elements that are dissimilar in size, position, and orientation. It can take quite a bit of trial and error in scaling, positioning, rotating, repositioning, rescaling, repositioning, etc. to get two elements lined up. By using a strategically placed pivot point and the procedure in Figure 14.43 (opposite) a complex lineup can be done quickly without all of the trial and error.

Figure 14.43 Pivot point lineup procedure

14.3 Warps and Morphs

The image warp is another one of those magical effects that only computers can do. It is used to deform an image to change its shape and they can be animated as well. There are two types of warpers: mesh warpers, which are best for a general overall deformation; and spline warpers, which are best for fine detail work.

14.3.1 Mesh Warps

Mesh warps are used to perform “non-linear” deformations on images. That is to say, instead of a simple linear (equal amount everywhere) operation such as scaling an image in X, or a skew, this process allows local deformations of small regions within the image. The corner-pinning operation described in Section 14.1.1.5 is a global operation – the corner is repositioned and the entire image is deformed accordingly. With warps, just the region of interest is deformed.

Figure 14.50 Mesh warper

Figure 14.51 The warped image

The mesh warp is one of the earliest and simplest warp technologies, illustrated in Figure 14.50. The mesh is a two-dimensional grid superimposed over the image. The intersections have control points that can be moved and adjusted to determine the desired deformation, and the connections between the points are a spline. If a straight line were used between the control points, the resulting warped image would also have straight section deformations. The splines guarantee gracefully curved deformations that blend naturally from point to point. You basically move the mesh points around to tell the machine how you want to deform the image, then the computer uses your mesh to render the warped version of the image, like the example in Figure 14.51.

Mesh warpers are easy to use but suffer from a lack of local control. Control points are evenly spaced so are not necessarily located where you need them. As a result, the best use of mesh-warping is general overall deformations such as the flag in Figure 14.51. They can be animated with keyframing, so the flag above could be animated to flutter. Mesh warpers are especially appropriate for lens distortions.

14.3.2 Spline Warps

The spline warper is the “second generation” of warpers and works on entirely different principles. It offers much more control over the warp and excellent correlation to the target image, so it is accordingly more complicated to learn and use. To do a high-quality detailed warp you need a way to precisely indicate what part of the source picture is to be affected, as well as precisely where its final destination is. Both of these functions are satisfied with a pair of connected splines: one for the source and one for the destination.

Figure 14.52 Original image

Figure 14.53 Warped image

Figure 14.54 Warp splines

Starting with the original image in Figure 14.52, it was deformed to the warped image in Figure 14.53 using just three splines – two for the eyes and one for the mouth (Figure 14.54). A white boundary spline surrounds the head, which constrains the warp affect to only affect pixels within the boundary.

There are three components to a spline warp – the source spline, the destination spline, and the correspondence points. A spline warp is started by drawing the source spline around the region of the picture to be warped, which defines what pixels are going to be affected. The source spline is shown as orange in Figure 14.54. The source spline is either copied or a second spline drawn, which becomes the destination spline, shown as light pink. The warp program will move the pixels under the source spline to the location marked by the destination spline.

The correspondence points in Figure 14.54 are marked in yellow and control the path of motion of the source pixels to their destination. Correspondence points can be added or removed and shifted around the perimeter of the splines to change the path of motion as needed. The combination of the source and destination splines plus the correspondence points tells the computer unambiguously how to deform the image. Further, the splines can be keyframed to follow a moving character, which, of course, is a critical capability for a visual effects shot.

One of the very big advantages of spline warpers is that you can place as many splines as you want in any location you want. This allows you to place control points exactly where you want them for fine detail work. The mesh warper has control points at pre-determined locations, and if that isn’t where you want them, tough pixels. You also do not waste any control points in regions of the image that are not going to be warped. The other cool thing about the spline warper is that the destination spline can be drawn on a completely separate image. This is the basis of doing a morph where you want one image to “become” a second image – which provides an elegant segue to our next topic.

14.3.3 Morphs

It takes two warps and a dissolve to make a morph. The idea is to warp image “A” to fit over image “B” which is illustrated by the 5 “A” frames in Figure 14.55. In frame 1 we are seeing the original face because the warp is “relaxed” with 0% deformation. Over the next 4 frames the warp is dialed up to 100%, as shown in frame 5 where it now fits over the unwarped image “B”.

Image “B” goes in the opposite direction by starting out 100% deformed to match the “A” image in frame 1. The image “B” warp is relaxed over the length of the shot so by frame 5 it is 0% deformed. This means that, over the length of the shot, images “A” and “B” exactly fit over each other and move in unison. This can be seen in the bottom row labeled “dx”, which is a 50% dissolve between the “A” and “B” images to show how their shapes correlate over the length of the shot.

Figure 14.55 The A and B sides of the warp

Figure 14.56 Typical morph cross-dissolve timing

After the A and B warps are prepared the last step is simply to cross-dissolve between them. A representative timing for the cross-dissolve example is shown in Figure 14.56, where roughly the first third of the morph shows the A side only, the middle third is used for the cross-dissolve, and the last third is on only the B side. Of course, your timing will vary based on your judgment of the best appearance, but this usually is a good starting point. The final results of the A and B warps with their cross-dissolve can be seen in the sequence of pictures in Figure 14.57.

Figure 14.57 Cross-dissolve added between the A and B sides of the warps to make the morph

The warping objects must be isolated from their backgrounds so they can deform without pulling the background with them. As a result, the A and B sides of the morph are often shot on greenscreen, morphed together, then the finished morph composited over a background plate. If not shot on greenscreen then they will have to be isolated by roto.

WWW Morph – this folder contains the original A and B images used in Figure 14.57 to be used for your morphing enjoyment.

14.3.4 Tips, Tricks, and Techniques

The single most important step in creating an excellent morph is to select appropriate images to morph between. What you are looking for in “appropriate images” is feature correlation – features in the two images that correlate with each other. The most mundane example is morphing between two faces – eyes to eyes, nose to nose, mouth to mouth, etc. The closer the two faces are in size, orientation of the head, and hairstyle, the better the morph will look. In other words, the more identical the A and B sides of the morph are, the easier it is to make a good morph.

If a face were to be morphed to a non-face target, say the front of a car, then the issue becomes trying to creatively match the features of the face to the “features” of the car – eyes to headlights, mouth to grill, for example. If the task is to morph a face to a seriously non-face target, say a baseball, the utter lack of features to correlate to will result in a morph that simply looks like a mangled dissolve. This is because, without a similar matching feature on the B side of the morph to dissolve to, the black pupil of the eye, for example, just dissolves into the white leather region of the baseball. It is high-contrast features dissolving like this that spoil the magic of the morph.

Meanwhile, back in the real world, you will rarely have control over the selection of the A and B side of the morph and must make do with what you are given. It is entirely possible for poorly correlated elements to result in a nice morph, but it will take a great deal more work, production time, and creative imagination than if the elements were well suited to begin with. Following are some suggestions that you may find helpful when executing morphs with poorly correlating elements:

Having to warp an element severely can result in unpleasant moments of grotesque mutilation during the morph. Scale, rotate, and position one or both of the elements in order to start with the best possible alignment of features before applying the warps in order to minimize the amount of deformation required to morph between the two elements.
Occasionally it may be possible to add or subtract features from one or the other of the elements to eliminate a seriously bad feature mis-correlation. For example, in the face-to-car example above, a hood ornament might be added to the car to provide a “nose feature” for the car side of the morph. It probably wouldn’t work to remove the nose from the face, however. Use your judgment.
When uncorrelated, high contrast features look “dissolvey”, such as the pupil cited above dissolving into the white part of the baseball. Try warping the offensive element to a tiny point. It will usually look better shrinking or growing rather than dissolving.
Don’t start and stop all of the warp action simultaneously. Morphs invariably have several separate regions in motion, so start some later than others to phase the action. A large deformation might start soonest so it will not have to change shape so quickly, calling attention to itself in the process.
Don’t do the dissolve between the entire A and B sides all at once. Separate the dissolve into different regions that dissolve at different rates and at different times. The action will be more interesting if it doesn’t all happen at once. Perhaps a region on the B side looks too mangled, so hold the A side a bit longer in this region until the B side has had a chance to un-mangle itself.
The warps and dissolves need not be simply linear in their action. Try an ease-in or an ease-out on the warps and the dissolves. Sometimes the warps or dissolves look better or more interesting if the speed is varied over the length of the shot.

14.4 Point Tracking

Motion tracking is one of the most magical things that computers can do with pictures and is a fundamental technique in compositing visual effects. With relentless accuracy and repeatability the computer can track one object on top of another with remarkable smoothness, or stabilize a bouncy shot to rock-solid steadiness. The reason to stabilize a shot is self-explanatory, but the uses of tracking are infinitely varied. With the process called “match move” an element can be added to a shot that moves convincingly with another element and even follow a camera move, or, conversely, an element in the scene can be removed by match-moving a piece of the background over it.

It is called “point tracking” because the trackers are locked onto a single point in the scene. Point tracking works by placing “trackers” onto several carefully selected targets in the frame then letting the computer track those targets frame-by-frame over the length of the shot. When the tracking phase is done the tracking data is then processed to produce the kind of motion track you are looking for – match move, stabilize, etc. Beyond just matching the translation motion, the tracker can also calculate image transforms like rotate and scale. Monitor insert shots are done by tracking the four corners of the monitor then using the tracking data for a corner-pin operation to lock the image into the monitor.

While we always want multiple trackers for more accurate tracking results, here is a list of the minimum number of trackers required for each type of tracking task:

1 point – translation only

2 points – rotate and/or scale

4 points – corner pin

14.4.1 The Tracking Operation

The first step is the actual tracking of the target object in the frame, which is a data-collection operation only. Trackers (the white rectangles) are planted on key points in the picture that are the tracking targets, like those in Figure 14.58. The computer will then step through all of the frames in the shot, moving the trackers frame-by-frame to keep them locked onto their tracking targets. This is the data-collection phase, so collecting good tracking data is obviously a prerequisite to a good motion track.

Figure 14.58 Trackers on tracking targets

You may have noticed that the trackers in Figure 14.58 have two boxes. The inner box is the match box. It is the pixels within this box that are analyzed for a match. The outer box is the search box, and it defines the area the computer will search looking for a match each frame. If the motion in the frame is large, the search box must also be large because the target will have moved a lot between frames and the target needs to stay within the search box each frame to be found. Some systems allow you to extend the search box just in the direction of motion, such as a horizontal pan, rather than simply making the entire search box larger all the way around. Since the size of the search box affects the processing time this is a time saver.

While the actual algorithms vary, the approach is the same for all trackers. On the first frame of the tracking, the pixels inside the match box are set aside as the match reference. On the second frame, the match box is repeatedly repositioned within the bounds of the search box and the pixels within it compared to the match reference. At each match-box position a correlation routine calculates and saves a correlation number that represents how closely those pixels matched the reference.

After the match box has covered the entire search box area, the accumulated correlation numbers are examined to find which match-box position had the highest correlation number. If the highest correlation number is greater than the minimum required correlation (hopefully selectable by you), we have a match and the computer moves on to the next frame. If no match is found, most systems will halt and complain that they are lost and could you please help them find the target. After you help it find the target, the automatic tracking resumes until it gets lost again, or the shot ends, or you shoot the computer (this can be frustrating work).

14.4.1.1 Selecting Good Tracking Targets

One of the most important issues at the tracking stage is the selection of appropriate tracking targets. Some things do not work well as tracking targets. Most tracking algorithms work by following the edge contrasts in the picture, so nice high contrast edges are preferred. The second thing is that the tracking target needs good edges in both X and Y. In Figure 14.58 trackers #2 and #3 are over targets with good edges in both directions, while tracker #1 is not. The problem for the tracker program becomes trying to tell if the match box has slipped left or right on the roof edge, since there are no good vertical edges to lock onto. Circular objects (light bulbs, door knobs, hot dogs seen on end) make good tracking targets because they have nice edges both horizontally and vertically.

A second issue is to select tracking targets that are as close as possible to the locking point, the point in the picture that you wish the tracked item to lock to. Take the fountain in Figure 14.59, for example. Let’s say that you want to track someone sitting on the fountain wall at the locking point indicated, and the shot has a camera move. There are potential trackers all over the frame, but the camera move will cause the other points in the picture to shift relative to the locking point due to parallax.

Figure 14.59

If the camera’s position is moved with a truck, dolly, or boom there will be noticeable parallax shifts in the shot that cause points in the foreground to move differently than those in the background. However, even if the camera is just panned or tilted there can still be subtle parallax shifts between points in the foreground and background.

Another variable is lens distortion. As the different trackers move through the lens’ distortion field at different times, their relative positions shift in the picture. The problem is that they are often not perceptible to the eye. They are, of course, perfectly obvious to the computer. All of these subtle positional shifts introduce errors into the tracking data that will result in the tracked object drifting or squirming around the locking point instead of being neatly stuck to it. Lens distortion may not be a problem if the tracking is concentrated in the center of the frame, which is the lens’ sweet spot. However, if lens distortion is preventing a good track then the only solution is to remove it with a lens distortion tool.

14.4.1.2 Bad Tracking Targets

Figure 14.60 Examples of bad tracking targets

There are two classes of known bad tracking targets that you should never use to track a shot – specular highlights and crossing elements. Referring to Figure 14.60, the red arrows point to specular highlights, which are attractive because they have high contrast with the background and are often conveniently circular. The problem with specular highlights is that they are not locked to a surface. When the camera moves they will drift across the surface. Very bad. The blue arrows point to crossing elements – the horizontal rod vs. the vertical edge. Again, very attractive as it provides nice contrast with horizontal and vertical edges, which we like. The problem is that as the camera moves the intersection of these two elements will drift due to parallax. Again, very bad. Never use specular highlights or crossing elements as tracking targets.

14.4.1.3 Tracker Enable/Disable

The tracking targets are frequently not conveniently on screen for the full length of the shot. The target may move out of frame or something may pass in front blocking it from view for a few frames. To accommodate this, any tracker worth its weight in pixels will have an enable/disable feature for each tracker. If the target for tracker #3 goes away on frame 100, that tracker must be disabled on frame 100. Should the target be blocked for 10 frames, it is disabled for those 10 frames. Later, during the calculation phase, the computer will ignore the tracker while it was disabled and only use the enabled frames for its calculations.

14.4.1.4 Offset Tracking

Another situation you will encounter in the real world of tracking is occluded (covered up) tracking targets. You are solidly tracking a clean target on a 100-frame clip when suddenly someone walks in front of it for 8 frames. Most programs have a solution for this called “offset tracking”. The basic idea is that while the real target is briefly occluded the tracker is repositioned over to a nearby target that is moving in the same way. Later, when the original target is revealed, the tracker is shifted back. The tracking program is told by the operator when the tracker is offset and when it is back on target. A second case for offset tracking is when a tracking target moves out of frame. Just before it leaves you offset the tracker to a nearby target that is, again, moving in the same way.

Figure 14.61 Offset tracking

Figure 14.61 illustrates the offset-tracking operation on a clip. In the first frame the white tracker is on the target. In the second frame the talent has stepped in front of the target so the tracker has been offset to a nearby point. In the last frame the tracker has been restored back to the target. Since the tracker is told when the tracker is offset it will subtract the offset distance from the tracking data so that the resulting tracking data will be as if the target was never covered.

The secret to successful offset tracking is to make sure that the new target is moving in the same way as the original target. If it is not you will get erroneous tracking data and a bad track. The requirement is that the offset target is the same distance from the camera as the original. If it is closer or further it will move differently due to parallax.

There are some situations where it will make sense to track the shot backwards, like the example illustrated in Figure 14.62. In this shot the mission is to track the left headlight, but it is out of frame at the beginning of the clip. However, if we start tracking the headlight from the end of the clip and track it backwards we can use an offset track as soon as the headlight gets to the edge of frame. This will maintain contiguous tracking data even though it completely leaves the frame, provided that a suitable offset-tracking target is available – which it often isn’t.

Figure 14.62 Tracking backwards

14.4.1.5 Keep Shape and Follow Shape

At the top of this section the roll of the search box and match box was described. When the tracking starts the program takes a “snapshot” of the pixels in the match box on frame 1, called the reference. It then compares this reference to frame 2 to find a match, then to frame 3, then 4, and so on. The point here is that it is using the same reference from frame 1 for the full length of the shot. That works fine if the target does not change size, rotation, or perspective over the length of the shot. But it invariably does.

If it changes only a little then the match error will slowly get larger towards the end of the shot. If it changes a lot then the tracker will error out and stop. It must then be given a new reference to resume. But what if the target is changing shape a lot? We might have to stop every 10 frames to give it a new reference. This is when the tracker is switched to “follow shape” mode. Of course, it will have a different name in your software, but the idea is the same.

Figure 14.63 Match reference

Figure 14.64 Shape change

Figure 14.63 illustrates a fine tracking target, the corner of a square. It has strong vertical and horizontal edges, and the match reference that it creates on the first frame of tracking is shown in the inset. But this particular square rotates. Figure 14.64 shows the same square a few frames later, and the inset shows what the match box sees then. This will correlate very poorly with the reference made from frame 1, so the system declares “no match” and halts.

The solution to this class of problem is to have the system follow the changing shape of the target each frame by using the previous frame’s best match as a new reference. Hopefully the target will not have changed very much between two adjacent frames. So the best match of frame 51 becomes the reference for frame 52, and so on. Creating a new reference each frame to follow a target that is constantly changing shape is the “follow shape” mode, and keeping the same shape as the reference over the entire shot is the “keep shape” mode. Your software will have different names, of course.

You really want your software to allow you to select which mode to use – keep shape or follow shape – as well as switch between them when appropriate. The reason is that while the follow shape mode solves the problem of a tracking target that changes shape, it introduces a new problem of its own. It produces much-poorer tracking data. This is because each frame’s match is based on the previous frame, so small match errors accumulate over the entire length of the shot. With keep shape, each frame is comparing to the same match reference, so while each frame has its own small match error, the errors do not accumulate.

My favorite metaphor for describing the errors between the keep shape and follow shape modes is to offer you two methods of measuring a 90-foot hallway. With the “keep shape” method you get to use a 100-foot measuring tape. While you may introduce a small measuring error because you did not hold the tape just right, you only have the one error. With the “follow shape” method you have to use a six-inch ruler. To measure the 90-foot hallway you must crawl along the floor repositioning it 180 times. Each time you place the ruler down you introduce a small error. By the time you get to the end of the hall you have accumulated 180 errors in measurement, and your results are much less accurate than with the single 100-foot tape measure. So the rule is, use keep shape whenever possible and switch to follow shape on just those frames where keep shape fails.

14.4.1.6 Pre-processing the Clip

There are a few things that you can do to improve the tracking results from a point tracker. One of the main things is managing the grain, which is covered in detail in the next section. The basic idea of pre-processing the clip is to prepare a separate version of the clip that the tracker will like better.

Point trackers start by making a luminance version of each frame to do the actual tracking calculations so we can help the tracker by making our own luminance version that has more detail, more contrast and less noise than the original clip. We might call this the “tracking clip” to differentiate this version from the original clip. This would only be done if you are having trouble with the trackers staying locked onto their tracking targets. There are several reasons why a tracker might break lock, which are covered in Section 14.4.1.11: Reasons for Failure, so be sure to eliminate those causes before going to the trouble of making your own tracking clip.

Again, the objective of the tracking clip is to provide the tracker with more useful detail and less noise than the original clip. So here’s the tip – most of the picture information is in the red and green channels but most of the noise is in the blue channel. So make a tracking clip that is a one-channel image that’s simply the sum of the red and green channels. This sum will make a pretty bright clip so you can gamma down on it, which will increase the contrast and make the tracking targets stand out even more.

Another preprocessing operation is to remove lens distortion. The lens distortion in the clip will cause trackers that approach the edge of frame to drift away from their true paths. These “drifty” trackers will then introduce drift into the final computed tracking transformation, which can give you a puzzling troubleshooting problem.

14.4.1.7 Coping with Grain

Tracking a grainy plate will introduce noise into the tracking data like the example in Figure 14.65. This noise will cause the motion data calculated from it to jitter. There are three solutions to this problem, two of which are problematic and one that is rock solid.

Figure 14.65 Grainy tracking data

The first problematic solution is to run a filter over the noisy data to smooth it out. While this sounds great on paper, the filter is smoothing the noisy curve but it is not necessarily moving the data points to where they really belong. What can happen is that you end up exchanging noisy data for drifty data. As a result, the tracked object will drift or “squirm” around the target location, not lock to it. Not recommended.

The second problematic solution is to degrain the plate before tracking. The reason this is not the best solution is that the degrain operation by its nature moves the pixels around. Again we are introducing drift into the tracking data. This is probably less wrong than filtering the data as it introduces less drift than the filtering method and in some cases may be good enough. You may try this, but be sure to do the stability test described below in Section 14.4.1.10: The Stability Test to confirm your track lock.

But what if you don’t have a good degrain tool? The third and totally correct solution is tracker stacking. The idea is to stack multiple trackers on the same target, but offset by several pixels. The offset is critical. If they are stacked right on top of each other they will simply collect exactly the same noisy data. By being offset each tracker collects its own slightly different data. Later, the trackers are averaged together to produce a single data track of great accuracy that will actually cancel out the grain jitter.

Figure 14.66 Tracker stacking

This is not the same as filtering the data. Filtering just averages each noisy data point with its noisy neighbors, which does not necessarily converge towards the correct data point. Averaging three samples of the same point definitely converges toward the correct data point. From a practical standpoint I suggest you first try the degrain solution as it is the quickest. If it does the job then go for it. If not, then switch to tracker stacking. But try to avoid filtering your tracking data.

One last tip on the grainy tracking problem is to increase the size of the match box. The larger the match box the less the grain alters the tracker’s data. To understand this just imagine the match box was only 10 pixels in size. The grain would now make up a huge fraction of the sampled data. Making the match box larger and larger makes the grain contribution smaller and smaller, and the track smoother and smoother. However, there are two downsides to making the match box large – the larger the match box the slower the tracking and the easier it is for the tracker to hop to a nearby similar feature. As in all things compositing, you will have to find the optimal solution for each shot.

14.4.1.8 Tracking Workflow

The first step in tracking a shot is to form a plan of action. Set the clip in your viewer and set the viewer to “ping-pong” (play forward and backwards) and study the shot. Look for quality tracking targets based on the rules above and make a list. Stare at a tracking target candidate as the clip ping-pongs, making sure it is good for the length of the shot. Decide which ones will be tracked forwards, which backwards. Note any that will need offset tracking because of occlusion or leaving frame. Make a plan.

Now you are ready to track. You can certainly place several trackers all over the frame then start tracking. However, after a few frames one of the trackers will error out and stop the tracking processes. You fix that one and resume, but a few frames later another tracker errors out. When you finish tracking and go back to check your tracking data it becomes a confusing mess as to which trackers are good and where the bad sections are.

The recommended workflow is to do one tracker at a time. Work that tracker for the full length of the shot until you have clean data and a good track. Disable that tracker, set your second tracker, and work it for the entire shot until you have good data. Disable the second tracker and move on to the third. Proceed through your list of tracking targets one at a time until done.

14.4.1.9 Cleaning up Tracking Data

Once you have collected the tracking data for a single tracker you should inspect the tracking data and clean it up before moving on to the next tracker. How do you inspect your data? Two ways. First, look at it.

Figure 14.67 Inspect tracking data for anomalies

Figure 14.67 shows some typical tracking data from Nuke. The tracking data should be smooth and continuous, which this mostly is. The green arrow points to a glitch in the data. For these frames the tracker got lost, then found itself a few frames later producing a short section of anomalous data. You can assume the data should be smooth and continuous, so you can simply drag those few control points to where they probably should be. The error will be brief and small so this should work fine most of the time, especially if you have used multiple trackers (it will be averaged out with the other trackers).

The red arrow points to a discontinuity in the data. Either the tracker hopped over to a nearby target or you set up an offset track starting on that frame, but the offset choice was not good. Either one will produce a discontinuity like this. If it is the offset track problem you can literally select the bad section of points and drag them down to hook up with the rest of the curve and you will probably be good. If not, delete the bad section and re-track just the bad section.

14.4.1.10 The Stability Test

The stability test is the recommended way to confirm that your tracking data is rock solid. The idea is to use the tracking data from a single tracker to stabilize the clip in X and Y. While the stabilized clip is ping-ponging in your viewer you can set the cursor as a reference right over the tracked point to see if it is chattering or slips off for a few frames then returns. If it’s chattering then you have noise problems. If it hops off for a few frames the tracker got lost for a bit, most likely from a temporary occlusion. Some tracking programs have a stability test built in as a feature, but if not, just grab a transform node and use the inverse of the tracker’s XY data to stabilize the clip yourself.

If your raw tracker data looks good you might skip the stability test for each tracker and go right for the transform stability test. Use all of the tracker data to calculate a stabilizing transform then do the stability test described above. While the single tracker test above can only do an XY stabilize, because that is the only data a single tracker provides, the transform can stabilize for translation, rotation, and scale. Again, ping-pong the stabilized clip and place the cursor as a reference point over what is supposed to be the locking point – the point in the shot that you want a tracked element to lock to.

With the stabilizing transform there should be no chatter or hopping off as those are typically individual tracker problems, but there could be drift. If there is drift it means that one or more trackers are not on the same plane as the others so are not moving with the group. A second possible cause for drift is if one or more trackers got too close to the edge of frame where lens distortion became significant. Find, fix, and retest.

14.4.1.11 Reasons for Failure

There is a long list of reasons why a tracker might “break lock” (my term for losing its lock on its tracking target). Running through this list will both give you guidance on how to set up for better tracking results, as well as assist in trouble-shooting when the tracker halts. All trackers will error out and halt, but there will be precious little information as to what is actually wrong. You have to intuit what the problem is yourself and then try to fix it. This is much easier to do if you understand how the trackers think. Here are some common causes of tracking failure:

Figure 14.68 Tracker breaks lock with search box too small

Search box too small – this can cause the tracker to break lock because it can’t find the tracking target in the next frame like the example in Figure 14.68. The target (fireplug) moved a short distance between frames 1 and 2 and the search box was large enough to find it in frame 2. However, on frame 3 the target moved much further and the search box was too small, so it broke lock. Some trackers allow you to also change the shape of the search box in the direction of motion so you could, for example, in this case extend just the left edge to follow a quick pan.

Search box too large – this can cause a tracker to “hop” over to a nearby similar-looking target. Figure 14.69 shows a tracker firmly locked onto the round portal window on the train with a nice big search box. However, as the train rolls by, the tracker was able to “see” the second portal window then hop over to it in Figure 14.70. When tracking targets with similar-looking objects in the vicinity you must carefully gauge the size of the search box.

Figure 14.69 Tracker locked onto first window

Figure 14.70 Tracker hopped to second window

Occlusions – sometimes things will pass in front of the tracking target and the tracker will either break lock totally or give you a few frames of flaky data like the example in Figure 14.67. If it breaks lock you will have to halt the track a frame before the occlusion, then resume after it passes. This will leave you with a short section of missing or bad data to repair later.

Focus change – the focus can change due to either a focus pull at the camera lens or the target moving out of the depth of field. The change of shape due to the defocus can cause the tracker to break lock. Back up a couple frames before the halt, set a new reference, then resume tracking until the next halt and repeat. Note that setting a new reference always introduces a small error in your tracking data so you want to keep it to a minimum.

Motion blur – a moving object will have some amount of motion blur either due to its own motion in frame or from a camera move. Either way, if the amount of motion blur changes, the tracker can break lock. Back up a couple frames before the halt, set a new reference then resume tracking until the next halt.

Shape change – the tracking target might change scale or rotate, which changes its shape to the tracker. Some trackers support a shape-changing target. However, they do this by periodically taking a new reference. But again, every time a new reference is taken an error is introduced into the tracking data.

Lighting change – should there be a big change in the lighting somewhere in the clip many trackers will break lock. You can back up and restart with a new reference, with its inherent additional error, or you might be able to animate a color correction to the plate to cancel out the lighting change.

As we have seen, a lot of things can go wrong that will spoof the tracker and cause it to either break lock or hop over to the wrong tracking target, or run amok altogether and fly across the screen. Hopefully this list will help you to anticipate and avoid problems to have a happy track, or at least help you quickly diagnose what’s wrong. If all else fails, you might switch to a planar tracker, described in Section 14.5: Planar Tracking.

14.4.2 Match Move

Now that you have some lovely clean tracking data we can move on to the second step, applying it to generating some kind of motion – either match move or stabilization. The tracking data can be interpreted in a variety ways, depending on the application. From the raw tracking data the computer can derive the motion for any or all of translate, rotate, and scale (zoom). If the mission is four-corner pinning then each of the four corners have to be individually tracked to provide data for a corner-pinning operation.

14.4.2.1 2D Transforms

The results of the tracking calculations are relative movements, not actual screen locations. The tracking data actually says “from wherever the starting location is, on the second frame it moved this much relative to that location, on the third frame it moved this much, on the fourth frame…. etc., etc.” Because the tracking data is relative movement from the starting position there is no tracking data for the first frame. The tracking data actually starts on the second frame.

To take a concrete example, let’s say that you have tracked a shot for a simple translate. You then place the object to be tracked in its starting location on the screen for frame one, and on frame two the tracking data will move it 1.3 pixels in X and 0.9 pixels in Y, relative to its starting location. This is why it is important that the trackers you place be as close as possible to the actual locking point. Tracking data collected from the top half of the screen but applied to an object that is to lock on to something in the bottom half of the screen will invariably drift.

Interpreted as match-move data, it must be fed to a transform operation that will actually perform the translate, rotation, and scale operations. Depending on the software design, it may be built into the tracker, or the data may be piped from the tracker to a separate transform operator. Either way, a key reminder here is to be sure to use the pivot point that the tracker calculates. For any rotate or scale operation the pivot point is critical, so failing to use it will take your match move badly off course.

14.4.2.2 Corner Pinning

Corner pinning is another type of transform commonly used with 2D tracking. The idea is to track the four corners of an image over a flat surface that is changing perspective due to a camera move. One of the most common applications is a monitor insert shot, but it is also used for other tracking situations such as a billboard or the side of a building. The reason monitor insert shots are so common is that it is not practical to shoot the monitor contents on the set because the lighting is too hard to control. So it is most often done in post as a visual effect. More work for us.

Figure 14.71 Monitor

Figure 14.72 Insert image

Figure 14.73 Corner-pinned composite

The process starts with tracking all four corners of the monitor shown in Figure 14.71. After the tracking data is all cleaned up it is piped to a corner-pin operator so that the insert image (Figure 14.72) is locked to the four corners of the moving monitor and composited in like the example in Figure 14.73. The key issues for a good corner-pin track is that the tracking data should not jitter and that the insert image should not drift. We have already looked at solving jittery track data, but corner pin drift is a new issue.

Figure 14.74 Closeup of monitor

Corner-pin drift comes from not tracking the corners at the right spot. Note the close-up of the corner of the monitor we are tracking shown in Figure 14.74. The aluminum bezel is not on the same plane as the actual screen. This offset will introduce a parallax between the bezel and the screen, which in turn will show up as drift in the tracking data. What to do?

There are two solutions. First, don’t track on the bezel. Note the tracking markers on the screen. Track on the angle marker that is near the corner. Great tracking data, but the insert image will not be exactly in the corner. The answer is in the corner-pin tool. Corner-pin operators have a “from” and “to” for each of the four points (your software may have different names). The “to” control point is where the tracking data goes, but the tracker is not in the exact corner. By adjusting the “from” control point, an offset for that corner is introduced that is used to dial the image’s corner to fit perfectly into the screen corner. Designers of corner-pin operators knew that the tracker would not always be in the exact location needed for the image corner, so this functionality is built in to correct for that.

The second solution is to go ahead and track on the bezel knowing that the tracking data will drift over the length of the shot. However, this drift will be both modest and uniform. The solution is to add a “drift correction” to the “from” control points by keyframing a correction offset over the length of the shot. This is not as hard as it sounds because you will usually only need a keyframe for the first, middle, and last frames. The drift will be that smooth and uniform. Be sure to use the stability test described above for each corner to confirm the wonderfulness of your work.

One last issue with monitor insert shots. Frequently, the camera move will take one or more corners of the monitor out of frame. If they go a little way out of frame you might escape with using offset tracking. This will not work if, for example, half the monitor goes out of frame. If that much goes out of frame you will need to switch to planar tracking, which is covered in Section 14.5: Planar Tracking. If the monitor goes completely out of frame you will have to use 3D camera-tracking which was covered in Chapter 8: 3D Compositing.

WWW Monitor Insert – this folder contains 240 frames of fun for testing your monitor insert skills, like in Figure 14.73.

14.4.3 Stabilizing

Stabilizing is the other grand application of tracking. It simply interprets the same tracking data in a different way to either remove the camera motion from the shot or lock down the motion of a selected object to hold it still for rotoscoping, for example. If the tracking data is smoothed in some way, either by hand or by filters, the bouncy camera move can be replaced by a smooth one that still retains the basic camera move. One thing to keep in mind about stabilizing a shot, however, is the camera motion blur. As the camera bounced during the filming it imparted motion blur to some frames of the film. When the shot is stabilized, the motion blur will remain, and in extreme cases it can be quite objectionable.

Once the tracking data is collected and formatted for stabilizing it is piped to an appropriate transform operator to back out all of the picture movement. The stabilizing is done relative to a reference frame chosen by the operator, often the first frame, but not always. All frames before and after are repositioned relative to the reference frame to hold the shot steady. This section addresses some of the problems this process introduces and suggests motion-smoothing rather than total stabilizing, which is often a more appropriate solution to the problem for a couple of reasons.

14.4.3.1 The Repo Problem

When a shot is stabilized, each frame is repositioned in order to hold the picture steady, which introduces a serious problem. As the frames are repositioned, the edges of the picture move into frame, leaving a black margin where the picture used to be. Let’s say that we want to stabilize the sequence of bouncy camera frames in Figure 14.75 by locking onto the tower and holding it steady relative to frame one. The stabilized version of the same sequence is shown in Figure 14.76, with the resulting severe black reposition margins.

Figure 14.75 Original frames with tracking target

Figure 14.76 Stabilized frames

Frame 1 in Figure 14.76, being the first frame in the stabilized sequence, was not repositioned since it is the reference frame to which all of the others are repositioned. Frames 2 and 3 in Figure 14.76 have been repositioned to line up with Frame 1, and as a result they have severe black margins. The fix for this is to zoom into each frame by scaling it up after it has been repositioned, such that all of the black margins are pushed out of frame. This means that the scale factor must be large enough to clear the black margin out of the worst-case frame.

The stabilized frames in Figure 14.76 have a white-dotted outline that represents the “lowest common denominator” of useable area that will remain after the scale operation. But zooming into pictures both softens and crops them. Therefore, as an unavoidable consequence of the stabilizing process, to some degree the resulting stabilized shot will be both zoomed-in and softened compared to the original. The nice client will, of course, not be aware of these issues so you would be well advised to inform them in advance. If you don’t then you will be blamed for the problem.

14.4.3.2 Motion Smoothing

Motion smoothing helps to minimize the two problems introduced from stabilizing a shot, namely the necessary push-in and the residual motion blur. While the push-in and softening from stabilizing is inherently unavoidable, it can be minimized by reducing the “excursions” which are the distances that each frame has to be moved in order to stabilize it. Reducing the maximum excursion will lower the amount of scaling required, and the maximum excursion can be reduced by lessening the degree of stabilization. In other words, rather than totally stabilizing the shot, just removing some of the motion may suffice. Since some of the original camera motion is retained with motion smoothing, any motion blur due to camera movement will look more natural compared to a totally stabilized shot.

Figure 14.77 Original tracking data

Figure 14.78 Smoothing curve added

Figure 14.77 illustrates one channel of tracking data for a stabilization shot. There are three ways to achieve motion-smoothing on this original data. The first is that your software may have it as a built-in feature and you can just dial down the stabilizing to some degree of smoothing. Done. The second approach is to run a filter over the tracking data before piping it to the stabilizing transform. That will knock down the more severe excursions and generally smooth things out. However, you lose control over the smoothing. My preferred solution is illustrated in Figure 14.78. Use the curve editor to draw your own smoothing line – the yellow line in this case – using the raw tracking data as a guide. This way you can control when and where the smoothing occurs. And control is the name of the game in visual effects. One other tip is that you don’t necessarily have to reduce the excursions equally in X and Y. Perhaps you can crop in less if one of the dimensions gets less correction than the other.

14.4.3.3 Stabilizing For Rotoscoping

Very often a shot that needs rotoscoping also has a camera move in it. Sometimes there is a lot of camera move if it is a hand-held camera. The camera moves makes your roto target gyrate around and makes a difficult roto task even harder. You can stabilize the clip, then do the roto on the stabilized version. Alternately, you could stabilize the clip to the roto target itself. When done with rotoscoping on the stabilized clip, simply invert the

Figure 14.79 Stabilizing for roto

stabilizing data so that it becomes match-move data and apply it to the roto. The roto will now track with the wildly gyrating original clip.

If the clip is stabilized just for the camera move, your roto target may get pushed out of frame. This can be solved by padding the shot prior to stabilizing it then rotoscoping the padded version. Figure 14.79 illustrates the procedure for padding a clip to allow for camera stabilization, then restoring it for the final roto.

WWW Roto Stabilize – here is another one of those annoying hand-held shots that needs to be stabilized before rotoing the character. You up for it?