How Video Works

Table 19.1 Common Data Rates

Format	Raster Size	Frames/ Sec	Scan	Approximate Data Rate^*
SD Video	720 × 486	29.97	Interlace	270 mb/s
HD 720p 60	1280×720	59.94	Progressive	1.5 gb/s
HD1080i	1920×1080	29.97	Interlace	1.5 gb/s
HD1080p	1920×1080	59.94	Progressive	3 gb/s
UHD	3940×2160	59.94	Progressive	12 gb/s

* These data rates are common approximations for comparison and are generally a bit higher than the actual data rate. For example, 1080i video, without any audio or metadata, would actually be:
1920 × 1080 × 20 bits (@4:2:2 sampling) × 29.97 Frames per second = 1.242915 gb/s

File Containers

Media files combine audio, video and metadata. Just as you need an envelope to use the U.S. Postal Service, media files need a container or wrapper to encapsulate data to store and move it within computer systems. And just as physical envelopes come in different shapes, sizes and material suited to various tasks, there are a number of media containers that store and move media data (Figure 19.1).

Figure 19.1 Encapsulation Process

One of the most important things a media container does is arrange and interleave the data so that the audio and video play out synchronically. If all the video data were at the start of the file, with the audio at the end, the entire file would need to be loaded into the computer memory before it could be played. Most computers don’t have enough memory for this, nor does the audience have the patience to wait for this to happen. Let’s take a look at different file containers.

QuickTime

One of the most familiar container formats is QuickTime, which was created by Apple. Files with the extension .MOV are QuickTime files. In addition to audio and video tracks, QuickTime files also support tracks for time code, text and some effects information.

Because QuickTime hosts a wide variety of audio and video codecs, and because Apple offers a free player for several operating systems, it is a very popular choice for web-based distribution. It is also a common format for the creation of files to be exchanged among content creators for review and approval copies.

There are several related containers that are very similar in structure to QuickTime, but were designed for special purposes. MP4 files (including .mp4, the audio only version .m4a, as well as others) are a container developed by the MPEG committee to interact specifically with MPEG-4 compression features not supported by QuickTime itself. 3GP is another format based on QuickTime, but in this case optimized for mobile delivery over cellular networks.

NOTE Remember that QuickTime, MP4 and 3GP are different types of containers. They don’t control or determine the type of compression they contain and carry. The compression type is determined by the codec. Codecs are described later in this chapter.

Windows Media

Files ending in .WMV and .ASF are Windows Media Files. This is a Microsoft design whose name is used to represent both a compression codec and a container format. Normally if a file has the .WMV extension, it is both a Windows media container and compressed with Windows media compressor. If .ASF is the extension, then the container holds a different codec.

Flash

Flash format (.flv, .f4v) is another popular container format. This is an Adobe product that was developed to contain animations built using a format called SWF. It was later extended to allow audio and video in the H.264 codec, discussed later in this chapter. This is mostly seen as files playing in a web browser via a plug-in that must be downloaded by the user. Adobe makes the browser plug-in available for free. As a result, Flash is a very popular container for web distribution of video content.

WebM

This container is a more recent development for distribution of web files in association with HTML 5 initiatives. While this supports limited codecs, it is making headway in some areas with large services like YouTube making their content available in this format (Figure 19.2). (WebM is discussed in more detail later in this chapter.)

Figure 19.2 File Container Formats

Media Exchange Format

The MXF format is perhaps the most popular in large-scale media production organizations. This format was developed from the beginning to be a SMPTE format that equipment manufacturers could use to exchange content among their products. Computerized systems used in television can require media to be wrapped in different ways. To accommodate this, the standard allows for several variations in the format. These differences are categorized as Operational Patterns, or OPs. While there are currently 10 different specs for OPs, the two most common are OP-Atom and OP-1a. The biggest difference between the two is how the audio and video are stored on disk.

OP-Atom is designed for editing systems that store the audio and video as different files. Early computer editing systems had difficulty getting all the data needed to edit from a single disk, so the audio and video were stored on different drives. While that is no longer the case, some edit systems still use separate files for each element.

OP-1a is a single file in which all the audio and video are stored in the same container. Devices such as video servers use this format.

Codecs

As you learned in Chapter 14, data is normally compressed to reduce the amount of information with which our computers have to cope. There are many different algorithms, or codecs, that can be used for this job. Some are best suited for use when capturing images, while others are meant for editing. Some work well for distribution, with variants for the kind of medium that will carry them. Often codecs developed for one purpose are also used in others areas.

NOTE Remember that the word codec is derived from the words compression and decompression.

There are hundreds of codecs that have been created, with newer more efficient ones replacing previous generations. Let’s take a look at some of the more common codecs that are used in the production and post production process, as well as those used to stream media.

Acquisition Codecs

Image Capture comes in many forms from the camera phone through cameras designed for digital cinema quality. Low-end cameras look for codecs that allow long captures to be stored in a small space. The goal at the high end is to capture as much detail as possible to allow the greatest flexibility in post production image processing.

At the high end are codecs such as ArriRaw and Redcode. The Arri format for the cameras they produce is not compressed at all and is, as the name implies, the raw data from the image sensor. Red-code, from the makers of the Red Camera series, is compressed, very gently, in a codec that uses JPEG 2000 wavelet compression. Sony also offers a gently compressed acquisition codec in their SR format. While the files these codecs produce are huge, they contain all the data captured by the sensor. For later post production manipulations like compositing and color grading, all the available detail gives a greater range of creative options.

NOTE These cameras, like those of the other high-end camera makers, can also be set to shoot in several other formats as well.

Adobe created CinemaDNG format as an open non-proprietary format for image capture. This is a moving image version of their Digital Negative Format (or DNG), which supports gentle lossless compression. Several camera makers support this format including Black Magic.

A somewhat more compressed image can be captured using codecs that were originally designed for editing. Avid DNxHD and Apple ProRes are two examples. Both of these formats allow for various quality captures by choosing different data rates. Originally created for editing, these codecs capture each frame as a separate element. Both have top data rates of about 200–250 Megabits per second (Mbps), or about 7:1 compression ratio. In addition to on-camera recording, these are popular choices for video servers recording studio television shows, often with data rates as low as 100 Megabits per second. An additional motivation to use these formats is that the captured material is edit-ready. No additional processing is necessary to prepare for the edit room. The clip files may simply be quickly copied to the edit system. Obviously, this is tremendously helpful for workflows that have short deadlines.

The next step down the quality ladder is to cameras that record in more heavily compressed formats such as Sony XDCam and Panasonic AVCHD. The data rate for these formats is in the 30 to 50 Megabits per second range. Both compression types are variations of the MPEG 4 Part 10, or H.264 format. Cameras in this class are often used in newsgathering and other projects that do not require heavy image compositing or color grading. Since much of the detail is compressed out of these images, they are not suited to heavy post production manipulation.

Finally at the lower end of the image capture spectrum are cameras that record in very long GOP versions of codecs like H.264. Compression for these images is very lossy, with data rates often less than 1 Mbps. While these can produce acceptable images, they are very difficult to work with in post production and often must be transcoded to a different format for editing. Camera phones and inexpensive digital video cameras sold as consumer devices fall into this group of equipment.

Codecs Used in Post Production

As mentioned earlier, there are codecs that were designed specifically for the unique requirements of editing. As examples, Avid created a group of codecs called DNxHD, while the engineers at Apple have produced several versions of their ProRes codecs (Figure 19.3). Both of these are Intraframe-based codecs, meaning they do not require information from adjacent frames to produce images.

Many codecs achieve high compression ratios by taking advantage of redundant information in adjacent frames. This does make editing a challenge. In order to make a cut at a specific frame, the computer must create that frame, along with the new one that follows, by decoding the picture from neighboring frames. While this is possible, it is somewhat taxing on the system and can produce a less than satisfactory experience for the person making the edits.

Figure 19.3 Various Codecs Using .MOV File Container Format

While there are codecs designed just for editing, most edit programs will work with many other codecs as well. Often editors will choose to work in the codec that their material was acquired in for speed and simplicity in workflow. For example, some larger networks and program producers have settled on XDCam at 50 Megabits per second as their common format. This offers an acceptable compromise between quality, speed and operability.

Contribution and Mezzanine Codecs

Frequently there is a need for compression that is an intermediate between uncompressed footage and distribution codecs. For example, let’s say the cameras at a sports event are capturing the footage RAW or uncompressed. When it’s time to uplink that footage or send it over a fiber optic line back to the network, the signal has to be compressed to reduce its file size. The encoder that does that is set to the largest compression ratio the channel can handle. That level of compression is referred to as contribution quality. Of course, when the footage gets to the network, it might be further compressed to be recorded on a server, or it might be uncompressed to be switched with other material as a live feed. Other uses of contribution quality compression include storing and playing out program content and commercials, and archiving finished work.

NOTE Examples of this group are codecs such as H.264, HEVC (H.265), MPEG and JPEG 2000. When used at contribution quality levels, the codecs are adjusted to higher bit rates than when used for distribution.

Another name used for this type of compression is mezzanine. Much like the mezzanine is half way between floors in a building, you can think of mezzanine compression as a middle level of compression. One common use of this type of compression is when preparing files for distribution channels such as YouTube. The finished edit in the edit codec for an hour-long show can run to many Gigabytes, far too large to easily send to the web host. A compression is made to reduce the amount of data to something easier to upload. The web host site then processes that file into several variations at different bit rates to make it publically available. The web site will offer either automatic or manual selection of the best bit rate for each user’s connection. Thus, what you have sent to be re-encoded is a middle or mezzanine format.

Digital Distribution Codecs

For digital distribution such as Internet streaming, which is discussed in more detail in Chapter 21, there are certain formats that are more effective than others. One codec in particular, MPEG-4, was developed especially for streaming media on computers that have a much lower signal throughput or bandwidth than digital television or DVDs. It has heavily influenced three specific areas, which are interactive multimedia (products distributed on disks and via the web), interactive graphic applications (for example, the mobile app “Angry Birds”) and digital television (DTV).

Audio Codecs

Just as there is a variety of video codecs, there are numerous audio codecs that have been created for different purposes.

MP3 is a lossy data compression scheme for encoding digital audio. MP3 was the de facto digital audio compression scheme used to transfer and playback music on most digital audio players such as iPods. It is an audio-specific format designed by the Moving Picture Experts Group (MPEG) to greatly reduce the amount of data required to faithfully reproduce an audio recording as perceived by the human ear. The method used to produce these results is called perceptual encoding.

In perceptual encoding, compression algorithms are used to reduce the bandwidth of the audio data stream by dropping out the audio data that cannot be perceived by the human ear. For example, a soft sound immediately following a loud sound would be dropped out of the audio signal saving bandwidth.

NOTE MP3 is formally known as MPEG-1 (or MPEG-2) Audio Layer III.

AAC (Advanced Audio Coding) is a lossy compression and encoding scheme for digital audio, which is replacing MP3. Using a more sophisticated compression algorithm than MP3, AAC audio has a vastly better audio quality at similar bit rates as MP3.

As part of the pervasive MPEG-2 and MPEG-4 family of standards, AAC has become the standard audio format for devices and delivery systems such as iPhone, iPod, iTunes, YouTube, Nintendo 3Ds and DSi, Wii, the PlayStation 3 and the DivX Plus Web Player (Figure 19.3). The manufacturers of in-dash car audio systems, capable of receiving Sirius XM, are also moving away from MP3 and are embracing AAC audio.

Windows Media Audio (WMA) is a proprietary lossy audio codec developed by Microsoft to compete with MP3 and the RealAudio codecs. It is used by the Windows Media Player. Variations of it are compatible with several other Windows or Linux players. This audio format and codec can be played on a Mac using the Quick-Time framework but it requires a third-party QuickTime component called Flip4Mac WMA in order to play.

QuickTime

Windows Media

Flash

WebM

Media Exchange Format

19
Managing Video Files

File Containers

Other Containers

Switching Containers

Codecs

Acquisition Codecs

Codecs Used in Post Production

Contribution and Mezzanine Codecs

Digital Distribution Codecs

MPEG-4 Codecs

Google’s WebM Codec

Audio Codecs

Distribution Codecs

19Managing Video Files

File Containers

QuickTime

Windows Media

Flash

WebM

Media Exchange Format

Other Containers

Switching Containers

Codecs

Acquisition Codecs

Codecs Used in Post Production

Contribution and Mezzanine Codecs

Digital Distribution Codecs

MPEG-4 Codecs

Google’s WebM Codec

Audio Codecs

Distribution Codecs

19
Managing Video Files