Chapter 3. Images

In the previous chapter you learned how to create vector graphics, a series of lines and paths (and sometimes text) that have no predefined resolution and can be composed of multiple colorspaces and attributes. However, in many cases you may need to utilize a raster image (sometimes called a bitmap image) on your page. This chapter introduces them to you.

Raster Images

When most people think about raster images, they think about standard raster image formats such as JPEG, PNG, GIF, or TIFF. And while those formats do contain raster image data, they also contain all sorts of other things in a form of “image package.” For PDF, however, you can’t use the full package (except in one special case—see JPEG Images), and you need to “unwrap” it to get at the raw form that PDF expects.

This “raw form” is just a series of pixels, or in more technical terms, a two-dimensional array of those pixels (the two dimensions being the height of the image and the width of the image). For example, in Figure 3-1, the height is 40 pixels and the width is 46 pixels.

Figure 3-1. Large pixel (FatBits) image

Each of the pixels in that image, as mentioned previously, is really itself an array of values—one per number of colors (also known as color components) in the color space. If this were a DeviceRGB image, then each pixel would have three elements. However, if it were a DeviceCMYK image, there would be four, and if it were a DeviceGray image, there would be only one component. This shouldn’t come as a surprise; it exactly matches the number of operands that the color space operators take! See Basic Color for more on color space operators.

Finally, to understand how the data for the image is going to be arranged, you need to know how many bits of data are needed for each component of each pixel. Most developers only think in terms of 8 bits per component, which is why RGB images are sometimes referred to as 24-bit color images (8*3 = 24). But PDF supports a much richer set of options here, allowing 1, 2, 4, 8, or 16 bits per component (although in practice, only 1, 8, and 16 are used).

1 0 obj
<<
    /Type                /XObject
    /Subtype             /Image
    /Height              40
    /Width               46
    /ColorSpace          /DeviceRGB
    /BitsPerComponent    8
    /Length              5520    % 40*46*3
>>
stream
% pixel data goes here
endstream
endobj

Adding it to the resource dictionary is no different than the examples from the last chapter with ExtGState resources. So the resource dictionary would look something like Example 3-2.

Example 3-2. Image resource dictionary

% in the page dictionary
/Resources <<
    /XObject <<
        /Im1    1 0 R    % reference to our 1 0 obj in the previous example
    >>
>>

Images in content streams

The simple part of working with images in your content stream is knowing (and using) the Do operator, which takes a single operand: the name of the resource. However, if you only knew that and tried to just use it alone in the stream, like this:

/Im1 Do

you wouldn’t see anything on the page and you’d wonder what went wrong.

What is wrong is that drawing image XObjects requires special handling of the CTM. If you want to understand the background for this, consult ISO 32000-1:2008, 8.9.4—but for now, the important thing to understand is that rather than the normal identity matrix of 1 0 0 1 0 0, an image XObject has a default of w 0 0 h 0 0 (where w is the image’s width and h is its height in pixels, as defined in the image dictionary). Thus, for our image to appear in the lower-left corner of the page, without any scaling or other transforms, our content stream needs to look like Example 3-3.

Example 3-3. FatBits fish

q            % you don't need the q/Q, but it's a good habit!
46 0 0 40 0 0 cm
/Im1 Do
Q

As with the paths we worked with in the previous chapter, you can apply any combination of transformations to your image—scaling, rotation, etc. Just remember to always start with the image’s size.

JPEG Images

One of the various filters that can be applied to a stream object is the DCTDecode filter. A DCTDecode stream is equivalent to a JFIF file, also known colloquially as a JPEG (or .jpg/.jpeg) file. JPEG files are the only standard image format that can be placed into a PDF without any modification—you just read the data stream from the file and then write it into the value of the stream object in the PDF, as shown in Example 3-4. Of course, you will either need to know a priori the size, colorspace, etc. of the image or have some way to parse the JPEG to obtain those values.

Example 3-4. PDF image based on JPEG data

1 0 obj
<<
    /Type                /XObject
    /Subtype             /Image
    /Height              246
    /Width               242
    /ColorSpace          /DeviceRGB
    /BitsPerComponent    8
    /Length              16423
    /Filter              /DCTDecode
>>
stream
% image data right from the JPEG goes here
endstream
endobj

Transparency and Images

While the normal behavior of an image in the PDF imaging model is for all of the pixels to be drawn on top of anything below them, there are ways that the image can express that some parts of itself are either completely or partially transparent. The original methods that PDF supports are called masking, as they completely “mask out” (off vs. on) a set of pixels based on the provided criteria. The newer methods use the same transparency model (see Basic Transparency) as that of paths, where you can have levels of transparency. Because we’ve already looked at that model, we’ll cover that first.

Soft Masks

As you learned earlier, images in PDF are in a defined colorspace and as such have a defined number of components. In the case of RGB, there are three components: red, green, and blue. That means that there is no room for transparency information. The normal way to address this—as most common image formats, such as PNG and TIFF do—is to simply define a new color(space) with four components (ARGB or RGBA), where the fourth component of each pixel is its transparency value. However, as you also learned, one of the key goals when transparency was introduced in PDF was to maintain 100% backward compatibility with nontransparency-aware implementations. That prevented the use of a new colorspace.

So instead of the new colorspace, the transparency values for each pixel are stored in a separate image XObject. This “soft mask” image is referenced from the original image XObject via the SMask key in its dictionary. The soft mask image’s dictionary is a standard image dictionary, subject to a few restrictions (see ISO 32000-1:2008, Table 145) —most importantly, the colorspace of the image must be DeviceGray. Allowing only DeviceGray makes perfect sense, since that’s a one-component-per-pixel colorspace, which matches our missing component from RGBA/ARGB. This is also why in most cases the width and height of the parent image and its soft mask will match (although this is not required).

If you were to take the Example 3-3 from earlier and use a soft mask to make all the white parts transparent, you’d have something like Figure 3-2.

Figure 3-2. FatBits fish with a soft mask

% this is the soft mask
10 0 obj
<<
    /Type                /XObject
    /Subtype             /Image
    /BitsPerComponent    8
    /ColorSpace          /DeviceGray
    /Filter              /FlateDecode
    /Height              40
    /Width               46
    /Length              166    % smaller for compression
>>
stream
% masking data goes here
endstream
endobj

% this is the parent image
11 0 obj
<<
    /Type                /XObject
    /Subtype             /Image
    /BitsPerComponent    8
    /ColorSpace          /DeviceRGB
    /Filter              /FlateDecode
    /Height              40
    /Width               46
    /SMask               10 0 R
    /Length              166
>>
stream
% image data goes here
endstream
endobj

Stencil Masks

While soft masks are the most powerful, because they allow for varying levels of transparency, sometimes all you need is to be able to turn off a set of pixels so that they don’t draw on top of whatever is behind them. The equivalent of a soft mask but with simple on/off properties is called a stencil mask. It works almost exactly like the soft mask, except that rather than the masked image being in DeviceGray at (usually) 8 bits per component, it has no colorspace and is always 1 bit per component. Each bit represents a pixel’s on/off state, with 0 meaning on (mark) and 1 meaning off (leave alone). Should it be necessary, those values can be inverted through the presence of a Decode key in the stencil mask’s image dictionary with a value of [1 0].

If you were to use a stencil mask to mask out the FatBits fish, it might look like Example 3-5.

Example 3-5. FatBits fish with a stencil mask

% this is the stencil mask
10 0 obj
<<
    /Type                /XObject
    /Subtype             /Image
    /ImageMask           true
    /BitsPerComponent    1
    /Height              40
    /Width               46
    /Length              230    % (40*46)/8
>>
stream
% masking data goes here
endstream
endobj

% this is the parent image
11 0 obj
<<
    /Type                /XObject
    /Subtype             /Image
    /BitsPerComponent    8
    /ColorSpace          /DeviceRGB
    /Filter              /FlateDecode
    /Height              40
    /Width               46
    /Mask                10 0 R
    /Length              166
>>
stream
% image data goes here
endstream
endobj

Color-Keyed Masks

In some cases, however, such as with our fish, all you really need to do is inform the PDF viewer that it should just ignore (mask) any pixels of a specific color (in our case, white). This simplest form of masking is called color-key (or chroma-key) masking, and it works like a blue screen in the movies. Anything that is in the defined color is not drawn.

To use a color-key mask in PDF, no secondary image is needed; you just need to know what color(s) you wish to have masked out. The Mask entry in the image dictionary will have as its value an array containing 2*n entries (where n is the number of components in the image’s colorspace): each pair specifies the minimum and maximum values for that component to be masked. Therefore, for our RGB image, we need six values. Since we only want to mask out white, the minimum and maximum are both the same—255 (the value of white). Using this approach, the full image dictionary looks like Figure 3-3.

Figure 3-3. FatBits fish with a color mask

11 0 obj
<<
    /Type                /XObject
    /Subtype             /Image
    /BitsPerComponent    8
    /ColorSpace          /DeviceRGB
    /Filter              /FlateDecode
    /Height              40
    /Width               46
    /Mask                [255 255 255 255 255 255]
    /Length              166
>>
stream
% image data goes here
endstream
endobj

Vector Images

PDF doesn’t really have the concept of a vector image, such as an EPS or EMF file. Instead, what it has is a way to encapsulate a content stream into a reusable object. As with raster images, this is another type of XObject called a form XObject.

Note

The name “form” here comes from the PostScript usage of the term; it has nothing to do with an interactive form, which is another type of PDF construct that you’ll see in Chapter 7.

Adding the Form XObject

Just as with image XObjects, you need to do three things:

Create a form dictionary and add the data.
Add a reference to the form dictionary in a resource dictionary.
Refer to the resource in a content stream.

The Form Dictionary

A form XObject is a content stream (discussed in Content Streams) with a form dictionary associated with it that provides some extra intelligence and support to that content. Anything you can put in a page’s content stream, you can put into a form XObject. This is very powerful since it enables a means to reuse an entire PDF page in some other context (for example, imposition or stamping). You’ll see how to do that later in this chapter, but for now, let’s take a look at the form dictionary’s special fields.

There are four special keys in the form dictionary that are important in their creation:

BBox: The most important one, and the only one that is required, is the BBox key. The BBox is an array representing a bounding box for the content. You can always use a rectangle larger than the actual content size, but a smaller one will cause your content to be clipped to that size.
Matrix: The Matrix is a standard transformation matrix that will be applied to all instances of the XObject whenever it is drawn. It is almost always the identity matrix, since any specific transformation would be applied in the invoking content stream. Since the default value for the key is the identity, there is no reason to include this in the dictionary unless the transform is something else.
Resources: Just like a page, your content stream may need extra resources (ExtGStates, fonts, etc.), and this is where you would reference them.
FormType: This one is not only optional, but serves no practical purpose since there has only ever been a single value for it (1). Don’t bother writing it into your PDFs; it only wastes space.

Example 3-6 shows a simple form XObject.

Example 3-6. Simple form XObject

1 0 obj
<<
    /Type              /XObject
    /Subtype           /Form
    /BBox              [0 0 100 100]
    /FormType          1  % optional, only here for example
    /Matrix            [1 0 0 1 0 0] % optional, only here for example
    /Length            180
>>
stream
     0 0 1 rg          % set the color to blue in RGB
     0 0 100 100 re    % draw a rectangle 100x100 with the bottom left at 0,0
     F                 % fill it
endstream
endobj

You would then add this to the page’s resource dictionary and reference it in the content stream exactly as you did for the image (see Example 3-7). However, unlike with images, for form XObjects you use a standard CTM and don’t need to worry about the size (since it’s not described in pixels).

Example 3-7. Referencing the XObject in the page dictionary

% in the page dictionary
/Resources <<
    /XObject <<
        /Im1    1 0 R    % reference to our 1 0 obj in the previous example
    >>
>>

                          % in the page's content stream
q
        1 0 0 1 0 0 cm    % as with normal content, this means 100% at 0,0
    /Im1 Do
Q

The Do operator, when used with a form XObject, uses the keys we discussed earlier as part of its rendering or painting. The operations are as follows:

Saves the current graphic state, as if by invoking the q operator
Concatenates the matrix from the form dictionary’s Matrix entry with the current transformation matrix (CTM)
Clips according to the form dictionary’s BBox entry
Paints the graphics objects specified in the form’s content stream
Restores the saved graphic state, as if by invoking the Q operator

Note

Since the painting of the form XObject automatically invokes a q/Q pair, there is no need to have them as the start and end inside the form XObject’s content stream.

Copying a Page to a Form XObject

If you are building a tool to impose or stamp PDF documents, one common operation is to convert a PDF page into a form XObject, so that it can easily be incorporated into any other page. Assuming you have a PDF library that is able to work with the object model of a PDF, the following instructions should help you do the conversion:

If the page has an array of content streams, combine them into a single one.
Compute the BBox based on the page’s MediaBox and Rotate keys.
Copy the resource dictionary from the page to the form XObject (if you are doing this in the same PDF, use a shallow copy instead of a deep one).

Note

Since it is deprecated by ISO 32000-1:2008, you can remove the ProcSet key, if present.

What’s Next

In this chapter, you learned about how PDF works with images—both raster and vector—via XObjects. Next, we’ll look at drawing text.

Chapter 3. Images

Adding the Image

Soft Masks

Adding the Form XObject