Chapter 7
EPUB Security
Understand the various security methods for EPUB
Understand how security affects EPUB
Learn how to secure embedded fonts
Learn how to know when EPUB contents have changed
New books can be purchased for your EPUB reading systems. These newly released books need to prevent readers from sharing the published work with everyone else on the Internet. EPUB files can be secured to prevent them from easily being shared.
With nonsecured EPUB files, the embedded fonts can be secured to prevent anyone from extracting the fonts. Once a font is extracted, it could be shared on the Internet. Sharing the font may not be allowed if the font is copyrighted.
It can be important for readers to know when the contents of an EPUB file have changed. Readers can even know if contents have been added or removed by signing the EPUB.
Introduction to EPUB Security
Most security is put into place by digital publishers. If you create an e-book and submit it to a publisher to be sold, they may choose to encrypt the file. Since the file is encrypted, it cannot be shared among readers on the Internet. By allowing sharing, the author of the EPUB and the publisher lose revenue because more readers are getting the book for free.
Security limits the purchaser from easily sharing the EPUB and is called digital rights management (DRM).
NOTE
It should definitely be noted that encryption schemes can be cracked by knowledgeable programmers. Removing encryption is beyond the scope of this book and is currently illegal. The issue is hotly debated, however. Purchasers of EPUB books argue that if they purchase the book, they should have the right to decrypt the book and convert it to another format.
Remember from Chapter 1 that the ZIP file cannot be encrypted nor can the META-INF folder and its contents. With an encrypted EPUB, the META-INF directory can contain three extra files: rights.xml, encryption.xml, and signatures.xml.
You may be asking how it all works. Well, let’s look at it a piece at a time.
Public Key Infrastructure and RSA
Usually, everything starts when an account is set up on a web server. Let’s say you want to download an EPUB from your local library.
Online Libraries
Most local libraries are connected to online libraries. You can ask your local librarian, or go to the website for your local library.
There should be a link somewhere to allow access with your library card. If not, try going to www.search.overdrive.com and clicking Library Search. Enter your ZIP code and then find your library on the list. Once you click your library, you should be taken to a page that shows information about it. On this page should be a link to a digital library associated with your local one.
Books can be downloaded in a variety of formats, including audiobooks. The digital books are usually for Adobe Digital Editions (ADE), which is a free download to view ADE PDFs and EPUBs. These files contain DRM security, but can be viewed for a limited time.
Another format sometimes available is Open EPUB. An Open EPUB file is a standard EPUB file with no security.
To download an e-book, you are required to set up an account. When this account is set up, the server uses your information to generate encryption keys. One key is kept on the server, while the other is for your reading system.
The keys generated are part of the public key infrastructure (PKI), which is an asymmetric key cryptography. More specifically, the portion of PKI is the public-key cryptography standards (PKCS). The PKCS has various encryption methods, but DRM uses PKCS #1, the Rivest, Shamir, and Adleman (RSA) standard.
The two keys, or certificates, are referred to as a public key, which is kept by the server, and a private key, which is kept by you on your reading system. Only the specific private key can decrypt the data encrypted by the public key. (This is just a basic explanation—this is a detailed topic that could require a whole book to explain fully.) DRM is not managed by individuals but by companies. There are currently no applications to download that allow an individual to apply DRM to their own content.
When you get a book from the server, it encrypts the book using the public key, or digital certificate, it generated. The encryption method is a very in-depth mathematical algorithm that obscures the original content. Once the e-book is downloaded and opened, your reading system will use the private key it received to decode the file for viewing.
Within an EPUB, all XHTML, CSS, images, and fonts can be encrypted. Every file can be encrypted, or only a few, but not the mimetype, META-INF folder, and the OPF file.
Once a file is opened, it is decrypted using the private key received from the server. Your application can check with a certificate authority (CA) to validate that your key is still active. If your key has been revoked, then the book will not open. If a key is revoked but you have purchased the EPUB, you need to contact the seller and have the certificate renewed.
Once it has been approved, then other factors are checked, such as the expiration date. Similar to a regular library book, a digital book can be viewed only for a certain time. Once the time expires, the e-book can no longer be viewed. Another factor checked can be on how many devices the e-book is loaded. You are also prohibited from printing or converting the e-book.
There may be more or fewer limitations, depending on the publisher of the book and from where it was downloaded.
Advanced Encryption Standard
Another security method used is Advanced Encryption Standard (AES). The data is put through matrices and encrypted using a complicated algorithm. AES is used to change the data so that no one can easily open and read the EPUB contents. If they could, the EPUB could be shared on the Internet.
The files are encrypted and placed into the ZIP file and renamed to EPUB. The encryption.xml file is generated, as discussed later in this section, and placed into the EPUB as well.
The key generated earlier by PKI is used to alter the files and encrypt them. The private key placed on your reading system is used to decrypt the file so the reading system can display a readable publication.
AES is followed by a number, which refers to the size of the key. DRM uses AES-128.
EPUB Security Files
As mentioned in Chapter 1, the three files for DRM are rights.xml, encryption.xml, and signatures.xml. The files are located in the META-INF folder.
There are two main EPUB DRM schemes: Adobe and Apple. We’ll deal with the Adobe DRM scheme to show how the files are set up. Each file has a different use, and we can cover them individually.
rights.xml
The rights file is used to specify the rights a user has to view a specific EPUB. The rights .xml file is stored unencrypted in the META-INF folder.
The best way to learn about the rights.xml file is to view one. Here are the contents of a rights.xml file:
The first line—<?xml version="1.0"?>—denotes that the file is an XML file.
The second line is a section that contains the rest of the file. It shows that the file is protected by Adobe Adept: <adept:rights xmlns:adept="http://ns.adobe.com/adept"> and the distribution server is at http://ns.adobe.com/adept.
After this, we get into the token information at <licenseToken xmlns="http://ns.adobe.com/adept">. Again, we see the information about the distribution server for checking the token information.
The next few lines are about the purchaser and device of the e-book:
Starting with the first line, we have the user ID, resource ID (ID of the e-book), resource type (EPUB), device type (standalone), device ID, voucher ID, URL of license server, URL of operator, fulfillment ID, distributor ID, and finally the encryption key.
Next we get into the permissions allowed for the EPUB:
The permissions allow the display of the EPUB until the specified time. Here the time is until 2013-03-15T21:51:17Z, which is March 15, 2013, at 21:51:17 Zulu time. The reader may also take excerpts and print from the EPUB until the expiration date.
Finally we come across the following:
The license service info section shows the license server URL (Uniform Resource Locator) and the certificate associated with the server. The certificate is used to make a connection between the server and its public key. This allows verification of the certificate and authentication of server.
encryption.xml
The encryption.xml file is used to specify the encryption of the EPUB contents. It gives more detailed data about the EPUB than the rights.xml file. The file is stored unencrypted in the META-INF directory.
Like the rights.xml file, it is best to view one in order to better understand it. Here is a portion of the encryption.xml file that goes with the rights.xml file covered in the previous section.
NOTE
Some data is the same in each file, as we will soon see. The repetitions are removed to give you a more concise example.
As usual, the first line specifies that the file is an XML file.
The next line is the <encryption> section, which contains the rest of the elements. The <encryption> line includes the XML namespace information of the file.
After this, there are a number of <EncryptedData> tags, which are used to specify each file within the EPUB that is encrypted. For the listed example, the repetitions of the various HTML files have been removed as well as the images. Let’s look at the first one:
All of the other files are identical in nature except for the CipherReference URI.
The first line shows the namespace for the encrypted data scheme. You can go to www.w3.org/2001/04/xmlenc# to see more information. The next line is the encryption method: AES-128.
The next section is the key information. The information listed here is the resource server and resource ID. The resource ID here matches the resource ID in the rights.xml file.
The next section contains the cipher reference, which relates to the file being encrypted; the data from the rights.xml file can be used to decrypt the data. Each file that is encrypted in the EPUB must be listed an “Encrypted Data” section. As you can see from the sample listed, you can encrypt XHTML, images, CSS, NCX, and fonts.
Within an EPUB 3 file, audio and video files can also be encrypted.
Font Mangling
Font mangling is used to encrypt fonts within the EPUB. The main reason to do this is to protect copyrighted fonts. As we discussed in Chapter 1, an EPUB is nothing more than a ZIP file. Since the ZIP file cannot be encrypted itself, the files are available to anyone who can open a ZIP file and extract the contents. If the fonts can be extracted, then they can be freely used. If the fonts are copyrighted, however, then they must be protected in some manner.
Look at the sample file c07-01.epub (located at www.mhprofessional.com/EPUB). It is Rumpelstiltskin from Chapter 6. This time, the font is encrypted. If you extract the EPUB and remove the TrueType Font file, you cannot open it in a font viewer. If you open the EPUB in a reading system that supports embedded fonts, however, you should be able to view the fonts.
Once the font is “mangled,” as it is called, the encryption.xml file is created in the META-INF directory. The file has the following contents taken from c07-01.epub:
The contents are similar to those we saw in the encryption.xml example. The encryption method is listed as a website address of http://www.idpf.org/2008/embedding. This means the font-mangling algorithm is from the International Digital Publishing Forum (IDPF).
The only file being encrypted in this case is the medi - best Ruritania.ttf file in the OEBPS/Fonts directory. Multiple font files can be encrypted, or none at all—in this case, only one was embedded in the file.
To encrypt fonts, you need to open your EPUB file in Sigil. Once the font is added, right-click the font and select Font Obfuscation. After the menu opens, select Use IDPF’s Method. The font is now encrypted; make sure you save the file.
NOTE
If the EPUB is opened by anyone using Sigil, they can turn off the obfuscation and then extract the font.
Signing an EPUB
EPUB files can be signed to let a person know when they have been modified. Specific files within the EPUB can be entered in the signatures.xml file and checked for validity. If the files are not valid—that is, they have been modified—then the reader knows the publication has been changed.
Signing the files requires the use of the Digital Signature Algorithm (DSA) standard. The standard uses the Secure Hash Algorithm (SHA-1) to generate a value representing the files.
Secure Hash Algorithm
A hash is used to determine the contents of a file. The hash is a string of characters that is generated by running a file through an algorithm. The resulting value is the digest or hash value. A file can be checked by the reading system by generating another hash value and checking it against the hash value in the file. If the values match, it can be concluded that the file is intact and not corrupted or changed.
The digest with SHA-1 is 160 bits, or 20 bytes, or 40 hex characters. The resulting value, or hash, produced by the algorithm on the data is used to verify the current data against the original. Every time the hash is produced, it must always be the same, unless something within the data has changed. In this way, SHA is used to determine that files have not been altered, deleted, or added. Some EPUB security systems may allow for alteration, addition, and deletion of certain files, while others may not.
NOTE
The 20 bytes or 40 hex characters may look strange. One ASCII character, or byte, has a value of 2 hex characters. For example, the letter A has an ASCII hexadecimal value of 41. So A is 1 byte, and in hex, the 41 is 2 bytes.
signatures.xml
First, let’s start with a sample file from the IDPF:
The first line shows the file is an XML file and uses the namespace of xmlns="urn:oasis:names:tc:opendocument:xmlns:container".
The second line is <Signature Id="AsYouLikeItSignature" xmlns="http://www.w3.org/2000/09/xmldsig#"> and shows a signature ID. The signature ID matches the following line from the OPF file: <link rel="xml-signature" href="../META-INF/signatures.xml#AsYouLikeItSignature"/> located in the <metadata> section.
The <SignedInfo> section references the signed data and specifies the algorithm used to sign it.
Following this, we come to the information on the signature schema: <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>. The algorithm specifies the methods and parameters to generate a signature.
An important line is <Reference URI="#AsYouLikeIt">. The ID makes a reference to the upcoming manifest section, which contains all the files being signed.
After we specify the signatures schema, we need to specify the digest method that is being applied. The line is <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/> and shows that SHA-1 is being used.
The following line lists the digest value. For SHA-1, the value will be a 40-hex byte string of data. For example, the <DigestValue> could be D4BC0B471C65C3013D1247DE19A55B8DB981503C.
The next line is the <SignatureValue>, which is a binary-to-text encoding of the signed information (<SignedInfo>) section data.
The next section is the <KeyInfo> section:
This section contains the information on the public key used to decrypt the data. The sections <P>, <Q>, <G>, and <Y> contain the elements of the public key or certificate. It is up to the application to check the validity of the key and the signature file. Once this information is verified, the signature information can be used to check the specified files and verify they have not been modified.
Finally, we get to the <Object> section, which contains the list of the files that are signed. The manifest is listed here and has the ID as referenced by the Reference URI from the <SignedInfo> section. The manifest section begins with <Manifest Id="AsYouLikeIt"> and contains references to each file signed.
The files referenced are listed one at a time, as shown:
The file signed here is the OPF. The file structure shows it is located within the OEBPS folder and called As You Like It.opf.
The transform algorithm is the same value as the <CanonicalizationMethod Algorithm>. The section has requirements for line order, line spacing, etc. Again we come to a digest method algorithm that specifies the use of SHA-1.
We complete the section with the SHA-1 value of the specified file. For the sample file from IDP, the As You Like It.opf would be F90F4B008E059569C8EB73DAD63D4EDB7580283E.
There will be one <Reference> section for each file that is signed.
Validation
An EPUB, when opened by a reading system, should have the signatures checked to verify that all files match the signature. When they do match, then the files have not been modified or corrupted. The files are intact as they were when they left the distribution server. Keep in mind that not all systems may validate the files against their signatures.