`7 What Are We` `Really` `Trying to Preserve:` `The Original or the Copy?`

There is a complex relationship between preservation and copying. We “preserve” some objects by making “preservation copies.” In other cases we preserve copies because either they are the only versions of items that exist, or else the copies have their own value. There is some evidence to suggest that in the digital realm, our relationship to copies and copying may be evolving. This chapter will consider some of these issues, with the exception of copyright, which is beyond the scope of this book—although copyright reform will certainly be a monumental challenge. Copyright influences how we produce, access, and use information, but here I am interested in considering the essence of copying itself.

Hillel Schwartz writes that “copying makes us who we are.… Cultures cohere in the faithful transmission of rituals and rules of conduct. To copy cell for cell, word for word, image for image, is to make the known world our own.”¹ In a series of essays about twins, doppelgangers, self-portraits, decoys, and other forms of replication, Schwartz makes the case that copies are not inauthentic and are part of our cultural inheritance. He has a point. We learn by copying: infants mimic the sounds of their parents, school children copy sentences from the blackboard, art students copy the paintings and sculptures of well-known artists, forgers study how to copy signatures. Copying can be kinetic, even reflexive. It is an educational tool as well as a means of deceit. We are all shaped by it.

Copying is also the means by which we transmit information and knowledge; it “transforms the One into the Many.”² The earliest transmission of knowledge was oral; people memorized stories that were passed down from generation to generation. Sometimes the person who memorized the stories copied or mimicked the ways in which storytellers conveyed them. Once writing was developed, oral culture could be transmitted through writing (though oral transmission continued). Texts were copied from manuscript to manuscript, and later from manuscript to printed editions. (Editions were also copied.) There were professional copyists as well as other trades and professions whose work was deeply connected to copying, the aim of which was to preserve and transmit: archivists, scribes (also known as scriveners), notaries, tachygraphers, university stationers, secretaries, and court stenographers. Stenography transfers spoken word into printed words; the transfer of the oral to the written continues to the present. Drawings were copied, too; one such copying device, the pantograph, was created in 1603 for draftsmen. It is still in use for many applications.³

Other technologies have evolved to simplify copying. In the late eighteenth century, the invention of the copy press made it possible to reproduce holographs.⁴ Carbon paper was invented in 1801.⁵ In 1839, Henry Fox Talbot, one of the pioneers of photography, created negatives by using paper soaked in silver chloride; the images he copied were fixed with salt. This made it possible for him to copy his photographs. Soon after, microphotographic processes would be developed that could be used to copy texts as well; by the mid-twentieth century microfilms of newspapers were being created.⁶ Many kinds of mimeograph and photocopy machines were developed at the end of the nineteenth century and throughout the twentieth. Xerography, developed in the 1930s, was the most pervasive copying device of the second half of the twentieth century.⁷ Today scanners make it possible to create digital copies from paper or film. Digital texts and images can also be copied onto a variety of platforms. For example, cutting and pasting digital information from the Web to a Word document is common practice.

It may be convenient for us to be able to make copies easily, but how do we assure the accuracy and veracity of a digital document that has been copied? (See “It Takes a [Virtual] Village,” chapter 8.) And the source document itself could be constantly changing. By the time this book is published, figure 7.1 will no longer be accurate: the Wikipedia entry will probably have been revised at least several times. There is a lack of fixity with Web content.

Figure 7.1

Cut-and-paste fragment from the Wikipedia entry on “Copying,” in the original Arial 10.5 typeface

Finally, we can now copy onto the cloud. Cloud computing uses a network of remote servers hosted on the Internet to store, manage, and process data. This approach is an alternative to saving information on a PC, local server, or network. The implications for this new approach to copying are not yet known—as perhaps is implied by the word cloud. Clouds float in the sky, but they may obscure what we can see. And they may break up and disappear, just as so many companies that support cloud content will do. What would John Ruskin make of cloud content? He saw clouds as holding meaning for him, as I explain in chapter 9. He would want us to save in original form as much as we can of our cultural heritage (figure 7.2).

Figure 7.2

Cloud computing. Graphic by Vanessa Reyes, used with permission

Copying may be an essential aspect of communication, but there is an inherent tension between fixity and evanescence. In many instances, copying causes errors and information is often transmitted incorrectly or lost. Errors may never be caught, yet copies survive and can themselves be reproduced ad infinitum. This necessarily raises issues of accuracy, reliability, authenticity, integrity, and security. While the threat of information loss has always existed, there is quite a bit of anxiety about it today in part because we can lose a great deal—and quickly. (For example, who hasn’t lost data in transferring something from one version of a mobile phone to another?)

However, authenticity and reliability have always been considered with regard to copies of documents. Courts of law must often assess the authenticity of documents presented as evidence. In the United States, the Federal Rules of Evidence spell out the requirements that copies must meet to be admissible in court. For example:

When the only concern is with getting the words or other contents before the court with accuracy and precision, then a counterpart serves equally as well as the original, if the counterpart is the product of a method which insures accuracy and genuineness. By definition in Rule 1001(4), supra, a “duplicate” possesses this character.⁸

A copy of a public record is admissible as follows:

The proponent may use a copy to prove the content of an official record—or of a document that was recorded or filed in a public office as authorized by law—if these conditions are met: the record or document is otherwise admissible; and the copy is certified as correct in accordance with Rule 902(4) or is testified to be correct by a witness who has compared it with the original. If no such copy can be obtained by reasonable diligence, then the proponent may use other evidence to prove the content.⁹

There are other venues in which the reliability of documents is important: for example, in determining whether something is a fake or forgery, and in research. The methods of determining/establishing reliability and authority differ according to the context. In a court of law, for example, a document may need to be sealed or notarized. To determine whether a work of art is a forgery, one may need to study the provenance of an item. One can also draw evidence from the object itself. Paper, ink, and pigments can be tested for age and for chronological authenticity. This is the domain of experts such as bibliographers, conservators, and scientists.

Digital documents can also be authenticated using techniques such as digital watermarking and other means of content protection such as steganography, which conceals information within other text or content.¹⁰ Important texts, such as legal documents, are more likely to be saved than are, for example, informal communications. Because this is so, resources will be expended to assure digital fixity by every possible means: watermarking, migration, emulation, redundancy, and so on.

At the same time that we are anxious about losing information, our tolerance for digital errors seems to be increasing. Here is one example. For a decade, beginning in 2005, Google engaged in a massive digitization project—usually referred to as the Google Books Library Project or Google Books Project—with large libraries in the United States and Europe; some twenty million books were scanned. The Internet Archive has also engaged in large-scale digitization; to date it has scanned 2.4 million books; approximately one thousand per day.¹¹ Paul Conway studied the error rate in digital surrogates created by Google and deposited in the HathiTrust Digital Library. At the time of the study (2011) he found that errors occurred in about 1.25 million volumes, or roughly 12 percent of the HathiTrust corpus.¹² A 2007 study by Paul Duguid presented some of the inherent difficulties in scanning books that he concluded contributed to errors.¹³ He examined some early editions and copies of The Life and Opinions of Tristram Shandy, Gentleman, by Laurence Sterne, a book well known for its unconventional typography and layout. The book contains a black page, a marbled page, blank chapters, arrangements of asterisks on pages, and so on. The copies that Duguid examined were missing some of these features, as well as volume numbers, and other important information.

An early digital library was Project Gutenberg, which was started in 1971 by Michael S. Hart, who, with his volunteers, entered texts by hand in plain text using ASCII. A plain-text version of the bibliographically complex Tristram Shandy could not even capture all of the text, let alone the special typographical elements. For example, the two lines of Greek text on the title page were transcribed as

(two lines in Greek).

Duguid assumed that the more innovative Google Books Library Project versions would be more sophisticated than the Project Gutenberg version. While scanning is faster than hand keying the text, and results in a visual version, thus seemingly more accurate, the scanned copies simply introduced new problems. Yet, according to Duguid, many people insist that “innovation should supersede inheritance.”¹⁴ Implicit in this statement is that the new is an improvement over the old. But one might equally hold that the new causes us to lose some of the quality of the old.

It is clear that many technologies have given us tools for copying, but they have also given us tools for deception—hence the anxiety about the accuracy and authenticity of copies. Likewise, some copying technologies engender anxiety based on their impermanence. Paper and high-quality microfilm have long lives; digital technology—still primarily driven by commercial interests—has impermanence built in. Planned obsolescence keeps the creators of hardware, software, and new devices profitable. And yet we are dependent on this technological infrastructure for copying, preserving, and transmitting information. Obsolescence keeps us anxious when our responsibility is to maintain long-term access to authentic, authoritative, and reliable documents.

From the beginning of the Google Books Library Project to the present, I have polled the library and information science students in my preservation courses about acceptable error rates in digitization projects. I ask them one question: “If you were managing a large-scale digitization project in your library, what error rate would you be comfortable with?” A decade ago, the answer ranged from 2 to 6 percent (occasionally someone would say that 0 percent was acceptable). The percentage has gradually increased. When I asked the question again in 2016, the average was 10–12 percent, with one student stating that she would be comfortable with a 15 percent error rate. While this is not a methodological approach, and the number of students that I have surveyed is not large—350 students in my preservation classes over 11 years—it may indicate, for error acceptance, changing attitudes among budding librarians and archivists. As one student recently said in class, “What choice do we have?” Another student wondered whether the net gain of having so many items available online might not offset the loss. These people now manage or will soon be managing digitization projects. Has the error-tolerance rate gone up? If so, what are the long-term implications for future scholarship? And what does this tell us about people’s attitudes about preservation? If the digitizers of the future are comfortable with a 12–15 percent error rate, how much of our cultural heritage will we lose? I could not identify any studies that have examined how attitudes are changing so it is not possible to know how (or if) such changes will impact future digital projects.

What does this rising tolerance for errors, which I experience with my own students, tell us of our standards for scholarship? One scholar, needing a particular page in a monograph (cited in a text she was working with), sought out that volume in Google Books, only to find that the very page that she needed had not been copied in the scanning project. A 12 percent rate of error (Conway’s calculation) may just as well have been a 75 percent rate. If this were medicine, in which lives are in jeopardy because of errors, the percentage would be much closer to zero. Errors in copying create problems where they do not exist in the originals—a strong argument for keeping originals at hand, even when they have been copied.

One other thing can be maddening. Examining the library literature on preservation microfilming from the 1980s to the early 2000s, I have observed that the declared tolerance for filming and scanning errors has remained low. For example, the 2001 specs for preservation microfilming at one institution stipulated that there not be any filming errors. A 2006 article by Karen Coyle about scanning pointed out that even if a top-of-the-line scanner was 99.9 percent accurate, you would still average one error per page.¹⁵ Yet Conway and others have shown that error rates have been higher than the error-rate tolerance recommended in the professional literature. Conway writes that “to preserve the products of large-scale digitization is a decision to preserve imperfection.… For after all, preserving imperfection is an acknowledgement of the deep relationship between the material nature of our print culture and the equally certain physical aspects of our digital world.”¹⁶

Conway’s study demonstrates that whatever the tolerance rate for errors is, the actual error rate is much higher. My interactions with students indicate that today’s information professionals are comfortable with higher error rates than professionals were a generation ago.

Does that mean that there is a tolerance for imperfection in copies? That is not an easy question to answer. More likely is that we expect less of copies. Most art museums now allow visitors to take pictures of objects on display so long as flashes are not used. The pictures that one takes with a smartphone camera can in no way match the high-quality reproductions that a museum can create. But for most people that is just fine—they are trying to capture the moment rather than venerate or formally record the work of art.¹⁷ This suggests that copies are indeed their own genre, as Schwartz proposes. If that is the case, then there is no reason for a copy to be compared to an original, except in a court of law, an auction house, or other particular contexts.

Schwartz tells us that “practical distinctions between the unique and the multiple have historically been entrusted to theologians, notaries, connoisseurs, and curators.”¹⁸ In the digital realm, copying may have taken on its own identity. We may further ask whether copying adds value to the original because it is worthy of being copied. If that is the case, what is the value of the copy itself?

The value of the copy is also determined by the culture in which an object is created or exists. For some, copying may have no legitimacy, while for others the original may not be important. The context for copies—and their preservation—is essential to the stewardship of cultural heritage.

Anything that is not copied risks disappearing. (Again, in some cultures that is accepted and expected.) Schwartz and Conway remind us that copying is ultimately imperfect, but since time immemorial it has been a consistent approach to preservation. Today copying is particularly useful for low-value materials, or any items that are published or created on poor-quality paper. We must fully understand, however, what we are losing by making copies. While creating surrogates is considered by many to be one aspect of preservation, it will never be an ideal solution.

This is a book about our cultural heritage—its monumentality, and the monumental effort we must take to preserve it. Our heritage exists in and is manifested by innumerable “objects” in the analog world, and also, increasingly, “things” digital (converted to digital form or born that way). These objects and things reveal our cultures in many ways. To preserve what they stand for, and to allow them to continue to reveal and represent our culture, we try to maintain the originals. For some things, like books and some prints, for example, we may rely on multiple copies. But for other things—paintings, manuscripts, sculptures, buildings, bridges, certain textiles—there will be only single, unique exemplars. If anything is threatened—or is made uncertain for any reason (as I have discussed throughout this book)—the cultural information it contains may be saved in the form of a copy. But as I have shown, copies are imperfect; copies may themselves be evanescent; and copies may be unreliable or inaccurate or inauthentic for several reasons.

There are untold numbers of these “things” in the world—things revelatory of our culture. What we make copies of in the name of preservation is tremendously problematic. What do we copy? Who will decide and what medium will be used? Is that medium reliable, tamperproof, and affordable? Who will pay for it? Do we rely on analog methods of copying or can we trust digital methods? Are digital copies really preservation copies? Given the profit motives of those driving digital technology (with their constant updating of hardware and software), what kind of reliability and longevity (and long-term access) can we expect from digital copies made today?

These are but a few of the myriad questions we could (and must) ask if we look at copying as a form of preservation. As I said earlier, making surrogates will never be an ideal solution. The task of doing so is beyond monumental. But we must face it nonetheless or we risk losing key elements of our culture before we realize they are gone.

Notes

7 What Are We Really Trying to Preserve: The Original or the Copy?

`7 What Are We` `Really` `Trying to Preserve:` `The Original or the Copy?`