Using Massive Digital Libraries: A LITA Guide

—William Shakespeare, The Tempest

Defining the Massive Digital Library

This chapter examines the intellectual and theoretical assumptions about digital libraries to explain the need for establishing the massive digital library (MDL) as a differentiating concept. In previous publications, authors introduced this term to describe the digital libraries they analyzed in their work, including Google Books, HathiTrust, Internet Archive, and the Open Content Alliance (Weiss and James 2013a, 2013b). This term has proved to be useful in distinguishing the subjects from digital libraries in general, but further elucidation is warranted.

This brings us to the question, how does one define a massive digital library? To answer this question in depth, we must first look at the development of the digital library from the twenty-first century onward through multiple perspectives: collection sizes, methods of acquisition and collection development, collection content, collection diversity, content accessibility, metadata, and means of digital preservation.

There are fundamental questions to be explored, as well, such as the following: What is a library? How is an MDL different from the libraries of the twentieth century? Libraries are created to serve particular communities often limited by geography or intellectual discipline, but MDLs offer the promise of transcending these boundaries. Does size mater? If an MDL has some twenty million books but lacks the specific book a reader wants, of what worth is it? More broadly, if an MDL does not have significant coverage of a topic or subject, is it really better than a collection that focuses more narrowly?

These questions are not easy to answer.

Foundations of the MDL

A Resounding Announcement

In late 2004 Google made its well-documented “resounding announcement,” as Jean-Noël Jeanneney described it in his 2005 book Google and the Myth of Universal Knowledge (published originally in French, and then in 2007 translated to English), to digitize millions of the world’s books—including works still in copyright—and to place them online. Jeanneney and others took the Google announcement as a wake-up call for European countries to catch up to the US company, whose motives were seen as not entirely trustworthy.

Nearly ten years on, however, it is hard to imagine as we ride the full wave of Web 2.0 dominated by Google and Facebook that their desire to create an online digital library should have come as such a shock. It is not as if Google altered or reinterpreted the fundamental concepts of the digital library or electronic document delivery. In looking at the development of the digital library, most of its ambitions, as well as the procedures to do so, had been either explicitly stated or hinted at in the various “library of the future” ventures that had begun as early as the 1950s and 1960s (Licklider 1965). Yet the shock and awe of Google’s announcement caused significant hand-wringing and soul-searching at the time (Jeanneney 2007; Venkatraman 2009).

It is more likely that Jeanneney and others reacted to being caught flat footed and falling behind in terms of organization and ambition. The pushback was partly one of conservatism—not in the US political sense of the word, but in the urge to preserve current cultural values—and a distrust of the ways in which US-centric capitalism creates huge shifts in society and leaves many, especially those in other countries, in the lurch. There was also a quite justifiable realization that the social construct of the library itself, and the social contract upon which it has been built, could be endangered by such destabilizing projects.

Indeed, Google’s main stated goal for the project, “to create a comprehensive, searchable, virtual card catalog,” hints at deep shifts in the impact of the digital economy and capitalism itself.1 An interview with Jaron Lanier, the computer scientist who coined the term virtual reality, drives home how revolutionary the Google Books initiative might really be for capitalist-driven societies:

At the turn of the [twenty-first] century it was really Sergey Brin at Google who just had the thought . . . if we give away all the information services, but we make money from advertising, we can make information free and still have capitalism. But the problem with that is it reneges on the social contract where people still participate in the formal economy.2

As the formal economy—a regulated and documented exchange of goods, services, and labor—gravitates toward the informal economy of social media, where a clear exchange of money for goods and services created is not visible and users often provide the unpaid labor, the gestation and formation of new digital initiatives has the potential to be highly disruptive.

It is hard not to see the current rush to create similar digital library projects in this light. Although it is outside this book’s overall scope to deeply examine these shifts, the informal economy’s foray into digital libraries will surely influence whether bricks-and-mortar libraries remain relevant in the future information society. If people freely provide labor or tasks similar to what librarians traditionally have done, even if they are ultimately inferior in quality, then librarians will eventually become obsolete.

By the time Google made its proclamation to enter the digital library world, ambitiously taking on the task of digitizing millions of books, it was clear that digital library projects would be entering a new, third phase.3 In Hahn’s (2008) eyes, it is because companies that had never before stated an interest in joining the scholarly publication community (namely Google, Yahoo, and Microsoft) suddenly entered the fray that this major hand-wringing from the traditional stakeholders in scholarly communication resulted (i.e., libraries, publishers, scholars, cultural ministries, and governments). Where the previous interest of such companies was mainly in platforms, networks, hardware, and software applications, they now began to shift toward the “informalization” of previously unavailable and restricted content. Ironically, without the major involvement of companies and the gathering of various stakeholders the future of the digital library is very much unclear. Yet it is also compromised with them involved. The confluences of culture, technology, community, services, and content itself are gathering together and barreling forth into the future, but their outcomes are very much unresolved.

The issues of information integrity and accessibility raised by the increasing economies of scale evinced in these new businesses entering the digital library game were addressed in part by the “very large digital library” (VLDL) movement, a precursor to the MDL in terms of nomenclature and classification. This movement took up the task of attempting to define and delineate some of the projects described here. From 2007 to 2011 workshops on VLDLs took place, attempting “to provide researchers, vendors, practitioners, application developers and users operating in the field of VLDLs with a forum fostering a constructive confront on the issues they face and the solutions they develop.”4 This theoretical approach includes examining “foundational, organization and systemic challenges” of digital libraries and their issues related to scalability, interoperability and sustainability.5 However, the movement does not seem to have progressed beyond the most recent workshop, which was held in September 2011 in conjunction with the International Conference on Theory and Practice of Digital Libraries.

In some ways, the VLDL concept anticipates the definition in this book of the MDL, yet the term does not seem to have caught on, nor does the movement appear to be having much impact. Some of the problem may be that advocates for the movement admit to not having a “consensus yet on what a VLDL is” beyond a description of digital libraries that hold more than a specific amount of digital information.6 Additionally, they appear to be approaching the problem purely from a computer science and computer engineering perspective, which may be limiting the appeal of the movement.

Giuseppe Amato’s discussion of the social media digital-image site Flickr in the presentation “Dealing with Very Large Visual Document Archives” is an example of this.7 While the work is of high caliber and provides excellent suggestions for utilizing shape recognition for searching images, it does make some missteps in terms of defining Flickr as an archive. It is not. To conflate the very specific LIS definition of an archive with that of an online social media image and content management system is to ignore many of the important tenets of librarianship. This makes the adoption of a term much more unlikely by a wider audience when boundaries and distinctions are not as clear as they could be.

Importantly, the VLDL movement does look at concepts such as volume and velocity, and variety of collections, yet its advocates admit that “there is not yet any threshold or indicator with respect to these features . . . that might be used to clearly discriminate very large digital libraries from digital libraries.”8 Therefore, incorporating some of the traditional aspects of librarianship, including both the digital and the bricks-and-mortar variations, will not only expand the interest in such MDL systems but also help to include predefined conceptions of archives, libraries, and asset management systems. These characteristics will help to define a new class of digital library.

Looking at Past Models as a Framework for MDLs

In many ways MDLs are the logical progression of the first digital libraries. In 1995 Fox and colleagues made the point that the digital library was defined differently by different constituents. The MDL may be no different. First, many have seen it as a result of new library functions, and Akscyn and colleagues (1995) note well the hyperbole that occurs when new technologies appear. New technologies tend to lead people to decide that the previous technologies are doomed to be replaced completely. But this has not been the case with television, radio, CDs, LPs, and other mature technologies. The same will be the case with digital libraries even as the massive digital libraries begin to aggregate more content and replace some of the value that users have come to see in them (Ishizuka 2005).

There is, of course, the danger that massive digital libraries will develop into something that does not serve the user in the same ways that more traditional libraries have. New technologies often bring forth conflicting feelings of excitement and trepidation among users. Professional librarians experience this acutely when they see the potential benefits that MDLs and other digital libraries promise, yet there is an underlying perceived threat to their profession.

In the early days, people viewed digital libraries as merely computerized versions of the physical library. Automation would fulfill this criterion quite easily. People also saw the digital library as offering new information resources. These new types of resources turned out to be a combination of various file formats, including video, audio, and web pages. Along with these definitions, we are also confronted with the digital library as the basis for new approaches to classification and cataloging. The Dublin Core Element Set, with its series of extensions and elaborations—Goddard Core and Darwin Core, for example—fits within this new type of library.

The digital library also suggests that there will be new modes of interaction with patrons. Instead of purely a face-to-face interaction or a physical locality defining the user (i.e., a community of users defined by proximity to a physical building), users and target audiences are spread out and further diffused. The result is a wider profile of users, which will ultimately affect collection development decisions and policies.

Mass digitization projects have been around for some time. The digital initiatives at the Library of Congress American Memory project, for example, and numerous long-term archival digitization projects provided the blueprints for large digital creation from the 1990s.9 Institutional repositories have begun to create larger collections. According to current statistics from the Registry of Open Access Repositories (http://roar.eprints.org), the largest institutional repositories, or IRs (e.g., Networked Digital Library of Theses and Dissertations Union Catalog and Virtual Library of Historical Press) each contain more than a million items. However, as Schmitz (2008, n.p.) writes:

Mass digitization and IRs fall on a single continuum of resources, yet they differ in many ways. Most notably, IRs provide scholars an opportunity to add to the body of recorded knowledge through publishing, while mass digitization makes a large existing corpus of printed literature available to scholars for use in their work.

Though the stream of scholarly output converges in the “digital middle,” this conflation of content and format does not equate uniformity of approach. The institutional repository and mass-digitization project as exemplified by the MDL do not necessarily share the same approaches or values, despite that both are online digital platforms aimed at sharing content with users. Philosophies nevertheless diverge where digital formats might converge.

The decisions and policies for acquisition of materials changes and requires a new approach, such as sharing online content or subscription to materials. The model is altered for digital materials because of the issues regarding copyright, which had been developed over centuries through printing press culture.10 In a short time the issues of copyright became complicated by the blurring of lines between copying, distributing, and disposing of digital content, which persists as a perfect copy (this issue is discussed in chapter 5). The current rights issues related to streaming truly redefine the digital library, especially those that include video, audio, and other multimedia formats.

With all these new approaches, the most fundamental service of the library, preservation, needs further solidification. New methods of digital storage and preservation are still being developed and must be tested over longer periods of time in order to start the process long-term accessibility and viability. The cloud storage services and LOCKSS initiatives that currently exist, including Amazon Glacier and MetaArchive, to name two currently popular choices, are still largely unproven. It remains to be seen whether they will survive longer-term shifts in the information technology landscape.

There is ever more reliance on portable electronics, systems, and networks. The smartphone and tablet have become important and ubiquitous information tools. Their quick adoption has altered how people approach the web and their digital libraries. Indeed, some universities are wholeheartedly adopting the mobile revolution with their own initiatives. California State University, Northridge, for example, implemented its myCSUNtablet initiative, which provides Apple iPad tablets to all students.11 The iPad is provided at regular price, but the university funds new e-textbook development for the iPad that will create a “cost-neutral effect” for students.

Shifts in intellectual, organizational, and economic practices have occurred with digital libraries as well. End users’ search behaviors have changed; people’s reactions to information overload have been analyzed; organizations have become less centralized, and their services can occur in various locations; and new digital economies have shifted whole business models to meet the dominance of Facebook, Google, and other social media and web technologies, especially in the case of printing and publishing.

Looking at things through this lens, we can see that MDLs are more than just “larger” digital libraries of aggregated content. They have addressed and even solved many of the same problems as smaller digital libraries, but they have also developed a set of their own.

Characteristics of MDLs

If an MDL is more than just a supersized digital library, then what is it, exactly? What characteristics do we need to pin down to define one? Furthermore, why do we need to provide different nomenclature to describe it?

Thomas Kuhn (1970), in his groundbreaking work The Structure of Scientific Revolutions, discusses how scientific theories and concepts change over time. He argues that the change is neither a gradual one nor a progressive one. In many ways change in scientific models is revolutionary and therefore a complete alteration of everything people saw and experienced in the past. In his parlance this is a “paradigm shift.” Though this terminology has been appropriated by business jargon and has of late been applied inaccurately in popular discourse, the shift is an important way to conceptualize what is also occurring in information science.

Kuhn (1970, 110) writes:

No paradigm ever solves all the problems it defines and . . . no two paradigms leave all the same problems unsolved. . . . Like the issue of competing standards, that question of values can be answered only in terms of criteria that lie outside of normal science altogether, and it is that recourse to external criteria that most obviously makes paradigm debates revolutionary.

In other words, scientific paradigms are by their very definition working models attempting to parse reality, yet at the same time they remain rule bound and are therefore able to pose and answer only questions that fall within the parameters of those rules.

To think outside of that framework is to court contradiction and confusion in the holder of the paradigm. As a result, paradigms often cannot overlap, and are often at cross-purposes, because they question different things and look for answers in different ways (Kuhn 1970).

In defining MDLs, taking a page from Kuhn’s model, I have developed the following list of criteria and characteristics:

Collection size: Ranging from five hundred thousand to more than one million items

Acquisitions, collection development, and copyright concerns: Numerous partnering members contributing print book content that may or may not have copyright clearance

Content type: Mass-digitized print books or similar volume-centric holders of information (e.g., encyclopedias, atlases)

Collection diversity: Diversity dependent upon self-selected partnering members

Content access: Degrees of open access; single interfaces, such as search engines and portals, representing all the collections

Metadata: Gathered and aggregated from multiple sources, with a reliance on new schemas

Content and digital preservation: Large-scale preservation by consortium members

Collection Size

Scope

When defining something as an MDL, one important consideration is size. Generally, the massive digital library is one that has aggregated or gathered content from numerous sources into one web-defined area or domain. The largest MDLs work on a much grander scale. Google Books, for example, contains nearly thirty million books in its collection. HathiTrust, a consortium of several US-based and European universities, contains close to eleven million digital books, with 30 percent of those in the public domain. The numbers of materials at least partly accessible in MDLs range from the low hundreds of thousands of items (e.g., Internet Archive) to the tens of millions (e.g., HathiTrust, Google Books). Eventually, these will possibly reach hundred millions and more. In contrast, subject and institutional repositories are smaller in size. For example, MIT’s DSpace repository, the pioneering institution of the DSpace software and IRs in general, contains only sixty-three thousand items, thirty-five thousand of which are electronic theses and dissertations. University of Kansas, another “top 100” institutional repository in the United States, houses only around ten thousand items.

Differentiating Issues

The sheer size of these collections raises issues that do not affect subject and institutional repositories or digital libraries and collections built on a smaller scale. Issues of findability and collection development become much more fraught with peril as subject and institutional repositories nonetheless remain small in scope and therefore more manageable. But what happens when collections get too large and need new approaches to confirm collection sizes? What if they are too big to fail, like the Wall Street banks of the 2008 recession? In this case “too big to fail” just means too big to hold accountable. In many ways the MDLs could be enamored of their own size. Other issues regarding size include the speed with which retrieval is possible and the increased amount of technological overhead needed to keep the project online and accessible.

Size is not in itself a good thing, either. The utility of a collection to its patrons matters more. Having access to the metadata records and full-text searching capability of some millions of books does, however, present a compelling benefit. WorldCat has a huge database of items, but it lacks the full-text searchability of Google Books.

WorldCat was developed in an era when it was considered “good enough” that a library catalog contained a few basic metadata elements, such as title, author, and subject heading. These ideas were based on what was possible to accomplish with a card catalog. MDLs like Google Books illuminate a new way forward in which we can dig into the content of the book, look at things like term frequencies, and rank the results list on the basis of those.

Acquisitions, Collection Development, and Copyright Law

Scope

The traditional model of library acquisition has been to purchase a book itself from a vendor or solicit the means to acquire books via donors. This worked in the print world quite well, with copyright laws providing the “exhaust rule” to allow libraries to accept, pass along, or dispose of books however they felt was appropriate. Libraries could band together quite easily and trade or lend books to users outside their main user base. Although digital versions of these materials are more easily shared and aggregated, current copyright law does not provide as much leeway for sharing these resources. Aggregation of content via multiple “online” partnering institutions results in some obstacles, especially if the traditional pillars of libraries, including fair use and section 108 of the Copyright Act, become weakened by such things as the Digital Millennium Copyright Act (DMCA) or a future full-scale revision.

In this regard, the MDL approaches content aggregation less like a smaller digital library and more like an online digital aggregator of journal and serial content. The resource type for the MDL is still generally the old-fashioned book, a technology perfected over the centuries after Gutenberg’s press, but in a new digitally transformed format. The expression “new wine in old bottles” is almost inverted here, yet the new containers have the ability at times to transform the original content and add value to it, despite the movement from a three-dimensional type of media to a two-dimensional one. It is no secret that a digitized version of a master’s thesis, for example, will receive new life in the digital milieu.

Differentiating Issues

In many ways the massive digital library is moving in tandem with the development of the so-called fourth paradigm—data-intensive research. The acquisition of the book is becoming less important than the large amounts of data and the large-scale trends that can be derived from former print-based cultures. Moving large data sets from print to digital, and moving large text sets, like a book to digital, opens up amazing possibilities for the mining of data, not only in the sciences but also in the humanities. One can search, for example, the frequency that terms appear in the Google Books corpus and track those through data visualizations over time. HathiTrust has a similar search as well. The smaller-scale digital library does not have such a robust capability for data mining.

Smaller digital libraries have also focused generally on the content that exists within their physical holdings, though with some creating limited consortia to distribute digital content—mostly archival materials, university publications (e.g., yearbooks), and other generally public domain materials. However, MDLs have established partnerships to increase the totals of their online digital-print collections, but at the risk of stretching the boundaries of ownership and acquisition. While this works for offline content, since copyright law allows it, copyright law does not explicitly support sharing copies online.

The model of aggregation works well enough for journals and article databases since the publishers themselves have developed the content. However, licensed material comes at an expensive cost to libraries and increases yearly. It may be that MDLs will eventually have to abandon open access and move to pricing models to be sustainable and to avoid legal entanglements.

Collection Content

Scope

Currently, the exemplars of the MDL field focus purely on the mass-digitized book or similar volume-centric collections of print materials. Generally, the books come from various MDL partnerships, including universities and large public libraries across the United States. Google Books, for example, digitizes the books from about twenty to twenty-five academic and public library collections from across the United States and internationally. A large amount of the books digitized are in the public domain and help users unable to access special collections or archives at institutions.

Differentiating Issues

In many ways MDLs should be able to expand their content types very quickly. Instead of relying purely on the digitized print book collections, some will move into the areas of audio; video; and alternative publications, such as atlases, musical scores, and the like. Some MDLs already focus some of their collections on materials such as web resources or multimedia digital files (e.g., Internet Archive’s Wayback Machine and Grateful Dead bootlegged concert recordings). It may be possible in the future that updated editions in e-book format or new born-digital e-books may replace or complicate the accessibility of some of the works currently residing within the MDLs.

Collection Diversity

Scope

Similar to the parameter of collection size, the diversity found in MDL collections is significantly broader and deeper than the typical digital library collection or institutional repository. This is largely a reflection of the aggregation of multiple partners. Diversity has the potential to be greater than any other type of digital library, yet it could still be less than that of a research I–level university academic library or large public library’s print collection.

Differentiating Issues

With just a fraction of collections currently digitized and available online and the general primacy of the English language in the main MDLs based in the United States, prioritization of limited resources set aside for digitization projects tends to favor English-language materials. With this emphasis, collection diversity is compromised. Factors such as MDL aggregator partnerships and language-primacy policies of such libraries become paramount considerations. If an MDL partners with foreign universities (as in the case of Google with Keio University or HathiTrust with Biblioteca de la Universidad Complutense de Madrid), the likelihood of broader and deeper diversity increases. But as is discussed in chapter 7, there are still flaws in this approach. Nevertheless, MDLs are notable in comparison to their small digital library counterparts for their ability to increase diversity in subject-matter representation.

Content Access

Scope

Although single search interfaces are nothing new to digital libraries, often the implication with a small digital library or digital asset management system’s series of collections (e.g., a CONTENTdm collection), is that the items are generally available in the same locations, or at least the materials have some physical or locational ties. Provenance in this sense becomes an important defining characteristic of the library and its “new and improved” digital version.

Digital libraries already test the bounds of physical provenance, with many of them aggregating collections on a minor scale. However, with MDLs the sense of physical location is rendered nearly irrelevant. The point of the MDL becomes not adding value to a physical collection that will nonetheless remain tied together (and continue to be meaningful as a physical collection), but obliterating and recombining collections into something malleable and new. The importance of access trumps the need to completely preserve physical provenance.

Along with erasing the existing boundaries, MDLs also provide at least some amount of open access material, especially with books that have fallen into the public domain. As a result, some issues pertaining to open access to copyrighted and orphan works arise.

Differentiating Issues

MDLs can remove the barriers between systems, but they can also hide the originating sources. Sometimes, then, the contextual meanings derived from a collection of materials can be lost, though this is arguably more a concern with archives, which function on the principle of provenance. At the same time, meaning need not necessarily be derived from context. This frees users to recombine knowledge by surpassing the boundaries of subject fields, classifications, and taxonomies.

Open access becomes a moral imperative when dealing with public domain books. Many of the works digitized by all MDLs are openly accessible, but errors and mistakes have been found in studies of Google Books by the authors (James and Weiss 2012). Orphan works by far represent the thorniest of all the problems related to digitized book content. Publishers need to revisit some of their policies. As researcher Paul J. Heald shows, there is a huge gap in the amount of books available between the 1930s and 1990s.12 Almost sixty years are missing in disproportionate amounts. MDLs might help alleviate this issue.

Metadata Development

Scope

The MARC record has long been the defining metadata standard for print matter, especially books. In fact, no other metadata is as robust or as finely tuned to the needs of libraries and their printed shelf matter. However, as print formats give way to digital media, the MARC record has not been meeting many of the needs of the digital age. As a result, digital libraries and collections are moving toward metadata schemas more appropriate for digital materials. In particular, the Dublin Core metadata schema has been adopted almost universally for digital collections. Some MDLs are able to aggregate and crosswalk, which consists of mapping similar elements to each other, numerous metadata schemas. This is an important defining feature: the ability to crosswalk multiple metadata to aggregate immense data in its systems. Many systems already are able to do this, including repositories compliant with the Open Archive Initiative’s Protocol for Metadata Harvesting.

Differentiating Issues

One of the biggest problems with digital copies is the possibility of losing the tie with the physical object. Without metadata available to anchor a digital version to the original object, it could quite literally and figuratively be “lost at sea.” MDLs have shown issues with incorporating metadata from various siloed institutions. The crosswalking of aggregated data becomes a major issue when mistakes from the source collections appear. The aggregation is only as good as the source material. How does an MDL, then, deal with metadata quality control? How would an MDL begin to approach such important issues related to metadata and interoperability?

Digital Content Preservation

Scope

One of the main reasons given for digitizing content has been the creation of digital copies that would provide stand in for the original version. Digitization is therefore implemented as a protective measure. Digital preservation as a practice includes refresh, migration, replication, emulation, encapsulation, the concept of persistent archives, and metadata attachment. Digital libraries have attempted to incorporate these practices into their sustainable frameworks and to follow specific guidelines set up by various organizations. The philosophy of long-term preservation of digital content has been one of the guiding principles for the creation of most online digital collections.

Differentiating Issues

In reality, true digital preservation has proved to be elusive. Not all digital libraries exist as preservation platforms. Those that have preservation policies are sometimes unable to follow through because of increased costs or insufficient funding. Some larger initiatives like Microsoft’s Live Search Books and Live Search Academic have been abandoned.13 In small-scale digital collections, issues of digital collection preservation have generally been developed and regulated effectively. Several organizations, including DRAMBORA, Trac, Nestor, and Platter, provide specific guidelines for offering a trusted digital collection.

However, MDLs have also been at the forefront in developing digital preservation methodologies. HathiTrust, in particular, has taken the lead on this. Internet archive and Open Content Alliance have embraced this as part of their missions as well. Issues of digital preservation will need to address scalability and feasibility. It may be necessary for those involved with MDLs to create new procedures for handling the preservation of digital copies of print books.

One unintended problem with the practice of digital preservation is that ultimately some meaning is still lost in the transference from one format to another. Scholars and historians can derive as much from the context and physical materials of a book (e.g., marginalia, binding) as from the written content. On a large scale, such loss of material could be devastating. In chapter 13 this issue is central to the problem of digitizing Japanese-language books. Imagine if scholars were unable to look at the ads in newspapers because the aggregator had excised them from their database—leaving only the text of the article. Much would be lost. In this case, some guidelines on digital preservation of books should be examined with relationship to MDLs.

Conclusion

In summary, these criteria and their attendant issues, though entirely unique to the digital library, may require different approaches when dealing with a massive digital library. The issues involved with aggregating millions of decentralized, previously published print materials into one uniform conceptual space become more complex. It is important to differentiate, therefore, between smaller counterparts, as they are easier to police and analyze than MDLs, especially with regard to metadata uniformity, copyright compliance, and ownership. The larger the institution or system, the more unwieldy and slow to change it may become. As seen earlier, the issues are not entirely a matter of amplification or magnification of a regular-sized digital library.

Furthermore, some institutions deemed “too big to fail” may underperform if users and critics are not vigilant. In the case of corporations and businesses, the market is often determined to be the appropriate judge of an organization success. Efficiency in operations may be important for the bottom lines of for-profit organizations, including for-profit universities, corporate-owned massive open online courses (MOOCs), and traditional publishers. However, in dealing with consortia of public and nonprofit educational institutions, market forces should not be the sole factor determining their overall sustainability, especially when the content is of significant cultural and social value. It is in this vein that a series of criteria for evaluating and classifying the MDL becomes important for helping to preserve and protect long-term access and sustainability.

References

Fox, Edward, Robert Akscyn, Richard Furuta, and John Leggett. 1995. “Digital Libraries.” Communications of the ACM 38, no. 4: 22–28.

Hahn, Trudi Bellardo. 2008. “Mass Digitization: Implications for Preserving the Scholarly Record.” Library Resources and Technical Services 52, no. 1: 18–26.

Ishizuka, Kathy. 2005. “Showing Google the Way.” School Library Journal 51, no. 2: 26–27.

James, Ryan, and Andrew Weiss. 2012. “An Assessment of Google Books’ Metadata.” Journal of Library Metadata 12, no. 1: 15–22.

Jeanneney, Jean-Noël. 2007. Google and the Myth of Universal Knowledge: A View from Europe. Chicago: University of Chicago Press. First published in French in 2005.

Kuhn, Thomas S. 1970. The Structure of Scientific Revolutions. 2nd ed. Chicago: University of Chicago Press.

Licklider, J. C. R. 1965. Libraries of the Future. Cambridge, MA: MIT Press.

Schmitz, Dawn. 2008. The Seamless Cyberinfrastructure: The Challenges of Studying Users of Mass Digitization and Institutional Repositories. Washington, DC: Council on Library and Information Resources. www.clir.org/pubs/archives/schmitz.pdf.

Venkatraman, Archana. 2009. “Europe Must Seize Lead on Digital Books, Urges Reading.” Information World Review, no. 260: 1-1.

Weiss, Andrew, and Ryan James. 2013a. “Assessing the Coverage of Hawaiian and Pacific Books in the Google Books Digitization Project.” OCLC Systems and Services 29, no. 1: 13–21.

Weiss, Andrew, and Ryan James. 2013b. “An Examination of Massive Digital Libraries’ Coverage of Spanish Language Materials: Issues of Multi-Lingual Accessibility in a Decentralized, Mass-Digitized World.” Paper presented at the International Conference on Culture and Computing, Ritsumeikan University, Kyoto, Japan, September 16.

Notes

* Ryan James contributed to this chapter.

1. See the website for Google Books Library Project, at www.google.com/googlebooks/library/.

2. Lanier, quoted in Scott Timberg, “Jaron Lanier: The Internet Destroyed the Middle Class,” Salon, May 12, 2013, www.salon.com/2013/05/12/jaron_lanier_the_internet_destroyed_the_middle_class/.

3. John P. Wilkin, University of Michigan, quoted at “Library Partners,” Google Books, http://books.google.com/googlebooks/library/partners.html.

4. “Workshop Objectives,” Fourth Workshop on Very Large Digital Libraries, September 29, 2011, www.delos.info/vldl2011/.

5. Ibid.

6. Ibid.

7. Giuseppe Amato, “Dealing with Very Large Visual Document Archives,” presentation at the Fourth Workshop on Very Large Digital Libraries, September 29, 2011, www.delos.info/vldl2011/program/1.VLDL2011.pdf.

8. “Workshop Objectives,” Fourth Workshop on Very Large Digital Libraries, September 29, 2011, www.delos.info/vldl2011/

9. See the American Memory project website, at http://memory.loc.gov/ammem/index.html.

10. “Copyright Timeline,” Association of Research Libraries, www.arl.org/focus-areas/copyright-ip/2486-copyright-timeline.

11. “myCSUNtablet initiative,” California State University, Northridge, April 5, 2013, www.csun.edu/it/news/ipad-initiative.

12. Rebecca Rosen, “The Hole in Our Collective Memory: How Copyright Made Mid-Century Books Vanish,” Atlantic, July 30, 2013, www.theatlantic.com/technology/archive/2013/07/the-hole-in-our-collective-memory-how-copyright-made-mid-century-books-vanish/278209/.

13. Michael Arrington, “Microsoft to Shut Live Search Books,” TechCrunch, May 23, 2008, http://techcrunch.com/2008/05/23/microsoft-to-shut-live-search-books/.