9 GIVE THEM WHAT THEY NEED

Rethinking Management, Aggregation, and Access for Digital Collections at the University of California

Sherri Berger and Catherine Mitchell

The ten libraries of the University of California (UC) system hold a dazzling array of unique materials, including photographs, newspapers, maps, oral histories, correspondence, government publications, historical records, works of art, films, and ephemera. Documents of world events and expressions of human culture, these materials are essential items of study for scholars, students, and the general public—and, in many cases, the UC campus libraries hold the only existing copies in the world. Unfortunately, as vast and valuable to educational inquiry as these resources are, most of them are effectively invisible to researchers because they are not available online. Funding for digitization is undeniably scarce. Yet technical infrastructure provides another, equally intractable barrier––even among those libraries with significant numbers of digital collections (whether digitized or born-digital), many struggle to make them broadly available on the web.

Over the past eighteen years, the California Digital Library (CDL) has maintained two related services aimed at providing dynamic access to the UC Libraries’ unique digital content: the Online Archive of California (OAC) and Calisphere. Initially launched at UC Berkeley and formally established at the CDL in 1998, the OAC places digital resources in the context of descriptive finding aids and the larger universe of unique physical collections. Established by CDL in 2006, Calisphere provides an optimal discovery environment for digital resources exclusively. Both services have been primarily focused on providing end points for access to digital materials, and, while they have garnered attention and recognition for the content they include, they have not solved a fundamental problem: how to aggregate and make available a consistent stream of digital resources from each of the UC Libraries. The libraries that lack technical infrastructure have not been able to contribute digital content routinely to OAC and Calisphere because of insufficient back-end support for the management of that content. The libraries with more advanced infrastructure for managing and providing access to their digital collections have also, at times, opted out of contributing digital resources to these platforms because of concerns about duplicating effort and uncertainty about benefits. All of the libraries struggle to keep up with demands for stewarding new types of digital content, from video games to brain scans, which grow increasingly complex and diverse as technologies evolve.

These obstacles are not unique to UC but rather are emblematic of a common “consortial problem”: how to build services that address the distinct needs and capacities (both technical and organizational) of a heterogeneous group of content providers. This chapter explores an effort at UC to solve this problem by establishing a distinctly modular set of services for the largely unique digital resources of the UC Libraries. Following four years of system-wide administrative planning, a project team based at the CDL embarked, in July 2013, on a two-year initiative to establish a multi-layered technical system for building and providing access to a collaborative UC Libraries Digital Collection (UCLDC).1 The project resulted in a set of three production services, released in September 2015:

• a shared digital asset management system (DAMS) for use by campus libraries that lack such a system or equivalent management solution

• aggregation of digital resources owned and maintained throughout the UC Libraries, regardless of where they are hosted and how they are managed

• a flexible public access solution that includes a user-centered interface for the entire collection, along with support for mega-discovery and highly customized display

This modular strategy challenges the one-size-fits-all approach to digital collection building. CDL’s prior services solved a discrete problem for some libraries (namely, the need to host and provide access to digital content) but in the process disincentivized the wider participation of other libraries that need additional—or simply different—support. The UCLDC services strive to give each of the UC Libraries what it singularly needs to meet its digital collections goals, connecting with campuses’ digital content only at those points where they can fill a gap or provide an opportunity to scale relatively scarce resources. For some libraries, this means end-to-end digital asset management, discovery, and display. For others, it means enabling aggregated access and wider exposure to locally managed materials. The ultimate goal is to ensure that all digital content, regardless of how it is generated and managed by distinct UC Libraries, is made both broadly discoverable and deeply engaging for researchers worldwide.

PART ONE

A SHARED DIGITAL ASSET MANAGEMENT SYSTEM

The Need for a Shared Digital Asset Management System

In the UCLDC service model, access represents the terminus and culmination of a series of processes designed to meet campus library needs—the first of which is the not-insignificant task of digital asset management.

Generally speaking, management refers to the administrative functions required for describing, storing, organizing, and retrieving digital content and metadata, but these functions can vary in type and complexity. In its overview of digital media management systems, JISC Digital Media presents a spectrum of tools and activities that could provide the appropriate level of management for a given digital collection. On one end of the spectrum is the simple storage of files, organized into folders, on a local computer. At the other end is the development or commissioning of a highly customized, multifunctional system for performing a host of specific functions required for readying digital content for preservation and access.2

Throughout the UC Libraries, methods for managing digital content have similarly run the gamut. To date, each campus library has tackled the process independently and, consequently, has established its own workflows and acquired its own tools—in some cases, utilizing different management applications and processes for different collections and content types. Two campus libraries have developed their own digital library platforms that provide extensive management functionality: UCLA has implemented Islandora and UC San Diego has mounted a Hydra stack. UC Berkeley has produced a robust METS generation and storage system called WebGenDB, which does not provide full-fledged digital asset management but has nevertheless allowed the library to create and publish digital collections en masse.

The other seven UC Libraries have relied on various, more discrete tools to aid in the storage and management of their digital files and metadata, including Microsoft Excel, Filemaker Pro, Omeka, CONTENTdm, DSpace, and Canto Cumulus. Although the libraries have managed to leverage these tools to produce and share some digital content, there is consensus that the products do not do enough, individually or collectively, to enable the robust, consistent management of digital files and metadata at scale. Library staff often criticize these systems for their

• lack of sophistication in supporting the complexity or number of resources

• difficulty of use in performing necessary management functions

• high expense, in the form of annual licensing fees or local technical support

At best, these technologies have made burdensome the management of existing digital collections; at worst, they have prevented these libraries from providing access to as many new digital collections as they might have otherwise.

When, in 2009, the libraries began planning the UCLDC project, management was a key part of the discussion. A subsequent system-wide task force recommended that the UC Libraries “implement a coordinated, system-wide DAMS solution immediately,” with the goal of providing “all campuses with the means to create, manage, and make accessible digital assets efficiently and at less cost.”3 A shared DAMS, the task force asserted, would result in cost savings for the campuses—even taking into account set-up costs and ongoing investment—by centralizing the resources necessary to license and/or develop such a system. Notably, the task force also emphasized the urgency of establishing this system-wide DAMS, advising the libraries to find a solution that would enable the campuses to “immediately” create metadata and manage at-risk digital resources.

An Alternative Model: Selecting an Enterprise DAMS

Another system-wide task force, convened in 2012 to develop the UCLDC technical model, established three major principles for UC’s digital asset management solution: modularity, best-of-breed components with open-source tendencies, and broad adoption with community support. The team identified four technical solutions that met these principles and then evaluated each in relation to a list of determined requirements and the goals of the wider initiative.4 The options under consideration included two platforms originating in the library community (Hydra and Islandora) and two enterprise content management products developed outside of the library community for a range of professional applications (Alfresco and Nuxeo). After much deliberation, the task force recommended Nuxeo (www.nuxeo.com) for immediate implementation as a solution for the pressing needs of the libraries.5

Why choose an enterprise solution? Given the stated urgency of establishing a shared DAMS, speed of implementation proved an important criterion for selection. On this front, the task force ultimately advanced a philosophy that might best be summarized as “borrow before build”: the idea that resources might be saved (and thus the project would advance faster) if staff time were focused on making minor adjustments to an existing project rather than generating new code. Of the four products evaluated, Nuxeo appeared to support natively the most management requirements, especially technically complex ones such as batch metadata editing and automatic generation of derivatives. It also provided many tools and ready-made product extensions that the team hoped would reduce the need for intensive technical development, thereby speeding up initial implementation and reducing the complexity of ongoing support. In short, Nuxeo offered the greatest promise for meeting the system-wide digital asset management requirements on the shortest possible timeline.

Concerns about whether such a system could work for the specific types of resources and the unique needs of libraries were assuaged by the fact that Nuxeo had successfully served as the technical foundation for a similar project in the cultural heritage community: CollectionSpace. Developed by UC Berkeley and partners, CollectionSpace is a Nuxeo-based platform for tracking and managing museum artifact data.6 Not only did this project serve as an exemplary parallel use case for the Nuxeo product, but it additionally provided an opportunity to leverage expertise already within the UC community. Indeed, the UCLDC project team benefited significantly from the involvement of CollectionSpace staff, who helped install Nuxeo and determine a process for migrating existing collections into the platform.

Released for use in July 2014, Nuxeo is licensed annually by CDL on behalf of the UC Libraries. This license covers Nuxeo “Studio” (a web-based configuration tool for customizing the product); regular, automatically deployed fixes to the system; and a high level of responsive technical and user support. CDL hosts the DAMS on its own servers and performs any development work needed by the UC Libraries that is not globally supported by the vendor. Customization of the system and its open-source code base is essentially limitless, impeded only by CDL’s own development capacity.

Implementation: Initial Success

Although it is still too early in the project to evaluate the overall success of the shared DAMS at the UC Libraries, Nuxeo has so far lived up to its promise as a powerful and extremely extensible product with relatively straightforward installation and customization procedures. (See figure 9.1 for an example from the shared DAMS.) Proof enough is the speed with which the UCLDC project team was able to release a functional product, with a custom metadata model, to the campus libraries: a mere ten months from the start of implementation.7 The DAMS now meets the majority of the UC Libraries’ stated requirements for digital asset management, and many existing collections have been migrated into the system. Uptake among the libraries has begun, with four of the ten campuses actively using the system to create new collections, and others evaluating it for future use.8

Figure 9.1 | Digital object in the UC Libraries shared DAMS

This is a customized instance of the Nuxeo platform. Staff at each campus library have separate spaces to organize objects and create metadata, but they share the same global system and can search across all collections.

At this early stage of the DAMS service, the goals are to develop policies and processes for authorizing campus users; train these users on the DAMS; consult with them as they build a few collections “from scratch” using existing tools; and, most importantly, identify and begin to implement refinements to the system. The DAMS implementation team anticipates that, as the UC Libraries become familiar with the system, they will begin to determine which of their collections should be managed in the DAMS. Criteria for inclusion in the DAMS will be largely dictated by the libraries; CDL intends to support the management of as many types of resources as possible within the system.9

The team’s initial experience with Nuxeo bolsters the case for licensing and customizing an enterprise product. The most direct benefit of this approach has been the vendor’s role as the primary developer of the platform, thereby minimizing the demand on local resources to build the core service. Nuxeo provides extensive technical support by pushing regular fixes and improvements to the DAMS, which has significantly simplified and expedited its central deployment. This company is in the business of content management and, as such, is constantly working to anticipate market needs to an extent that would not be possible by the lean project team at CDL. To that end, Nuxeo bundles new features into annual upgrades, which CDL has easily implemented even while retaining local customizations. Additionally, ready-made add-ons such as Shibboleth and Amazon S3 integrations have greatly reduced the amount of “ground-up” technical development required of the team. Unlike a typical vendor relationship, however, where the client is dependent on the commercial product and hopes that its general development trajectory continues to meet local needs, in this case CDL also has access to the open-source code base and hosts the entire system. This arrangement has afforded CDL the capacity and the freedom to develop new features as requested by the UC Libraries—for example, a bulk file upload client—and ensures continued control over the specific implementation.

This approach is ideal for a consortial setting: Nuxeo and products like it are already optimized for a multi-institution model. The project team easily built out a structure within the platform wherein each campus library has its own space for creating and managing digital objects, while still allowing library staff to search across all collections. User permissions such as “view only” and “manage all”—complex functionality that would be difficult to develop in-house—are already built in, so libraries can quickly designate different roles for student workers, metadata librarians, and administrators. Nuxeo’s open-source code base and REST API further support UC’s collaborative use case by opening the door to co-development opportunities with campus partners. UC Santa Cruz, for example, has created a plug-in that allows users to pull content from Nuxeo into Omeka, a tool for creating digital exhibits.

Early experience suggests that the UCLDC service model hits a technical “sweet spot” with this DAMS implementation. By licensing support for an open-source product, CDL and the libraries are reaping the benefits of a well-resourced, industry-standard content management system, while retaining the ability to build what is uniquely needed on top of its non-proprietary code base. In a field that often develops its own technical solutions, this service represents an alternative approach: buy into a robust enterprise platform and customize it only where it falls short of the specific needs of particular users and their collections.10

PART TWO

AGGREGATION OF UC LIBRARIES DIGITAL RESOURCES

Beyond Hosting: The Need for a More Flexible Model

While the shared DAMS undoubtedly will stimulate new collection building across the campuses, the UC Libraries have already made a substantial corpus of content accessible online.11 UC Berkeley, UCLA, and UC San Diego (utilizing their respective systems) have most routinely been creating and exposing digital resources at scale. Even campus libraries without such infrastructure have leveraged what they do have to produce and make available important digitized content, if only on a project-by-project basis. That this impressive array of unique digital resources already exists—and, for some campus libraries, pre-dates a robust management solution—is a testament to the commitment that each has made to digitizing and providing online access to its unique holdings.

Though these collections are vast and diverse, however, they have also been “siloed,” or dispersed across many different platforms maintained by the libraries themselves, by the CDL, or by third-party providers. The result is that end users are unaware of both the full scope of online resources available at UC and, as a result, of materials directly relevant to their research. Even users savvy enough to locate all of the access points to the libraries’ digital collections cannot search across these collections to uncover related resources. This distributed environment also has posed internal challenges, chief among them the missed opportunities for collaborative digital collection development. With ten campus libraries working independently on digital projects and publishing the resulting content on separate platforms, significant organizational overhead has been required to identify shared collection strengths and potential collaborative digital projects.

Aggregation is the clear solution to these challenges and, in theory, could have been achieved in the shared DAMS. Such an approach, however, was not a viable solution for the UC Libraries. Why? First, those campus libraries with their own management solutions in production or development have already invested heavily in systems that are working well for them or promise to do so. It is unreasonable to require them to migrate their content to the shared DAMS or to deposit and maintain copies of their objects in it while managing them elsewhere. Either option would lead to increased workloads, redundant storage costs, migration complexities, and inevitably version control issues. Similarly, there are a number of collections throughout the libraries that are not good candidates for the first phase of the shared DAMS implementation. They may, for example, reside on third-party platforms, present technical challenges, or simply require more resources for migration than are currently available. Some of these collections might eventually benefit from the shared DAMS, but in the meantime, an alternative solution for aggregation is necessary.

In short, to present a truly comprehensive view and “single search” of all unique digital content within the UC Libraries’ purview, the distinct needs of the campus libraries and their respective collections must be accommodated.12 The original model of aggregation through the OAC and Calisphere (hosting digital resources from multiple contributors in a single repository) has succeeded in bringing together over 460 collections in the past eighteen years. But this approach has only partially served collection owners. It requires an elaborate and often onerous workflow for pushing files and metadata into the repository; it does not sufficiently incentivize participation from institutions with alternative solutions for hosting, managing, and surfacing digital content; and it fails to accommodate the assorted edge cases where the central repository is simply not appropriate.

Leveraging a Harvest for Metadata Aggregation

The CDL will achieve this goal of aggregation then not within the DAMS but instead, like several other major digital library consortia, by way of a metadata harvest.13 The harvest should be regarded as another layer of the UCLDC platform, separate from but “on top of” the shared DAMS (see figure 9.2). Within this layer, item-level metadata for designated collections from a variety of sources is gathered, stored in a common index, and remediated and normalized where possible, thereby enhancing the discoverability of the resources once aggregated. Current and known future targets for this metadata retrieval include the following:

• campus-hosted DAMS, notably those in production at UCLA and UC San Diego and those in development at UC Berkeley and UC Santa Barbara

• the existing, now effectively “legacy” OAC/Calisphere METS repository

• miscellaneous other platforms managed by campus libraries (e.g., the Legacy Tobacco Documents Library, a highly specialized project at UC San Francisco)

• the shared Nuxeo DAMS (metadata for collections within this system will be harvested into the index just like metadata from any other source)

Figure 9.2 | UCLDC service model

Dark grey areas represent the core components supported by CDL. (The Merritt preservation repository is also a CDL service, but is considered beyond the scope of the UCLDC technical stack, proper.) The shared DAMS comprises a file system for storage and Nuxeo, an administrative interface for managing digital objects. Aggregation and end-user access are separate layers.

To actually execute the harvest, the UCLDC project team has been working with institutions to determine the most efficient and effective method for obtaining the best possible data about their collections. (The term “harvest” is used loosely to describe the process of gathering metadata, as CDL works with various feeds and outputs including, but not limited to, OAI-PMH.) In addition to gathering metadata, CDL has also requested that institutions identify the URL for access images and other files, where possible, within the metadata. This approach allows for the display of actual visual content in the public interface, thereby creating a more seamless experience for end users.

Having obtained the data, the challenge then becomes how to turn a set of heterogeneous feeds and formats into a successful aggregation. On this front, the UCLDC model extends the “borrow before build” philosophy by leveraging work completed by the larger aggregators. Specifically, CDL has mirrored DPLA’s metadata application profile (MAP), which is “the basis for how metadata is structured and validated in DPLA, and guides how metadata is stored, serialized, and made available.”14 For each collection harvested from a given target, the UCLDC team creates a crosswalk to the MAP, thereby reconciling the metadata fields from the various sources. Additionally, some of the metadata is augmented centrally (e.i., by adding a rights statement to all objects in a given collection, if they lack one). The team is currently experimenting with other metadata enhancement strategies such as geographic place name recognition. These kinds of innovations will increase the benefits of the central service by greatly expanding discovery opportunities for end users without requiring more work from campus metadata librarians.

Managing the Aggregation

Critical to the long-term success of the UCLDC suite of services is the ability, in cooperation with the UC Libraries, to manage and control this large data aggregation. To that end, the CDL has developed an application that exists alongside the common index, helping track all of the data coming into the system and thus plan for the continued growth of the collection. This “Collection Registry” (http://registry.cdlib.org/) is a simple website that might be thought of as a human-readable display of collection-level data in the common index. It is also an administrative interface that UC Libraries staff can use to provide key information about the collections. Specifically, the Collection Registry performs the following functions for the UCLDC.

Tracking New Collections

In the Registry, campus library staff can indicate new collections that they plan to add to the system, whether they intend to build them in the shared DAMS or contribute them through the harvest. For DAMS collections especially, it is important for CDL to know characteristics like the size and dominant format of a prospective collection, to ensure there is the requisite storage capacity and functionality in place to support it. For harvested collections, if a new source or type of collection is proposed that has not been previously supported, CDL then works with the institution to determine the best way forward to obtain its metadata.

Directing the Harvest

The Registry plays a major role in the harvest mechanism, from both administrative and technical perspectives. Once the appropriate feed for a given collection is determined, CDL requires a target URL from which to grab the data. Campus library staff record these URLs directly within the Registry interface, edit them when necessary, and indicate when the collections are ready for harvest (or re-harvest, if the data has been updated). CDL staff also use the Registry to actually trigger the harvest, telling the system to go fetch the data.

Maintenance of Consistent Data

The Registry additionally serves as a source of data itself for the purposes of consistent, unified display in the public interface. Three types of information in the Registry constitute, essentially, a controlled vocabulary: campus name, department name, and collection name. Standardizing these fields is crucial for ensuring that all digital objects held by a given campus/department and in a given collection appear as such for end users. CDL plans to integrate these vocabularies into the shared DAMS, so campus libraries can make these designations for hosted items. The Registry also enables campus partners to provide collection-level descriptive information and departmental addresses and hours that they wish to display for end users. Eventually, it may be the platform for managing more complex data such as end-user permissions for restricted content.

Collection Development

Finally, the vision for the Registry extends beyond the original scope of UCLDC services, insofar as it might serve as a platform for digital collection development writ large at the UC Libraries. As mentioned earlier, to date both CDL and the UC Libraries have found it difficult to maintain a collective inventory of both digital and physical collections. The Online Archive of California provides one source of information, but it is used predominantly by archives and special collections and thus does not contain information about other unique resources within the purview of the libraries. The de facto method of recording the scope of collections across the campus libraries has been, historically, to send around a series of Excel spreadsheets every few years. Although this approach may work on a project basis, it does not provide a current picture of the system-wide collections landscape, an important requirement for securing funding for collaborative digitization initiatives. CDL plans eventually to extend support of the Collection Registry so campuses can enumerate and describe potential future digital collections—“hidden” collections that would be prime candidates for digitization. Such functionality will help ensure there is a constant stream of new resources and metadata flowing into the aggregation for years to come.

PART THREE

A PUBLIC ACCESS SOLUTION

Supporting Discovery and Access through Multiple Interfaces

The flip side of the harvest (getting data in) is discovery (getting data out). The UCLDC aggregation layer allows for flexible discovery, meaning the data contained within the common index is shareable with end users through multiple interfaces. This flexibility is central to the design of the platform and—like other consortial harvest projects before it—the guiding philosophy behind the initiative.15 Single point of access to the UC Libraries’ digital collections is a major goal, but it is not the only goal. As with digital asset management, it cannot be assumed that, when it comes to discovery and access, one size fits all. This is the case both for end users, who have a constellation of websites and applications at their disposal, and content holders, who need the ability to customize the appearance of their digital resources for different purposes.

The UCLDC aggregation supports large-scale discovery through the Digital Public Library of America (DPLA). As mentioned previously, CDL is mirroring the DPLA metadata application profile as part of the harvest infrastructure. This choice not only avoids “reinventing the wheel” when it comes to metadata reconciliation, but it also allows easy sharing of the aggregated metadata with DPLA without additional remediation. CDL has launched as a DPLA Content Hub, thus serving as a major source of metadata for the DPLA service and end-user portal. As such, CDL has begun sharing all of the metadata harvested into the UCLDC index to date and will continue to provide this data feed to DPLA moving forward. Participation in DPLA is critical to the UC Libraries’ strategy of providing broad public access to their digital collections. Much has been written about getting digital library resources into users’ workflows and “meeting them where they are.”16 Although the DPLA site is still young, as the only national-level aggregation of unique digital resources, it is likely to attract a large audience. Even if end users do not start at the DPLA site for their research, they will be more likely to find digital resources that appear there when using search engines like Google.

On the opposite end of the spectrum, the UCLDC platform also allows for small-scale, highly customized discovery. As described earlier, one function of the common index is to aggregate metadata harvested from the various sources. But the index (specifically Apache Solr) and its application programming interface (API) additionally and importantly support the search, retrieval, and flexible exposure of that metadata. Using the Solr API, a developer can access all or some of the data stored in the index and essentially “plug it into” any graphical interface or functionality. This means that the UC campus libraries are able to create custom interfaces for subsets of the aggregation. Several libraries, for instance, are participating in celebrations for their respective campuses’ anniversaries and may wish to create special websites showcasing their campus history resources. Or they might decide to collaborate on the creation of topical portals for resources held across institutions, for example on a “California food and wine” collection and corresponding interface.

The API also opens the door for more participatory end-user engagement with the data itself, as researchers could use it directly to create “mashups” of the collection.17 One of the most exciting parts of the UCLDC project is the potential it holds for converting end users from mere consumers of the digital collection to active partners in its development.

Investing in User Experience

Beyond supporting mega-discovery and highly customized display solutions, the UCLDC service offers a new public interface that provides researchers with not only seamless discovery of disparate digital items across the entire aggregation but also an exceptional user experience when interacting with these materials. Why expend resources to create an interface for digital content that already appears, in toto, within DPLA and, in focus, within specialized interfaces enabled by the Solr API? Because while “serving up” content is critical, it is not enough: the need remains for a digital collection interface that meets user expectations and facilitates the research process.

As web technologies have improved and sites have evolved in their complexity and sophistication, user expectations have risen accordingly. Digital collections websites are probably closest in structure to ecommerce sites; both attempt to organize a vast number of unique digital items and connect visitors with the most relevant items for them—thus driving engagement, either through purchase or simply usage. Users increasingly expect these sites will provide:

• high performance, including quick load times for large quantities of objects

• a great user experience, with a premium placed on saving time and reducing the number of clicks required to connect users with desired information

• rich context for digital objects through curation, item-level description, and clear associations between similar content

• personalized suggestions for discovery, tuned to past behaviors and/or preferences

• a multitude of search, facet, and browse options that satisfy a range of information-seeking practices and meet users at various phases in the research process

• “click of a button” features and integration with third-party applications that enable users to save and share content using the tools with which they are familiar

• a trusted brand association

• clean, modern, and attractive design

Just as these user expectations push companies to invest in user experience design in order to facilitate sales, so should they push libraries and related institutions to do the same in order to facilitate the creation of research citations, museum exhibitions, documentaries, family histories, lesson plans and the many other products of research. Academic libraries are far from the only game in town when it comes to serving up digital content, so they must make a compelling case for the value and relevance of the resources they provide to institutional patrons and to the larger global community.

As a regional content aggregator, the CDL is best positioned to meet these growing user expectations and to realize the ultimate payoff of UCLDC services: ensuring that researchers can discover the materials in the collection through a variety of channels and are aided in their use of those materials by a well-designed website. Why? From a resource perspective, it makes good sense to amass user experience (UX) research and design expertise at the consortial level and apply it across a breadth of resources. Additionally, the UCLDC aggregation is large and diverse enough to sustain high-value features (e.g., facets) but small enough to make these features meaningful to researchers (through, to continue the previous example, a coordinated taxonomy). Here again is a “sweet spot” in the UCLDC service.

The Calisphere Redesign

In order to achieve these user-focused goals for the interface to the UCLDC aggregation, CDL has completed a visual and technical redesign of Calisphere. Briefly described earlier, Calisphere (calisphere.cdlib.org), initially released in 2006, is a CDL service that showcases the unique digital resources of the UC Libraries as well as other cultural heritage and educational institutions within the university’s ten campuses and throughout California. Prior to the redesign, the site made available approximately 260,000 images, texts, and A/V resources documenting events and subjects as far-flung, topically and geographically, as the People’s Temple movement, the Melanesian Islands, mid-century food photography, and Japanese-American relocation. The new Calisphere website, released in beta mode in September 2015, continues to provide a point of access to these resources, in addition to all of the content in the new UCLDC aggregation, for a total of more than 400,000 objects.18

The previous Calisphere site proved an ideal starting point for building the UCLDC public interface. It had a user-centered design ethos that led to its adoption by a diverse community of users.19 It carried strong brand attributes as a known and trusted source of digital historical content. And finally, it contained an even larger content base from California’s amazing breadth of educational and cultural institutions, which served to augment and enrich the UCLDC aggregation. Yet, eight years into its lifecycle, Calisphere was in need of a substantial visual and technical refresh. The UCLDC project afforded CDL the opportunity to refactor Calisphere’s design while leveraging the site’s long-standing role as a valuable resource within the UC community, the state of California, and well beyond.

Data-Driven Design with a Focus on the Digital Object

The new iteration of Calisphere has largely adopted the information architecture of the previous site but includes key changes to the interface that continue the CDL’s tradition of data-driven UX design. In 2012, CDL conducted an in-depth analysis of Calisphere usage data (tracked by Google Analytics) to gain a better picture of how users were arriving at and navigating through the site. Three findings stood out to the team:

• Almost 70% of visits to the site began at a digital object (as opposed to “navigational” pages like Calisphere’s homepage or its more than 100 handcrafted topical collections), mostly referred from search engines and other websites.

• The “bounce rate” for those digital object entry points was very high (over 60 percent), meaning many users left without clicking on any subsequent pages.

• Visitors rarely used the site search.

On the one hand, these statistics indicated that Calisphere was effective at surfacing unique digital content through search engines and other external discovery mechanisms, since a vast majority of users arrived at specific objects. But, the results also pointed to a challenging reality: many of those users left without exploring any additional content. Data collected since the assessment has confirmed that most researchers simply do not start at the homepage and use navigational and search tools to drill down to individual objects. The object is their homepage.

For the new site, accordingly, CDL re-conceptualized digital object pages as points of entry to Calisphere. The new object pages not only showcase the content but also provide contextualizing information and navigational prompts for users, encouraging them to dig deeper into more relevant content within this rich collection. In designing these pages, CDL explicitly laid out the following four objectives:

• Help the user quickly grasp what the site is: an open and accessible digital collection with a breadth of unique resources, presented by the University of California.

• Promote understanding and usage of the object through a well-organized, intuitive display.

• Direct the user to other relevant content, thereby encouraging discovery of additional resources.

• Connect the user with the owning institution, should he/she need additional information about the digital object.

Figures 9.3 and 9.4 show the “before” and “after,” respectively, of the design of a digital object on Calisphere. Figure 9.4 demonstrates how the new design makes the object not only a destination but also a launch pad for subsequent discovery and interaction. Specific features include clear directives for contacting the owning institution and performing new searches, information in the header that helps contextualize the object within the site, a clear and easy way to browse all of the metadata for a given resource and interact with the file (zoom for images and media player for A/V materials), and clear browse options like a “similar objects” carousel and links to related collections.

Figure 9.3 | Object in the old Calisphere site.

The design of a simple digital image object on the previous Calisphere site. In 2011, CDL added “more like this” links to related objects. This feature since drove usage throughout the site, and was expanded upon in the new iteration of the design.

Figure 9.4 | Object in the new Calisphere site.

Digital object pages on the new site have been treated like the homepage, emphasizing contextual clues and navigational tools that encourage users to explore the wider collection on Calisphere.

Design, of course, is only half the equation when it comes to developing a usable website; technology plays a major role in providing the features and functionality that users need. The new Calisphere site is built on a modern technical stack that enables CDL and the UC Libraries to meet user expectations now and in the future.

“Decoupling the interface from the data” is one of the major technical principles that guided the implementation of the new site and supports the user-focused design paradigm described above. The previous Calisphere interface was closely meshed with an underlying repository and thus digital object data. This structure made it difficult to be nimble in rolling out new features, as a change to the data could “break” the interface and vice versa.

The UCLDC technical architecture was specifically designed to avoid this problem. The project team built the new Calisphere site using the Solr API, meaning digital resources and data from the common index are pulled and displayed in a custom interface. (In this sense, Calisphere is really just an ambitious version of the highly customized access solution supported by the API, described earlier.) The interface is thus decoupled from the underlying data, which will make it considerably easier to modify the site over time.

In creating the new site, CDL has also taken advantage of new technologies and standards that were not available when Calisphere was originally launched almost a decade ago. For example, the project team built the site in a responsive framework that allows users to easily access and interact with it on a range of devices: desktop, tablet, and mobile. Furthermore, a host of new products have been used—including CSS compilers, deployment tools, and off-the-shelf styles like Bootstrap—which inherently promote development and workflow efficiencies. These products will allow CDL to deftly implement changes and save time on site maintenance, thus freeing up resources for more elaborate, user-centered development projects as needed.

CONCLUSION

The UC Libraries Digital Collection project tackled a common consortial problem: how to design and develop services that are compelling and valuable to a heterogeneous group of stakeholders. The ten UC Libraries hold a great many important collections and have dramatically different levels of technical infrastructure and capacity to manage and expose these collections. By focusing on some but not all of these needs, the CDL’s past services made important inroads but also struggled with uneven participation from the libraries and an incomplete aggregation of their collections. For every digital collection that “made it” to Calisphere, several more languished in inaccessible databases and hard drives; still others lived happily on the web, but lacked the connections and increased exposure provided by aggregation.

The UCLDC service model asserts that the key to the success of collaborative digital collection building is to give stakeholders what they each need, whether that is a fully realized DAMS solution, an easy mechanism for contributing metadata for digital resources that are managed elsewhere, or a little bit of both. Yes, this modular approach requires a great deal of time, coordination, and resources. But the rewards are great: a flexible platform that supports broad participation, the opportunity to share collections in exciting new ways and meet researchers in new settings, and, ultimately, the exposure of unique materials that have been obscured for far too long. And by building this modular system with “borrowed” best-of-breed components whenever possible, CDL and the UC Libraries ensure that they have the resources required to create a sustainable future for the service.

Of course, there is an inherent risk in releasing a service tuned to the specific needs and requirements of a heterogeneous group of stakeholders: individualizing the service to the degree that it no longer inspires collective buy-in. Though not addressed in this chapter, the work of communicating the overall shape and significance of the UCLDC services across the consortium has been substantial. At the most fundamental level, it is crucial for all stakeholders to recognize the inherent value of the initiative, even if they have limited investment in some of its components (e.g., the shared DAMS solution). CDL has encouraged stakeholders—through in-person meetings, wiki updates, release reports, demos, and other communications—to keep their eyes on the prize: a collaborative UC Libraries collection made discoverable and accessible through a process that feeds statewide and national aggregations, supports custom access for distinct collections, and facilitates research of all kinds. In the end, the success of the initiative will be measured not by the technologies licensed or developed but, instead, by the ability of users worldwide to gain meaningful access to the riches of the UC Libraries.

Notes

1. Planning for the project officially kicked off with the establishment of a UC Libraries Digital Libraries Services Task Force in 2009. The final report is available at http://libraries.universityofcalifornia.edu/groups/files/dlstf/docs/DLSTF_Final_Report.pdf. A series of system-wide task forces and teams subsequently convened over the next few years to refine the scope of the project and hammer out the details. References to key teams and reports are provided throughout the chapter.

Concurrent with the start of planning for the project proper, the UC Libraries’ Collection Development Committee put forth a vision for a collaborative collection that set the tone for the project and informed the design of the resulting service. See “The University of California Library Collection: Content for the 21st Century and Beyond,” http://libraries.universityofcalifornia.edu/groups/files/cdc/docs/uc_collection_concept_paper_endorsed_ULs_2009.08.13.pdf.

2. JISC Digital Media, “Systems for Managing Digital Media Collections,” www.jiscdigitalmedia.ac.uk/guide/managing-a-digital-media-collection.

3. “New Modes for Organizing and Providing Access to Special Collections, Archive, and Digital Formats,” final report of the UC Libraries Next Generation Technical Services (NGTS) New Modes Task Force (September 2010), http://libraries.universityofcalifornia.edu/groups/files/ngts/docs/NGTS2_New_Modes_FinalReport.pdf.

4. “Digital Asset Management System (DAMS) Requirements,” final report of the UC Libraries NGTS Power of Three Group 1 (POT1) Lightning Team 1A, http://libraries.universityofcalifornia.edu/groups/files/ngts/docs/pots/pot1_lt1a_finalreport_july2012.pdf.

5. Notably, the task force additionally recommended that this “short-term tactic of adopting a non-library-industry standard technology . . . be combined with a long-term strategy of participating in the library-specific Project Hydra and Islandora communities utilizing the Fedora (or Fedora Futures) repository framework.” Although this recommendation was not ultimately endorsed, a system-wide discussion continues about how best to participate in larger community development initiatives in this space. Both the technical model and selection process and criteria for the DAMS are documented in “Proposed Model for Systemwide Digital Asset Management System (DAMS) with Discovery and Display,” final report of the UC Libraries NGTS POT1 Lightning Team 1C (February 2013), http://libraries.universityofcalifornia.edu/groups/files/POT1_LT1C_finalreport_08Feb2012.pdf. The process was modeled on a similar undertaking by the Council of State University Librarians in Florida; see “DISC Investigation: Common Digital Library System,” http://csul.net/sites/csul.fcla.edu/uploads/disc-findings-09-01-11.pdf.

6. See www.collectionspace.org/program-history/.

7. UCLDC Metadata Scheme, https://registry.cdlib.org/documentation/docs/dams/metadata-model/. Staff time over the ten months totaled approximately 3.5 FTE, including not only the technical set-up and installation of the DAMS, but also activities related to creating and vetting an object model and metadata scheme, working through UC’s approval process for Shibboleth authentication, customizing DAMS interface elements, developing a process for migrating existing collections into the system, and overall project management and other work across the project (including but beyond the DAMS component).

8. At the time of writing, libraries at UC Irvine, UC Merced, UC Riverside, and UC San Francisco are using the DAMS.

9. Although space is not currently a restricting factor (since a move to cloud-based Amazon Web Services now provides extensive storage relatively cheaply), it will nevertheless be important to centrally track the extent of digital resources as they come in.

10. At the same time, the CDL has de facto considered it part of our charge to keep an eye on new technology solutions coming out of the broader library community. Given the express statement in the UCLDC planning process of the need for long-term planning around a DAMS solution, growing system-wide interest in community development initiatives, and the rapidly changing nature of technology and the digital library landscape, CDL has been following the development of Fedora/Hydra, even while we have moved forward on the implementation of the Nuxeo solution as charged by the libraries. The UC Library system will continue to assess Nuxeo as a DAMS solution in light of these maturing technologies.

11. A 2012 inventory revealed approximately 680 completed or in-process digital collections across the UC Libraries. See “Identification/Analysis of Existing UC Digital Collections for Inclusion in the UC Libraries Digital Collection (UCLDC),” final report of the UC Libraries NGTS POT1 Lightning Team 3A, http://libraries.universityofcalifornia.edu/groups/files/ngts/docs/pots/pot1_lt3a_finalreport.pdf.

12. Leah Prescott and Ricky Erway, “Single Search: The Quest for the Holy Grail” (Dublin, OH: OCLC Research, 2011), www.oclc.org/research/publications/library/2011/2011-17.pdf.

13. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a prime example of a harvesting protocol that gained traction in the digital library community and formed the basis of early projects such as OAIster. Other digital library programs that aggregate content from multiple institutions, which also use a metadata harvesting model, include the Mountain West Digital Library, Digital Commonwealth (a consortium in Massachusetts), and Digital Kentucky; Europeana and the Digital Public Library of America (DPLA) take harvesting to the next level by aggregating aggregations of data.

14. Digital Public Library of America, “Metadata Application Profile,” http://dp.la/info/developers/map/.

15. Cesare Concordia, Stefan Gradmann, and Sjoerd Siebinga, “Not just another portal, not just another digital library: A portrait of Europeana as an application program interface,” IFLA Journal 36, no. 1 (March 2010), 61–69, doi:10.1177/0340035209360764.

16. See, for example Ricky Erway and Jennifer Schaffner, “Shifting Gears: Gearing Up to Get Into the Flow” (Dublin, OH: OCLC Programs and Research, 2007), www.oclc.org/programs/publications/reports/2007-02.pdf; and Lorcan Dempsey, “Reconfiguring the Library Systems Environment,” guest editorial to portal: Libraries and the Academy 8, no. 2 (April 2008), www.oclc.org/research/publications/archive/2008/dempsey-portal.pdf.

17. Holley Long, “End user development of digital collection mash-ups,” OCLC Systems and Services: International Digital Library Perspectives 28, no. 4 (2012), 199–207, doi:10.1108/10650751211279139.

18. Technically speaking, the existing digital content in the Calisphere repository actually became part of the UCLDC aggregation when harvested into the common index. CDL is testing the scalability of the harvest infrastructure to institutions beyond the UC Libraries, in which case it would also become an ingest mechanism for non-UC institutions moving forward.

19. Calisphere has been focused, since its launch in 2006, on providing a compelling user experience that encourages meaningful access to digitized resources. The site originated from a mission-driven and design-centered challenge: to “transform collections intended for university-level research and teaching into accessible resources for multiple audiences.” Initially focused on helping K-12 teachers leverage digital library collections for classroom use, Calisphere offered innovative approaches to the display and organization of unique digital resources. For more information about the development of Calisphere in response to teacher needs, see Isaac Mankita, Ellen Meltzer, and James Harris, “A Handful of Things: Calisphere’s Themed Collections from the California Digital Library,” D-Lib Magazine 12, no. 5 (May 2006), www.dlib.org/dlib/may06/mankita/05mankita.html. Despite its initial orientation toward a teacher audience, the Calisphere interface has proved compelling to a broad spectrum of researchers at all academic levels and from all walks of life. For an analysis of user data and demographics collected on the site, see Sherri Berger, “OAC and Calisphere Assessment 2011–2012: Executive Summary and Report,” (June 2012), www.cdlib.org/services/access_publishing/dsc/calisphere/docs/oac_calisphere_assessment_summary_report_2012.pdf. Subsequent data collected over the past two years has reaffirmed the finding that Calisphere users are wide-ranging in their experience and research objectives.