The fact that books in research libraries are usually shelved by subject, rather than by accession numbers (i.e., the order in which the books are received) or by the height of the volumes, gives researchers a major advantage in gaining subject access to their contents—one that, in many cases, cannot be matched by any computer searches, even if the same texts are available in electronic formats. This advantage, however, is nowadays in jeopardy from many library administrators who often assume that the shelving of books by subject is no longer necessary “in a digital age.” Much of this view stems from the library profession’s difficulty in failing to distinguish between the desirability of providing more content in digital forms, on the one hand, and, on the other, the increasing need to provide multiple avenues of access to that content—avenues beyond the mere typing of guessed-at keywords into a blank search box. Contrary to the many assertions by “single search box” advocates, real researchers have persistent problems in attempting to specify in advance all of the relevant search terms that will produce the retrievals they desire. In the real world, people need not just prior-specification search mechanisms but also avenues of access that enable them to recognize what they don’t know how to ask for.
It is indeed possible to shelve books in arrangements other than by the traditional subject categorizations of the Library of Congress or Dewey Decimal Classification systems. A library could, for example, simply arrange its volumes in the order of their acquisition. In this case, catalogers would then have only to assign sequential whole numbers to the books (1, 2, 3, …). Such a system would be capable of storing an infinite number of items, and, as long as the number that appears on the computer record corresponds to the number on the book, readers who find the catalog entry would then be able to locate its corresponding volume on the shelves. The library would save thousands of dollars every year if this scheme were used, since it would require professional catalogers only for describing the books and devising subject headings for the catalog records and not for also creating systematic call numbers with intricate subject relationships to each other. It would especially save money, too, in preventing the need for the redistribution of already-shelved books caused by unanticipated bulges of growth in particular subject classes. In a whole number or “dummy number” system, the only area that needs room for growth is the very end of the sequence. No empty spaces need be left on any shelf but the last one, because no new books would ever be interfiled with those already in place—incoming books would be shelved only at the end of the sequence, simply in the order in which they happen to arrive.
Another possibility is that the library could shelve books strictly according to their height—all 6-inch-tall books together, all 10-inch books together, and so on. If this were done, then the vertical distance between bookshelves could be adjusted precisely so that there would be no wasted space above the volumes cause by height differentials. Given that there are miles of books in any large library, this expedient would enable shelving to be much more space efficient, which would save money and create room for larger collections. Such systems are used routinely by many academic libraries in their remote storage facilities—i.e., the warehouses (usually off-campus) built to hold the overflow books for which there is no longer any space available in the main library buildings themselves. The books in these warehouses are usually placed in boxes, with all books in any one box being of exactly the same height; and the boxes themselves are stored on shelves dozens of yards higher than any that could be reached by unaided human researchers. As long as the number on the computer catalog record for the book is linked both to the barcode on the book itself and to the barcode of the box in which it rests, the actual retrieval from the box can be accomplished by cherry picker mechanisms, either human or robotic.
Either of these shelving schemes would be much less expensive to hard-pressed library budgets than the traditional practice of maintaining books in a subject-classed arrangement. So why is the latter still desirable when cheaper alternatives are available? Why shouldn’t compact storage techniques be used not just in offsite warehouses but within the main library buildings themselves? Further, why couldn’t massive digitization projects such as Internet Public Library (ipl.org), HathiTrust.org, or Google Books replace rather than merely supplement onsite book collections shelved in classified order? What difference does the method of shelving make to the researchers who have to use the books?
A very real problem with books shelved in these configurations is that any (and all) access to them requires skilful prior use of the online public access catalog (OPAC), and the OPAC searches only superficial records representing and “pointing to” the books (which are shelved elsewhere); it does not search the actual book texts themselves. No catalog record of a book can contain all of the information in the book—the catalog record enables you to see only the subject(s) of the book as a whole (at scope-match level), not its individual pages that contain much more extensive and specific information. Moreover, you must specify the right search terms to begin with—and at that “whole-book” level designated by only a few subject headings—when sometimes you may really need in-depth access to the individual pages of the book(s). (In some cases—not nearly all—you may also be able to search the books’ tables of contents transcribed on the catalog records.) Even if the OPAC allows you to search its catalog records in class-number order, that is not at all the same thing as searching the actual books arranged in subject groupings on shelves.
When the storage of the books, by height of volumes or by sequence of acquisition, makes no attempt to shelve books on the same subjects next to each other, then there is no possibility of quickly browsing the full texts of related works in close physical proximity. In a height system, if one book on anthropology is 6 inches tall and another is 10 inches, they may be shelved on entirely different floors; in a sequential system, if one book came into the library a year after the other, they may be separated by hundreds of feet of cookbooks, car repair manuals, and Gothic novels.
One of the major advantages of a classified arrangement of actual books in a research library is that classified shelving enables you to simply recognize relevant works whose titles, keywords, or subject headings you could not think of in advance in using the OPAC. Such shelving allows for—indeed, positively encourages and enables—discovery by serendipity or recognition, and at a full-text (not scope-match or “snippet”) level. (I am assuming here that you are fortunate enough to be working in a research library that has open stacks; not all of them do.) The value of such discovery may be incalculable for any given search. One historian of prison labor, for example, found through stacks browsing the only known image of prisoners on a treadmill in the United States. “Neither the book title, nor the call number, nor the author led me to this report,” he commented. “Only a hands-on shelf check did it.”1 In a similar manner, I once found an illustration of a slave coffle—a line of shackled slaves being marched under guard—very different from one that is widely reproduced, in a book I noticed next to another volume that I was actually looking for. This illustration proved to be very welcome to a historian writing a book on slavery. The presence of this particular illustration is not indicated by the book’s catalog record in the computer, nor does the word “coffle” appear as a searchable keyword in the illustration’s caption.
With access to books shelved by subject, you can do focused examination of contiguous, subject-related full texts—that is, you can do deep searches of not just tables of contents and back-of-the book indexes, but maps, charts, tables, illustrations, diagrams, running heads, highlighted sidebars, binding conditions, typographical or color variations for emphasis, bulleted or numbered lists, prefaces, acknowledgments, forewords, footnotes, and bibliographies. You can search individual pages and paragraphs, and even spot the particular words you want within them, all within readily recognizable conceptual contexts.
For example, I once had to answer a letter from a historian seeking information on traveling libraries that circulated among lighthouse keepers at the turn of the twentieth century. These were wooden bookcases, each with a different selection of books, that were rotated among the tenders in order to relieve the boredom and monotony of their isolated lives. I first tried searching the computer catalog of the books at the Library of Congress—with no luck. Even after searching several commercial databases covering journals and dissertations (including the largest commercial index to American history journals) I still found nothing—only, occasionally, the right words ([lighthouse* OR light-house*] AND [book* OR librar*]) in the wrong contexts.
So I decided to look directly at the books on lighthouses in the library’s bookstacks. The major grouping for this topic is at VK1000-1025 (“Lighthouse service”); this area had, by a quick count, 438 volumes on 12 shelves. I rapidly scanned all of this material—literally paging through, quickly, all of the volumes.
I found 15 books that had directly relevant sections—a paragraph here, a half page there, a column elsewhere—containing descriptions of the book collections, reminiscences about them, official reports, anecdotes, and so on. I also found another 7 sources of tangential interest—on reading or studying done in lighthouses, but without mentioning the traveling libraries—and photocopied these, too, for the letter writer. The primary 15 contained a total of about 2,100 words on the traveling libraries, including a partial list of their titles.
Particularly noteworthy is the fact that, of the 15 prime sources, not one mentioned the libraries in its table of contents, and 9 of them (60 percent) did not mention the libraries in their index, either—or did not even have an index to begin with. Equally noteworthy is the fact that 13 of the sources were twentieth-century publications—9 of them published after 1970—and thus still under copyright protection.
Now it is entirely true that a Web search on lighthouse libraries will indeed retrieve much relevant material quickly—but, for the most part, it is not the same material. The bookstacks hold the copyrighted sources whose full texts are not searchable on the Internet.
This information on lighthouse libraries could not have been found even if the books’ tables of contents and indexes had been entered into an online catalog. This level of research depth in book collections can be achieved only by focused browsing: inspecting the actual full texts in a systematic fashion, not just looking at any surrogate catalog records in an OPAC, no matter how detailed.
With the classification scheme’s arrangement of books, however, the needed information could indeed be found both systematically and easily. Retrieving, via call slips or online requests, 438 books scattered in storage by random accession numbers nowhere near each other would be so time consuming and difficult as to be effectively not possible in the real world in which actual researchers must work.2 And determining in advance which 15 had the right information from the catalog would also be impossible—the catalog records just do not contain that depth of information. And so a reader who had to guess in advance which 15 of the 438 volumes had the right information would necessarily miss most of what the library actually had to offer—information that is readily retrievable as long as we remember that focused browsing is an alternative search method for doing full text searches, other than using digitized full-text databases.
Note in this case that even searching by class numbers in the catalog is not the equivalent of searching by class area in the actual bookstacks. Even if the OPAC allowed for the construction of a list of the same 438 books, it would still not be possible to determine from that list of superficial catalog records which 15 books contained the needed information buried down at page and paragraph level. The use of subject-classified arrangements of full texts—printed books arranged in subject groupings on bookshelves in libraries with walls—thus provides a depth of subject access that cannot be matched by any computer searches of mere surrogate catalog records. “Depth” refers to the parts of the books that are searchable on the shelves but not in the OPAC: not just tables of contents and indexes, but maps, charts, tables, individual paragraphs, etc.
Note further that there was also a trade-off: in browsing only the VK1000-1025 area I was missing other classes on other aspects of lighthouses, e.g., TC375-379 (lighthouse technology), KF26-27 (multiple congressional oversight hearings), and Z6839 (subject bibliographies). The range of these aspects, however, was discoverable through use of the LCSH heading Lighthouses in the online catalog. While the classified shelving arrangement provides full-text depth of access to books within particular classes, it lacks the uniform headings, cross-references, and browse menus of the OPAC that show the range of all of the relevant classes.
While focused browsing of full texts in classified order on bookshelves can thus be contrasted to OPAC searching of the books’ catalog records—i.e., depth vs. range—it can also be contrasted to keyword searching in those many databases or websites (e.g., Google Books, Hathi Trust, many commercial databases) that do indeed contain full texts. There is another major trade-off here, too: recognition access (in bookstacks) vs. prior-specification access (in full-text sources online).
For example, I once received a rush request from a librarian at the Supreme Court library that one of the justices needed to confirm the statement that “the United States occupation zone in Germany after World War II encompassed 5,700 square miles and a population of over 18 million people.” I first tried the subscription databases America: History and Life and Historical Abstracts (two of the best sources covering history journals) in hopes that someone had written something concise on the occupation zone, but the results were much more diffuse than I wanted. So I tried our online book catalog. Just as an initial stab I did a keyword search of “occupation” and “zone” and “Germany,” with a limit on the search that I wanted only records published between 1945 and 1947. Within the 145 records that came up, I spotted one pretty quickly that had a formally established corporate name on it: Germany (Territory under Allied occupation, 1945–1955: U.S. Zone). Office of Military Government. When I searched on this standardized term I found a very focused pool of records. There were 18 hits; one of them had the word “population” in its title.
Since this was a rush request, I immediately went back to the bookstacks to look at this one pamphlet. This initial item did indeed have population figures for the American zone in 1947, but no square mileage figure. Right next to it, however, was another report that had a 1946 population figure—17,174,367—close to the “18 million” in the original inquiry, but obviously being a very different keyword character string. And it also had an area figure for the American zone—but in square kilometers, not in square miles. That was no problem, as the figure could easily be converted. The significant point, here, is that the chart providing the area figure did not say “square kilometers” written out—it said simply “sq. km.”
The equally significant point is that this particular pamphlet is indeed digitized in Google Books, but, even so, I could not find it there. If you search Google Books for the three words with which I started my own search in LC’s online catalog, namely “occupation” and “zone” and “Germany,” and limit to publications between 1945 and 1947, you got (at the time) 653 hits; the exact pamphlet I found in the stacks showed up as the 307th item in the Google “relevance-ranked” display. (I could find it in the list only because, at that point, I already knew the precise source I was looking for, and I simply scrolled through the whole list until I spotted it. The number and the ordinal position are those as of March 2008; Google displays, however, change not only from one day to the next but also, frequently, from one minute to the next. A search in Google Books for the same three keywords in July of 2013 produced 889,000 hits.)
I cannot emphasize the following point enough, because it is so strongly counterintuitive to theorists who do not actually have to do such searches or find such information themselves, and yet it is nonetheless true: you cannot “progressively refine” such a set of 653 items (let alone 889,000) down to the right pamphlet by simply typing in extra keywords. Why not? Because the terms “18 million” or “square miles”—the keywords contained in the justice’s question—are not the words that actually appear in the table between paragraphs 2 and 3, on page 6 of that pamphlet; nor do they appear anywhere else. In order to do “progressive refinement” you have to know in advance which exact words will produce the refinement you seek, and it is precisely that knowledge that we lack when we are moving around in unfamiliar subject areas. In fact, I could not get the relevant table to show up at all, even in snippet form, even after I had discovered the right keywords (via stacks browsing), in spite of the fact that I could view other snippets from the same pamphlet. The Google software is such that it won’t show you every snippet containing the words you type in; and the company is playing it safe, legally, in not providing full-text views of post-1922 works, such as this occupation zone pamphlet.
The point is this: even if the Google keyword search software would display every instance of every word asked for, I still would never have known in advance the precise keywords (like “sq. km.”) that I needed to type in—I would have typed the phrase “square miles” written out, because it would not have occurred to me to think in terms of kilometers, let alone in terms of abbreviations. (The same point applies not just to Google Books but to any full-text database or website.)
The fact that the pamphlet is digitized therefore does not mean that it is easily accessible online—quite the contrary: it is not findable because Google’s keyword search mechanism does not provide adequate access to it.
By using the classified bookstacks, however, I employed a different search technique for for full-text searching—a technique that enabled me to recognize what I could not specify in advance in a blank search box. I could find the source I needed because it was physically right next to the one that I started off looking for—and the one I was looking for was itself one of only 18 records, not one of 653. And I could skim both that initial full text—down to the level of its individual tables—and the one right next to it quickly, precisely because they were physically shelved right next to each other, within a limited class—I had only a very small contextual range of materials (less than one shelf ) to inspect.
Classified bookstacks thus allow researchers to find through recognition within full texts what they don’t know how to ask for: we can look not just at tables of contents (which can sometimes be included in OPAC records), but also maps, charts, tables, illustrations, etc., most of which cannot be digitized at all (for copyright reasons). Moreover, we can examine all these features within limited physical shelf areas—we won’t have hundreds of thousands of electronic records to wade through, most of which have relevant words in irrelevant contexts. Such quick and focused browsing provides deep access via recognition in ways that digitized libraries of the very same texts do not.
Another example: an editor compiling a handbook of miscellaneous facts on library history needed to identify which book was the first one ever to be printed in French. I tried numerous full-text subscription databases and Internet sites; in the French sources I searched for “premier livre” (first book) combined with “langue française” (French language). After finding only conflicting and incorrect answers, I finally did some focused browsing in the library’s bookstacks in the Z103 (Bibliography) shelves. I quickly found a 1927 Manuel du Bibliophile Français (1470–1920), which identified the volume Reçueil des Histoires de Troyes (1466)—and this Manuel cleared up what had been a major point of confusion: although this was the first book printed in the French language, it was not printed within France itself; it was produced in Cologne, Germany. My point here, however, is that the relevant passage in this bibliography refers to it as “le premier livre dans notre langue” (the first book in our language). In searching full-text databases it never occurred to me to type in “notre langue” rather than “langue française.” (The Manuel itself, as of this writing, is not digitized in Google Books.)
It is especially noteworthy that any proposed use of Google Books, Digital Public Library of America, or Hathi Trust to replace (rather than supplement) classified bookstacks would entirely segregate foreign language materials into multiple electronic “zones” that could not be searched simultaneously by the specification of English keywords. With classified bookstacks, on the other hand, books in all languages are grouped together by subject in the same physical locales; often an English-speaker (such as me) can simply notice relevant foreign books on a topic simply because they are shelved in the same classification areas as the English works. (I would not have thought of the title word Bibliophile, rather than Bibliographie, if I’d had to specify that, either, in a blank search box; but I could immediately recognize its relevance when I saw it on the shelf.) Some enthusiasts of Web e-books would thus unwittingly re-create in reality the disastrous consequences mythologized in the Tower of Babel story—scholars relying on such sources alone could not retrieve together the relevant works in multiple languages on their subjects.
The distinctions discussed above should be of particular concern to Faculty Library Committees, as they may be the only ones in a position to stop library administrators from sending too many books offsite “because they are in Google Books.”
Both “depth” access and “recognition” access come into play simultaneously in the need to maintain browsable book stacks. Focused browsing of subject-grouped books enables researchers to find very specific information only accessible at full-text levels that are not present in OPAC records; it enables recognition access to terms or other elements (e.g., illustrations) within those full-text books that cannot be specified in advance in Google-type searches; and it enables those full-text/recognition searches to be done within manageably limited ranges of books, rather than within excessively granular sets of thousands of texts retrieved by online keyword searches. It also enables full-text searching to be done (down to individual page levels) even within books—especially those still under copyright protection—that have not been digitized at all.
It is in regard to these concerns that library administrators frequently have “blind spots”: since they often don’t use libraries for research projects of their own, they don’t grasp the need for recognition access of terms at the full-text level provided by browsing. They naïvely assume that if all of the words in a book are searchable in full-text databases, then all of the search terms needed to find those books are also specifiable in advance. They are not. In Google-type keywords searches, highly relevant works are often missed because the best search terms cannot be guessed at, and they are also missed because even specifying the right terms can produce thousands of unwanted “noise” retrievals.
The result is that too many books needed for scholarship get sent off to remote storage warehouses, thereby precluding not only all depth access to nondigitized texts, but also all recognition access to any texts—both those with digital counterparts and those that are not digitized at all.
Although historians, anthropologists, biographers, linguists, and others have frequently experienced the advantages of direct access to classified bookshelves, almost no one bothers to write down the specific examples of the successes it generates. It just takes too much time and too many words (as above!) to explain why it is crucially important when all other methods of searching fail. Most academics themselves cannot articulate the difference between recognition vs. prior specification access, and yet it is the facile dismissal by library administrators of the importance of recognition mechanisms, especially at deep (page, paragraph, sentence) levels, that is especially galling to the researchers who most need it. This dismissal is usually done with the patronizing air that prior-specification keyword retrieval in full-text databases can now “replace” classified shelving, that most physical books (even the copyrighted and nondigitized) can therefore be sent to offsite storage, and that advocacy for retaining a noncomputerized access mechanism to full texts—focused browsing of physical books—must be “sentimental” rather than rational. Until recently, scholars could simply assume that no research library administrator would even think of undermining the practice of shelving books by subject. Unfortunately, that assumption is no longer a safe one—the abandonment is being actively promoted by bean counters who overlook the very real trade-offs among the different search techniques for finding the beans.
The larger principles of browsing for information have applications beyond the use of library bookstacks. Another situation, for example, is that of using primary sources in archives or manuscript collections. Primary records are those generated by a particular event, by those who participated in it,or by those who directly witnessed it; and they are usually unpublished. Thus, a researcher interested in World War II propaganda would be interested in such primary sources as copies of leaflets dropped from airplanes, typescript accounts of the flights written by those who planned or flew them, and firsthand accounts of civilians on the ground who found the leaflets. Secondary sources are the later analyses and reports written by nonparticipants, usually in published literature—although a published source can itself be primary if it is written by a participant or a witness or if it directly quotes one. Many collections of primary manuscripts exist on an incredible array of subjects and can be identified through the sources identified in Chapters 13 and 14. However, such collections are more often than not poorly indexed or not indexed at all and are not arranged by subject. In such cases, researchers must simply browse through the material to see what’s there.
Similarly, “focused browsing” might be the term applicable to direct inspection of particular sites or limited physical areas. For example, genealogists may wish to know where, exactly, a certain ancestor is buried in a cemetery, and existing maps of the area may not be detailed or complete enough to show this level of information. Similarly, private investigators looking into an automobile (or other) accident case often need to find eyewitnesses; to do so, they physically go to the site of the accident, preferably at the same time of day as the incident, and knock on doors or talk to those habitually in the area. In such cases, direct examination of the specific site is necessary. The principle is the same, however: if you don’t know exactly where the needed information exists, put yourself in a situation or a physical area where it is likely to exist and then look around so that you can recognize valuable clues or indicators when you see them.
Some of the major themes of this book are that a variety of search techniques can be used to find information, that each of them always has both advantages and disadvantages, and that no one of them can be counted on to do the entire job of in-depth research. What is required is usually a mixture of approaches so that the various trade-offs can balance and compensate for each other. My observation, however, is that in this age of proliferating Internet resources, the research techniques of general and (especially) focused browsing of printed books on library shelves, for both depth penetration and recognition access, tends to be overlooked by researchers who do not understand the limitations of the Web—in both content and search mechanism capabilities—or of computer databases in general. While it is certainly true that browsing bookstacks is very seldom the first search technique to be employed in attacking any research problem, the fact remains that the vast bulk of humanity’s memory contained in post-1922 books does not exist digitally in readable forms (beyond “snippet” levels) and probably never will without many decades of delay (for copyright reasons). Further, Internet search mechanisms, which require prior specification of keywords and then merely rank rather than conceptually categorize the results, cannot provide access to much material relevant to a topic in a way that allows for recognition, at full-text levels, of what cannot be specified in advance. Researchers who neglect the direct inspection of the full texts of books arranged in subject groupings on library bookshelves are missing a vast store of information that cannot be retrieved in any other way.