Chapter 5

Keyword Searches

The various databases discussed so far are of the controlled vocabulary type. The advantages of databases using standardized search terms (subject headings or descriptors) is that they solve the problems of synonym variations (e.g., “death penalty,” “capital punishment”), of foreign-language terms (“peine de mort,” “pena capitale”), and of relationship connections to other topics (through BT, RT, and NT cross-references, alphabetical menus of search terms, online browse displays of subdivisions, subject tracing fields within full record displays, and linkages of headings to classification numbers). The conceptual grouping function of controlled vocabularies saves researchers the considerable trouble of having to search using a wide variety of terms for material on one subject, and the associative function (referring to cross-references and browse menus) maps out formal paths not only to preferable search terms but also to multiple related topics and to unanticipated aspects “off to the side” of the same topic.

Disadvantages of Controlled Vocabulary Searches

There are, however, corresponding disadvantages to any controlled vocabulary system. First, the grouping function is sometimes achieved at the expense of blurring fine distinctions between subjects. Several years ago, for example, a reader was interested in the idea of “patients actively participating in the therapeutic process”; at the time, this was a new field of interest in the medical profession. The only subject heading available was Patient compliance, which is not the same thing as active participation; nonetheless, this term was used to include the latter idea until enough of a literature grew up that a new heading, Patient participation, was created to deal with it. Similarly, I once helped a reader who wanted books on “subfractional horsepower electric motors.” I showed her the LCSH heading Electric motors, Fractional horsepower, but she insisted that she wanted only subfractional and not fractional. When we looked at the entries under the Fractional heading, however, we saw that it included works on subfractional motors. Evidently the cataloger, seeing no separate heading in the list for “Subfractional,” simply chose the closest heading that did exist. This often happens—especially if the cataloger is not in a position to notice new works being written on the narrower topic.1 Distinctions that are important to subject specialists may not be perceived as important by library catalogers, especially at their first appearance, so if you wish to retrieve library materials within the subject heading system you must use the terms chosen by the catalogers. LCSH headings thus often include subject areas that are not precisely indicated by the terminology of the heading. (If there is any doubt about what a heading means or includes, simply call up the list of records cataloged under it; the retrieved titles and note fields will clarify its scope of coverage.)

Second, a controlled vocabulary system cannot get too specific within one subject without losing its categorization function. This is particularly true of a book catalog, which seeks to summarize the contents of works as a whole (rather than indexing individual parts, sections, chapters, or paragraphs). Thus the researcher who was looking for material on the dental identification of Hitler’s deputy Martin Bormann could find a general heading on Bormann, Martin, 1900–1945 but not a precise one on “Bormann, Martin, 1900–1945—Dental identification.” Similarly a researcher looking for “effects of wing design on reducing heat stress at supersonic speeds in military aircraft” will not find a controlled vocabulary term that is nearly as precise as he would wish.

Note, however, that many headings sometimes referred to as “orphans”—those that are applied to only one book—are in fact parts of larger categories, in spite of apparently retrieving only one item. For example, the headings Church work with cowgirls and Church work with employed women (as of this writing) both point to only one book apiece in the catalog of the Library of Congress. And yet these headings themselves are parts of a larger group, created by the appearance of both headings within an OPAC browse display of alphabetically adjacent related terms:

Church work with abused women
Church work with alcoholics
Church work with cowgirls
Church work with disaster victims
Church work with divorced people
Church work with employed women
Church work with families
Church work with immigrants
Church work with people with disabilities
Church work with tourists
Church work with women
Church work with youth

The full menu of related terms extends to hundreds of such phrases, many with further subdivisions. LCSH terms that are orphans in terms of their conceptual grouping function are seldom orphans in terms of their associative or relationship-mapping function.

A third difficulty with controlled vocabularies is that, by nature, they are relatively slow to change. The reason is that a cataloger cannot simply insert a new term into the LCSH list without integrating it with the existing terms through a web of cross-references that must extend from the new word to others, and to the new word from the others. These cross-references have to include broader (BT), related (RT), and narrower (NT) relationships defined in both directions. In other words these links have to be created not just “out” from the new term alone. The other existing terms have to be modified too, to show their own links back to their new neighbor. Any new term may also have to be linked to particular classification numbers. This intellectual work—again, it’s sort of like creating, or extending, a huge crossword puzzle that has been expanding for more than a century—takes considerable time and effort, so catalogers are cautious about acting too quickly. (This is one major difference between subject cataloging and tagging—tag terms can be applied by anyone at all with no thought for either their standardization or their relationships or formal linkages to other terms.) They find it is often advisable to wait until a new subject has achieved a recognizable critical mass in determining its own vocabulary. If your topic is in a new field, then, or is of recent development—or if it was a kind of “flash-in-the-pan” academic fad—its terms may not appear in the system. For example, you will find an established heading for Behaviorism (Psychology) but not for “neobehavorism.”

Fourth, the formal cross-references of a controlled vocabulary system may not, by themselves, be adequate to get you from the terms you know to the heading that is acceptable. In Psychological Abstracts, for instance, articles on the psychological problems of hostages were indexed for many years under the term Crime victims. A cross-reference from “Hostages” to this heading was introduced only in 1982, but Hostages did not become a descriptor in its own right until 1988. (Note, however, that in the LCSH system cross-references are only one of five mechanisms that enable you to get from whatever term you think up to the controlled term[s]; see Chapter 2). Many of those who disparage controlled vocabularies seem to be unaware that the networks relating terms to each other include many more linkages than just cross-references.)

In spite of such disadvantages, however, researchers must also keep clearly in mind the fundamental advantages of controlled vocabulary sources, specifically those of the Library of Congress Subject Headings system: uniform heading, scope-match specificity, and specific entry (see Chapter 2). The combination of these principles produces the characteristic of plenitude in a controlled catalog: an ability to show users, in a systematic manner, many more relevant options for researching their topics than they are capable of articulating beforehand. Conceptual categorization of materials allows researchers to simply recognize, within the conceptual groupings, works whose individual titles (or other keywords) they could never think up beforehand. Further, linkages, subdivisions, and adjacency mappings extend the fields of recognition possibilities in a systematic manner that minimizes guesswork while at the same time eliminating excessively granular retrievals of thousands of irrelevant records.

“Researchers Accustomed to Google Don’t Use Subject Headings”

It is sometimes maintained today that subject headings (or descriptors) are no longer necessary because most researchers are accustomed to Google keyword searching and don’t use subject headings. I fully agree with the latter part of the statement, that most researchers don’t use them, but it is incorrect to conclude that therefore they are no longer necessary. The fact is—and I see this on a daily basis—they continue to solve too many very real, and very serious, research problems for those who know of their existence. Students who are shown the right subject headings for their topics—and how to find them—are immensely grateful. They do indeed appreciate the advantages of searching by controlled terms once those advantages are pointed out to them—and once they are shown what keyword searching is not doing for them. But no one learns what controlled vocabularies can do without some prior (or point-of-use) instruction on what they are and how they work. Instruction classes that focus entirely on how to do critical thinking usually don’t notice, themselves, the crucial distinctions between controlled vocabulary vs. keyword searches. The philosophy of the late Apple chairman Steve Jobs, that consumers don’t know what they want until it’s put in front of them, is very relevant here. (The library field, unfortunately, has a habit of simply following Google rather than focusing on alternatives to it that work much better in the niche areas libraries must fill.)

Problems with Keyword Searches

While is it indeed true that most researchers don’t know how to use subject headings or descriptors, it is equally true that most researchers don’t know how to do effective keyword searches either. This may sound strange or counterintuitive, but it is nonetheless quite true, for several reasons.

First, most researchers do not realize that the keywords they type in are not categories. I’ve already provided multiple examples in Chapter 2, but let me add another here: a senior researcher who was working on “technological innovation in the low countries” showed me the printouts he had made on his own. It was evident that he had typed in the keyword term “low countries” in the assumption that doing so would also include everything relevant on Belgium, the Netherlands, and Luxembourg without any further specification. He didn’t realize that the databases he used were simply looking for the character string “l-o-w [space] c-o-u-n-t-r-i-e-s.” They were not picking up the names of the particular countries since those are formed by different character strings. Nor did he realize that he was getting only English-language sources, because the English character string misses the French and German character strings “Pays-Bas” and “Niederlande”—as well as “Belgique” and “Belgien” (among other language forms for the individual countries). In effect, he assumed that his keyword search term was an inclusive conceptual category, rather than a precise letter string—because, in his own mind, it was an inclusive conceptual category. This is a very common mistaken assumption among all researchers who do only keyword searches. They think their search terms include much more than they actually do.

What is particularly harmful to good research here, beyond the basic confusion of categories vs. character strings, is that the “categories” people ask for—or think they are asking for—are usually much too broad to begin with. Researchers routinely ask for what they think they can get rather than for what they really want, and so they (mistakenly) phrase their keyword inquiries in very general terms. Actually, however, specific searches work better than general searches. The result is that the sets of records they retrieve, right from the start, do not contain the “on-target” records they really want, and progressively limiting such bad initial sets (via additional keywords, “facets,” or any other mechanism) will do nothing to steer them to other sets that would be preferable—i.e, you can’t narrow down to the best sources if you’re looking within the wrong sets to begin with. You need cross-references to better search terms that will produce better initial sets rather than limiting mechanisms applied to the wrong sets.

A further problem is the most people who are accustomed to Google keyword searches do not know about use of quotation marks, word truncation, proximity searching, use of parentheses, or differences between Boolean AND, OR, and NOT operators—all of which can greatly modify the results of keyword inquiries (see Chapter 10).

Most researchers frame their questions within what might be called a “horizon of expectations”—i.e., a set of assumptions of what, or how much, information they are likely to find. If, then, they subsequently find any information at all that falls within this circumscribed set of expectations, they will go away satisfied—even though they may well have searched the wrong databases (or not enough of the right ones), typed in the wrong search terms, and assumed that “relevance ranked” results show all of the best material that is available. This is what I mean in saying that most students are in the situation of the Six Blind Men of India in researching the elephant. (This is also my major criticism of automated systems that steer students toward a single search box, with the snake oil come-on that “under-the-hood programming” will automatically provide them with the best results, no matter what terms they type in.)

While it is true, then, that most people don’t understand subject headings, it is equally true that that very same people also don’t understand keyword searching—and that they are also not given any instruction on the major differences of the two search techniques. I’ve emphasized the point so much because, in my decades of working with tens of thousands of researchers, I’ve found this distinction to be absolutely crucial do doing effective research—and yet it is seldom taught to anyone other than librarians. And too many librarians themselves, these days, seem to be losing sight of it, particularly in teaching classes on how to do research. I don’t mean to minimize the importance of the focus of many such classes on critical thinking, but I do mean to emphasize that it doesn’t much matter how well you can do critical thinking in evaluating websites (or other sites) if you’ve typed in the wrong search terms to begin with. Right from the start you will be radically skewing the range of resources that you perceive, as did each of the Six Blind Men. (To anticipate upcoming chapters, there are still other “blind spots” overlooked by both keyword and controlled vocabulary searches; they show up when compared to the results achieved by citation searches and related record searches, use of published bibliographies, talking to people, and so on. All of the latter, alternative search techniques will be explained in due course.)

When all is said and done, then, keyword searching necessarily entails the problem of the unpredictability of the many variant ways the same subject can be expressed, both within a single language (Huron Indians, Wyandot Indians) and across multiple languages (Venice, Venizia, Venidig). And no software algorithm will solve this problem when it is confined to dealing with only the words retrievable from the given documents (or citations or abstracts) themselves, which contain only the various authors’ own wordings.

A second problem with keyword searching, no matter how skillfully it is done, is its likelihood to retrieve the right words in the wrong conceptual contexts. This point needs no further elaboration—anyone who has done Internet searches runs into this problem. And it cannot be solved by computer algorithms; indeed, it is their operation that is causing the problem.

A third problem with keyword searching, especially in full-text databases or websites, is that of excessively granular retrievals—i.e., such searches often, if not usually, produce results numbering in the tens of thousands of hits, including pages that simply mention the desired keywords in references that are tangential or superficial. Keyword searching in full-text databases misses the scope-match level of specificity (providing whole books on a topic)—it frequently “overshoots the mark” in a way that buries many useful hits within way too much unwanted chaff.

Major Advantages of Keyword Searching

Nonetheless, just as with controlled headings, there are major advantages to keyword searching as well as drawbacks. The advantages show up precisely in the area where subject headings don’t work well. The fact is, there will always be many topics that simply fall between the cracks of any subject heading or descriptor system. No thesaurus has a term for “managing sociotechnical change,” for example; but a keyword search of the exact word “sociotechnical” combined with “manage*” or “plan*” (the asterisk being a truncation or word-stemming symbol) will turn up multiple hits, in several databases, that are directly relevant. Similarly, there’s no adequate subject heading for “Elfreth’s Alley” in Philadelphia, or for the “Edenton Tea Party” in colonial North Carolina, or for “nadaismo” (a literary and political movement in Colombia), or for “water meadows” in Great Britain, or for “memsahibs” (wives of British colonial officials in India)—and yet in each case the research problem could be solved by keyword searching for these exact terms. The nature of these subjects is such that there aren’t multiple different vocabulary terms for them.

The basic trade-off between keyword vs. controlled vocabulary searching is that of precision, on the one hand, and predictability, conceptual categorization, relationship mapping, and plenitude, on the other. The latter considerations come into play especially when the subject cannot be specified cleanly in precise keywords.

Although the databases discussed in Chapters 2 and 4 are controlled vocabulary types that employ subject headings or descriptors, all of them can also be searched by keywords appearing anywhere within their titles or abstracts (or, in some cases, full texts). And most can also be searched by combinations of both controlled terms and uncontrolled keywords in Boolean combinations (see Chapter 10).

The databases discussed below, unlike those in Chapter 4, are primarily searchable by keywords alone. This point applies especially to the many full-text databases. When searching such resources, you are essentially combing them for the words used by the authors of the individual works, who were writing their texts with little or no regard as to whether their peculiar terminologies or turns of phrase would be similar to those of any other writers on the same topics.

Major Keyword Databases

The following databases are available through a variety of different vendors. Some search title keywords within citations; others search at the level of abstracts; still others provide full-text depth. The list is by no means exhaustive, but these are some of the most important keyword-searchable resources that most scholars need to know about.

Web of Science, a subscription service from Thomson Reuters, is a combination of several separate indexes, Science Citation Index Expanded (1900– ), Social Sciences Citation Index (1900– ),  Arts & Humanities Citation Index (1975– ), Book Citation Index (2005– ), Conference Proceedings Citation Index (1990– ), and Data Citation Index. They will be described more fully in Chapter 6. Libraries that subscribe can limit their access to any individual component—they don’t have to take all of them. They may further choose the years of coverage they want to pay for—say, for journal coverage, only back to 1990 or 1980. The roughly 13,500 journals selected for indexing are chosen in large part (although not exclusively) on the basis of how frequently they are cited by articles appearing in other journals. What this means is that Web of Science covers the cream of the crop of academic journals in all fields remarkably well. The indexing extends to citations and abstracts, not full texts. A particularly useful feature is that the titles of journal articles written in foreign languages are translated into English for keyword searching (although the original language of the article will always be clearly specified). In other words, you can do your searches only with English-language terms, although your results may then include citations to articles in languages other than English.

Note an important limitation: Web of Science focuses only on journals that have footnotes to begin with—it does not “see” important news or commentary magazines and journals whose articles lack this scholarly apparatus. Many of the more popular and influential newsstand-type publications are of this nonfootnote type, and so are not indexed here. Web of Science is a particularly good source, however, for cross-disciplinary searching since it covers so many subject areas simultaneously.

Scopus from Elsevier is a similar but even larger database, covering about 22,000 journals in all academic subject areas (see Chaper 6).

Periodicals Index Online, from ProQuest, indexes citations to articles in more than 6,000 journals in 60 languages going back to 1665, up to 2005 You can search for keywords appearing in the titles of articles (as well as in their authors’ names and journal titles), but not abstracts or full texts. Coverage is international, in English, French, German, Spanish, Italian, and most other Western languages. Unlike the Web of Science, this database does not translate foreign language article titles; as a practical matter, this means that you have to think up keyword synonyms and phrase variations in all of the languages in which you want retrieval. Boolean combinations, proximity searching, use of parentheses, and word truncation are allowed. Searches can be limited by language, by years of publication, and by any of 37 broad subject categories:

Agriculture, Ancient Civilizations, Anthropology/Ethnology, Applied Arts,

Archaeology, Architecture, Area Studies (Africa), Area Studies (Asia), Area Studies (Australasia), Area Studies (Europe), Area Studies (Middle East), Black Studies, Business/Management, Economics, Education, Fine Arts, Folklore, Geography, History (General), History (The Americas), Humanities (General), Jewish Studies, Law, Library/Information Science, Linguistics/Philology, Literature, Music, Performing Arts, Philosophy, Political Science, Psychology, Public Administration, Religion/Theology, Social Affairs, Social Sciences (General), Sociology, and Women’s Studies.

Although coverage is quite strong in social sciences and humanities, it is comparatively weak in the sciences. It is, however, another source that is particularly good for cross-disciplinary searching, through all of the 37 areas listed. One historian of slavery, for example, found here an article with specific data on the cost of transporting a slave from Baltimore to New Orleans; she hadn’t found such information elsewhere because the article appeared in an obscure economics journal rather than in a history journal. This database covers both fields simultaneously. Most indexes cover only recent decades of publication; the fact that Periodicals Index Online covers so many journals internationally over a span of more than three centuries makes it an extremely valuable resource.

Periodicals Archive Online is a companion database to Periodicals Index Online; the difference is that this one provides keyword-searchable full texts of articles back to 1802, but only from about 600 of the journals, of which about 140 are in languages other than English. (Approximately 50 new journals are added every year.)

19th Century Masterfile, from Paratext, includes a complete computerized version of a venerable printed source, Poole’s Index to Periodical Literature (1802–1906), which covers nearly 500 American and English periodicals. In addition to the Index itself, Paratext has augmented Poole’s with dates for all citations—not in the originals—and corrected all title abbreviations. The database, however, goes way beyond this, with some coverage back to the 1200s and forward to about 1930. The goal of the company is to include all relevant indexes to material published in English before 1930, with links to any full text of the source documents, wherever available. So far, links to more than 13 million full texts within other library subscriptions (e.g., JSTOR, American Periodicals Series, Hein Online, Accessible Archives, Google Books) or in freely available websites. (You won’t see the full-text links to subscription services if your local library doesn’t pay for them.) The database, up to now, has digitized and edited more than 70 indexes to nineteenth-century magazines, newspapers, books, U.S. patents, and government publications (both American and British). Among these are the following:

A.L.A. Index to General Literature (an index to book contents, 1893–1910)
A.L.A. Portrait Index (listing citations to 40,000 portraits of individuals before 1906)
Accessible Archives index of nineteenth-century American newspapers (1728–1922)
American Association of Law Libraries, Index to Legal Periodicals (1908–1935)
American State Papers (1789–1838)
Ames, Comprehensive Index to the Publications of U.S. Government 1881–1893
Annals of Congress (1789–1824), Register of Debates (1824–1837), and Congressional Globe (1833–1873)
ARTstor Digital Library (This is a subscription database containing more than a million digital art images from museums and photo archives. 19th Century MasterFile indexes the images and provides links to them if your library subscribes to ARTstor.)
Burlington Free Press (1848–1870)
Catalogue of the Public Documents of the 53rd to the 76th Congress (1893–1940)
Checklist of the United States Public Documents (1789–1909)
Cobbett, Parliamentary History of England, 1066–1803
Cotgreave, Contents-Subject Index to General and Periodical Literature (1850–1899)
Cumulative Subject Index to the Monthly Catalog of United States Government Publications (1895–1976)
Cumulative Title Index to United States Public Documents, 1789–1900
ERIC documents [education]
Farmer’s Bulletin index, 1889–1939
Galloupe, General Index to Engineering Periodicals (1883–1893)
Granite Monthly (1877–1930)
Greeley, Public Documents of the First Fourteen Congresses (1789–1817)
Hansard’s British Parliamentary Debates: House of Commons, First and Second Series 1803–1830
Hansard’s British Parliamentary Debates: House of Lords, First and Second Series 1803–1830 [further coverage of Hansard’s is planned]
Harper’s Magazine Index (1850–1892)
Harvard University Library Catalog (pre-1931)
Hickcox, Monthly Catalog of U.S. Government Publications (1885–1894)
Index of Patents Issued from the U.S. Patent Office (1790–1873)
Index to the Journals of the Continental Congress 1774–1789
Index to the Oregon Spectator (1846–1854)
Johnson, Descriptive Index to Engineering Literature (1884–1891) continued by Engineering Index to 1900
Jones and Chipman, Index to Legal Periodical Literature (1786–1922)
Library Journal Index (1876–1897)
Maclay, Sketches of Debate in the First Senate of the United States 1789–1791
Making of America journals (Cornell/Michigan; 36 titles indexed)
Messages and Papers of the Presidents (1789–1897)
New York Daily Tribune Index (1875–1906)
New York Times Index (1863–1905)
Palmer’s Index to the Times (London) (1880–1890)
Poore, Descriptive Catalogue of the Government Publications of the United States 1774–1881
Psychological Index (1894–1905)
Records of U. S. Congressional Serial Set (1818–1930)
Richardson, Index to Periodical Articles on Religion (1890–1899)
Royal Society, Catalogue of Scientific Papers (1800–1900) and Subject Indexes
St. Nicholas (1873–1928)
Smithsonian Institution, Annual Reports (1849–1961)
Southern Historical Society Papers (1876–1910)
Swem, Virginia Historical Index (1619–1930)
Wright, American Fiction (1851–1875)

The publisher of the database keeps looking to add other sources, in all subject areas, for the pre-1930 time period, so coverage will be increasing.

PERSI (Periodicals Source Index), from HeritageQuest, is an index to more than 6,500 U.S. and Canadian local history and genealogy periodicals going back to the early 1800s. Although most are in English, some French Canadian periodicals are covered. This is a particularly good source for really obscure topics in American history, and it serves as an excellent supplement to the major index in the field, America: History & Life, which does not notice most of these smaller-circulation journals. The researcher, for example, who was interested in the “Edenton Tea Party” in North Carolina in 1774 found nothing on it in AH&L but turned up 15 articles in PERSI. Biographical articles on individuals who participated in the Revolutionary wars often show up here, as do articles on the history of particular local buildings (e.g., taverns and inns), obscure sites (“Elfreth’s Alley” in Pennsylvania), or whole towns. My experience is that anyone who uses AH&L should also search PERSI (and vice versa).

Several databases are particularly good in providing full-texts of old periodicals and newspapers:

American Periodicals (ProQuest) is a collection of full texts of more than 1,800 American periodicals published between 1740 and the early twentieth century. It includes what used to be American Periodicals from the Center for Research Libraries, which provides more than 300 full texts with full-color scans documenting the emergence of color printing.

American Antiquarian Society Historical Periodicals Collection (EBSCO) is a full-text database of more than 7,600 American periodicals published between 1691 and 1877. Series 1 contains 500 titles between 1691 and 1820; Series 2 has over 1,000 titles from 1821 to 1837; Series 3 has over 1,800 titles from 1838 to 1852; Series 4 has over 1,200 titles from 1853 to 1865; and Series 5 has over 1,500 titles from 1866 to 1877. (Libraries can subscribe to the individual components without getting the full set.)

ProQuest Historical Newspapers (ProQuest) provides text-level searching of dozens of American newspapers, digitized all the way back to their first issues. Major titles such as the New York Times, Washington Post, Wall Street Journal, and Christian Science Monitor are found here, along with other papers from Boston, Chicago, Atlanta, Los Angeles, St. Louis, Detroit, Cincinnati, and Baltimore. The Times of India (1838–2002) and the Irish Times (1851–2010) are also covered (if libraries choose), along with four historical Jewish newspapers, and a Historical Black American newspapers component is also available.

America’s Historical Newspapers (Readex) provides full texts of more than a thousand newspapers from all 50 states and the District of Columbia, from 1690 to 1922. A companion database, African American Newspapers 1827–1998 provides 270 additional titles from 35 States. Readex also offers several full-text databases of foreign papers: African Newspapers, 1800–1922; Latin American Newspapers, 1805–1922; and South Asian Newspapers, 1864–1922. Libraries can also subscribe to a cross-searching combination database, World Newspaper Archive.

Accessible Archives (from Accessible Archives) provides full texts of dozens of American newspapers and periodicals, primarily from the period of the Revolution through the Civil War; it includes several major African American titles, as well as scores of county histories, written mainly between 1870 and 1900, for more than 30 states. Full texts of dozens of military unit histories are also searchable.

Chronicling America at http://chroniclingamerica.loc.gov/is a free website created by the Library of Congress; as of this writing it provides full texts of more than 1,000 newspapers from, at present, about two dozen states from 1836 to 1922. Many of the titles are represented by only one or two issues, however. This is an ongoing project. Bibliographical information about hundreds of other U.S. newspapers back to 1690 is also searchable. Other free websites having full texts of some nineteenth-century American periodicals are the two Making of America sites, from Cornell (http://digital.library.cornell.edu/m/moa/; offering only about three dozen periodicals), and from the University of Michigan (http://quod.lib.umich.edu/m/moagrp/; 13 periodicals). The latter sites include digitized books as well, although not nearly on the scale of Google Books. Google has yet another free site with texts of old newspapers, Google News Archive at http://news.google.com/archivesearch; and although the site has “millions” of articles it is not possible, as of this writing, to get an overview listing of which newspapers and which dates of coverage are included.

17th–18th Century Burney Collection Newspapers (Gale Cengage) includes full texts of over 1,270 titles, primarily from London, but also from English provinces, Ireland, Scotland, and the American colonies from 1603 to the early 1800s. It also covers broadsides, pamphlets, proclamations, Acts of Parliament, and even some books. 19th Century British Newspapers provides about 50 major titles from England, Scotland, Ireland, and Wales. The two databases are available in a merged file called British Newspapers 1600–1900.

Nineteenth Century US Newspapers Digital Archive (Gale Cengage) includes about 250 full-text titles from the 1800s.

19th Century UK Periodicals (Gale Cengage), a work in progress, will eventually offer full runs of 600 titles from 1800 to 1900.

British Periodicals (ProQuest), parts I and II, together provide full texts of nearly 500 titles from the seventeenth through the early twentieth centuries.

The Times Digital Archive 1785–1985 (Gale Cengage) provides every issue of this major London paper from 1785 to 1985. The same publisher also offer separate databases for other London papers, the Illustrated London News Historical Archive 1842–2003, the Financial Times Historical Archive 1886–2006, the Times Literary Supplement Historical Archive, 1902–2005, and The Sunday Times Digital Archive 1822–2006.

The several titles above from Gale Cengage, from the Burney Collection through the Sunday Times Digital Archive, can all be cross-searched simultaneously through the company’s Gale NewsVault search interface. The latter searches all of their more than 2,000 titles and 10 million digitized pages.

British Newspaper Archive (British Library) is an ongoing project to digitize 40 million pages of its newspapers, chosen from 52,000 titles covering 350 years. Searching is free, but downloading requires payment, at www.britishnewspaperarchive.co.uk.

NewspaperArchive (NewspaperArchive) is a huge full-text database of more than 5,800 newspapers from eleven countries: the United States, United Kingdom, Canada, China, Denmark, France, Germany, Ireland, Jamaica, Japan, and South Africa. Some coverage extends back to the 1600s.

Eureka.CC (CEDROM SNI) is another large database of about 3,000 newspapers and periodicals internationally, going back to 1980. It is particularly good for Canadian and French language news coverage. It has content from every state in the United States and every province in Canada.

Early American Imprints, Series I: Evans, 1639–1800 (Readex) provides full texts of almost every nonserial publication in the American colonies within this span of years, including advertisements, ballads, broadsides, captivity narratives, cookbooks, devotional literature and sermons, diaries, emblem books, grammars, hymns, maps, memoirs, nonfiction books, novels, plays and playbills, textbooks, trade catalogs, and travel literature. Coverage is extended by Series II: Shaw-Shoemaker, 1801–1819; the two can be searched simultaneously in libraries that subscribe to them. More than 36,000 items are keyword searchable.

Early English Books Online (EEBO) (Chadwyck-Healey) provides virtually every book—over 100,000 titles—published from 1473 to 1700 in England, Scotland, Ireland, Wales, and British North America, as well as books in English published anywhere else in the world. (EEBO includes some American texts not in the Early American Imprints databases.)

Eighteenth Century Collections Online (ECCO) (Gale Cengage) offers more than 180,000 full texts of books, pamphlets, and broadsides published in the United Kingdom from 1701 to 1800 (both English and foreign language) as well as some North American imprints.

Early European Books (ProQuest) is a full-text database, with annual additions, seeking to include all works printed in Europe prior to 1701 in all languages, as well as pre-1701 works in European languages printed outside Europe. It currently includes about 18,000 titles. The texts are presented in full color, complete with images of bindings, edges, endpapers, blank pages, and loose inserts. It is essentially a supplement to the EEBO database (above) in that it concentrates on digitizing the contents of predominately non-Anglophone libraries, but it also includes material duplicated in EEBO if the English works form integral parts of those libraries.

C19: The Nineteenth Century Index (ProQuest/Chadwyck-Healey) enables you to cross-search more than a dozen different databases covering books, newspapers, periodicals, and other formats from the nineteenth century. It includes the Nineteenth Century Short Title Catalogue, which is a list of virtually every book in the English language published anywhere in that century, keyed to a microfiche set of the books. (Your own library may or may not own this separate microfiche collection; as of this writing the texts of these microfiche book have not been digitized, but C19 provides links to many of the same titles in Google Books.) The database, as of this writing, searches these files:

American Periodicals from the Center for Research Libraries
American Periodicals Series
Archive Finder
British Periodicals
Dictionary of Nineteenth Century Journalism
House of Commons Parliamentary Papers
Niles’ Register, 1811–1849
The Nineteenth Century
Nineteenth Century Short Title Catalogue
Palmer’s Index to the Times
Periodicals Index Online/Periodicals Archive Online
Poole’s Index to Periodical Literature
Proceedings of the Old Bailey
The Wellesley Index to Victorian Periodicals
U.S. Congressional Serial Set

Other sources will be added in the future.

Dissertations and Theses: Full Text (ProQuest) provides the best access to doctoral dissertations. Although coverage is worldwide, the emphasis is on U.S. and Canadian sources. More than 2.7 million titles are searchable from 1861 to the present. (More than 2.1 million can be ordered from ProQuest in paper or microfiche formats; more than 1.2 million can be downloaded as PDFs.) From July 1980 forward, abstracts can be searched; prior to that, only titles are searchable for subject content. Master’s theses are abstracted from 1988 forward. From 1997 forward, full texts are online for most dissertations and masters’ theses.

Dissertations are a true gold mine of scholarship in all subject areas, but (contrary to a widespread assumption) the vast majority of them do not wind up as published books. As with Web of Science and Periodicals Index Online, give this database a try no matter what topic you are researching. It has controlled headings at very broad levels (e.g., Law; History; United States; Education, Higher; Sociology, General; Literature, Modern; Literature, American; Women’s Studies; and so on), but these are useful only in combination with uncontrolled keywords. Remember that you must play around with synonyms in this database, without the help of any cross-references (e.g., “cattle industry” in addition to “beef industry”). In helping a priest interested in “the theology of humor” I found this database to be very useful—but the keyword “theology” had to be combined not just with “humor” but also with terms such as “laughter,” “comic,” and “comedy”—each of which produced a different set of results.

The Library of Congress, alone of all libraries in the world, owns a virtually complete set of American doctoral dissertations (but not master’s theses) on microfilm and microfiche. Doctoral dissertations from countries other than the U.S. and Canada are systematically collected and made available via interlibrary loan by the Center for Research Libraries in Chicago at www.crl.edu/collections/topics/dissertations. Many Canadian theses are available full text online at www.collectionscanada.ca/thesescanada/. Dissertations from MIT (usually not in ProQuest) are available at http://dspace.mit.edu/handle/1721.1/7582, or type “MIT theses” in Google, Bing, or Yahoo!. A general websites for dissertations and theses are provided by OpenThesis at www.openthesis.org/and the Networked Digital Library of Theses and Dissertations at www.ndltd.org/.

Here’s a tip for academics whose own dissertations were written prior to 1997: if you order an unbound photocopy (the cheapest option) of your dissertation from ProQuest, at that point the company will also digitize the work and make it full-text readable (and downloadable) through its Dissertations and Theses: Full Text subscription service. Pre-1997 texts are being added to the database individually only on as “as ordered” basis. Making your dissertation widely available in full text format in this database can have the spin-off benefit of inducing more people to look at it who would otherwise not bother to order an individual copy. One qualification: these pre-’97 texts can be found only through author, title, or name-of-institution searches. Their abstracts are not present, and their full texts, even though digitized, are not keyword searchable (unlike the texts from 1997 forward).

JSTOR (pronounced Jay-Store) is an online collection of full backfiles of about 2,000 academic journals; hundreds more titles will be added in the future. In each case, the vendor goes back to Volume 1, Number 1 of each title and digitizes the entire run of the journal, up to nearly the present. In other words, the virtue of this database is its retrospective coverage: some of the full-text journals here go back hundreds of years (in one case, Philosophical Transactions of the Royal Society, to 1665). But there are no subject headings or descriptors here—only keywords are searchable. Libraries can subscribe to a number of separate component collections, including Arts & Sciences, Life Sciences, Biological Sciences, Business, Ecology & Botany, Health & General Sciences, Language & Literature, Mathematics & Statistics, and Music. Check with your local reference librarians to see what is included in your own library’s subscription, because JSTOR access may be very different in coverage from one library to another. The company has recently expanded its service to provide full texts of more than 15,000 books from several major university presses; this Books at JSTOR component will continue to grow along with the expansion of serials coverage. A very tiny portion—only 6 percent—of the JSTOR articles (pre-1923 U.S. and pre-1870 non-U.S.)2 has now been made freely searchable at www.jstor.org; but the vast bulk of the database’s content is still available only via subscription.

Project Muse is a full-text database of about 600 current high-quality academic journals, from more than a hundred publishers, generally from the mid- or late 1990s or early 2000s forward. (More titles will be added.) Over 12,300 academic books are also included. Articles here are searchable via controlled subject headings as well as by keywords, but it’s easiest to remember this database in conjunction with JSTOR as it provides the more recent years of some (not all) of the journals found there.

LexisNexis is a huge conglomeration of over 45,000 sources worldwide; as with many other online services, subscribing libraries can pick and choose which component parts they wish to pay for. Corporate or professional library subscriptions entail selections from an extensive menu of hundreds individual component databases and are likely to be quite different from academic library offerings such as LexisNexis Academicor LexisNexis Library Express (below), which are standardized in their coverage. One major difference is that corporate subscriptions may include real estate records, vehicle registration records, personal data on over 450 million individuals (telephone and cell phone numbers, drivers licenses, professional licenses, voter registrations, death records, selected marriage and divorce records, criminal histories, employment locators), and asset records (real estate and county assessor records, foreclosures, bankruptcies, tax liens). Academic library subscriptions will not include this public record data.

Several separate databases3 within the conglomeration are available, among them the following:

LexisNexis Library Express is the version most likely to be found in public or state libraries. It provides full texts of more than 9,500 sources, including 2,500 newspapers worldwide, 1,000 magazines and journals, more than 1,000 newsletters, broadcast transcripts (NBC, ABC, CBS, BBC, CNN, NPR, Fox, etc., including foreign sources worldwide), business and trade publications, market research reports, 300+ legal periodicals, 500 law reviews, U.S. court decisions, country and state profiles, patents (from 1971 forward), SEC filings, medical and biographical sources, and a Company Dossier service with detailed data on 53 million U.S. and foreign firms.
LexisNexis Academic is geared to college and university libraries; it is very similar in content to Library Express, but has a different pricing structure for this clientele. It also includes nearly 300 college and university newspapers and the Sheperd’s Citations service, which are not available in Library Express.
LexisNexis Scholastic provides very similar content to Library Express, with federal and state legislative and bill-tracking information. It is geared toward high school libraries.

Factiva (from Factiva) is another very large full-text database, covering about 36,000 publications worldwide in 28 languages. It provides global coverage of more than 4,500 newspapers. Hundreds of newswire services are also included, and 25,000 websites and blogs are indexed. Coverage extends primarily from the 1990s forward, with some titles back to the late 1970s. Factiva is particularly strong in its business coverage: tens of thousands of SEC company reports and investment analyses are available in full text, with directory information for 17 million public companies worldwide. The database also tracks business and general news websites.

HeinOnline (Hein) is a huge compilation of full-text material in the field of law; it contains dozens of component indexes and full-text libraries,4 among them the following:

Law Journal Library (more than 1,800 U.S. and international titles)
American Indian Law Collection (more than 900 titles)
Code of Federal Regulations (both current and retrospective to 1938)
English Reports, Full Reprint (1220–1867)—more than 100,000 cases
Federal Register Library (back to 1936; also United States Government Manual back to 1935, and Weekly Compilation of Presidential Documents to 1965)
Foreign & International Law Resources Database (including International Yearbooks and Serials; U.S. Law Digests; International Tribunals/Judicial Decisions; and more)
Foreign Relations of the United States (the full set of 500+ volumes providing primary source documents in U.S. diplomatic history)
History of Bankruptcy (books, legislative histories, documents, treatises)
Kluwer Law International Journal Library (20+ major European law journals)
Legal Classics (more than 3,000 works)
Sessions Laws Library (all 50 states, Washington, DC, and U.S. territories)
Subject Compilations of State Laws (1960 to date; thousand of sources for comparing laws on hundreds of subjects)
Taxation & Economic Reform in America, 1781– (federal level tax regulations, laws, and hundreds of legislative histories)
Treaties and Agreements Library (all U.S. treaties and agreements: in force, expired, or not yet officially published)
United Nations Law Collection (a huge compilation of all relevant U.N. publications, including [among much else] texts of all treaties registered with the U.N. since 1946, and with the League of Nations from 1920 to 1946)
United States Code (all versions from 1925–1926 to current edition)
U.S. Attorney General Opinions
U.S. Congressional Documents (including complete Congressional Record and its predecessors Annals of Congress, Register of Debates, and Congressional Globe; also Cannon’s, Hind’s, and Deschler’s Precedents)
U.S. Federal Agency Documents, Decisions, and Appeals (complete case/decision law opinions of federal regulatory agencies)
U.S. Federal Legislative History Library (hundreds of legislative histories for major U.S. legislation since 1789, with full texts of 250+)
U.S. Presidential Library (including Messages and Papers of the Presidents, Public Papers of the Presidents, CFR Title 13 [proclamations, executive orders], Economic Report of the President, Weekly Compilation of Presidential Documents, Daily Compilation of Presidential Documents)
U.S. Supreme Court Library (including official U.S. Reports [1754-], Preliminary Reports, and Slip Opinions; also dozens of books on the Court, and major Periodicals: Supreme Court Economic Review, Supreme Court Review, and United States Supreme Court Bulletin)
U.S. Statutes at Large (1789– )
World Constitutions Illustrated (an ongoing project attempting to provide the complete constitutional history of every country, with complete editions, translations, and commentaries)
World Trials Library (trial transcripts, court documents, and monographs on famous and not-so-famous trials worldwide; includes full Nuremberg Trial transcripts)

A fuller listing of sources covered in this subscription database can be found at www.heinonline.org/.

Columbia International Affairs Online (CIAO) (Columbia University Press) is another very good keyword index to full texts of public policy studies since 1991.

OpinionArchives (OpinionArchives) is a full-text database of major “commentary” magazines, digitizing the full run of each. Among them are American Spectator, Commentary, Commonweal, Dissent, Harper’s, The Nation, National Review, The New Leader, The New Republic, The New York Review of Books, The New Yorker, The Progressive, Washington Monthly, and The Weekly Standard.

The Gale Directory of Databases (31st ed., 2000), already mentioned in Chapter 4, is the best index to all of the 21,000+ subscription databases available through libraries.

Printed Sources for Keyword Access to Older Journals

Three printed keyword indexes for which there are no online equivalents are occasionally useful for searching older journals; all were published by the now-defunct Carollton Press, Inc. They are:

Combined Retrospective Index to Journals in History 1838–1974 (11 vols.)
Combined Retrospective Index to Journals in Political Science 1866–1874 (8 vols.)
Combined Retrospective Index to Journals in Sociology 1895–1978 (6 vols.)

Three good sources for identifying printed indexes to older journals, for which there are no database equivalents, are these:

Bonnie R. Nelson’s A Guide to Published Library Catalogs (Scarecrow Press, 1982). Many research libraries over the years have specialized in collecting resources in particular subject areas; often these libraries published printed catalogs of their holdings, and sometimes these old catalogs included entries not just for books, but for individual journal articles within the subject areas of the catalog. For example, before the appearance of the H. W. Wilson Company’s Art Index (covering articles from 1929 forward), the Metropolitan Museum of Art in New York did extensive indexing of art periodicals, which can be found in its published Library Catalog of the Metropolitan Museum of Art (48 vols.; G. K. Hall, 1980, 2nd ed.). Nelson’s book identifies dozens of such printed catalogs in all subject areas.
Robert Balay’s Early Periodical Indexes: Bibliographies and Indexes of Literature Published in Periodicals before 1900 (Scarecrow, 2000). This is an annotated list of about 400 indexes to old journal articles, categorized by broad subject areas with a more detailed index by specific topics.
Norma Oland Ireland’s An Index to Indexes (F. W. Faxon Company, 1942; reprinted by Gregg Press, 1972) is a subject bibliography of over 1,000 printed indexes in over 280 subject areas. (The Paratext database 19th Century Masterfile is attempting to eventually cover all such publications online; but that is a very long-term goal.)

These guides to old indexes are sometimes useful when sources like Readers’ Guide Retrospective, Periodicals Index Online, Web of Science, JSTOR, or 19th Century Masterfile don’t turn up the older information you need.

Keyword Searching on the Internet

Although the focus of this book is on sources that are not accessible on the open Internet, it remains true that an amazing and wonderful variety of material is indeed accessible there. The crucial point, however, is that the mere presence of good material on the Web does not automatically assure efficient access to it—the latter is a function of the search techniques that the Web allows (or eliminates) for finding its content.

Virtually all Web searching is done via uncontrolled keywords, because the various search engines do not have software that enables controlled vocabularies to be exploited—i.e., they lack mechanisms for displaying cross-references or browse menus; indeed, most websites are not indexed with controlled subject headings or descriptors to begin with. This is eminently understandable because the Web includes billions of sites, and the cataloging elements added to records in OPACs (standardized subject headings and displays of their networks of linkages to each other) must be created by human beings rather than by computer algorithms. Cataloging must therefore be confined to a comparatively small niche area, the subset of information resources deemed important for library collections; it cannot provide access to the entire Internet.

When billions of sites require indexing, automated means (bypassing direct human inspection) must be employed; there is literally no alternative. Algorithms, however, must work directly—and in most cases only—with the unstandardized keywords available to them within the various websites. They can rank the display of requested keywords in marvelously ingenious and useful ways, according to various weighting criteria: for instance, a site A, to which other sites (B, C, D) link, will rank higher in importance than those sites without such links; if the linking sites (B, C, D) are themselves extensively connected to still other sites (E, F, G, etc.), then these additional attachments will provide a cumulative additional weight to the relevance ranking of A.

Nevertheless, as anyone who had done any Web searching already knows, these relevance ranking mechanisms do not solve the problems discussed above (e.g., getting the right words in the wrong contexts, excessively granular retrievals). They especially do not solve—and in fact greatly exacerbate—the problem of getting an overall perspective on relevant literature. (The growing proliferation of echo chambers and filter bubbles—i.e., producing search results weighted and skewed by individuals’ own idiosyncratic past search histories—further diminishes the Web’s capacity to provide inclusive overview perspectives.) In general, scholarly research—as opposed to quick information seeking (see Appendix B)—seeks to gain exactly that overall view. Scholars wish to be reasonably assured that they are not overlooking especially important sources and that they are also not wasting time re-inventing the wheel in duplicating research that has already been done. Searching the open Internet (i.e., the sites freely available), as valuable and as necessary as it is, cannot solve these problems.

Full-Text Book and Journal Sites on the Open Internet

A number of free sites on the Internet now provide access to texts of many books and journal articles. The most prominent are Google Books, Google Scholar, Hathi Trust www.hathitrust.org, and the Digital Public Library of America (DPLA) http://dp.la.

Google Books is an ongoing attempt to digitize every book in the world; it currently includes tens of millions of volumes from dozens of major libraries internationally. Google Scholar is a comparable attempt to bring all open-source journal articles into a single database. Hathi Trust and DPLA are other sites for digital books and other resources from many contributing libraries.

The good intentions of all such sites are greatly hamstrung by the unavoidable reality of copyright restrictions. If copyright laws worldwide were all repealed, then such websites could indeed cover “everything” within their domains, but copyright is a reality that will simply not go away. The millions of people who create information and whose livelihoods depend on their creations do not agree that “information wants to be free.” Unlike some academic authors who may be satisfied by payments in the form of enhanced résumés that lead to increased chances for tenure and promotion, most other writers require more direct monetary compensation for their labors. It is highly likely that a system of trade-offs among what, who, and where restrictions on access to copyrighted information will continue to obtain. The alternative of, in effect, government-enforced socialism in the publishing world would entail greatly increased tax burdens on all citizens to pay for their immediate online access to “all” publications without restrictive passwords or site restrictions. It would also require outright coercion of any individuals who wish to opt out of any such government-controlled system and who wish to charge prices for their work determined by supply and demand considerations rather than by imposed price controls. Further, the prospect that every nonacademic author will voluntarily and selflessly contribute his or her own work product freely to the open Internet, for the good of all people everywhere, without government intervention—that prospect entails nothing short of a radical transformation of human nature itself. (Even academics wish to receive royalty payments for their books, if not for their journal articles.) Changes in technology do not produce changes in human nature or in the need to make a living.

The alternative of government-regulated control of information has been shown by history, notably by the failure of the communist system, to be unworkable in the long run. The market system of supply and demand, which requires restrictions entailing who can pay for access to what resources, seems more likely to prevail for most of the world’s information economy. Nor are copyright restrictions the only legal barriers to the provision of free digital access to “everything”—the realities of antitrust laws against monopolistic suppression of competition also play a role in the world of online information. (My own crystal ball is too cloudy to foresee what specific court decisions will be made in these areas, other than to predict that neither copyright nor anti-monopoly laws will simply vanish; nor, I suspect, will the U.S. Supreme Court [or any other tribunal elsewhere] give unqualified assent to the notion that the world’s information “wants to be free.”)

That said, these free sites are still capable of providing amazing results within the niche fields that remain legally open to them. Even if it continues to provide only provide snippet-level access to most books published after 1922, Google Books is still wonderfully useful on questions the call for very specific information and for simply distributing so many out-of-print books so widely. For example:

I once helped a researcher whose family was interested in a particular ancestor, a John DeHart. DeHart was a member of the Continental Congress, but the family knew he had resigned from that body and wanted to know why. Google Books turned up pages from two relevant books very quickly, because the words “John DeHart” and “Continental Congress” and “resigned” were, in combination, very distinctive and did not provide a mountain of irrelevant retrievals. It turns out that DeHart did not want to support the now-famous resolution of Richard Henry Lee that the states should be free from Britain. (When I provided printouts of the relevant pages, the young woman who asked the question told me “this is probably not what my grandmother wants to hear,” but that, since the information did solve the family mystery, she was still glad to get it.)
A historian interested in the office of the Judge Advocate General in the Civil War period wanted to know if there is a published list of the staffers in that office at the time. (He was aware that unpublished records of the office might exist at the National Archives, but he wanted to save himself the trip over there.) Various bibliographies of old government manuals identified the Official Register of the United States, which does list federal personnel from the period, although sometimes only at the “director” level. I went back into the stacks and brought out all of our volumes from 1859 to 1867. The interesting thing was that the 1867 volume does list the individual staff members in the office, not just the Judge Advocate himself, whereas the volumes from 1859 to 1865 gave only the Judge Advocate’s name, with no staffers. Unfortunately, the Library of Congress’s set is missing the 1866 volume, which would have been important if it did list all of the individual staffers, as it was so close to the Civil War. I could find the missing 1866 volume, however, in Google Books. It turns out that this volume lists only the Judge Advocate’s name (like the preceding volumes, unlike the succeeding 1867 volume)—but establishing that fact, through Google Books, saved the historian a great deal of time in having to track down the 1866 volume in another library.
Another researcher wanted to find a particular article with the title “Philip Lee Phillips, Cartobibliographer” written by Walt Ristow—but she did not have any information on when or where the article was published. Google Books provided enough snippet information to establish that the paper first appeared in a German journal, Kartensammlung und Kartendokumentation, in 1971, as well as in a German festschrift, Karten in Bibliotheken (also in 1971), and that it was reprinted in the American journal Surveying and Mapping in 1972. Neither Google Books nor Google Scholar (for open source journals) provided the text of the article itself, but with the adequate citations provided by the snippet references, paper copies could be quickly located.
Another historian wanted to find a copy of a National Security Agency memorandum, and the only reference to it he had was its report number, NSAM 182. This very distinctive number showed up in the snippet display of a footnote from a book in Google Books; the citation there indicated that the text of the memo had been reprinted in the “Gravel” edition of the Pentagon Papers, vol. 2, page 69. This Gravel edition of the Pentagon Papers had not itself been fully digitized, but with the citation information provided by the Google snippet, a paper copy of the volume could be quickly retrieved.

In each case, very specific—not “overview”—information was needed, and Google is a godsend in such situations. When scholars are trying to achieve overviews of extensive bodies of literature on their topics, the Internet search engines are not nearly as good as the several other mechanisms discussed in this book. These are the kinds of inquiries that they cannot handle well:

“What is available on film versions of Othello?”
“I’m doing a dissertation on Japonisme—what has already been written on this?”
“What has been done on ‘Terrorism in India’?”
“I have to write a paper on the information-seeking habits of tourists.”
“How do people behave when they visit zoos?”
“What are the justifications for the wars in Iraq and Afghanistan?”
“Why do people who have been told repeatedly about the dangers of HIV/AIDS continue to act irresponsibly?”
“What do you have on racial discrimination in France in the 1920s?”
“I have to write something on German perceptions of American Indians.”
“I want to compare the treatment of POWs and MIAs in the Vietnam and Yom Kippur wars.”
“I have to write on paper on sexuality in late Roman times.”

It isn’t difficult to find some information on many of these topics on the open Internet, or via Wikipedia, but if you are doing any kind of scholarly research on these topics you will certainly need sources in addition to those on the Net. Google, Google Scholar, Hathi Trust, and DPLA are wonderful sources for many inquiries; but they do not cover nearly “everything,” nor does their keyword search software enable you to find all of the relevant sources that they do include. The whole family of resources on the open Internet can best be described as “necessary but not sufficient” for scholarly research. It is the other sources that are also necessary—resources available only through real libraries—that are the focus of the present book.

Other Approaches to the Internet

The term “invisible Web” is slippery; I use it to refer to those portions of the open, freely available Internet that are not indexed by the spiders and crawlers of the major search engines. I do not include within it the hundreds—indeed thousands—of subscription databases available through libraries; these, as I use the term, are not on the “free” or “open” Internet to begin with. There is room for legitimate disagreement here; some would say that any password-protected commercial database that is both electronic and remotely accessible is part of the “invisible” Web. I would restrict the “invisible” designation to sources that are electronic, remotely accessible, and also freely accessible from anywhere, at any time, by anybody—but at the same time lying beyond the reach of the conventional spiders. For example, as of the present writing, the online catalog of the Library of Congress is freely available at catalog.loc.gov (see Chapter 2), but the individual book (and other) records within it are not crawled by the various search engines. You can use the Internet to get to the catalog, and once you’re there you can then search within it, but a Google or Bing or Yahoo! search will not “see” or retrieve the individual OPAC records directly. (This limitation will soon be overcome. Direct Internet access to the catalog records themselves, however, will not provide any access to cross-references or browse displays; for that reason use of the the Library’s separate OPAC [catalog.loc.gov] will continue to provide superior search capabilities.)

There are whole books written on how to access the “invisible” or “the deep” Web—i.e., the free and yet hidden sites within it. Although, again, the free Internet—whether spidered/crawled or invisible—is outside the scope of this book (which is detailed enough already), there are nevertheless some avenues of Internet access beyond what is provided by the search engines that reference librarians find to be particularly noteworthy:

Internet Public Library at www.ipl.org is a link to authoritative sources, categorized by subject. It provides, through its “Resources by Subject” groupings, a conceptual categorization of highly recommended Internet sources that might otherwise be buried by the relevance-ranked keyword access of the search engines.
Refdesk.com also provides useful categorizations. Its home page, while overwhelming detailed, repays some study. Particularly useful for academic purposes is its list of “Refdesk subject categories” near the bottom of the screen.
Libraryspot.com is somewhat similar index to multiple online reference sources and is also worth checking.
Directory of Open Access Journals at www.doaj.org allows article-level searching of more than 10,000 free journals on the Internet.

While looking at these sources, however, don’t overlook the reference librarians themselves in your local library.

To sum up, keyword searching is the means of subject searching used by most students, most of the time. It is a powerful search tool especially in those situations when you can clearly specify what you wish to see, in precise terms. It is best understood, however, in relation to controlled vocabulary searching: each of the two approaches has strengths and weaknesses that complement the other. The main point to remember is that keyword searching is not conceptual category searching: it will give you exactly the words you specify, and if there are other ways to express the same subject you will not be retrieving those other expressions. While very powerful in turning up precise information, keyword searching is notoriously unreliable in providing overview perspectives that give you reasonable confidence you have not overlooked something important. It is a necessary component of the scholar’s search toolkit; but it is only one of several tools.