Chapter 10

Truncations, Combinations, and Limitations

Finding the best search terms for your topic, whether keywords or controlled vocabulary headings, is obviously very important to the success of your research. Equally important, however, are considerations of what to do with those terms once you’ve decided what they should be—that is, how you type them can make a radical difference in determining what records they retrieve. By “how you type them” I am referring to the many combinations, truncations, word proximity specifications, and field limitations that are possible. For example, the combination of the search terms

Motion pictures AND women

will produce results that are very different from typing in the same string in quotation marks: “motion pictures and women”—which in turn will produce very different results from the strings “motion pictures for women” or “motion pictures about women.” (Eliminating the quotation marks would, in most databases, leave the conjunction AND operative, but would make the prepositions “for” and “about” invisible to the search software.)

Similarly, within the LCSH system Philosophy AND Research will produce results much different from either Research—Philosophy or Philosophy—Research. (In most databases the connector could by typed in either upper or lowercase—“AND” or “and”—without making a difference.)

A researcher trying to find articles on the “information-seeking behavior of tourists” could type in that string of terms—but (in databases providing abstracts of articles) she would get better results with

touris* AND “information se*”

This phrasing would include any of the variant forms tourist [singular], tourists [plural] and even tourism, as well as information seeking, information seeker [singular], information seekers [plural], information search, information searching, information searcher [singular], or information searchers [plural]. The same inquirer would get better results by omitting the word “behavior” entirely—it isn’t necessary for the concept, and the specification that that particular keyword must be present would unwittingly eliminate any relevant records that use “activities” or “practices” or other such terms rather than “behavior.”

Similarly, to return to an example mentioned in Chapter 2, a search for portrayals of the Arab–Israeli conflicts in fiction, the phrasing

Israelis AND Arabs OR Palestinians AND literature OR fiction

will produce a huge jumble of results that bear no resemblance to those produced by a more carefully crafted string using parenthetical groupings, such as

Israel* AND (Arab* OR Palestin*) AND (literature OR fiction)

Moreover, while the above string would work in most commercial databases, the same terms would have to be typed in differently in many library OPACs that use the question mark rather than the asterisk as the truncation symbol:

Israel? AND (Arab? OR Palestin?) AND (literature OR fiction)

Again, it’s not just a matter of simply finding the right words to begin with—how you type them in changes what they will find.

The major considerations governing “how you type them in” are those of word truncation, wildcards, Boolean combinations of terms, word proximity specifications, and limitations of searches by various criteria (language, date, geographic area code, document type, etc.).

Word Truncation and Wildcard Symbols

Most subscription databases use the asterisk (*) for word truncation, also known as word stemming. This feature saves you the trouble of having to key in, individually, multiple words having the same stem. Thus, typing child* will find not just the singular form of the word, but also child’s, children, or children’s—any words having the same stem. Similarly, Athen* will retrieve not just Athens but also Athenian or Athenians, and comput* will bring up computer, computers, computerization, or computing.

You have to be careful, however: child* will also retrieve childbearing, childbed, childbirth, childhood, childish, or childlike if those terms are present in the database you are searching, and their presence can greatly increase the number of “noise” records that only get in the way of what you want. Similarly, Athen* will also bring up Athena, Athenia, Athenaeum, Athenagoras, Athènes, Atheniensium, and athenischen. And comput* can retrieve computable, computative, computation, or computational.

As in the example of Arab–Israeli fiction, some OPACs such as the online catalog of the Library of Congress (which uses the Voyager system from Ex Libris—also used by about 1,300 other libraries) use the question mark (?) rather than the asterisk for word truncation. Still other databases use the exclamation point (!).

The term “wildcard” usually refers to retrieval of variant characters within individual words, rather than to the word stemming that takes place at the end of the terms. For example, in the Historical Abstracts database, there are two wildcard symbols: the question mark (?) can be used inside a word to replace any one character; thus ne?t will retrieve neat, next, or nest. The pound sign (#), however, when used internally, can replace more than a single internal character; thus colo#r will retrieve either color or the British spelling colour.

All of this, of course, can be very confusing and difficult to remember since different databases have different conventions. The one very important point to remember is that most of them will have a Help icon that you can click on, which will spell out the information you need regarding which symbol(s) to use within whichever database you’re searching. Look for that icon and skim whatever information it brings up. You really need to know how to type in the words you want.

Boolean Combinations

“Boolean” combinations derive their name from the nineteenth-century British mathematician and logician George Boole; the term refers to the capacities of most databases to combine multiple search elements within one inquiry. For example, a researcher interested in the topic “computer-assisted instructional techniques in the field of geography” was initially referred to the ERIC database, a large index to journal articles and research reports in the field of education. This index has a list of controlled descriptor terms. The Thesaurus of ERIC Descriptors listed several different relevant terms for each of the two elements he wished to combine:

Programmed Instruction Geographic Concepts
Learning Laboratories Geography
Programmed Instruction  Materials Geography Instruction
Computer Assisted Instruction Human Geography
Physical Geography
World Geography

The ERIC database can search all of the first-column terms at once, and all of the second column, then cross the two sets against each other to present only those citations that retrieve at least one descriptor from each column simultaneously. Had he so desired, the searcher could have introduced a third set of terms, specifying the output to only those citations having any of the additional descriptors Secondary Schools, Secondary Education, Secondary School Curriculum, and so on. A further specification could have limited the results to only articles or reports published within the last five years.

The computer accomplishes this operation of combining and screening terms via Boolean combinations, which are illustrated in Figure 10.1. If Circle A represents the set of citations retrieved by expressing one subject (either controlled descriptors or keywords or both),and Circle B represents another subject, then the area of overlap in Figure 10.1a represents those citations that deal with both subjects simultaneously. Other circles or limiting factors can be introduced for further specification. And other types of combinations are possible, as shown in Figures 10.1b and 10.1c.

In the above example, the way that the search terms are entered can also be simplified. If the terms are first specified as having come from descriptor (rather than keyword) fields, an expression such as the following would work:

(Programmed OR Learning Laboratories OR Computer Assisted Instruction) AND Geograph*

The word Programmed by itself, being common to two of the descriptors, need not be typed in twice. The asterisk (*) after Geograph* is the truncation symbol that tells the computer to retrieve any terms having the same stem, equivalent here to (Geography OR Geographic).

The use of parentheses, as in the above example, enables you to use multiple Boolean combinations of different kinds within the same search specification. Another of the examples given above could be further extended as in Chapter 2, thus:

image

Figure 10.1 Venn diagrams (a, b, c) showing Boolean AND, OR, and NOT intersecting circles, respectively.

Israel? AND (Arab? OR Palestin?) AND (literature OR fiction) NOT (juvenile OR children’s)

This OPAC search would turn up books on the portrayal of the various Arab–Israeli conflicts in fiction, but would eliminate the many propagandistic story books written for children.

Similarly, the reader who vaguely remembered a book about someone who bicycled through France, but couldn’t recall the exact title or author, found what he needed in Book Review Digest Plus by doing a combined search of “bicycl*” (to include “bicycler” or “bicycling” as well) and “France” as subject words, but NOT-ing out the phrase “tour de France” (in quotation marks), which would provide multiple unwanted hits. (In this database, the terms are entered in separate search boxes, with the AND, OR, and NOT operators given in drop-down menus for each box.)

Be particularly wary of how you use the NOT operator. For example, suppose you want articles on “dog food AND cat food”; in this case, if Circle A represents “dog food” and Circle B represents “cat food,” then the AND combination of the two will give you the shaded area of Figure 10.1a. Now suppose you want either “dog food OR cat food.” The OR operator between the terms will give you the shaded area in Figure 10.1b. But now suppose you specify “dog food NOT cat food.” The NOT operator here will give you only the shaded area of Circle A—not the entire Circle A—represented in Figure 10.1c. The area where the circles overlap contains citations that talk about both dog food AND cat food (as in Figure 10.1a)—but by saying you wish to eliminate the articles that include cat food (all of Circle B), you have unwittingly eliminated some entries in Circle A that also talk about dog food. Be careful about using NOT as a connecting term, in other words—you may be eliminating more than you wish.

Note that some databases require the capitalization of connecting terms (AND, OR, NOT, or sometimes OR NOT), whereas others allow either capital or lowercase entry. Again, look for the Help icon within whichever database you are using to determine its conventions. Don’t just start typing keywords into the first blank search box you see—take a (very brief  ) moment to familiarize yourself with how to type in the terms you want. This is important.

Combinations Using Component Words within Controlled Subject Strings

The ability to search and combine component words that are shared by multiple controlled vocabulary strings is very useful in many situations. For example, in the Library of Congress Subject Headings list (see Chapter 2), there are more than 17 pages of headings that start with the terms African American(s); these include such terms as the following:

African American actors
African American architecture
African American baseball umpires
African American cooking
African American diplomats
African American families
African American History Month
African American leadership
African American painting
African American parents
African American preaching
African American quilts
African American radio stations
African American students
African American veterans
African American wit and humor
African American women
African American women composers
African American women surgeons
African Americans—Biography
African Americans—Civil rights
African Americans—Folklore
African Americans—History
African Americans—Legal status, laws, etc.
African Americans—Relations with Korean Americans
African Americans—Religion
African Americans and mass media
African Americans in art
African Americans in the motion picture industry
African Americans with disabilities

Similarly, there are 10 pages of headings that start with the word Television; among them are the following:

Television
NT Animals on Television
Businessmen on television
Detective teams on television
Medical personnel on television
Sex on television
Violence on television
Television—Art direction
Television—Censorship
Television—State-setting and scenery
Television—Vocational guidance
Television actors and actresses
Television addiction
Television and literature
Television broadcasting of news
Television crime shows
Television in adult education
Television in politics
Television news anchors
Television personalities
Television plays, Hindi
Television programs
NT Action and adventure television programs
Live television programs
Lost television programs
Medical television programs
Television comedies
Television reruns
Women’s television programs
Television talk shows
Television weathercasting

While there are thus hundreds of separate headings with either African American(s) or Television in them, there are also precoordinated headings that combine the two concepts, such as:

African American television journalists
African American television viewers
African American women on television
African Americans on television

The point here is that the component word search capability of online catalogs can easily cross all of the many African American(s) headings against all of the Television headings without your having to type them all in individually:

You will need to be connected to the Internet to follow this: from the catalog.loc.gov search page select “Advanced Search” and then change the drop-down menus for each of the first two search boxes from “Keyword Anywhere (GKEY)” to “Subject ALL (KSUB).” Type “African American?” in the first box and “television?” in the second box. This method of searching takes you directly to the catalog records having the subject terms you’ve entered—unlike the Browse search screen, it entirely bypasses the intervening display of browse menus that would alert you to other headings or cross-references (see Figure 10.2).

This combination of LCSH elements produces more than 400 records. Particularly interesting is that among the retrieved hits are Geraldine Woods’s The Oprah Winfrey Story (1991) and Paul Mooney’s Black is the New White: A Memoir (2010). The subject headings (or tracings) attached to the former book show that it has the separate headings:

image

Figure 10.2 LC catalog Advanced Search screenshot with two drop-down menus changed to “Subject ALL: KSUB” for the two boxes: African American? AND television.

Television personalities
African Americans—Biography

The tracings on the latter also show separate headings:

Television comedy writers—United States—Biography
African American comedians—Biography

What is significant here is that a researcher found works having the desired words (among the 400+ hits) not only within precoordinated strings such as African American television personalities but also, as in these instances, across entirely different strings attached to the same record. In a sense, then, the capacity to do Boolean combinations of individual words within different subject heading strings is a kind of sixth way to find the right LCSH headings for your topic, in addition to the five discussed in Chapter 2. Note, however, that this is not a conventional keyword search of words transcribed from titles of the books themselves. Rather, it is a keyword search of terms within the artificially created phrases—LC subject headings—that were added to the catalog records by librarians. If these controlled vocabulary terms had not been added, these records would not have been found at all because their own titles do not contain keywords specifically designating either the African American or Television subject content.

Searching for individual component words that appear within many different subject headings is very useful whenever there are large clusters of related headings. (Art, Business, Civil War, History, Indians, United States, and Women are other component words that each appears within a large multitude of different LCSH phrases.)

Proximity Searches

Beyond the word-combining capabilities of the standard Boolean operators AND, OR, and NOT, many databases allow more nuanced retrieval through word proximity searching. In these instances you can tell the computer not just to find two terms anywhere at all, but within specified distances of nearness, or in a specified order.

For example, in the Periodicals Index Online database (indexing more than 6,000 journals in 60 languages back to 1665) you can use either NEAR or FBY (Followed By) operators. Though you can always type in the term “life insurance” in quotation marks, which would assure retrieval of those two words immediately next to each other, in that order, you could also type:

life NEAR.5 insurance [This would give you both words, in either order, with up to five intervening words]

life FBY.3 insurance [This would give you both words, in the order specified, with up to three intervening words, as in the phrases “life casualty insurance,” “life and health insurance,” or “life and employers’ liability insurance”]

Other databases have similar functions but use different operators, such as

Life N5 insurance [Either order, up to 5 words apart]
Life W3 insurance [Same order, up to 3 words apart]
Life PRE/3 insurance [Same order, up to 3 words apart]

Again: take the very brief moment required to click on the Help icon in whatever database you’re searching to see which conventions it is using. The use of these proximity operators can greatly cut down the number of junk hits that have the right words in contexts you don’t want.

Proximity searching is also very useful when you are trying to pin down a phrase or quotation when you have only incomplete information about it. For instance, the researcher trying to identify which Supreme Court justice said that he couldn’t define pornography, but that “I know it when I see it” used a proximity search within Academic Search Complete:

Justice N15 pornography N15 “when I see it”

This quickly turned up references to the remark made by Justice Potter Stewart.

Limitations of Sets

Many databases allow limitations of search results by language, by date of publication, or by other features. The Help screens can provide this information; in many cases, the limit options will show up in columns along the margins of search screens or be presented below the search boxes in Advanced Search screens. Although the default setting for search screens, in most databases, will be for a “simple” single box, always look for and change your setting to the Advanced search mode. This will bring to your attention many possibilities for limiting your searches in ways that will greatly reduce the clutter of unwanted hits having the right terms in the wrong contexts. Many of these limit features are extremely important and should be actively sought out.

Limiting by Time Periods

While many databases allow you to limit your results to dates of publication of articles, it is rare to find files that enable you to limit to the dates of subject coverage you want. Thus, if you are looking for articles on the situation of the Copts in Egypt between 1900 and 1930, it would not be adequate to find only articles published between those dates, because there could be (and are) articles written about the Copts in that period that were themselves published much later. In this connection, two of the best databases for coverage of history journals, America: History & Life (AH&L) and Historical Abstracts (HA)—both from EBSCO, have exactly the desired capability. Not only can you fill in search boxes asking for month/year ranges of “Published Date,” you can also search by “Historical Period”—the latter enabling you to specify which years of subject coverage of the articles you wish to find. These databases were described in Chapter 4; to repeat that information briefly here, AH&L is the largest single database covering U.S. and Canadian history (from prehistoric times to the present); HA is the largest covering all other areas (from ca. 1450 forward, with some minimal coverage of earlier eras). Both files interpret “history” in a very broad sense, covering the history of art, education, literature, philosophy, religion, and so on—not just politics and rulers and international relations.

This limiting-by-dates-of-subject-coverage feature is extremely useful in historical inquiries, but my experience is that very few users of these databases—even among professional historians—are aware of it or notice this option on the search screens. You have to change the screen display from the default Basic Search to the Advanced Search for this search option to appear; further, you have to scroll down to the bottom half of the search page to see it.

An example of the utility of this feature is provided by the researcher who wanted information on “education in the Philippines between 1898 and 1916.” Within Historical Abstracts, use of the Historical Period limit boxes (coupled with education and Philippine* as subject-descriptor terms) produced 36 hits right on the button—as opposed to 76 that appear when the Historical Period limitation is not used. Similar precision is obtainable with questions on “racial discrimination in France in the 1920s” or “student movements in Uruguay in the 1960s and ’70s” or “Russian foreign policy in the 1700s” or (in America: History & Life) “Jewish identity in Atlanta the 1930s.”

The Brepolis Medieval Bibliographies database has a somewhat similar feature, although not as sophisticated—its Thematic Search option (on the Advanced search screen) allows you to limit your subject searches to whatever centuries you are interested in.

In the ProQuest Statistical Insight database, if you first do a subject search you will then see a clickable limit option “Date Covered” that allows you to select whatever range of years of subject coverage you wish the statistics to reflect. (This is different from the distinct “Date Published” option.)

Limiting by Geographic Area Codes

A particularly useful, but generally neglected, limit feature within online book catalogs is the capacity to specify geographic area codes. This capability is present with OPACs using the Voyager search software but, tragically, has been eliminated entirely by several other catalog systems (including WorldCat)—that is, even though the data are present on the catalog records themselves, some current OPAC systems have been dumbed down to the point that they cannot “see” or make use of them. (The unfortunate assumption in the library world is that OPACs should be “more like Google”—and the search software of Google Books cannot make use of area codes.)

The utility of limiting by area codes in online book catalogs is best illustrated by examples. One researcher, for instance, wanted to retrieve a set of any books on the folklore of Indians of North America. This inquiry is complicated by several factors; one is that the notion of “folk” cultural practices is divided among many different LCSH terms, among them:

Folk art
Folk artists
Folk dance
Folk dancing
Folk drama
Folk festivals
Folk literature
Folk music
Folk poetry
Folk singers
Folk songs
Folklore

Another complication is that there are many different terms within LCSH for North American Indian groups. While there is a general heading for Indians of North America, there are also numerous narrower terms linked to it, such as:

Algonquian Indians
Athapascan Indians
Caddoan Indians
Off-reservation Indians
Ojibwa Indians
Piegan Indians
Reservation Indians
Sewee Indians
Shoshoni Indians
Tinne Indians

Yet another complication is that several of these narrower terms themselves lead to further narrower headings; thus, under Algonquian Indians one finds a list of over 50 additional groups, such as:

Abenaki Indians
Cheyenne Indians
Fox Indians
Narragansett Indians
Ojibwa Indians
Potawatomi Indians
Wampanoag Indians

The important point here, as Chapter 2 explained, is that the many narrower terms are not included by the more general terms such as Indians of North America or even Algonquian Indians, and so must be searched separately. Obviously it would be very difficult to round up all of the many specific cross-referenced terms to begin with, and combining all of them with a lengthy series of Boolean OR operators would inevitably overload the search system.

This is a situation in which component word searching can combine with geographic area codes to solve a difficult problem very efficiently. Again, I’ll use the OPAC of the Library of Congress (catalog.loc.gov) as the exemplar here, both because it is freely available on the Internet and because many universities’ local catalogs have jettisoned the necessary capability in their own search software. Let me first note a useful trick mentioned above, but not elaborated. Searchers in this catalog need first to select the “Keyword Search” option on the initial page, and then select from its drop-down menu the option “EXPERT (use index codes and operators).”

Within this EXPERT search box you can then specify in which field, on the catalog records, you wish their desired terms to be found.

The major field delimiters are these:

KSUB (or lowercase ksub) for the subject headings field

KTIL (or ktil) for the title field

KPNC (or kpnc) for the author names field.

Other codes are listed in the drop-down menus of the Advanced Search mode.

In the present example, one could type this string into the EXPERT search box:

KSUB folk? AND KSUB Indians AND K043 n

This search tells the OPAC to look for any appearance of the words folk or folklore as parts of any subject headings, combined with the appearance of the term Indians within any subject headings, combined further with the geographic area code (whose field is specified by K043) for any records whose subject content concerns North America (designated by the n). (Note that a different field called KPUB—irrelevant here—exists for designating the geographic place of publication of a work; what we want in most cases is the area code indicating the geographic subject of the work.)

The use of the geographic area code for North America neatly solves the problem of including all of the North American tribes while, at the same time, excluding all of the records having to do with Indians in South American locales or those from India itself (designated East Indian[s] in LCSH). You could not achieve this comprehensiveness, and this precision, with a keyword search; a good library OPAC search mechanism, however, enables you to see “the whole elephant” clearly, in ways that Internet mechanisms cannot match.

A similar example is provided by the reader who wanted books on “child trafficking in Asia.” Child trafficking is an LCSH heading, so the search could be phased as:

KSUB “child trafficking” and K043 a

The code “a” takes in all of Asia, and so this search turns up works not only on Asia in general but also those specifically on Afghanistan, Bangladesh, Burma, Cambodia, China, India, Indonesia, Nepal, Pakistan, Philippines, Sri Lanka, and Thailand.

A very important distinction to note is that, in the area code system (unlike the LCSH subject heading system), the broader codes do include the narrower geographic codes within them. Thus the code n will retrieve not just works about North America as a whole but also its further subdivisions such as n-cn (Canada), n-us (the United Statesas a whole), n-us-ut (Utah), n-us-wy (Wyoming), and n-mx (Mexico). And note that these narrower codes are automatically included simply by specifying “n”—that is, you do not have to make use of truncation or word-stemming symbols (e.g., n?) to retrieve all of the narrower areas together.

The major geographic area codes are these:

n = North America
s = South America
cl = Central (or Latin) America
e = Europe
a = Asia
f = Africa
u-at = Australia
po = Pacific Oceania
b = Commonwealth countries
d = Developing countries
xd = Western hemisphere

The full list of codes is available online at www.loc.gov/marc/geoareas/gacs_code.html; it is also findable by typing “MARC Code list for Geographic Areas” in Google, Bing, or Yahoo!. Within the large continental categories, individual countries or regions can be further specified, for example:

a-af = Afghanistan
a-cc = China
a-cc-hk = Hong Kong
a-iq = Iran
a-ja = Japan
e-fr = France
e-gx = Germany
e-ge = the former East Germany
e-gw = the former West Germany
e-it = Italy
e-ru = Russia (Federation)
e-ur = the former Soviet Union
n-cn = Canada
n-cn-ab = Alberta
n-us-al = Alabama
n-us-il = Illinois
n-usc = Middle West
n-usn = New England
n-usp = western States
n-usu = southern States

You can also bring up this information in a way similar to finding the LCSH subject tracings for a particular book: find any relevant record through a keyword, author, or title search, and then display the record in its Full Record format. For example, in Figure 10.3, W. R. Smyser’s book Germany and America: New Identities, Fateful Rift? (Westview, 1993) shows two codes, n-us and e-gx. Remember, then, that displays of records in the full format will bring to your attention not just subject headings but also geographic area codes. (The geographic area code is also shown in Figure 2.4.) The Full Record format is the default display in catalog.loc.gov, but it may not be in other libraries’ OPACs.

image

Figure 10.3 Full display of catalog record for Smyser’s Germany and America with arrow pointing to geographic area codes.

Unfortunately, since many individual libraries have chosen OPAC systems that are incapable of using these codes, their catalogers no longer add them to the records they create locally, and these deficient records are then picked up for use by all the other libraries in the system. The OPAC of the Library of Congress, at least (as of this writing), is still the best source for doing searches with geographic area codes. (Even here, however, the system is not perfect because LC itself accepts copy catalog records from other libraries to speed up its own operations.)

Limiting by Document Types

Many commercial databases allow for limitations by other considerations, such as “by language” or “by year(s) of publication.” A particularly useful—but, unfortunately, generally neglected—option is limitation by document type. The best researchers can often achieve amazingly on-target results through exploitation of this search feature. Within the ProQuest ERIC database for resources in the field of education, for example, you can limit your retrieval to any of more than fifty very specific types of material, of which the following is only a sample:

Collected Works: Proceedings
Creative Works
Dissertations/theses
Guides: Classroom
Book/Product Reviews
Journal Articles
Non-Print Media
Reference Materials: Bibliographies
Reports: Research
Tests/Questionnaires
Multilingual/Bilingual Materials

Similarly, within EBSCO’s PsycINFO database (see Chapter 4) you can limit by several document types, among them:

Bibliography
Chapter
Column/Opinion
Comment/Reply
Dissertation
Editorial
Encyclopedia Entry
Journal Article
Obituary
Review-Book

In Chapter 8 there is a similar list of the document types that can be “limited” to in the Web of Science database (e.g., Article, Bibliography, Book Review, Editorial Material, Letter, Software Review, and Review).

With options such as these, teachers searching ERIC can zero in on curriculum guides (“Guides: Classroom”) or tests; psychology grad students using PsycINFO can look immediately to see if doctoral dissertations have already been done on their topics; and general researchers in Web of Science can quickly locate literature review articles in any academic field. (In the above example of the researcher who wanted information on “education in the Philippines from 1898 and 1916,” a limitation to literature review articles in the Web database produced just such an article, with 94 footnotes, on “textbook wars” in public education in the Philippines from 1904 to 1907.)

Without the document type limiting features, the best material could easily be lost within large jumbles of mostly irrelevant hits—especially since the format of any document will usually not be revealed by its title or abstract keywords. Moreover, as I’ve mentioned before, it is very rare for any researchers (other than librarians) to specify format designations in their searches; most people simply type in subject keywords without any thought for the different document types in which they might appear. But if you limit, right from the start, the field in which your search terms apply (obituaries, literature reviews, curriculum guides, etc.), then you will immediately be zeroing in on only the most relevant literature while simultaneously excluding vast ranges of material that would otherwise bury the best sources within mountains of unwanted chaff. Crossing that line—i.e., adding format designations to your subject searches—goes a long way toward moving you from the amateur level to that of professional researcher.

In a sense, then, where you type the words (within which format fields) is just as important as how you type them. The same point applies to other “nonformat” parts of the records: sometimes, especially in keyword databases not having controlled subject descriptors, you will get the best results by restricting some your search terms to appearances specifically within titles of articles in combination with other terms appearing in broader sections of the same records, in either abstracts or full texts. In the ProQuest Dissertations and Theses Full Text database, for example, you will usually get a better set of on-target results by limiting your searches to the title and abstract fields rather than searching full texts right off the bat. The title/abstract combination is indicated by the drop-down menu option “Anywhere except full text—ALL.”

In most subscription databases you cannot rely on relevance-ranking computer algorithms to make these distinctions among searchable fields for you.

Combining Keywords with Citation or Related Record Searches

A couple peculiar but very useful options in doing Boolean combinations show up in the Web of Science database (described in Chapter 6). These are the capabilities of combining keyword results with citation search results as well as results of related record searches with further keyword specifications. Let me give some examples.

A search for articles on the topic “changing paradigms in the concept of property” can be done in very interesting ways in the Web database. One approach, of course, is simply to look for a simple combination of the keywords “property” and “paradigm*”; this does produce relevant hits such as articles entitled “Protecting Intellectual Property—New Technologies, New Paradigms,” “Information, Incentives, and Property Rights— The Emergence of an Alternative Paradigm,” and “Symposium—Toward a 3rd Intellectual Property Paradigm.”

There are other ways to come at this topic, however. Anyone who writes about paradigms in a scholarly journal probably has a footnote citing the book that put this term into prominence: Thomas Kuhn’s The Structure of Scientific Revolutions. Similarly, a scholarly discussion of private property is likely to cite the classic work on the subject, John Locke’s Second Treatise on Civil Government. When footnotes are introduced as search elements, a researcher then has a large variety of relevant elements that can be brought into a Boolean combination:

#1 the word “property” itself appearing in the title of an article
#2 a footnote referring to John Locke’s work on property
#3 the word “paradigm*” appearing in the title of an article
#4 a footnote referring to Kuhn’s book on paradigms

In the Web’s Advanced Search box one can then combine the results of several these separate searches:

(#1 OR #2) AND (#3 OR #4)

The results will include a number of relevant articles that do not have both keywords in their titles. The article entitled “The Concept of Private Property in Constitutional Law—The Ideology of the Scientific Turn in Legal Analysis,” for instance, has the work “Property” in its title, but not the word “Paradigm.” The latter concept is included however, because this article cites Kuhn’s work in a footnote.

Similarly, the article “Paradigms as Ideologies—Liberal vs. Marxian Economics” does not have the word “Property” in its title, but it does have a footnote citing Locke’s Second Treatise. And the articles “A Consent Theory of Contract” and “The Constitution and Nature of Law” have neither relevant keyword in their titles, but each article cites both Kuhn and Locke in its footnotes. The ability to search footnote citations, and to combine them with either keywords or other footnote references, is an option that few researchers think of, but it can provide extraordinary results.

Another researcher interested in assessing models for comparing the operations of the U.S. and German Supreme Courts found useful results by first assembling a set of articles citing important authors who had already done relevant work on the two courts and then combining those sets with the keyword “model*”—thereby finding some hits that were quite good, even though they did not contain the keywords “Supreme Court” or “Bundesverfassungsgericht” (the German federal constitutional court) in their titles or abstracts.

If you want to be an expert searcher you should watch for opportunities to employ this search technique, especially if there are standard works (or authors) in your field of interest that are likely to be cited frequently.

Another wrinkle on combining search elements comes from the capability of the Web of Science database to cross related record search results (see Chapter 7) with keywords. For example, I once helped a young woman who wanted information on the linguistic remnants of African slaves’ speech in Venezuela and its influence or survival in the local Spanish. This involved three steps:

1. An initial keyword search: Venezuela* and (Africa* or slave*) and (language* or linguistic* or lexic* or speech). This immediately turned up one good article on “Some Lexical Linkages between Africa and Venezuela.”
2. A related record search starting from this one article led to a listing of 399 other articles in the database that have footnotes in common with it. The important consideration here is that none of these necessarily has any keywords in common with the starting-point article—only articles having footnotes in common with it define the set. Thus, right at the top of the list was an article with the title “Studies in Afro-Hispanoamerican Linguistics”—a literature review article with 146 footnotes—which is in the ballpark, but without using any of the (Venezuela* and [Africa* or slave*]) keywords I had specified.
3. Since articles within the list of 399 could have any words at all, I then used the “Refine Results” box to specify that I wanted only those (within the set of 399) that did have the exact keyword “Venezuela*” somewhere in their titles or abstracts—but not necessarily any of the other terms I had originally specified. This refinement produced an article entitled “Black Rural Speech in Venezuela.” I had initially missed this highly relevant source because I hadn’t thought to use the keyword “Black” in my search. But the related record search plus the keyword refinement of it brought it to my attention even when I couldn’t specify all of the relevant terms.

Refresher: Combinations without Computers

Combining terms, or sets of terms, via computer searches is an extremely useful capability, especially if you employ refinements such as word truncation, proximity searching, field specification, and set limitation. But it is also important to remember that you have additional, and very powerful, mechanisms that enable you to effectively combine two or more search elements in ways that lie “outside the box” of postcoordinating computer systems. These other mechanisms have already been touched on separately but should be brought together for emphasis as they are so often overlooked:

Precoordinated subject heading strings in Library of Congress Subject Headings, particularly in online catalog browse displays (Chapter 2)
Index pages or tables of contents in published bibliographies (Chapter 9)
Subject-classified bookstacks enabling you to do focused browsing (Chapter 3)

Precoordinated LCSH terms, discussed in Chapter 2, effectively combine two or more search elements into a single subject heading, such as the following:

Women in aeronautics
Sports for children
Theater in propaganda
Minorities in medicine—United States—Statistics
Education and heredity
Doping in sports
Architecture and energy conservation—Canada
Erotic proverbs, Yiddish
Church work with criminals
Hallucinogenic drugs and religious experience in art—Mexico
Odors in the Bible
Smallpox in animals
Miniature pigs as laboratory animals
Television and children—South Africa—Longitudinal studies

Many of the strings within the LCSH system, again, are created by the use of standardized subdivisions, and these linkages often show up in online browse displays without being recorded in the LCSH list of subject headings:

United States—History—Civil War, 1861-1865—Regimental histories—Illinois infantry—51st—Company E
World War, 1939-1945—Underground movements—France—Chronology
Juvenile delinquency—Great Britain—History—Sources
Corporations—Charitable contributions—Japan—Directories
Hospitals—Job descriptions
Potatoes—Social aspects—Ireland—History
Mexican American agricultural laborers—Bibliography
Cancer—Psychological aspects—Case studies
Bird droppings—Pictorial works
Toilet training—Germany—Folklore
Flatulence—Dictionaries—French

There are hundreds of thousands of actual and potential precoordinated headings in any OPAC that uses the LCSH system. Reference librarians are trained literally to think in these terms. To the extent that you anticipate the probabilities that there may be precoordinated headings for your subjects, you can exploit whole arrays of them via cross-reference links and, especially, browse displays in online catalogs (at least in the OPACs that are structured well enough to display them). The ability to exploit browse displays will frequently enable you to surpass any results obtainable from combining separate terms in a blank search box, because browse menus enable you to recognize combinations that you could never think of in advance.

Another mechanism for combining two subjects without using a computer is that of published bibliographies, as discussed in Chapter 9. The trick here is simply to find a bibliography on the first topic of interest and then to look for the second topic within its index (or table of contents). This search technique is especially useful when looking for a particular topic in connection with a literary or historical figure, as there are thousands of excellent book-length bibliographies available on such individuals. For example, a scholar looking for material comparing the philosophy of Sartre with that of Christianity could turn to François Lapointe’s Jean-Paul Sartre and His Critics: An Annotated Bibliography 1938–1980), 2nd ed. (Philosophy Documentation Center, 1981). He could then simply turn to its index to see which of the studies is listed under “Christianity.” (There are 11.) Similarly, a researcher looking for material discussing both Samuel Beckett and Alberto Giacometti turned to Cathleen Andonian’s 754-page Samuel Beckett: A Reference Guide (G. K. Hall, 1989) and simply looked under “Giacometti” in its Subject Index to find four articles (including two that do not show up in an online search of the MLA International Bibliography.)

Yet another mechanism exists, “outside the box” of computerized retrievals, for effectively combining two subjects: focused shelf-browsing, as discussed in Chapter 3. There, the example of “traveling libraries in lighthouses” is relevant. In that case, after first exhausting all of the computer databases I could think of, I went directly to the bookstacks having the group of texts on “Lighthouse Service” (VK1000-VK1025) and quickly flipped through all the volumes on those several shelves, looking for “libraries” as either an index entry or a text work within the books. I found 15 directly relevant sources. The trick here, then, is similar to that with published bibliographies: start by finding the classification area for the first subject, then look for the second subject within the books shelved in that limited class area. This focused browsing technique enables you to cross subjects that cannot be brought together by Boolean searches in computers.

The moral of the last several paragraphs can be summarized briefly: do not rely exclusively on Internet searches for in-depth or scholarly research, as they are too limited both in the content covered and in the search techniques they provide for access to the contents. The same can be said even for the many thousands of commercial databases that are not freely accessible on the open Internet. Don’t allow yourself to be boxed in exclusively by computerized resources; even within them, don’t think that you always have to rely solely on postcoordinate combinations (see Chapter 2) of only the terms you can guess at on your own. If you want to be a good researcher, you need to be aware of all of the options available.