INDEXING

In a dedication preceding his Natural History (70s CE), Pliny the Elder speculates modestly that the emperor Titus might have more important things to do than to read the vast work from start to finish:

As it was my duty in the public interest to have consideration for the claims upon your time, I have appended to this letter a table of contents of the several books [quid singulis contineretur libris: what is contained in the individual books], and have taken very careful precautions to prevent your having to read them. You by these means will secure for others that they will not need to read right through them either, but only look for the particular point that each of them wants, and will know where to find it.

The listing that follows breaks down the Natural History effectively into chapters, giving topics in the order they appear, each epitomized in a phrase, for example, “The world—is it finite? is it one? its shape; its motion; reason for its name.” This is not the first table of contents in antiquity; indeed Pliny notes that he took the idea from Valerius Soranus’s Epoptides (now lost; meaning is debated, perhaps: Initiated Women). Nevertheless, Pliny’s description of it expresses nicely both the point of an index—readers’ time is finite, and in much of our reading we need only part of a book, not the whole thing—and the basic principles of indexing: abstraction and arrangement. The table represents the work in miniature, with its captions naming the parts of the text that they stand in for. At the same time, because it mirrors the structure of the main work, the table indicates where to go to see any of its entries expanded. An item near the start of table can be traced to a location near the start of the work, and so on.

This brings us up against the difference between what, in modern terminology, we would speak of as a table of contents and an index proper. Both, we might say, are the products of indexing—they both work by abstraction and arrangement—but in the former, the arrangement comes ready-made: it is one of similarity with its referent; in the latter, however, the terms are reorganized, most commonly into alphabetical order, so that the ordering, in relation to the source, is arbitrary. By severing the relationship between content and arrangement, remapping the material instead onto a sequence that any literate person will have learned during his or her schooling, alphabetical order offers a universal, text-independent navigation system for verbal information. What may feel intuitive to us has certainly not always seemed so, and we can find instructions for use prefaced in alphabetically arranged works as late as Robert Cawdrey’s English dictionary, A Table Alphabeticall, of 1604: “Nowe if the word, which thou art desirous to finde, begin with (a) then looke in the beginning of this Table, but if with (v) looke towards the end.”

Clay tablets discovered in northern Syria have shown that the sequencing of the letters in the Ugaritic alphabet—which carries through to the Hebrew and Greek, and ultimately Roman alphabets—was established by the middle of the second millennium BCE. In Hebrew, we can see this order structuring some of the acrostic sections of the Hebrew Bible—the book of Lamentations, for example, and several of the psalms. In Greek, abecedaria—that is, simply, rows of the letters of the alphabet in order—have been discovered dating from the eighth century BCE onward. However, the use of alphabetical order in Greece—other than as an aid for the acquisition of basic *literacy—seems not to have come until rather later. The rise of the alphabetically ordered list, spreading out from Alexandria during the third century BCE, has led to the suggestion that it was developed in response to the vastness of the newly founded library there, a way of bringing the collection under control. Our first significant instance of indexing, then, comes in the form of not a book index, but a library catalog.

Compiled around the middle of the third century, the Pinakes of Callimachus (ca. 303–240 BCE) was a huge bibliography that is said to have taken up 120 rolls itself. Although only short fragments of the original work survives, we can derive a sense of its arrangement from the couple of dozen fragmentary references to it in other works. From these we can deduce that it was organized firstly into *genre—rhetoric, law, epic, tragedy, and so on—and within these classes, authors were arranged alphabetically by name. Under each name, Callimachus listed biographical information relating to that author (e.g., father’s name, birthplace, nickname, profession), followed by a list of their works, distinguished into genuine and spurious, along with their first few words and their length in lines. Rolls in the library would have been stored in racks of pigeonholes, and it has been suggested that these shelves would have been labeled with tablets (Greek pinakes) indicating genre and author. If this is the case, Callimachus’s title nicely expresses the spatial relationship between an index and its referent: what you see here can be found there.

Thus, in antiquity, we find both alphabetical order and the table of contents. The tools of indexing are in place, and yet it will be a long time before the arrival of the book index proper. Although certain works, notably glossaries, were organized alphabetically during the first millennium of the Common Era, ordering beyond the initial letter (so that, for example, ant comes necessarily after aardvark) seems to have been lost in western Europe. When Papias the Lombard boasts, in the mid-eleventh century, that his dictionary, the Elementarium doctrinae rudimentum (Basic rudiments of learning), is alphabetical to the third letter, he is able to claim this as an innovation. Papias’s precision will be important, a century and a half later, when the arrival of two institutions—the universities and the mendicant orders—will necessitate a new way of reading, more like the one envisaged by Pliny of his time-poor emperor.

As the requirements of preaching and teaching demanded a more efficient type of reading than the meditative monastic mode, the decades around the turn of the thirteenth century saw an extraordinary series of innovations on the page. Mary and Richard Rouse reel off a few of these—“running heads, chapter heads in red, initials in alternating red and blue, different sized initials, marked paragraphs, references, names of cited authors”—but the same moment also sees the division of the books of the Bible into chapters, accomplished by the English cleric Stephen Langton in around 1200. With this chaptering in place as a suitable locator, the stage was set for the first great indexing milestone of the Middle Ages, the Bible *concordance.

The first concordance to the Bible was compiled by the friars of the Dominican priory of St. Jacques in Paris. The work was begun under the direction of Hugh St-Cher, dating it to between 1230 and 1235, and completed no later than 1247. Using Langton’s Bible chapters and then applying a further subdivision into sevenths, labeled a to g, the concordance gives roughly 129,000 locators for around ten thousand keywords. Some leaves of preparatory materials, discovered as binding waste, show how the labor was divided up, with different parts of the alphabet compiled in different hands. These notes were collated, their ordering tightened up, and then retranscribed in full, so that the finished work runs from A, a, a (an exclamation usually translated as Alas) through to Zorobabel.

The St. Jacques Concordance uses a large amount of abbreviation so that the entire work can be crammed into a single volume. Nevertheless, it suffers from a significant drawback (one that still dogs bad indexes today). Entries are presented as single, undifferentiated lists of locators, often running to the hundreds. A reader wishing to track down a particular instance of the word Deus (God) or Peccatum (Sin) will find the St. Jacques Concordance almost useless as a time-saver. And so, a few decades later, a second concordance was produced, again at St. Jacques, this time by the Englishman Richard of Stavensby and his accomplices, hence its name, the Concordantiae Anglicanae or English Concordance. The innovation of the second concordance was to include, for each reference, a few words of context: the phrase in which the given term appeared. So, for example, the first entry for Regnum (Kingdom) appears as “Gen. x.c. fuit autem principium .R. eius Babilon et arach,” telling us that it appears just before the mid-point of Genesis 10, in a sentence that runs (to use the King James translation), “And the beginning of his kingdom was Babel, and Erech.” Essentially, this is what we would now call a keyword-in-context or KWIC index: a snippet view of the surrounding text.

The drawback here is that each reference, formerly a simple abbreviated locator, has ballooned into a whole line of information. With over a hundred thousand of these, the English Concordance was, of necessity, a large, multivolume work, far from portable, and too cumbersome to be convenient. A third version, then, whose contextual passages were limited to four or five words, was compiled and completed by 1286. Neither too small nor too big, it was the Third Concordance that would become the model for Bible concordances for centuries to come.

In 1230, just as the friars of St. Jacques were beginning the first concordance, across the channel in Oxford, the scientist and theologian Robert Grosseteste was applying the principles of abstracting and arrangement in a rather different way, compiling not an index to a single book, but one that would map the great swathe of his *encyclopedic reading. Unlike the word index of the Dominicans, Grosseteste’s was a true subject index, one that identified several hundred topics, such as fate, or tithes, or the unity of God. For each of these, Grosseteste devised a small but distinctive glyph (the symbol for imagination, for example, is a six-petaled flower) that he could then jot in the margin whenever he encountered a particular topic. Scanning these margins, Grosseteste was then able to compile a master record of all the appearances of each icon. Now in the Bibliothèque municipale de Lyon, what survives of Grosseteste’s Tabula (MS 414 ff. 17–32) shows the entries for the first few dozen of these entries. The arrangement is not alphabetical, but rather topics are grouped into nine major classes: God, creatures, the holy scriptures, and so on. Under each topic, a list of scriptural and patristic references appears first, with classical and Arabic texts in a separate section to the side.

Thus, by the middle of the thirteenth century we have both alphabetical and topical indexing. Over the next century, the two will come to be commonly applied together, with readers writing indexes into their own books, and using foliation or preexisting divisions of the text as locators. Papal records from the 1320s show payments being made for the compilation of tables—indexing had become a profession—and when print arrives, it is not uncommon for incunables essentially to reissue the work of earlier indexers. The index to Caxton’s Polychronicon (A chronicle of many ages) of 1482, for example, is drawn from a Latin manuscript of Higden’s text and simply translated into English (without being fully realphabetized, so that some entries end up under the wrong letter). The first printed index appears in Fust and Schöffer’s edition of Augustine’s De arte praedicandi (On the art of preaching) (Mainz, no later than 1465), but of greater moment perhaps is another work printed five years later. The Sermo ad populum predicabilis (Sermon ready to be preached to the people) (Cologne, 1470) is a short printed sermon in which a printed numeral—a folio number, forerunner to the page number—appears in the right-hand margin of every recto page. As Margaret Smith has shown, printed foliation or pagination was not widely adopted until the early sixteenth century; nevertheless, the innovation was a crucial one for indexing. Different manuscript copies of the same work rarely have the same pagination, so manuscript indexes rely for their locators either on shared textual divisions (book; chapter) that lack granularity, or on foliation that is copy specific. Printed pagination, stable across the whole print run of an edition, meant that any text came already provided with a uniform, highly granular system of locators. As long as they were using the same edition, page 16 would be the same for any reader, whether they were in Venice, Paris, or Oxford.

The story of the index in the print era has been one of deepening complexity and growing pervasiveness. From the late sixteenth century, indexes begin their migration from the front to the back of our books, while standardization affects the syntax of keywords such that, by the late nineteenth century, Lewis Carroll can spoof it in his index to Sylvie and Bruno (1889), for example, “Scenery, enjoyment of, by little men.” Following a period of proliferation (for example, the multiple indexes of Alexander Pope’s Iliad, with separate tables for “Persons and Things,” “Fables,” “Characters,” “Speeches,” “Descriptions,” and “Similes”) index form contracts into the single consolidated table we generally find today. Alongside these developments, there is also an emerging anxiety that the convenience that Pliny foresaw—that we would not need to read right through our books—is in some way dangerous, undermining deeper modes of reading or learning. We find it in Pope’s jibe that “index-learning turns no student pale,” as well as in Galileo’s dig at “that herd who, … in order to acquire a knowledge of natural effects, do not betake themselves to ships or crossbows or cannons, but retire into their studies and glance through an index or a table of contents to see whether Aristotle has said anything about them.” The convenience of the index, and its efficiency as a mode of information retrieval, is seen as a threat to other, older modes of reading.

Nevertheless, the march of the index has been irrepressible, and efficient retrieval as important an object for Chinese lexicographers of the twentieth century and for the architects of the *digital revolution as it was for the Dominicans of the late Middle Ages. In China, the index movement spearheaded by Wan Guoding in the 1920s addressed itself to the problem of ordering in a character system that is nonalphabetic. Multiple approaches have subsequently been adopted based on, for example, the simplified description of the shapes found in the corners of a given character (the four corner method). In the *big data age, meanwhile, we find Google summarizing the two basic operations of its flagship search product as “Crawling and Indexing.” The latter is explained with the aid of a familiar model: “It’s like the index in the back of a book—with an entry for every word seen on every web page we index.” The friars of St. Jacques would surely recognize the approach; so too would those veteran detractors, the Galileos and the Alexander Popes. In the era of the digital index, out of sight, but underpinning every search we perform online or on our laptops, this anxiety—that attentive, sustained “deep” reading is under threat—is being felt with a new keenness.

Dennis Duncan

See also books; data; files; learning; libraries and catalogs; lists; readers; storage and search; teaching

FURTHER READING

  • Rudolf Blum, Kallimachos: The Alexandrian Library and the Origins of Bibliography, translated by Hans H. Wellisch, 1991; Lloyd W. Daly, Contributions to a History of Alphabetization in Antiquity and the Middle Ages, 1967; Joseph A. Howley, Aulus Gellius and Roman Reading Culture: Text, Presence, and Imperial Knowledge in the “Noctes Atticae,” 2018; Mary A. Rouse and Richard H. Rouse, “La Naissance des index,” in Histoire de l’édition française, edited by Henri-Jean Martin and Roger Chartier, 4 vols. (1983), 1:77–85; Margaret M. Smith, “Printed Foliation: Forerunner to Printed Page-Numbers?,” Gutenberg-Jarhbuch 63 (1988): 54–70; Hans H. Wellisch, “Incunabula Indexes,” Indexer 19, no. 1 (1994): 3–12; Francis J. Witty, “The Beginnings of Indexing and Abstracting: Some Notes toward a History of Indexing and Abstracting in Antiquity and the Middle Ages,” Indexer 8, no. 4 (1973): 193–98.