DATA

Data is everywhere today, not only in the domains of science and technology, but in business, politics, sports, and the arts. And while claims about the ubiquity of data in our environment may be more or less accurate, even as claims, they represent something powerful: the idea of data has become central to contemporary culture, to our understanding of our world, our times, and ourselves.

Of course, neither the idea of data nor the technical practices that support it are entirely new. In one way or another, we have inhabited data cultures since the first tax rolls were inscribed and populations counted. Yet the need to call data “data,” to distinguish it from other sister concepts such as records, *facts, information, or evidence is much newer. If we use the invention of the modern term “data” as an indicator, we may consider the seventeenth century as the beginning of our era.

There are a number of reasons why thinking “data” in the perspective of four centuries is useful today. In the first place, it gives a larger view than that offered in discussions of data limited to the electronic age. The seventeenth and eighteenth centuries belong to what historians call *“early modernity.” This is our world, the world of globalization, technology, and capital at its first coalescence, when many of the institutions and ideas we now take for granted were still new and often shocking. While we may say that computer technology drove data to the center of culture, we may equally say that a preexisting early modern culture of data generated the demand for computer technology in the first place. The seventeenth-century world was steeped in many kinds of data immediately recognizable as such today, ranging from John Graunt’s mortality tables to the gold-clasped accounts book that Louis XIV kept in his pocket to the “weather clock” designed by the great architect Christopher Wren to automatically record measurements of temperature and barometric pressure.

Yet in the seventeenth century, the word “data”—our own favored term for this kind of information—was still emergent and in usages unfamiliar today. Indeed, a glance at the most accessible historical data on “data” for this period, the Google Books ngram for the word, reinforces this hunch. By counting and plotting how many times a word appears in its data set in a given year relative to the total words occurring in that year, the Google Ngram Viewer gives us a clue about when terms have been more or less important in languages.

The results of the Google ngram for “data” are both striking and intuitive (figure 1). In the years before 1800, the curve hugs zero. Around 1800, we see a very gradual rise until the twentieth century, when the curve is flat, and then at midcentury and late in the century it approaches vertical. It is easy to look at the graph and infer a relationship with the modern history of computing technology, with inflection points roughly aligned with, say, the paper punch card, the electronic computer, and the *internet.

Interestingly, however, the term “data” turns out to be a good case for how difficult it can be to interpret results from a data tool such as the Google Books Ngram Viewer. Here, the tip-off—and irony abounds—comes not from comparative quantitative data but from an older kind of qualitative data, an entry in the Oxford English Dictionary (OED). It is worth saying that the methods of the OED are not as dusty as one might assume. A monument of pen-and-paper scholarship, the OED was actually crowd-sourced using the networking methods of its day, press and post. Quotations were contributed by ordinary readers, which were mailed to the OED, sorted, and then filed in a purpose-built data collection center known as the *scriptorium.

This device does not support SVG

Figure 1. Results of the Google ngram for “data.”

From the OED, we learn that the term “data” emerged in English usage not in the 1940s but in the 1640s. We learn too that the usage of the term changed and flourished during the century and a half that followed. We also learn that these early usages both resembled and differed from what we expect “data” to mean in telling ways. The very first use cited by the OED, from the Anglican controversialist Henry Hammond, turns out not to refer to quantitative information, but rather, to theological precepts. What are we to make of this? And how are we to fit it into our modern cultural and linguistic picture?

There are a number of reasons for Google’s relative blindness to the early history of the word “data.” In general, Google Books presents challenges for researchers, not least because when it has digitized library books, it has not also digitized bibliographic *metadata from donor libraries. Many errors, including errors in assigning publication dates, have resulted. Often, books that Google claims were published in the eighteenth century and before turn out to be twentieth-century books that Google’s *machine learning algorithm has coded incorrectly. Typeface in old books often boggles Google’s *optical character recognition. And, for quantitative research, both problems are exacerbated by the relatively small sample that Google Books works with. Prior to the nineteenth century, the panacea of so-called *big data is absent. There are other reasons, too, more specific to the history of the term “data.” One of the most notable things about the four-character string “d-a-t-a” is that it is very common in Latin, the language from which the modern term “data” is derived. Yet the uses of “data” in Latin are different from and more various than those in modern languages. Today’s buggy optical character recognition technologies still have serious trouble distinguishing “data” from “date” and other visually similar character strings in pre-nineteenth-century typefaces. Moreover, the early modern usage history of the term “data” in European *vernaculars throws a screwball at any simple counting operation: at the beginning of the period in which we are interested, there was a lot of Latin being published, even in books that were otherwise written in modern languages.

In the general culture of the seventeenth and eighteenth centuries, “data” still evoked specialized kinds of argumentation and the special situation of argument. As the etymology of the word indicates—“data” is the neuter past participle of the Latin verb dare (to give)—“data” in the early modern period were “givens.” What “data” meant depended on what kind of argument one was making, what kind of facts, principles, or values, might be “given” in a particular argument. In that first OED citation, “data” were theological propositions, and this was no anomaly. “Data” during the seventeenth century was frequently used to mean something like the opposite of recorded facts. This was to change around 1750, when something resembling our current expectations emerged. The story of how this shift took place is at once the story of modern *epistemology and the prehistory of our information culture.

If Henry Hammond’s usage in 1641 strikes us as odd, the same is not true for the usage we find in an October 1775 letter from Benjamin Franklin to the English scientist and theologian Joseph Priestley, which employs the term “data,” with some irony, to describe a kind of political calculus not at all foreign today. There, Franklin suggests that Britain reconsider its opposition to American independence. He writes, “Britain, at the expence of three millions, has killed 150 Yankies this campaign, which is 20,000£ a head; and at Bunker’s Hill she gained a mile of ground, half of which she lost again by our taking post on Ploughed Hill. During the same time 60,000 children have been born in America. From these data his mathematical head will easily calculate the time and expense necessary to kill us all, and conquer our whole territory.”

In Franklin’s writing, “data” refers to quantitative facts gathered through observation and collection and subject to mathematical analysis. That Franklin might use the term “data” so casually strongly suggests that he took his usage to be transparent, if perhaps still neologistic. In the mid- and late eighteenth century, Hammond’s usage was waning, and the sense employed by Franklin was catching on.

The term “data” appears in a wide variety of contexts in eighteenth-century English writing. But what were these early usages? What was their importance in the language and culture of the eighteenth century? And what was their connection to the usages familiar today? It is a bit surprising that we don’t know more of the answers to these questions already. During the past decade, we have seen excellent books published on related terms including “fact,” “evidence,” and “truth.” Lorraine Daston has referred to this kind of scholarship as historical epistemology, the study of the conditions of knowledge in different periods.

In this field, “data” has a complex and dynamic role to play. A “datum” in English is something given in an argument. This is in contrast to a “fact,” which derives from the Latin verb meaning “to make” or “to do,” so that a “fact” is that which was done, occurred, or exists. The etymology of “data” also contrasts with that of “evidence,” from the Latin verb “to see.” There are important distinctions here: facts are ontological, evidence is epistemological, data—something given in argument—is rhetorical.

This distinction was essential to the early modern usage, and it remains so today. For early modern theologians, statements given in scripture were “data” in the most fundamental sense. Because they were known to be true, they were not to be put into question. In early modern usage the term “data” was also widely used to refer to values given in mathematical problems. These data were to be taken for granted because they were given arbitrarily; from them, there was no underlying truth to discover. “Data” in these two realms was given in the same sense but for precisely the opposite reason. In both cases, the label “data” served to distinguish the propositions or values in question from facts that could be profitably interrogated.

In early modern usage, a datum could also be a fact, just as a fact could be evidence. But, from its first formulation, the term “data” was useful—and distinct from these other terms—because it set to the side the question of referential truth. In this respect, the contrast with “facts” is particularly revealing. For facts to be facts, they must be true. Data, on the other hand, may be—and very often is—erroneous or confected. None of this affects its status as data. Facts proven false cease to be facts. Data proven false is false data.

From the beginning, then, “data” had this peculiar and powerful character. Its ontology was forward looking. Yes, “data” of the sort that Franklin discusses could be more or less accurate. Yes, “data” that claims to represent things in the world can and should be interrogated in these terms. Yet what makes “data” data is that one may operate on it without posing that question. As we have seen in Hammond, in early modernity, “data” and “facts” were as often as not conceptualized as contraries. In the age of Cambridge Analytica, this fact feels arrestingly relevant.

In the mid-eighteenth century, the terminological water became muddier. Over the following two centuries, “data” mattered more and more, but mostly in fairly restrictive scientific and bureaucratic settings. During this period, “data” and “facts” seemed less opposed to one another, and finally, in many situations, they could be substituted for one another freely.

To a great extent, this situation endured all the way until the mid-twentieth century. Here the Google ngram is right on the money. Throughout the nineteenth century and into the twentieth, the term “data” was used more frequently in more contexts. Its history in this period roughly parallels that of “facts” and for good reason: in contrast to the seventeenth- and early eighteenth-century usage, from the *Enlightenment forward, scientific givens were increasingly expected to be factual.

And then something changed again. With the emergence of information theory and electronic computing, a more general terminological need arose. Just as in the seventeenth century, in the second half of the twentieth, through to the age of derivatives, it became important to distinguish between facts and givens. This second time around, the need arose because some term had to describe the values on which computation is effectuated independent of any question of representational truth.

Here, then, is something that is hard to perceive from the quantitative data on language that one may extract from a resource such as the Google Books Ngram Viewer, which seems to make “data” a twentieth-century development. One part of the story the Google data tells quite well: as an idea in our general culture, “data” matters more in the nineteenth century than in the eighteenth, more in the twentieth than the nineteenth, and, at the end of the twentieth century, more than ever before. Another part of the story, the Google data misses entirely: as a cultural tool and an intellectual razor, “data” matters now in a way that more resembles its importance in the seventeenth century than at any time since. From this perspective, this is a story of profound epistemological circularity.

Surprisingly, then, in the case of the history of “data,” the OED is right, and Google is wrong. Or at least, Google on its own is not very helpful on the question. There are definitive trends in both the currency and the usage of the term “data” prior to the late modern period. But Google, with its massive resources, does almost nothing to make them visible.

For the moment, it is a win for nineteenth-century scholarly practices, but it is not a victory that is likely to stand for long. Even the venerable OED is moving to embrace a data-driven approach, which must be as good a signal as any that we should be ready to engage with quantitative approaches in the *humanities in a strong, critical fashion.

In the end, what does the history of the term “data” have to tell us about data today? There are a number of possible answers to this question, but one is worth particular attention: from the beginning, “data” was a rhetorical concept. “Data” means—and has meant since the beginning—that which is given. As a consequence, “data” serves as a kind of historical and epistemological mirror, showing us what, in any period, we take for granted. Without changing meaning, over the course of time, “data” has repeatedly changed referent. It went from being reflexively associated with things outside of any possible process of discovery to the very paradigm of what one seeks through experiment and observation.

Because data matters so much in our world, it is tempting to want to discover its essence, to define exactly what kind of fact it is. But this misses the most important reason why the word is useful. The data concept was innovated as a way of setting aside the question of facts. It reemerged at the center of our general culture as it came more and more to produce facts of its own.

Daniel Rosenberg

See also computers; cybernetics/feedback; databases; digitization; quantification; storage and search

FURTHER READING

  • Elena Aronova, Christine von Oertzen, and David Sepkoski, eds., “Data Histories,” special issue of Osiris 32, no. 1 (2017); Soraya de Chadaravian and Theodore M. Porter, eds., “Histories of Data and the Database,” special issue of Historical Studies in the Natural Sciences 48, no. 5 (2018); James K. Chandler, Arnold Ira Davidson, and Harry D. Harootunian, eds., Questions of Evidence: Proof, Practice, and Persuasion across the Disciplines, 1994; Lorraine Daston, ed., Science in the Archives: Pasts, Presents, Futures, 2017; Lorraine Daston and Peter Galison, Objectivity, 2010; Lisa Gitelman, ed., “Raw Data” Is an Oxymoron, 2013; Mary Poovey, A History of the Modern Fact: Problems of Knowledge in the Sciences of Wealth and Society, 1998; Daniel Rosenberg, “Data as Word,” Historical Studies in the Natural Sciences 48, no. 5 (2018), 557–67; Steven Shapin, A Social History of Truth: Civility and Science in Seventeenth-Century England, 1994.