Paper Trails

Note on Methods

Paper Trails is a work of digital history, a field that revolves around the use of computational methods to understand the past. This approach produced new findings and interpretations about the nineteenth-century American state and the western United States, while also allowing me to communicate those findings through maps, charts, and other data visualizations. In the interests of readability, I have tried to avoid detailed technical discussions about the data and methods of this approach within the main text of the book.

Many of the maps in this book rely on a particular dataset: post office records transcribed by the philatelist and postal historian Richard Helbock. The Post Office Department recorded the names of all post offices in the country and the dates for when they were established, discontinued, or reopened along with any time they changed names. The vast majority of this information is housed by the National Archives through microfilmed “Records of Appointment of Postmasters.” Helbock spent years poring over these records and consulting other local sources in order to build a dataset of every post office that existed in the United States, including their names, the states and counties in which they were located, and their dates of operation. Helbock passed away in 2011, two years before I discovered his work and purchased a CD-ROM of the dataset from his wife, Catherine Clark. I am indebted to Helbock, without whom this book would have been impossible.¹

The Helbock dataset is a remarkable source of historical information, but it required several additional steps to turn it into a spatial dataset. This involved a process known as geocoding, or assigning geographical coordinates to each post office so that they could be placed on a map. Instead of trying to locate all 166,000 post offices by hand, I wrote a computer program to try to do this automatically. Generally speaking, the program took the name, county, and state of each post office and attempted to locate a corresponding record in the Geographic Names Information System (GNIS), a database of several million geographical features compiled the US Board on Geographical Names. This process allowed me to locate roughly two-thirds of the post offices in Helbock’s dataset, which I then used to generate the maps in this book.²

There are limitations to this approach. Most obviously, a lack of geographical coordinates for such a large portion of the data hinders precise quantitative analysis, such as measuring the exact geographical distribution of post offices in different parts of the country. I have largely avoided this kind of analysis and instead used the dataset to visualize spatial patterns more generally. Even so, a map that is missing thousands of post offices can paint an equally misleading picture. To mitigate this, I assigned semi-random coordinates to each post office with a missing location, but confined these coordinates to an area within the surrounding county’s boundaries. These semi-random post office locations only appear on national maps of the contiguous United States and regional maps of the American West, and are indicated by a lighter, semi-transparent shade of gray in order to convey their uncertain status. The accuracy or inaccuracy of any individual post office is largely undiscernible to the human eye at this scale and with this symbology. In aggregate, the inclusion of these semi-random post office locations provides a more accurate impression of the network than if I had left them out entirely.

Several other features of this dataset should be kept in mind. First, individual post offices changed names with surprising frequency in the nineteenth century. Helbock recorded these events in his dataset as if the existing post office with the old name had closed and a new post office with a new name had opened. Because of this, maps of “established” or “discontinued” post offices in this book include some post offices that had simply changed names. Second, in addition to changing names, an individual post office might repeatedly close and then reopen within the span of a few years or even months. Rather than recording every one of these closures, Helbock only recorded “permanent” closures, which he defined as an office remaining out of operation for at least ten years. More fleeting post office closures and reopenings are not represented on the maps. If anything, then, the maps understate the western postal system’s instability—one of the book’s core arguments.

I relied on several additional datasets to generate the maps and charts in this book. Claudio Saunt generously provided spatial data that tracks the changing boundaries of unceded Native land and government reservations over the nineteenth century. I then made some minor corrections to the shapefiles in this dataset. State and territory boundaries come from Lincoln Mullen and Jordan Brett’s invaluable “USABoundaries” R library, based on shapefiles produced by the National Historic Geographical Information Systems. Any other sources of data used in a specific map or chart have been noted in its caption. Finally, unless otherwise stated, all the maps and charts in this book were generated using the R language for statistical computing, with a particular reliance on the “tidyverse” and “sf” libraries.³