Todo puede ser, respondió Sancho, mas yo sé que en lo de mi cuento no hay mas que decir: que allí se acaba do comienza el yerro de la cuenta del pasaje de las cabras.
—Sancho Panza to Don Quixote
After reducing Ulysses to so many sums, ratios, totals, and percentages, it’s only fair to end with some reflection on the other inescapable truth that literary numbers bring: the miscounts. Poor Sancho, cited in the epigraph above, is not alone in his plight and “mistakes in counting” (el yerro de la cuenta) are not restricted only to stories that you tell about “goats” (las cabras). The term miscount, as I’ll be using it, suggests that something went wrong along the way, either during the collection stage, when the data was first identified and assembled, or the calculation stage, when the numbers were run. But here, I’m also playing around with the concept of missed counts, the ones connected to a critical tradition that has chosen to ignore the presence of both precise and approximate numbers. Since the publication of Ulysses a century ago, a few of the numerical puzzles have been picked up, but they are never treated as anything more than the expression of light-hearted authorial pranking: seven letters in the title, fifteen lines for a letter by the fifteen-year-old Milly Bloom, twenty-two words in the opening line, 366 pages marking the middle of the book and end of daylight, and so on.
To this day, Joyce remains in control of the abacus for readers who may or may not have the capacity (or desire) to keep count as they read. But “Ulysses” by Numbers is one attempt to refocus our attention on all the numbers, including the ones to which we barely give a thought. They are there, but it takes some work to figure out not only how to operationalize them but also how to use them for more substantive interpretation of the novel. Think again of the quasi-paradox identified by William James: “something comes by the counting that was not there before. And yet that something was ALWAYS TRUE.”1 James is referring here to the discovery of planets that preexist computation, but the same is true for the words, paragraphs, and characters in Joyce’s novel. They will always be there, but the act of counting these words, paragraphs, and characters makes it possible to discover something about the structure, design, and pacing of Ulysses we didn’t already know even if it was ALWAYS TRUE.
Miscounts are built into the computational process, and anyone working with numbers knows that they can appear without warning and derail any quantitatively-inclined investigation. Working at a much larger scale, Ted Underwood, for instance, admits that there are “thousands of errors” in the data he used to examine genre, literary prestige, and gender, with the added caveat that the error can be measured but not eliminated.2 Not only would some of the data used in his investigations be inaccurate, but Underwood also expects that his “book will be wrong on some topics.”3 Still, for Underwood or anyone else working with numbers, miscounts are not always as disruptive or undesirable as they might seem. There are the more blatant errors, which Stephen Dedalus, in a moment of defensive optimism, calls “portals of discovery,” but others are harmless enough to warrant a mere shrug of the shoulders (U, 156). Does it really matter, for instance, that Joyce counted 138 characters in his list when there were 140 or that the episodes he first delivered to the Little Review weighed in at about 6,000 words, give or take a few hundred, with no agreement from his critics about the final tally? It certainly wouldn’t seem so, and it’s hard to imagine why anyone would need such exact calculations down to last character or word. But this tension between accuracy and approximation is on display in all of the examples I focus on. At some point in the analysis, the questions about accuracy arise: Just how many characters make an appearance in Ulysses? When, exactly, was Ulysses written? Each answer has a number with a qualification. There are 592 characters in Ulysses, but only if you count those present (i.e., in the plot on June 16, 1904). It took Joyce seven years to write Ulysses, but only if you ignore the fact that he was generating new content before 1914 and after 1921.
Miscounts, then, can be another way of describing the nature of literary quantities: they can be in the ballpark but still imprecise. And this tension between approximation and precision, accuracy and inaccuracy, ballpark and left field, can complicate how we imagine what it is we’re looking for in the first place. Approximation is often all we have. It’s not inaccurate, but it’s not precise either. The in-betweenness is what makes the approximation more like a miscount. It’s not wrong, as the general definition of the term implies, but it could be refined further, made more precise. Whenever the measuring begins, questions about scale will emerge, and they will involve defining the limits of knowledge that enough is enough based on the available evidence at a given time.
So let me return to the line with which I began this book and provide a qualification: numbers are everywhere in Ulysses, but they are not always precise, and the motivation for discovering them is by no means self-evident. Take the addresses of those subscribers linked to the one thousand printed copies as an example. Though a widely known fact in the novel’s publication history, the numbered copies are not what they seem. Not only were there more copies printed than advertised, but the task of pinning them down to a person or location (then and now) also proves incredibly complicated. Different numbered copies and addresses could be found with some archival digging, but the more important takeaway involves the pointed questions that these breakdowns raise about the reception history of the novel. We can identify, more or less, which cities, continents, and countries a majority of the copies ended up, but then what does the incomplete distribution across the various locations around the world tell us? For starters, the maps generated from ArcGIS, a platform that visualizes spatial data, make us see what the audience looked like at a specific moment in time and within the representational parameters of a geographic information system. They are two-dimensional diagrams documenting the gradual dispersal of a reproduced object that would, over time, generate a truly global audience but never as quickly or seamlessly as we’ve been led to believe.
The data points connected to the circulation of the individual copies, however, are there to reaffirm the disjuncture that exists between the reality of this moment in literary history and its rapid transformation into myth. Ulysses, so the stories go, was either seriously limited in its distribution around 1922 (when still banned in the United States and the United Kingdom) or it was everywhere all at once. And if there’s a miscount here, it is not because the number is wrong. Rather, the dataset is incomplete so conclusions based on it will be provisional. We have collected as many addresses as we can, but that still does not include all of the recipients, particularly those who got their copies by proxy or through bookstores or the many that still remain off the radar.
For some critics, approximation might be enough to make them want to steer clear of computational reading. That’s unfortunate because it misunderstands what function these approximate numbers can serve. They are not the answer to the question, as I’ve said many times already. Instead, they can be put at the service of a close reading that privileges context. Approximation in an interpretive exercise of this sort is an opportunity to consider why it’s one particular number at a specific moment in time and in a specific place and not another, a way of understanding why certain quantities or figures or percentages or ratios appear when they do.
Context needs approximation. That’s one of the lessons I’ve learned. Every one of these chapters involves sets of data that can be calculated. In some cases, my efforts were hampered by the lack of access to information, but even when I had everything I needed, it was often the novel that disrupted the calculation, in large part by refusing to make a clear designation of what counts or by what measure: What is a character or a word or a paragraph or a reader/subscriber? These categorical distinctions are not just confined to basic linguistic units or typographical, syntactical choices, they involve more expansive questions about style and authorship, audience and writing time. Thinking with the fuzziness of numbers, is, in its own way, to engage with the work as a critic who is in a position to weigh, evaluate, and determine what can and cannot be measured and to what end.
This fuzziness, then, is a condition knowledge, one that tells us how we as readers and critics engage with certain kinds of literary facts. I use these numbers to read Ulysses closely from a variety of angles, but there may be others in the future keen to put them at the service of a distant approach in the hope of perhaps discovering laws about the novel, modernism, or the twentieth century. I’m not convinced that would be the best use of the numbers, and that’s because an expanded sample loses the grain of the critical and creative context. Five hundred and ninety-two characters or six-thousand-word episodes are meaningful to the way we read not because they are like or unlike so many other novels. These numbers indicate their singularity: together, they are evidence of an organic process by which the novel came into the world. Reading with and against them gives us the chance to encounter that burst of imagination coming to terms with the urgency of the fictional plot as it was being written, discoveries of experimental techniques for representation of character, time, and space, and even the inevitable aging of the writer, an experience with time that shapes how any work of art comes into being before it is let go.
Coming up against miscounts has taught me something else. Far from being an unwelcome element in the process, the miscounts are the expression of an imprecision, vagueness, and inaccuracy that belongs both to the literary object and to literary history. In saying that, you probably don’t need to be reminded that literary criticism is not a hard science with the empirical as its goal, and critics do not need to measure the value of their arguments against the catalogue of facts that they can, or cannot, collect. The theories generated from approximations and small samples have been used to generalize about all kinds of ideas, processes, and things, including genre, form, reader reception, historical, social, and political contexts, etc. If all of these theories had to be based on the largest-known samples or corroborated by empirical facts, then many of them would simply not exist. The size of the sample and the desire to work approximately have made literary criticism possible as a field. That is what initially jump-started the computational turn with critics beginning to wonder if the availability of digital archives, which provided access to more and more of the “great unread” (Margaret Cohen’s phrase), would corroborate or disprove so much of what we think we know about literature.4
I’m not making any new claim here, but at a time when computational approaches to literature generate so much enthusiasm and skepticism, we need to be honest about the nature of the measurements we’re taking in the first place and what we expect to do with them. Any honest evaluation will include thinking about the tools and methods implemented along with the visualizations that have become so central to the interpretation of literary data. In the process, we need to address an equally basic question regarding literary measurements: What makes them valuable for critical analysis? If I read Ulysses by numbers all the while embracing this ambiguity, imprecision, and approximation, then what is the payoff? Wouldn’t it be easier to return to the good old precomputational days when all the numbers were more or less kept at bay?
In the course of writing this book I’ve thought deeply about both questions and have come to the conclusion that we can miss the counts but only at our peril. They may not upend a century’s worth of interpretations about Ulysses (and, really, how could they?), but they contribute significantly to the ongoing critical investigations about what this novel is made of and, on a related note, how it happened. As things stand now, we still rely heavily on the biographical, historical, and genetic contexts to provide answers—some more convincing than others—and all of them focused on the life and work, with an intense appreciation for those moments when the two intersect. To know Ulysses, for better and for worse, means to try to know what Joyce was doing, thinking, reading at specific moments in time, and to this day, there is still a great deal of critical admiration for uncovering the original sources behind the novel.
The so-called genetic approach has been firmly in place since the 1970s when a massive amount of archival material was made available in bound, photocopied volumes, but it has gained momentum in recent years not only because of the discovery of new materials but also due to the widespread proliferation of digital documents. A relatively small group of critics continue to sift through every scribbled sheet searching for answers to questions about the novel’s development, and though the focus can vary, they all have one goal in mind: to demystify the process by which words, sentences, paragraphs, pages, episodes, and characters were constructed over a relatively short period of time. What the genetic critics want most of all is concrete evidence to weigh decisions and check conclusions.
Given the wider availability of these resources, along with other digital platforms for the novel, readers are increasingly in a position to read genetically. And this is really an opportunity to read computationally. That’s what Gabler and his team did as they learned to navigate the different numerical systems holding the novel together. The major difference now involves addressing not only where those numbers are but also how we might find and process them. Working with numbers today may still involve asking the old intentionality question (Did Joyce or didn’t Joyce put them there?), but that’s a dead end. He mobilized some to organize his ideas, but there are just too many to keep track of. What’s more generative is an approach that lets us consider all the ones he was completely oblivious to. I used the term numerical unconscious before to describe a compositional process that was out of Joyce’s control. That’s not to say there was no intention guiding them. In this case, I use the term to identify forces, processes, and impulses that get channeled into the structure of a work without any conscious deliberation. And that gets us to think about the creative process not in terms of what he meant to do or not do with numbers. Rather, it is another way of thinking about what he did without having any idea it was even possible.
Gertrude Stein points out after Seneca that counting may be the “only difference between man and animals,” but it is not a meaningful activity in and of itself.5 In fact, it’s too often the case that instead of revealing meaning, the measurements mask its absence, bringing to mind T. S. Eliot’s stern warning that the critic’s tools should be “handled with care, and not employed in an inquiry into the number of times giraffes are mentioned in the English novel.”6 Just because you can count the giraffes does not make them worth counting, and that was the problem Kenner had with the “empirical variousness” of Richard Kain’s early book on Ulysses: “Doubtless he couldn’t have told you why it mattered that a huge novel had been grounded in Thom’s Directory, or why a mimeographed Word-Index was a useful portal of access (That’s not how we gain access to David Copperfield).”7 Take the measurements of Ulysses all you want was Kenner’s advice, but at least try and explain why it’s worth the effort.
Which brings me back to the miscounts. Try as readers might with the punch cards, calculators, or computers, the numbers will not always add up. This was as true with the 150 characters Kain first counted (a perfectly round number that he refused to document for later recounts) as it was with the unsubstantiated claim embraced by Joyce and his critics that Ulysses “grew by one third in proofs,” or the number of copies logged by Darantière when sending the book crates to Beach, or the number of responses from the U.S. librarians collected by Morris Ernst’s legal team.8 In the process of identifying so many miscounts, I was just as guilty. Counting can be an incredibly time-consuming process that carries on for several years as the data gets cleaned and reorganized. And even when all the data points seemed to be accounted for, the tallying did not always go as planned—and this was as true with the 106 characters of episode 10 (or was it 100 characters?) as it was with the 63 (or was it 62?) paragraphs of episode 14.
Like it or not, counts can be accurate even when they are close enough, almost there, just about, and have the ring of rightness about them that Kenner relished. Alexandre Koyré, a historian of science, called them the à-peu-près, or “more or less,” of scientific measurement before the precision of modern science arrived.9 Not only were individuals such as Galileo and Newton helping to transform the laws of the universe into a catalog of mathematically supported facts, but this persistent desire for precise measurements in the centuries that followed also effectively downplayed the relevance of things that were either left unmeasured or remained stubbornly approximate.
When tracing the passion for precise numbers, Koyré focuses on the moment when Galileo experimented with the laws of motion using an inclined wooden plank, a bronze ball, and a pendulum. Galileo, he points out, was on the right track with his theory about motion, but the numbers he compiled to document the acceleration (s = ½ at2) were all wrong because there was no accurate way to measure time.10 Still, for Koyré, the experience of the scientist with imprecise instruments and inaccurate numbers serves an important lesson: if its only numbers you want, then you can miss out on the rest of what’s happening in the experiment. The world may get measured by people searching for laws, but it is rare that the measurements alone, even the most accurate ones, can lead to a theory of anything.
The history of science, he reminds us, is filled with hunches, flashes of intuition, and general theories before there were any accurate numbers to support them, raising the question: What comes first, the theory or the measurement? Thomas Kuhn believes that Galileo was the kind of genius who could “leap ahead of the facts,” but when considering the broader history of measurement, he argues that the numbers get generated either to discover something that the scientist intuits or to confirm something the scientist already knows.11 Without a theory or law or even “some knowledge” leading the way, the numbers, he argues, remain “just numbers.”
Considering the nature of the arguments I work through in this book, Kuhn’s conclusion, when I first read it, gave me pause. How much knowledge is enough? And when are literary numbers just numbers? I never set out to discover any general laws about Ulysses or the novel as a genre, and I was never guided by the desire to try and confirm (or deny) the validity of a single theory or critical approach. All of the measurements in the preceding pages rely on the data of a single novel, which means, of course, that the conclusions, when there are any, are not generally applicable, thereby making the title—“Ulysses” by Numbers—all the more ironic. If Ulysses was, at some level, written by numbers, reading by them with historical sensitivity and methodological self-awareness and with the hope of understanding what that act of writing meant in its own time and in ours actually requires a lot of context. The numbers on their own are never enough when faced with this kind of challenge. What’s more, nothing particularly meaningful will come of readers armed only with punch cards, a calculator, or a computer, counting paragraphs, for instance, without ever having understood what’s inside them.
In computational literary criticism the big data is part of the allure: Who needs one novel when you can run the numbers on a corpus of 100,000 or more? Computational literary analysis has been making promises on the big data, with studies involving “millions of digitized books,” “2,958” nineteenth-century British novels, and essays by thirteen thousand scholars. None of the people running the numbers actually read all the words, and that seemed to be the beauty of it: there was no need to. The data, once processed and visualized, would provide an opportunity for macrointerpretation. The critical move from the particular to the general using empirical data signaled a break from the close reading that traditionally involved smaller samples, and none of them were organized by number: no seventeen novels for the seventeenth century or 290 French poems for the future of poesy. But to stay on topic here, there’s no way that the exact numbers of these much larger samples really matter. If there are 1.5 million books or 2,500 nineteenth-century novels, nothing will change in the analysis or the conclusions. The point here is that these sums render the specificity obsolete: what matters is that the number is very large, which means, by extension, that the conclusions derived from that very large number will be widely applicable.
Or maybe not. There are many still hoping that the laws of the literary universe are there waiting to be discovered in big data sets. Until then, the small data is there like William Blake’s grain of sand waiting for a universe to be projected. But in identifying what the benefits of small literary data might be, it’s worth returning once more to that distinction between literary numerology and humanities computing. The 1960s saw the brief coexistence of the two, the numerologists excavating texts for hidden patterns while the computer humanists (many of them from fields other than literature) were busy compiling concordances, indexes, and experimenting with translation. In these early days, the numerologists were the ones reading by numbers, the computer humanists focused more on the tools that would keep readers with their noses in the codex. Together they reveal different attitudes about where the numbers might be found and related questions involving where they originated.
The origin of literary numbers distinguish the numerologist and computer humanist in some provocative ways. If the former wants to see the numbers as part of a preexisting interpretive system, the latter wants to show that the system itself is not made by God or man. Even in the early days when the concordances were a primary consideration, there was already the idea in place that these machines can reveal a structure no one knew existed, and it was connected, of course, to the idea that meaning in the literary work is not guided by authorial intention alone. The numerologists lost the battle over literary numbers in the end, but the efficiency of the computer is not the only reason. As different kinds of programs and platforms were created, the computer was increasingly able to perform complicated tasks on literary works. Instead of only saving time with their magnificent indexing capacities, they could be adapted to generate editions involving multiple levels of revision, and with digitization would soon be running programs on works that had been transformed into strings of numerical code.
If numerology was once tasked with locating numbers supposedly hidden within and behind the pseudosacred words, paragraphs, and pages of literary works (always the most canonical, of course), the computer humanists were literally transforming these same literary works and words into numbers. Consider, for instance, an early description of this process from a paper in 1967 with the blandest of titles, “The Computer and the Humanist.” “The computer holds data in a form which can be thought of as numbers, and sentences of text or lists of musical data can, if desired, be added up arithmetically at any moment.” But there’s more: “Since we can move data around in a computer and compare, say, each letter with ‘blank’ to see whether we are at the end of a word, we need never think of our data as numbers, even though the machine treats the data as such.”12
We need never think of our data as numbers…That was bad advice back in the 1960s and remains so today. At this early coupling of man and machine was a desire to pretend as if the humanists were operating in a realm beyond the numbers with all the computational grunt work left for the computer alone. Much has changed—in part, because so many of the programs that we have at our disposal make the numbers a primary and not a secondary consideration. Network analysis, geographic information systems (GIS), the word and paragraph counts, and timelines are all part of a computational approach that needs to see the literary data as data—to see the words, characters, subscribers, paragraphs, and years of composition as numerical sequences—precisely so that we can then work our way into various qualitative considerations involving the who, what, where, when, and why of Ulysses.
In one of the most sophisticated, and scathing, critiques of data-minded humanists, Nan Z. Da concludes that “CLS [Computational Literary Studies] has no ability to capture literature’s complexity.”13 Though I’m certainly sympathetic to much of what she says about the overblown claims, misleading evidence and misuse of statistical tools at the big-data level, I’m less inclined to agree with the wholesale dismissal of an entire critical approach, one that I have insisted on associating with a longer history of a computational literary criticism with philological roots. No, I don’t think a revamped numerological criticism is a viable alternative, but there is a need to continue weighing the benefits and limits of quantitative literary measurement more generally and at a much smaller scale. Computational analysis is not opposed to literary complexity. It is a way into that complexity, in part, because the object itself resists facticity and statistical modeling, but what’s more it has the potential to benefit from a long humanist tradition that imagines not just the object but the practice as something with genuine interpretive power. The numbers are not incidental, and they do not serve as a deviation from qualitative analysis. Instead, they are part of what makes literature stubbornly abstract and occasionally rebarbative.
And in saying this, I’m acutely aware that the idea of counting at this or any other moment means nothing if not guided by an awareness of what concepts such as computation and literature mean in specific times and places. Like it or not, the literary object has changed irrevocably. The book will continue to stick around, but digital procedures and practices have transformed how we as readers can continue to interact with it. In the meantime, a literary mode of computation has moved far beyond the concordance and the punch card, but realizing its potential requires a lot of fine-tuning and experimentation. In my extended computational close reading of one of literary history’s most canonical works, I demonstrate that there is much to be gained from new and old numbers alike, many of them belonging to quantities we miscounted and missed counting. But as you probably noticed, those numbers, once found and corrected, are only the beginning. They require reading against the grain of Ulysses, largely because a particular kind of critical distance is required, and it is one that could never have been achieved through the navigation of words alone.
But that distance has always been there. Try as readers might to bridge the gap, Ulysses will remain a work in progress, a novel left behind for other generations to finish. Reading by numbers is one way to recover some of the mystery behind the creative process. Every Man a Joyce, indeed! Far from being a coherent method that gets applied mechanically from without, it is part of a reading experience that happens from within, allowing us to pause and consider those moments when the outline of the structure itself becomes visible before disappearing again. In the end, that is no small feat, since reading is a practice encouraging all of us to try and imagine what it might have been like to build something so monstrous, beautiful, and complex using only a pen and paper. Writing in the 1930s, it was Pound who observed that the “careful historian of the 1910s is not yet busy in numbers.”14 Reading Ulysses by numbers a century later, it makes you wonder what took so long.