The Linguistics Wars

Chomsky Agonistes

With the growth of a “Chomskyan era,” linguistics has definitely become a discipline worth breaking heads over.

—Anonymous (in Chomsky 2000d: 39)

Rust never sleeps.

—Neil Young (1979)

Chomsky, as the 800-pound gorilla in linguistics, is “the one whose work every student has to deal with, and whose dominance in the field is often compared to the relative dominance of people like Einstein and Freud” (Hughes 2006 [2001]:83). This might be the best one can do with a cliché trying to express his significance to linguistics, but there is more to it than that, which requires some baroque genealogical imagery. Chomsky is the 800-pound gorilla who has defined and then populated his own jungle, begetting troops upon troops, generations upon generations, competing and coöperating with each other, coöperating and competing; all of them, all of the time, keeping an eye on every move of their enormous progenitor, who moves a lot.

Chomsky’s influence over the course of linguistics in the six decades he has been influencing it recalls an exchange in Hamlet:

hamlet: Do you see yonder cloud that’s almost in shape of a camel?

polonius: By the mass, and ‘tis like a camel, indeed.

hamlet: Methinks it is like a weasel.

polonius: It is backed like a weasel.

hamlet: Or like a whale?

polonius: Very like a whale. (Shakespeare, Hamlet III.2)

“Do you see yonder cloud that’s almost in shape of a camel?” Chomsky asked linguists in the 1950s. “By the mass, and ‘tis like a camel, indeed,” they said. They got to work, charting its humps and its underlying structure, theorizing about its ideal abstract nature, setting aside any consideration of the wind. Then, about ten years later, most of them were talking as if the underlying structure was the cloud-camel, a suggestion Chomsky then rejected as counter to the progress of generative meteorology, because the surface of the cloud is more important than its depth. And, wait, the whole shape seems to have changed, too. “Methinks it is like a weasel,” Chomsky tells them. Rinse, repeat. A decade later: “Or like a whale?” Rinse, repeat.

There is, of course, at least one important way that Shakespeare’s dialogue doesn’t map as closely to the history of Chomskyan linguistics as well as it might seem on the first pass. We have the same moody Hamlet throughout, but scores of conflicted and conflicting Poloniuses (Polonii?). Each time Chomsky goes through one of his mini-paradigm shifts, he litters his wake with what Jackendoff calls “disillusioned Kuhnian debris.”¹ The field is more rancorous, and the paradigmatic models more ephemeral than most. But it may also be richer than most, for exactly the same reasons. Discord and shifting grounds keep the practitioners alert at the post; whatever else can be said for linguistics, it is not a sleepy field. Language, it would seem, is far too complex to give up its secrets in one fell swoop by one fell linguist, however hawk-eyed.

Chomsky is always changing. There are landmarks when a “new theory” arises, camel becoming weasel, sometimes even acknowledged as such by Chomsky. But in every paper, every lecture, it seems he is trying something new, losing faith in something old, resurrecting something once abandoned. Take just his most totemic instrument for five decades or so, the engine at the heart of the Generative-Interpretive Semantics dispute, the transformation. It’s gone now. Conceptually, that would seem pretty dramatic. Eliminating transformations is what set off the cascade of Alphabet Grammars in the 1980s. But at Chomsky Inc., its removal twenty years later was just another day at the office. Quite literally, the last remaining transformation, Move, was given a new name one day, assimilating it methodologically into the new star in the program, Merge. We start getting locutions like “Internal Merge—often called Move . . .” in the literature, and then Move simply drops from the vocabulary (e.g., Chomsky 2005a:12, 2006a:21, 2007b:16). One is tempted to wonder at this point if the MP here becomes the EMP, the Extended Minimalist Program.

David Golumbia puts the number of Chomsky’s revolutions at five (2015:25). It might be more accurate to put the number at whatever the number of technical papers Chomsky has written (I don’t have the stamina to count), since even the major books don’t stabilize the churn of proposals and modifications and speculations, sometimes revising proposals in one chapter that have been offered in earlier ones. These equations, “Revolutions = 5” and “Revolutions ≥ 100” may be matters of granularity more than anything, since every program goes through camel-weasel-whale shifting of some sort, especially in the crucible of debate, something endemic to Chomsky’s program. But, if we take the long view, and rachet the word revolution up in science to Copernican levels, to mean the shift to a new, profound, field-defining, epistemological and ontological understanding of the object of inquiry—for us, Language—the only equations that matter are “Revolutions = 1” and “Revolutions = 0.”

We will return to those equations in the end, but Merge? What’s that? Where is Chomsky’s program now?

Minimalism, Biolinguistics, Recursion

There is surely no reason today for taking seriously a position that attributes a complex human achievement entirely to months (or at most years) of experience, rather than to millions of years of evolution or to principles of neural organization that may be even more deeply grounded in physical law.

—Noam Chomsky (1965 [1964]:59)

We saw how Chomsky’s GB (Government-Binding theory) eclipsed the other AGs (Alphabet Grammars) that arose in the 1970s and 1980s as a specific theory of syntactic representation, with the more generalized Principles-and-Parameters framework (P&P) sweeping in to sharpen the focus on Universal Grammar and chart language diversity. Zeal was running high. The vision, as Martin Haspelmath summed it up, was stunning: each and every single descriptive grammar of a language—all those great volumes, hundreds of pages most of them, all the way back to Pāṇini— might be replaced under Chomsky’s P&P program with “a simple two-column table with the parameters in the first column and the positive or negative settings in the second column.” Reducing these works “to one or two pages [each] would truly be a spectacular success.” Many of those grammars are staggering accomplishments of erudition and endurance and scientific method. The all-time record-setting grammar, apparently, is a tenth century Arabic tome that clocks in at “1514 dense pages” (Owens 2000:287). Haspelmath concedes, however, that “we may still be a few decades (or more) from this ultimate goal” (Haspelmath 2008:80–81).

Talk even caught on of some parametric equivalent to the periodic table in chemistry arising from the P&P framework (Baker 2001 made the original suggestion; see, e.g., Boeckx 2006:3, Ludlow 2011:32, Moro 2015:93), and Neil Smith characterized Principles and Parameters as “the first really novel approach to language of the last two and a half thousand years” (2000:xi). The level of starry-eyed enthusiasm rivalled even the 1960s fervor for Transformational Generative Grammar. Chomsky’s position at the top was once again secure. Time to sit back, at the culmination of a stormy, astonishingly influential career, fold his hands across his tummy, feet up, and foster the programs he has founded, maybe even turn largely to political concerns and pass the linguistic helm to new generations.

But calm waters, apparently, are not calming waters for Chomsky, who sees himself perennially hunkered behind the gunwales, taking fire from all sides in a burning sea; nor is terminological stability his strong suit. Hence, perhaps, the Minimalist Program.

Or perhaps, as Chomsky recurrently suggests, the MP is just where you end up, after fifty years of thinking, if you start with Morphophonemics of Hebrew, his undergraduate thesis. Probably, given the data, it is both—the latest point of a continuous trajectory, but also a reactive leap, a response to pressures building up in the theory and in the theorist. Another saltation from an habitually saltationist thinker, but teleological all the same. Chomsky’s stated motivation is clear: to do what he does best: to turn linguistics on its head. The first four decades of generative grammar, as Chomsky now characterizes them (for instance, 2009:25), went at the problem of Universal Grammar from the “top down,” specifying as rich an account of our genetic endowment as language acquisition seems to require. The minimalist strategy is the reverse, coming from the “bottom up,” specifying as limited an account—as minimal an account—as required to determine language acquisition (and, latterly, also fit an evolutionary scenario).

Whatever the psychobiographical/intellectual-progress scenario—Colapinto (2007) just says “Chomsky’s system of rules [seems to have] reached a state of complexity that even Chomsky found too baroque”—the MP is Chomsky’s latest (final?) major reorientation of theory, stripping his approach down to its Generative-Semantics-reminiscent nubs.

The Minimalist Program appeared, to various measures of resigned sighing (“here he goes again”), incredulity by those who remembered the Best Theory dispute, and enthusiastic welcome, in a 1992 “occasional paper,” followed shortly by formal publication, and then expansion into a book (Chomsky 1992a, 1993a, 1995; also 2015a). It is less of a new direction than a step back, a retreat from the intertwined specificities of preceding work, especially from the rather tangled state of the P&P switchbox, to the ultimate big-picture Chomskyan view of language. That it resembles the Generative Semantics model is surely a matter of “accidental” convergence. No one supposes, not even Lakoff or Postal, that Chomsky was staying up late, under cover of night, cribbing from Postal’s “Best Theory” Homogeneous I diagram (Figure 3.3, page 103) for the architecture of MP (Figure 9.2, page 339). Virtually all models of language from high enough up look the same, certainly all mediational models; both Postal (1970) and Chomsky (1992a) are characterizing the same phenomenon, language, by way of its two endpoints, form and meaning. The move certainly raises questions, as so many of Chomsky’s moves do; in this case, why his hackles bristled over Best Theory argumentation in 1969, why he attacked it, why he turned away from simplicity for a time to a complexity appeal, and why go back now? But it does not suggest theft.

One answer to why we see the Minimalist Program in the 1990s—the clearest answer in retrospect—may be the shift in disciplinary focus toward evolutionary accounts of language, which Chomsky, playing catch-up for a change, has taken up over the last decade, with the Minimalist Program prominent in his claims and arguments. He has reduced the circumference of his gaze even further, down to the notion of recursion, and assumed a new label, Biolinguistics.

Biolinguistics is, in sync with many of Chomsky’s favored terms, systematically ambiguous, allowing him to change the scale of his arguments effortlessly. On the obvious scale, Biolinguistics merely names the study of “human language, a particular object of the biological world” (Berwick & Chomsky 2016:53). The label in this sense is virtually synonymous with linguistics. Aside from programs explicitly opting out of such a conception, as with the Katzian Platonists, it is difficult to see whose theories might be excluded from this definition. Despite this very broad definition, however, the word biolinguistics turns out in application not to be for everyone. In practice it is trademarked, co-extensive with Chomsky’s program.²

Chomsky backdates Biolinguistics to “a few graduate students at Harvard” in the 1950s (Chomsky 2005:1). He leaves them unnamed, but a bespectacled young man toiling on Logical Structure of Linguistic Theory in the Harvard stacks was chief among them, along with his friends Morris Halle and Eric Lenneberg. In this usage, Biolinguistics is the name of a field almost wholly circumscribed by their work. “The biolinguistic approach to language,” Boeckx says, is “often referred to as the ‘generative enterprise’ ” (2006:5; though see Boeckx 2015).³

The biolinguistic displacement of generative in recent Chomskyan theory might best be explained by the story of FOXP2. The tale begins with a Canadian linguist, Myrna Gopnik, and her brief 1990 letter to Nature with the inauspicious tag-line “Feature-blind Grammar and Dysphasia.” The letter sketches congenital language problems afflicting three generations of a British family. Half of them (16 of 30) have the same set of language impairments (dysphasias); a textbook Mendelian pattern. They develop language late, remaining largely unintelligible until about age seven, with difficulties in pronunciation persisting into adolescence. Hearing is normal, intelligence is normal, there are no relevant environmental factors, and by adulthood their articulation is fine. Their language use passes for normal. They tell stories and jokes, they tease and complain, make requests and responses, like anyone else. Gopnik’s testing showed that they follow complex commands (“Drop the yellow crayon on the floor, give me the blue one, and pick up the red one”); they have no problem with reflexives, negatives, and passives; they are fine with possessives. In all of these respects, and many others, they are just like those of us who did not acquire language late, nor produce garbled phonology for a decade or more.

But Gopnik kept looking. In one revealing probe, she put some objects before them, including a single book and a small stack of three books. Then she asked them (and a control group, in randomized tests) either to “touch the book” or to “touch the books.” The controls all touched the single book for the first request, the stack of books for the second. You and I would likely do the same. The KEs—as the family came to be known in the literature (they have now been much studied, much debated)—responded rather differently. They might point to a single book in the stack for the first request, or hesitate and ask “which one?” They did not automatically assume the one book on its own was the most natural referent for “the book.” For the second request, they always touched all four books, the single one on its own as well as the ones in the stack. They got the plural-identification tasks right, in other words, but in a roundabout and atypical way.

“Their normality is only apparent,” Gopnik notes. “They may have learned strategies for coping with language but their underlying grammar is still severely impaired” (1990:715).

She asked them to sort good sentences from bad sentences when some of the verbs were awry. The dysphasic KEs were worse than chance. Subject-verb agreement is a problem for them. She gave them something called the Wug Test, a brilliant little nonsense-word check for morphological regularity that toddlers pass with ease (Berko 1958). It involves completing the second member of sentence pairs like 1 and 2 reproducing the nonsense word and adding the right affix (plural for 1b, past tense for 2b):

The dysphasic KEs couldn’t do it.⁴

In short: we see grammatical effects in a small population following a clear genetic pattern. Debate ensued. Some researchers, including Gopnik, saw the KE’s genetic glitch as triggering a grammar-specific disorder of morphology and syntax. Others saw a broad pattern of intellectual and motor impairments that implicates morphology and syntax almost accidentally.⁵ But the positions hardly mattered. Languages and genes had never been in such close scientific alignment.

Meanwhile, Chomsky was rekindling the public imagination again. He had become the patron saint of several major rock bands because of his tireless activism—Bad Religion, Pearl Jam, Rage Against the Machine, U2, R.E.M.—who quoted him on the stage and on the page, included his imagery in their CD packaging, his samples and full talks on their CDs. R.E.M. asked him to tour with them and give a talk before every show (he declined). Bono famously called him “rebel without a pause.”⁶

For his linguistics, he was also now celebrated widely as the grand theorist of the genetic basis of language, largely on the strength of Steven Pinker’s (1994) bestselling popularization of his theories, The Language Instinct. The book sold very well and its genial, telegenic author was interviewed hither and yon (in a cosmic synchronicity with the real rock stars who promoted Chomsky in the 1990s, Pinker is almost never mentioned in profiles without reference to his long curly hair, firm-set jaw, Cuban heels, and other features contributing to his overall “rock-star looks,” Douglas 1999). The Language Instinct tied the ribbon of genetics around the Chomskyan package. One does not learn an instinct. One inherits an instinct. The book prominently features Gopnik’s research, noting that “if there is a language instinct, it has to be embodied somewhere in the brain, and those brain circuits must have been prepared for their role by the genes that built them” (Pinker 1994:299).

When a mutation of the gene FOXP2 was found to be responsible for the disorders the KEs exhibited, a little more than a decade after Gopnik’s letter, Pinker had the bow he needed for his genetics ribbon; or, to use his own film noir metaphor, the “smoking gun” for Chomsky’s program (Pinker 2001:465). News of FOXP2 swept through the field, then quickly beyond. “Language gene found,” proclaimed a Science Update headline in Nature (Whitfield 2001). “Language Gene is Traced to Emergence of Humans” said the New York Times (Wade 2002). “First Language Gene Found,” quoth Wired (Kenneally 2001). Wither went mention of FOXP2, thither went mention of Chomsky. “At stake” in all of this language-gene talk, Wired said, is the theory “originated by Noam Chomsky, about language and the brain . . . that [claims] because all children are born with an innate knowledge about language, grammatical structure must be biologically determined . . . [and therefore] can be tracked back to specific genes.” A lead editorial in Nature Neuroscience puts all the pieces together like this:

Ever since Chomsky suggested that humans have a “language instinct,” people have been debating the possible existence of genes that underlie our linguistic abilities. Now, in the first big triumph for the new field of “cognitive genetics,” such a gene has been identified [FOXP2]. The data seem clear-cut, and the discovery has been greeted with justifiable excitement; the [KE] deficit seems specific to language, and unlike the weak associations that are common in behavioral genetic studies, this gene shows a strong Mendelian pattern of inheritance. (Jennings et al. 2001)

The next year, Nature published a letter arguing on the basis of comparative DNA sequencing among several primate species (human, chimp, gorilla, orangutan, and rhesus macaque), as well as with our more distant relative, the mouse, that FOXP2 was recently “targeted” by natural selection, “concomitant with or subsequent to the emergence of anatomically modern humans” (Enard et al. 2002:871). (A few years later, Neanderthals were brought into the picture; no difference was found between their FOXP2s and ours—Krause et al. 2007.) As a transcription factor, FOXP2 is part of the regulatory genetics of building some part of our language machinery, it was shaped by natural selection, and it corresponds with our appearance as a species.

Into this context, at the turn of the century, came Chomskyan Biolinguistics. It faced an uphill battle, but that is Chomsky’s favorite kind. While the popularizers and the media hailed Chomsky as the genius behind the grammar gene, the growing number of scientists (linguists, neuroscientists, anthropologists, primatologists, . . .) theorizing about language evolution through the 1990s had a much different view of his role. Most were mystified why Chomsky had said so little about evolution over the years. Some were positively hostile to his occasional oracular pronouncements about “principles of neural organization . . . grounded in physical law” (1965 [1964]:59), and his recurrent dismissals of natural selection.

Chomsky’s early comments on the origins of language were not promising. Transformational Generative Grammar, he suggested, was so inevitable, so definitive, that there were no conceivable alternatives. An organism with “certain . . . physical conditions characteristic of humans” would just have a Universal Grammar—so far so good—but in such a case, he said, “talk about evolution of the language capacity is beside the point” (1968:83). The clear and rather baffling implication is that humans have language not for the reasons we have opposable thumbs, bipedal motion, descended larynxes, and all our other distinctive traits; namely, because genetic variations were favored under shifting environmental pressures and opportunities over a vast evolutionary past. Rather, we have language for the reasons that apples fall, bodies displace water, light has a constant speed; namely, physical laws of the cosmos. When he bothered to speculate about such laws, he threw up a Hail Mary. “We have no idea, at present,” he said in 1980, “how physical laws apply when 10¹⁰ neurons are placed in an object the size of a basketball, under the special conditions that arose during human evolution” (1980d: 321). “No idea” is what he said, but “maybe something good” is what he suggested, maybe Language. Maybe there is some neuron-packing threshold at which—kapow!—language appears.

His first intimations in these direction, in books like Aspects and Language and Mind (1965 [1964]:59; 1968:83), came when his model was very widely admired for its elegance. But the elegance receded. Complexity displaced simplicity as a justifying appeal. As the elegance receded and complexity took precedence, a fault line seemed to arise in Chomsky’s program. Universal Grammar, the basic calling card of his framework, called rather naturally for an evolutionary explanation; inversely, its amenability to an evolutionary account was a large part of the framework’s attraction, increasingly so as it entered the Principles and Parameters phase.

But the specific mechanisms of individual language grammars—plotted in terms of the subtheories of Government and Binding—do not call as naturally to genes and evolution. Their arcane inventory of gears and levers, and their intricate interworkings—coupled with their appearance nowhere else in the biological world—make them look more like the contraptions of a brilliantly deranged engineer than the developments of a biological process. As results like the FOXP2 findings arose, and evolutionary accounts of language became more fashionable, critiques of Chomsky’s gears and levers began to arise, most featuring the byword implausible. His framework, Jackendoff noted, calls for “parameters of such niggling specificity that they are hardly plausible as universal possibilities” (2002:190). Philip Lieberman complained that “the cumbersome and inadequate algorithmic descriptions and innate Universal Grammar specifying the ‘rules’ of syntax proposed by Chomsky and his disciples are not plausible” (Lieberman 2001:48).

Chomsky, you might be surprised to know—though at this point in our chronicle Chomsky and surprise may be mutually exclusive terms for you—agrees. In fact, he goes considerably further (surprised?) on the implausibility front than Jackendoff or Lieberman. He trumpets the dysfunctionality of his Universal Grammar, and therefore its sheer impenetrability by natural selection. “Point of pride” would be too mild for how he regards the vast implausibility for natural selection to select his Universal Grammar. It’s hard to disagree.

Darwin personifies natural selection, the great explanatory engine of his theory, as an omnipresent gardener or breeder, spending every moment looking for and approving those traits that promote survival and propagation, throwing out anything that hinders survival or propagation:

Natural Selection is daily and hourly scrutinizing, throughout the world, the slightest variations, rejecting those that are bad, preserving and adding up all that are good; silently and insensibly working . . . at the improvement of each organic being in relation to its organic and inorganic conditions of life. (Darwin 1932 [1859]:66)

What possible survival or propagation advantage might such a Godlike Horticulturist, no matter how closely She scrutinized, find in a parameter that “drops” pronouns, set one way, but “keeps” them when set another, or in the head-directionality parameter, or in the Tensed-S Condition? As Newmeyer puts it, surely no one “would argue that being forced to delete a tensed subject might impact one’s chances for reproductive success!” (1998:309). For Chomsky, it’s even worse than that: “Every language [as described in his theories] permits many different categories of expressions that cannot be used or understood readily (or at all), though they are perfectly well-formed” (Chomsky 1992:16). Languages are full of defects, malformations, and glitches in the criteria important to the Great Horticulturist. Not only would she fail to preserve them, she would stomp them out the moment they appeared in Chomsky’s story. But none of this matters to Chomsky, because natural selection is irrelevant to Universal Grammar. What is left, if we remove Darwin’s engine from an account of Universal Grammar? Principles of neural organization that somehow—kapow!—result from unknown physical laws.

Chomsky’s kapow! observations (they do not amount to an argument)—rest on a combination of ignorance and infinitesimally bad gambling odds. We don’t know what happens, other than squishing, when ten billion (actually, the figure is closer to a hundred billion) neurons are crammed into an organic basketball, but the chances that, ex machina, language is just conjured up are . . . well . . . beside the point really. Even his most faithful expositor could not follow him that far, pointing out a fatal flaw in his initial premise, and revealing the upside-down logic of Chomsky’s careless speculations:

Why would evolution ever have selected for sheer bigness of brain, that bulbous, metabolically greedy organ? A large-brained creature is sentenced to a life that combines all the disadvantages of balancing a watermelon on a broomstick, running in place in a down jacket, and, for women, passing a large kidney stone every few years. Any selection on brain size itself would surely have favored the pinhead. Selection for more powerful computational abilities (language, perception, reasoning, and so on) must have given us a big brain as a by-product, not the other way around! (Pinker 1994:363).

Pinker had given this some thought. He was instrumental in setting off the surge of interest in evolutionary theory among linguists, in a lengthy, methodical, inspiring paper with Paul Bloom. Chomsky shows up in it primarily as a foil. Centrally, the essay takes Gopnik’s KE research as “direct evidence against [kapow!-like] speculations that language is a necessary physical consequence of how human brains can grow” (Pinker & Bloom 1990:721), and when James Hurford credits Pinker and Bloom with “clearing away . . . spurious intellectual obstacles that had begun to block the path of a research program to integrate linguistics and evolutionary biology,” he seems to have Chomsky in mind (1990:736).

Sooner or later, the giant was going to wake.

Chomsky entered the evolutionary sweepstakes in a serious way in the early 2000s, with a paper in Science, “The Faculty of Language: What Is It, Who Has It, and How Did It Evolve?”—and, for all the rhetorical positioning of some fifty-year lineage for a generative Biolinguistics, it is this paper that marks the beginning of a central and sustained Chomskyan engagement with biological issues.⁷ Co-authored with a psychologist of animal communication, Marc Hauser, and the stylishly monikered evolutionary biologist, William Tecumseh Sherman Fitch III, the paper leaves basketballs and neural density behind. But it sticks to a version of the kapow! story.

Their aim is overtly conciliatory, to get past the “many acrimonious debates” that have plagued origins-of-language research, and thereby to foster a community of biologists, anthropologists, psychologists, neuroscientists, and linguists working in a “collaborative, empirically focused and comparative research program” (Hauser, Chomsky, & Fitch 2002:1569, 78). But it is also clearly promotional. Like the many marketing articles in the 1950s and 1960s by Chomsky and the Chomskyans in psychology, computer science, composition, literary studies, and foreign-language learning journals, the principal goal of Hauser, Chomsky, & Fitch (2002) is to advance a particular view of language and linguistics to other disciplines, to promote the “biolinguistic perspective on language and its evolution” (Hauser, Chomsky, & Fitch 2002:1570). They sum up that perspective in a graphic (Figure 10.1).

Figure 10.1 A schematic representation of organism-external and -internal factors related to the faculty of language. FLB (The Faculty of Language, Broad) includes sensory-motor systems, conceptual-intentional systems and other possible systems (which the authors leave open); FLN (the Faculty of Language, Narrow) includes the core grammatical computations, limited to recursion. (From Hauser, Chomsky, & Fitch 2003:1570. The text is theirs, slightly modified for clarity.)

The centerpiece of the Hauser-Chomsky-Fitch program is the Faculty of Language realized in all humans, separating us from all other organisms. This Faculty—or, let’s go with FL, since by this point you are utterly immune to alphabetization anxieties—is the generalized phenotype of language (that is, the evolutionary “target,” what it is that evolution explains). We all speak, understand, and (rather crucially for Chomsky) think because our genes have expressed themselves as our FLs. But our FL is critically understood with two distinct scopes.

You may have to grit your teeth for this one: remember how competence/performance begot I-Language/E-Language, with E-Language being left at the curb? Well, FL is pretty much another term for I-Language, and the two different scopes are a way for Hauser-Chomsky-Fitch to beget another dissociation, this time between a Broad FL and a Narrow FL (FLB and FLN), and another trip to the curb, this time to drop off the FLB.

The Broad scope of the Faculty of Language encompasses all the resources with which we speak and write, hear and read, neurocognitively spliced fully into our systems of knowledge and belief; all, except for recursion and some narrow band of machinery directly wired in alongside recursion. If this FLB doesn’t look familiar, you have forgotten your Minimalist Architecture/Basic Property. The Conceptual-Intentional System is where brain-internal linguistic entities (call them representations) interface with beliefs, thoughts, and plans (a word prominent from the early cognitive days of Transformational Grammar). The Sensory-motor system is where the representations interface with speech and hearing, as well as the visual and somatic systems necessary for reading and writing, and for manual languages. Whatever else might be conceivably relevant to language—emotions, for instance, perceptions, actions, endocrinal drives for food and procreation and nurturing offspring—are in the wholly unspecified Other segment, while a few specific neural functions and states float around the FLB.⁸

The Narrow scope of the Faculty of Language defines the centerpiece of the centerpiece of the program. It is very, very Narrow, restricted to “the computational mechanisms for recursion” that, in a familiar refrain of Chomskyan linguistics back to the early 1960s, provide the “capacity to generate an infinite range of expressions from a finite set of elements” (Hauser, Chomsky, & Fitch 2002:1569). Creativity! We have an asemantic arrangement-of-constituents Faculty here—or, in the language familiar from the Wars, autonomous syntax—which is why the FLN region of the diagram has puzzle-pieces with the Chomskyan calling-card sentence in mid-assembly here. The words seem to be whirling around in some kind of computational vortex, with two of them clicked together so we get an idea of the constraints that will ensure the sentence is perfect.

The puzzle pieces illustrate assembly conditions of a sort. They can click together in certain ways, but not in others. They exhibit light/dark color-coding to indicate the approved sequencing. Two of them (colorless and furiously) have terminal sides because they can (respectively) begin and end the appropriate sentence, but not show up in the middle. When the FLN has done its work, these pieces are arranged in a syntactically impeccable, but semantically nonsensical, expression. The FLN does not care about meaning.

The Science paper does not spell this next part out, but Chomsky subsequently emphasizes it repeatedly: the Minimalist Program provides the phenotype for FLN, for the engine of natural language, the trait that philosophers and linguists have long claimed best defines the uniqueness of our species against the rest of the biosphere. Two sets of implications attend this claim whenever it shows up in Chomsky’s writings. Firstly, no other theories about the evolution of language bother with a phenotype; and secondly, the specification of this phenotype has been the principal goal of Chomskyan linguistics back to its very beginnings, among those few Harvard graduate students who pioneered Biolinguistics.

Hauser, Chomsky, and Fitch frame the origins-of-language problem by way of three dichotomies:

1.	(a) slow accretionary growth	versus	(b) abrupt appearance
2.	(a) shared mechanisms with primates and other animals	versus	(b) a species-specific, uniquely human capacity, utterly unlike animal communication
3.	(a) continuous functional development of the faculty	versus	(b) a sharp button hook from some other function (exaptation)

These paired positions look like perfectly balanced alternatives, but our authors have their thumbs on one side of the scale, the (b) side, by way of another dichotomy, something of a stealth dichotomy: FLB versus FLN. The diagram pictures those two concepts in a simple subset relation, one containing the other, integrated seamlessly with thought and action, a part of the human organism. But the FLN is a wholly encapsulated bastion of syntactic autonomy.

The (a) options are the most obvious and congruent with evolutionary theory, understood primarily through Darwin’s masterful contribution. Natural selection—the “universal solvent,” Dennett calls it, “capable of cutting right to the heart of everything in sight” (Dennett 1995:521)—works very slowly, taking shared mechanisms on divergent paths, continually shaping specific organic functions to maximize survival and propagation, exactly what Chomsky has long rejected for the “special properties” of language.

Accordingly, our authors strongly associate the standard Darwinian Scrutinizer story with broader, apparently more mundane aspects of language, the FLB. Language-ready traits like our descended-larynx vocal tract, phonologically precise auditory discrimination, and principles of concept formation are all part of an ancient evolutionary history. In complete contrast, the narrow Chomskyan component of language, the relatively recent FLN, treated more as a fact than a hypothesis, suddenly appears (1b), is a uniquely human mechanism evolved for one function, thought (2b), but exapted to another (3b), communication.

What looks diagrammatically like a subset relation, in other words, is a radical qualitative difference; in evolutionary terms, “the core computational mechanisms of recursion,” appear in an evolutionary flash and we have our FLN (Hauser, Chomsky, & Fitch 2002:1573). The kapow! in this theory is muffled—smaller and more localized—than earlier suggestions seem to suggest. We don’t have language (FLB) showing up en bloc with a crack of thunder when some unknown physical law kicks in at a mysterious neuron-compression threshold, just the recursive procedure that licenses the “creative” syntactic aspects of language showing up with a small, chance mutation. In this view, it is an increase of private intellectual ability (planning, simulating, analyzing) that gets the Great Horticulturist’s approval, not, as in virtually all other theories of language evolution, an increase of social coöperation (sharing information, intentions, beliefs). In a sociological evolutionary theory, language can evolve in a population incrementally. In an exapting, psychological kapow! theory, one individual, then others, then others, would need to plan and plot in ways that helped them survive longer and propagate more until they discovered they could talk to each other.

The FLB is a typical evolutionary pastiche of jerry-rigged strategies, overlapping functions, and kludges, but the FLN exhibits perfection. “Recent work on FLN,” they say (citing a few of Chomsky’s MP publications, as well as Jackendoff’s rather different Foundations),

suggests the possibility that at least the narrow-syntactic component satisfies conditions of highly efficient computation to an extent previously unsuspected. Thus, FLN may approximate a kind of “optimal solution” to the problem of linking the sensory-motor and conceptual-intentional systems. In other words, the generative processes of the language system may provide a near-optimal solution that satisfies the interface conditions to FLB. (Hauser, Chomsky, & Fitch 2002:1574)

It’s unclear why a mutation that improves thought, a mutation for which communication is only a by-product, has any direct connection to a sensory-motor system, let alone represents an optimal solution for it. But nevermind: This sudden-onset, perfect-solution, recursion-über-alles story is the essence of Chomsky’s Biolinguistic program. There are lots of bits and pieces that make up the human capacity for language, integrated with cognition broadly, but the computational mechanism arrived in a burst between 50,000 and 100,000 years ago, making us us.⁹

The miraculous mutation is Merge.

Merge Is All You Need

All you need is Merge.

—Robert C. Berwick (2011)

That autonomous syntactic organ, the FLN, consists of (1) a big box of atomic word-like conceptual objects; (2) an operation that, two-by-two, combines those objects with each other and also with products of its own operation iteratively (hence, recursion). That’s it: one set of objects, still called, if only for convenience, the lexicon; one procedure, Merge.

Chomskyan syntax is Combinatoric Central, and Merge is the ultimate combinatoric rule. All it does is combine two objects into one new object, two objects into one, two objects into one, operating recursively, with the output of one Merge operation serving as the input for another Merge operation. Until it doesn’t, and we have a sentence. Merge does this to conceptual atoms, to morphemes, to words, to phrases, even to independent sentences. Remember Generalized Transformations like T_conj and T_so? They spliced sentences together. Developments leading to the Katz-Postal Principle put them out of business because by combining sentences they combined meanings in unpredictable ways. But that stopped mattering long ago. So, they get reincarnated in the operations of Merge. Remember Phrase Structure Rules, those devices that put words and phrases together? Merge again. Affix-hopping? Merge. Passive? Raising? Sluicing? Q-Magic? Irving? Euphemistic Genital Deletion? Merge. Merge. Merge.

Merge does it all. Here’s how Berwick and Chomsky describe it, with a touch of history to stress the inexorable development of generative grammar:

Over the years, research has found ways to reduce the complexities of [grammatical] systems, and finally to eliminate them entirely in favor of the simplest possible mode of recursive generation: an operation that takes two objects already constructed, call them X and Y, and forms from them a new object that consists of the two unchanged, hence simply the set with X and Y as members. We call this optimal operation Merge. Provided with conceptual atoms of the lexicon, the operation Merge, iterated without bound, yields an infinity of digital, hierarchically structured expressions. (Berwick & Chomsky 2016:70)

Here’s how it works, with familiar liberties. Take a sentence like 3.

Merge assembles the word breaking from its constituent morphemes, and assembles a sentence by successively joining it with other words, as in Figure 10.2, going through five iterations to derive the structure for Sentence 3.

Figure 10.2 A Merge-mediated derivation of the structure underlying Sentence 3.

It turns out there isn’t just one Merge, after all. There are two (I’ve added subscripts in Figure 10.2 to distinguish them). The first is External Merge, which gives us all of the intermediate structures, in steps 1–4. The second is Internal Merge—so called, because it doesn’t have to go outside the structure for its constituent—for step 5. In Figure 10.2, what is already on a limb, it gets cloned somehow, and the new version is Merged with the old structure to derive a new one. Movement is gone, the last remaining transformational vestige, but Internal Merge has some sort of copying or matching operation to make up for its disappearance. Remembering our history, we might note the irony that recursion started off for Chomsky as a property of transformations, but then migrated into the phrase structure. The Phrase Structure Rules then evolved into the single recursive operation, Merge, which turned around and ate the final holdout transformation, Move.

Is that all there is, my friend? Is the nut finally cracked? Are we done? Merge and the atoms make your sentences and all is well? No, no, no, and no. The structure we end up with after the last iteration of Merge in Figure 10.2 looks a bit like a post-Aspects Chomskyan/Jackendovian Surface Structure, in the sense that it provides a way to determine that the what at the beginning of the sentence is at some deeper, underlying level, the object of breaking. It is not yet the “real” sentence, however, since there is that extra what; and, in any case, we still have word-like atomic elements, which need some further processing, and some principle has to ensure the lowest-level what is phonologically null. (Speaking of the word-like atomic elements, I don’t see any reason we couldn’t have Merged up something on the order of what Floyd cause what to become not whole—possibly even like I ask you what Floyd cause what to become not whole, though I’m not sure about what the mechanisms might be to erase I ask you.)

The final structure in Figure 10.2 is a dreadfully impoverished sliver of linguistic information, as it is meant to be, of course. It is minimalist. It is narrow syntax. But the fact remains that facts remain unaccounted for. Maybe, just maybe, Merge is all one needs for an evolutionary account, popping into the head of some lucky primal ancestor, but it is not all one needs to describe the core knowledge of language for which Aspects, Generative Semantics, Interpretive Semantics, and all the Alphabet Grammars throughout generative history have sought to be responsible, not to mention Cognitive Linguistics.

Minimally, for instance, the derived structure in 10.2 (i.e., the structure after step 5) needs labels of the sort familiar from the days of Phrase Structure Rules. There is no indication what sort of groupings we have in 10.2, or what the word categories are: what the noun, what the verb, what the what. There is a procedure in the Minimalist Program that generates and distributes labels, projected somehow from the lexico-semantic atoms (Berwick & Chomsky 2016:10), so that the structure after the first application of Merge gets labeled a V on the basis of break, the structure after the second application is a VP because breaking is the head, and so on, up to S.

Also, there is an extra what. No one says “What is Floyd breaking what?” A movement rule would have extracted the second what (leaving a trace after Chomsky 1973a [1971]), but Merge leaves it in place. “Merge does not change the merged elements,” Berwick and Chomsky say: “it is optimal” (2016:99). Leaving it unmolested is a virtue for the Conceptual-Intentional interface because, as Semantic Interpretation Rules and Generative Semantics transformations alike will tell you, the meaning of 3 implicates a representation like 4:

So, the extra what is required for sussing out the meaning of 3, on the abstract conceptual side. But it’s a problem on the concrete, empirical, sensorimotor side.

Sentence 3 exhibits a familiar oddity in language that transformations were drafted to help deal with, all the way back to Zellig Harris: displacement. Phrases and words show up in different spots in sentences when we speak from their canonical, grammatical, “logical” locations. Displacement is a weird thing for languages to do, seemingly a hindrance to communicative efficiency. “For many years,” Chomsky says, “it was assumed—by me, too—that displacement is a kind of ‘imperfection of language,’ a strange property that has to be explained away by some more complex devices and assumptions about UG. But that turns out to be incorrect” (2016a:18). It turns out that, “on the contrary, it is an automatic property of a very elementary computational process” (Berwick & Chomsky 2016:99).

Automatic or not, Merge can copy the original what but cannot delete it, which puts the framework in a jam. That low-down what has to go. Structures like that final Merged one in Figure 10.2 are “the wrong structures for the sensorimotor system: universally in language, only the structurally most prominent copy is pronounced, as in this case: the lower copy is deleted” (Chomsky 2016a:18–19). As always in Chomsky’s prodigious and intricate vision, there is another principle or operation or natural consequence to come around the corner at the last moment, like the Cat in the Hat, to clean up the messes. In Chomsky’s latest program, for sentences like 3, a principle called Minimal Computation gets us out of the extra-what jam. Amongst other things, Minimal Computation ensures excess objects get zapped:

Deletion of copies follows from another uncontroversial application of Minimal Computation: compute and articulate as little as possible. The result is that the articulated sentences have gaps. (Chomsky 2016a:18–19)¹⁰

Where is this (uncontroversial!) principle of Minimal Computation, you ask? It comes as part of a cosmic Chinese box arrangement. For instance, Minimal Computation includes (entails? determines?) a principle of Minimal Distance. Minimal Distance ensures that whenever there is competition between possible associations of words, the competition resolves in favor of the association between words that are minimally distant from each other. As, for instance, with the adverb in Sentence 5.

What’s really interesting about Minimal Distance—the sort of cool little data-appeal that characterizes so many of Chomsky’s proposals (like Affix-hopping and the trace convention)—is that it is computed in terms of structure, not linearity or temporality. That makes invariably closer to argues than it is to wears [cable-knit sweaters] (even though it looks closer to wears on the page, and sounds closer to wears when spoken), as in Tree-1.¹¹

There are two candidate verbs for the association, but the adverb invariably goes with the closest one in structural terms (trace out the path in Tree-1), not in linear or temporal terms. All well and good. What is invariable is that the linguist argues, not that the linguist wears cable-knit sweaters. Exactly what we want, and we get the association we need from some overarching principle, a principle of Universal Grammar. But, again, where is that principle? Where does it go in Figure 9.3, the MP architecture? Here’s Chomsky:

The principle of minimal distance is extensively employed in language design, presumably one case of a more general principle, call it Minimal Computation, which is in turn presumably an instance of a far more general property of the organic world or even beyond. (Chomsky 2016a:11)

In some sense, then, Minimal Computation and its various subprocesses are “in” FLN, since that is where the computation happens. But Chomsky’s framework provides them for “free.” They are automatic consequences of biology, so no special account of them is needed by linguists; possibly they are automatic consequences of the universe more generally, so that even biology doesn’t need to account for them. They are just there. Chomsky’s Platonism is showing.

In short, Merge is not all you need. They provide no tally of the mechanisms, but Hauser, Chomsky, and Fitch say that FLN “comprises only the core computational mechanisms of recursion as they appear in narrow syntax and the mappings to the interfaces” (2002:1573). (Okay, so we now add interface mappings in FLN as well.) They left this phrase rather open at the time, but a few years later they elaborated:

Since the “mapping to the interfaces” is explicitly included in this description, it follows that [FLN] includes phonology, formal semantics, the structure of the lexicon (morphology, words), etc. (Chomsky, Hauser, & Fitch 2005:2)

Now, these interface mappings are essentially “computational,” so again we are in territory that may not “cost” the theory anything. But, whatever is going on under the FLN hood, there are a lot more moving parts than Merge.

Whatever is under the hood, it is hard to deny that Chomsky’s grand vision is yet again a thing of beauty. Aspects was beautiful as a model of deep linguistic uniformity. Principles and Parameters was beautiful as a theory of diversity explicable and unified by a switchbox. The Minimalist/Biolinguistic package shifts our focus from a “picture” of language to a “story” about language, but it is a story with an elegant evolutionary plot. It tells of a minor mutation with major implications, endowing us with an incredibly simple but powerful computational procedure, governed by natural laws that specify optimal linkages between the sensory-motor (form) and conceptual-intentional (meaning) systems, the old dream realized in a new way.

The surface individuality and deep uniformity of all languages is explained by natural laws—like a snowflake, in a currently favored Chomskyan image (e.g., Berwick & Chomsky 2016:71). Every language is utterly unique, intricate, and pretty, but step away and they all look the same, both their particularities and their commonalities are simply what happens when specific natural conditions are met: a certain species undergoes a small genetic mutation, getting Merge, in a universe governed by laws of computation.

The story—recursion showed up and made us us—is undeniably compelling, but Biolinguistics must all come down to genes. So Gopnik’s KE findings are an important calling card for the program, largely because they establish the necessary gene-language link so solidly, but also because the Biolinguistics story has a highly plausible chapter on FOXP2 implicating the concentric-circle setting of Hauser, Chomsky, and Fitch: FOXP2 is assigned to the genetics of the ring around the inner circle in Figure 10.1.

A recurrent image of the Biolinguistic core for Chomsky is a central processing unit, the place where the real work gets done in a computer. Aspects of language not directly traceable to that workspace, are figured as input devices, output devices, monitors, speakers, keyboards, touch pads, and so on—the peripherals—which is where we find the Biolinguistic argument about FOXP2. In Berwick and Chomsky’s account, FOXP2 is not implicated in any way with the central linguistic machinery. It regulates the genetics of production, “related to externalization” and aiding in the “development of serial fine motor control, orofacial or otherwise: the ability to literally put one ‘sound’ or ‘gesture’ down in place, one point after another in time,” a claim supported by the role of FOXP2 in songbirds and mice (Berwick & Chomsky 2011:33).

On this story, “FOXP2 is more akin to the blueprint that aids in the construction of a properly functioning input-output system for a computer, like its printer, rather than the construction of the computer’s central processor itself” (Berwick & Chomsky 2011:33). Their account is certainly plausible. It doesn’t address the KE’s defect for agreement and other morphological phenomena (which, on the surface, might seem to involve the FLN/CPU core, more than anything, because Merge puts morphemes together), but the KE’s FOXP2 glitch is overwhelmingly productive, and deeply so, with years of interrupted linguistic development that could easily have ripple effects on matters like agreement. The printers of dysphasics in the KE family come online late and sputter for years, the story goes, with a wonky communications protocol set up over the years in some compensatory way.

But here’s the problem: we are still left with the desperately underspecified everything else. Aside from recursion, everything else is pushed off on the interfaces and various avowed conditions. Deletion, once transformational, is now handled by the Sensorimotor interface. Labelling, once phrase-structural, is now handled by “some algorithm,” presumably associated with the Conceptual-Intensional interface. A few mechanisms seem to have disappeared (though one cannot be sure), including both Government theory and Binding theory. Greed is gone. (Where it went is anyone’s guess. It is simply not talked about any longer. But—I’m not kidding—a version was briefly called Suicidal Greed before it disappeared; see Chomsky 2000b [1998]:127). But most of the mechanisms of GB and P&P have been shuffled in some way to the periphery: “All conditions are interface conditions,” Chomsky states (1995:194) as a desideratum of the Minimalist Program. But how the complex network of principles and parameters are imagined as interface conditions is left unspecified. Five years later, Chomsky is in fact quite urgent, uncharacteristically so, about giving them some clarity and specificity:

Interface conditions . . . can no longer simply be taken for granted in some inexplicit way, as in most empirical work on language. Their precise nature becomes a primary object of investigation in linguistics, in the brain sciences, in fact from every point of view. (Chomsky 2000c [1999]:26)

But nothing came of the appeal, and Chomsky’s own work failed to reflect this urgency. Another decade and a half along, we find one of his most ardent expositors lamenting, “Unfortunately, the intervening fifteen years seem to have produced virtually nothing about the precise nature of these conditions” (Freidin 2016:685).

Is Chomsky right? The other guy, Lakoff? We know who won the war on the short view. Who won it on the long view? Which program is right, Biolinguistics or Cognitive Linguistics? I have no dog in the fight, but here in the twenty-first century, surely that’s the wrong question. The right question would be how much can each of them contribute to the study of language that is both fruitful and durable; eventually, even, compatible. Cognition is ultimately biological, after all, and biology has to go through cognition to get to language.

Neither side can see much of anything of value over the fence. The Cognitive Linguistics framework still often defines itself in overviews against one or more of the 800-pound gorilla’s instantiations, usually focusing on autonomous syntax, but once the comparison is over, it barely notices the gorilla or his progeny or their data, in its daily work. The Minimalist Biolinguisticians are even worse, apparently never having heard of Cognitive Linguistics at all. If we take one defining concept from each camp, metaphor from Cognitive Linguistics and recursion from the Minimalist Program, both of which seem absolute locks as properties common to human language, we look in vain to find even a mention of them in the theories of the other side. In one rather startling example, the title of Cedric Boeckx’s (2009) Language in Cognition promises a comprehensive overview of the intersection of those two topics, but the book is implacably in the Chomskyan mode. Metaphor is not only an indisputable fact of language, it has also been a preoccupation of cognitive science for over fifty years, but makes no appearance whatsoever in Boeckx’s book, not of the standard issue kind (“Anna is a peach”), not of the sense-extension kind central to word formation processes (“James broke out of his information bubble”), not of the metaphorism kind that has been a crucial explanatory element of Cognitive Linguistics for twenty years (“Evan spent a week on that endpoint”), nada. Meanwhile, recursion has been a focus of linguistics for generations, and was absolutely criterial to the other two disciplines foundational to cognitive science as well (psychology and AI). Recursion has virtually no presence in Cognitive Linguistics literature.¹²

There is room for bridging work. We just need more Jackendoffs, linguists who seek explanations with the greatest compatibility, not in ways that definitionally exclude other positions. His syntax is recursive, and his semantics includes metaphor (e.g. 2002: 358-362; see also Jackendoff and Aaron 1991).

Despite Chomsky’s personal skepticism, his mini-kapow! FLN hypothesis combines easily enough in principle with evolutionary accounts more at ease with natural selection. Most evolutionary linguists hold that there was a protolanguage of some kind (something like Chomsky’s atoms, linked to words and gestures, along with behavioral repertoires). Biolinguistics doesn’t require, so far as I can tell, that the lexical atoms and Merge appear with the same evolutionary thunderclap. In fact, wouldn’t Merge need something to start merging when it shows up? If these lexical atoms entered into hominid (primate? mammal? . . .) cognition at some point, why could they not be deployed, through noises, gestures, facial expressions, and the like (that is, the sensorimotor system) for communication? Chomsky doesn’t believe it, but that would provide a proto-grammar for Merge to super-charge.¹³

Recursion might show up in such a cognitive context, in other words, and boost hominid abilities to the point that protolanguage could drop the proto-, making us us.

Similarly, the Basic Property of the Minimalist Program (the BP of the MP) is highly plausible, with some explanatory reach. And, like the little-mutation-that-could-story, the Basic Property doesn’t have to be segregated from other approaches to language, even ones engaging in mutual antagonism. Construction Grammar, for instance, doesn’t much like Chomsky and Chomsky rejects the notion of specific constructions altogether. But Construction Grammar needs mechanisms that “make” constructions too. One might easily see the MP’s Syntactic Objects, assembled via Merge, as fitting certain cognitive criteria that make them candidates for constructionhood (though perhaps only the formal half of the form/content pairs that are constructions).¹⁴

There is at least room for optimism, however, that the two “sides” might help each other out, if the will is there. Again we might invoke Jackendoff and his Parallel Architecture. Jackendoff’s framework has a generative component with a rule very similar to Merge, UNIFY PIECES (similar both in its operation and in that it is the sole generative rule of syntax), but it is highly compatible with Construction Grammar (see, for instance, Jackendoff 2013; Goldberg and Jackendoff 2005).¹⁵ It also concerns itself with interface rules that should be of interest to the Minimalist Program, and has a generative semantics component, which should endear it to Cognitive Linguistics. But one needn’t be a Jackendoff to find common ground. All it takes is goodwill and openness.

Once more into the Great unNoam

When I began working for Professor Chomsky almost 24 years ago, I was 39 years old, and he was close to 65. Retirement seemed to be on his radar then, but I quickly realized that even if he stopped officially teaching his MIT class, he would always teach and lecture. He retired from MIT (meaning he wouldn’t be teaching an ongoing class) about ten years ago, but as I predicted, to this day [as I am retiring] he continues to work as tirelessly as ever.

—Bev Stohl (2017)

Chomsky is still Chomsky after all these years. The architecture and the details of his linguistic program have mutated and modified and morphed, decade after decade, but not him, or not much. He eventually traded his typewriter for a computer and took to email like a swan to trumpeting, but little else has changed. Controversies have come and controversies have gone, in linguistics, in politics, in philosophy, in computer science, in psychology, genetics, and biology, even in popular culture. Fame has flowed and surged, rarely ebbed, and since his gargantuan presence is multivalent, a large proportion of the fame falls in that particular subspecies we call notoriety.

He’s famous, very famous, and he’s private, but he’s not a recluse:

Over the years, the parade passing through his office portal has included

the amazing, the unexpected, the scary: students, activists, authors, at least one Sufi, political prisoners, movie directors, comedians, political hopefuls, musicians, overwhelmed fans, world-champion boxers, international leaders, Cirque du Soleil clowns, brilliant thinkers, lost souls. (Stohl 2015b)

A shelf of documentaries have been made about him. Perhaps the best known of these, the full-length movie, Manufacturing Consent (Achbar & Winotick 1992), has garnered bunches of awards, been widely translated, was featured at scores of film festivals, played in theaters around the world, and can still be seen at the occasional art house or university campus despite its easy availability on the internet and despite the fact that the specific political topics it highlights are almost thirty years old. Chomsky is all over YouTube. He speaks, speaks, speaks, and speaks some more; writes, writes, writes, and writes some more; engages every issue, answers every email. Rebel without a pause, indeed.

Chomsky is so famously active that the American satirical magazine, The Onion, put out one of its mock news releases, maybe its most absurd, entitled “Exhausted Noam Chomsky Just Going to Try and Enjoy the Day for Once.” It quotes him as saying that injustice will still be there for him tomorrow; “Today, I’m just going to kick back and enjoy some much-needed Noam Time.” But it ends with a final quote (maybe not so absurd after all) that “I’m going back home, writing one—just one—reasoned, scathing essay, and getting it out of my system. But then I’m definitely going back to the park to walk around and just enjoy the nice weather. I’m serious.” (“Exhausted”).

Noam Chomsky, with so much water under the bridge, over the bridge, around the bridge, in his nineties as we go to press, is a visibly—and astonishingly, only visibly—older version of the brilliant, generous, mulish, prolific, indefatigable, brutal, benevolent, chronically argumentative, humble, arrogant, autocratic agitator whom we have seen all along, the man who cut his scholarly teeth denouncing and supplanting the Bloomfieldians and his political teeth denouncing the Vietnam War, and who has kept them sharp politically and academically at every opportunity along the way, the man whom the New York Times, in a passage that has become obligatory whenever he is profiled, summed up thusly:

Judged in terms of the power, range, novelty and influence of his thought, Noam Chomsky is arguably the most important intellectual alive. (Robinson 1979:3)

The amount he has accomplished, the pace at which he works, the capaciousness of his intellect, the sheer will of the man . . . it’s staggering. So is the devotion and the revulsion he attracts. A character in the historical novel, The Architect’s Apprentice, says of Michelangelo, “There are those who worship him and those who loathe him. Even God doesn’t know which side outnumbers which” (Şafak 2014:163). Perhaps extreme emotions are inevitable with people whose imprint on their world is so large. One wonders if the same people both worshipped and loathed Michelangelo, serially, or if that response is more peculiar to Chomsky. He has those who love him, those who hate him—even God can’t count them—but there are also a great many who have done both. Angel? Devil?

Let’s try another account of the man, exemplifying a very familiar characteristic not unconnected to all that love and all that loathing: his brute refusal to see the facts as others see them:

“There’s no such thing as jet lag,” Noam said to me less than twenty-four hours after returning from Australia, where he was honored with the Sydney Peace Prize. Peeking at me through slitted eyes, he poked an index finger into his temple and added, “It’s all in the mind,” and I watched him weave toward his office like a half-asleep drunken sailor. (Stohl 2013)

Also, when ill, he regularly insists he is not ill. He sneaks treats to dogs. He can’t figure out coffee makers. He apparently has no relationship with food beyond a vague awareness it should be periodically consumed.

He is, in many ways a version of the head-in-the-clouds caricature professor who has trouble with the daily hubbub, perplexed by machinery like computers and printers, oblivious to mundanities, a classic naïf. After describing his famously cramped and austere office, piled high with books, with two blocky desks and a few hard wooden chairs, a reporter commented on how fitting it is for “Chomsky to have created an office in which there is nowhere comfortable to sit and no proper space to work. He seems genuinely indifferent to material things. Before his wife took over, he often gave away the copyrights to his books because he didn’t read contracts. He would sign anything that was put in front of him. He will wear the same outfit every day for a week” (MacFarquhar 2003). Newmeyer tells a story of Chomsky at a major conference in his honor, entitled “The Chomskyan Turn.” It took place over several days in 1988 at the Van Leer Jerusalem Institute, during the Intifada, with multiple acts of violence, insurgency, and oppression boiling up in Israel. “Noam was followed around the entire four days by a hulking young man with an enormous backpack,” Newmeyer recalls.

Everybody knew that he was Noam’s bodyguard. Everybody but Noam, that is. At one point we were at a restaurant table around which eight or so people were seated. Noam asked the young man cheerily: “What are you working on?” Startled, he answered “Um, linguistics.” Noam persisted, asking him what topic in linguistics. The answer, barely audible, was “Oh, linguistics linguistics.”

The jig still wasn’t up, but things were now seeming a bit odd even to Chomsky, so the conference organizer finally revealed to him that his constant companion was actually his bodyguard and that, among other tools of his trade, the backpack contained an assault rifle.¹⁶

That conference was occasioned by Chomsky’s sixtieth year on our planet. For his seventieth year, 1998, a tribute website was set up for him, a kind of festschrift (the “editors,” Jay Keyser and Janet Dean Fodor, knew that Chomsky was a mega-star, way too big for an ordinary bookbound festschrift). The web was a different place then, with fewer browsers and more search engines, but Google was already on the scene and most academics were active; most linguists, in particular, were early adopters—old hands by the latter 1990s. Not Noam. There was lots of buzz about the site around the field, with nearly two hundred moderated essays and over two thousand birthday greetings. But it was not hard to keep the site secret from Chomsky. Its existence was revealed to him

on the day of his birthday, [when] an elegantly printed list of all the contributors was placed on his desk in a modest red folder decorated discreetly with a gold ribbon. An unsigned birthday card lay on top, its message: Happy Birthday From All Of US.

He still didn’t know what was going on. One assumes he had at least heard of the web by this point but “the truth of it is that my wife [Carol] had to explain to me what was going on,” he told the New Yorker. He had the contributions printed and read them on paper. “When I was a kid,” he added for the reporter, “my mother used to arrange a surprise party every year for me, and I never figured it out . . . I was always completely surprised, and it was the same in this case” (Mead 1998:34).

And here’s a factlet that reveals something very personal about Chomsky that has, so far as I am aware, never been published before. Among the countless invitations he gets, Chomsky was invited onto a podcast to talk about Super Mario Brothers. He responded cordially. He always responds. He declined, too many commitments, but his answer tells us something that may surprise even you, jaded reader that you must be by now. Here is his response in full:

I wish I could manage. The rest of ‘20 is so intensively scheduled that I can’t add anything more, and it will be a few months before I can think about scheduling beyond.

My grandchildren always insisted I play that video game for hours. I was Luigi. Always ended up in a ditch or a lake. (@showshowpod 2020)

Chomsky played Super Mario Brothers, for hours, as Luigi.

Chomsky, one finally comes to understand, is just a guy. Whatever else he is—my thesaurus is running on fumes—Chomsky is also just a guy. Bev Stohl was his administrative assistant at MIT for over twenty years and wrote a blog for eight of those years (quoted a few pages back and the source of this section’s epigram) that recorded his idiosyncratic eating habits, his mischievous quirks, his affection for Irish cable-knit sweaters and rolled-up jeans, his uneasy, semi-Luddite attitude toward digital technologies, his corny jokes (“Isn’t Lent when you have to return all of your overdue books to the library?” [Stohl 2014]), and his generational obliquities (“Tell him I don’t know what he’s talking about,” he annotates a paper copy of an email in response to the automatically generated tag-line “Do you Yahoo?” [Stohl 2015a]), alongside the back-breaking speaking schedule, the inundation of correspondence, rife with political horrors and personal devastation, and the warm, deep, everyday humanity of Noam Chomsky. There is much to admire, much to lament, in Chomsky’s career (and much, we have seen, to question in Chomsky’s own account of that career). The same can be said of the rest of us, but our scale is so much smaller.

Chomskyan Truthiness

Werner Heisenberg, Kurt Gödel, and Noam Chomsky walk into a bar. Heisenberg turns to the other two and says, “Clearly this is a joke, but how can we figure out if it’s funny or not?” Gödel replies, “We can’t know that because we’re inside the joke.” Chomsky says, “Of course it’s funny. You’re just telling it wrong.”

—Katy Waldman and Will Oremus (2013)

If there is a single quality from which one might derive all of Chomsky’s rhetorical practices, its kernel, my vote would be for his unshakeable, basal certainty.¹⁷ The Heisenberg-Gödel-Chomsky-walk-into-a-bar joke tells us this (it’s an epigram for this section, if you missed it). Heisenberg, who developed the quantum uncertainty principle, recognizes the three of them are in a particular genre, the walks-into-a-bar joke, but he is uncertain if it’s funny or not. Gödel, who developed the incompleteness theorem, sees that the joke is incomplete (more technically, that they can only know it’s complete when they get out of it, at which point they will be in another incomplete system; more technically, . . . I’ll leave that to the mathematicians). Chomsky is certain that the joke is funny, knows that the joke is complete (in its pure, underlying form), knows more than the other two combined. Actually, this is how the source of that joke, someone known only as Saboot in the article I pulled it from, explains the Chomskyan punch line:

Why it’s funny: Because Heisenberg is uncertain, Godel sees that the joke is logically incomplete, and Chomsky is an asshole. I mean, because Chomsky distinguishes between the joke itself and the linguistic performance. (Waldman & Oremus 2013)

There is a somewhat oblique relationship between the notion of performance and the “real” punchline that some people might get. But anyone familiar with Chomsky will more immediately get a version of Saboot’s faux punchline first. The Deep Structure of the faux punchline, stripping it of the insult, gives us: Chomsky is always right, others necessarily wrong.

His admirer and most successful popularizer, Steven Pinker, knows him pretty well, and would have no trouble getting the joke. He says Chomsky “is always a hundred percent certain that he has been, is, and will be right about everything” (Mead 1998:34). Another informed opinion? A major European linguist who has been an important force in the development and expansion of Chomsky’s program through its Principles and Parameters phase, Jan Koster, says Chomsky “has a notorious inability to imagine that there are rational persons with opinions other than his own” (Koster 2013). Another? A very sympathetic journalist says he “is in the prophetic tradition and you can no more truly argue with him than you could have with Isaiah or Ezekiel. . . . His inner certainty seems complete” (Woollacott 1989). Another? A source very, very close to him, his first wife, says “one never wins an argument with Noam” (Hughes 2006:93).¹⁸—These are his allies, supporters, a loved one. He is right; others must be wrong. He even has an opinion (or, rather access to the indisputable truth) on the origins of email, which are apparently controversial. As Chomsky tells it, “email was invented in 1978 by a fourteen-year-old working in Newark, New Jersey,” adding his telltale “the facts are indisputable” (Garling 2012). Don’t look at me. I’m not going to argue the point with him.

If we have learned nothing else about Noam Chomsky so far, we have learned that he exhibits what we can call, after Wayne Booth (1974:77), The One Rhetorical Purpose Dogma. That purpose? To win. The only acceptable outcome of debate for Chomsky is to vanquish his opposition. Righteous argumentation can have only one outcome, and all argumentation is righteous. Perhaps this would be less of an issue if he wasn’t just so damn good at arguing. I gave a quintessential example of Chomsky’s debating abilities back in Chapter 2, with a brief description of his demolition of William F. Buckley, but such clips are legion, and getting legioner; as I write this Chomsky can still be found, daily, hourly even, tying such people in knots, at a time in life when many of us will be lucky to be able to tie our shoelaces. And there’s this other factor, even more personal: however much Chomsky claims that it is all just science, only the issues matter, he would change his mind if the evidence warranted it, facts are facts, and so on, it is blisteringly obvious he has a very thin skin.

These traits, the dogma, the brilliance, and the dermal fragility, have a cost. Keyser (2011:13–14) talks about “a kind of intellectual parricide that haunts fields like linguistics,” something Keyser’s generation would know more about than most. Chomsky not only promoted academic parricide in his own rise, he has perennially provoked it himself as the father of various movements, usually overcoming the threat by fanning the related urge of fratricide. As Bresnan—his student, acolyte, and opponent (a not unusual trifecta, we have seen)—says Chomsky “revolutionized linguistics but did it in a divisive way. . . . He’s a polarizer. He’s created warring schools” (Flint 1995:25). But there is a converse, as well, since he can only inspire movements to attack and defend through collaboration and cohort building. While many linguists would find the title of a graphic novel about Chomsky incredibly appropriate, many others would find it bitterly ironic: The Instinct for Cooperation (Wilson 2019).

New sources of bitterness and division arise recurrently. A dispute arose in the first decade of this century, for instance, about the implications of an Amazonian language for Chomsky’s beloved UG—perhaps the most rancorous episode of the last decade, though the rancor seems largely to have been on one side. The language, Pirahã, is exceedingly cool, and also exceedingly significant culturally, as the last vestige of the otherwise obliterated Mura group, spoken only by a couple of hundred people. (Among its coolest features is that it can be “spoken” in whistles, for ease of chatting when you find yourself in dense jungle foliage. It can also be hummed, perhaps to make it harder for the neighbors to eavesdrop.) But it may be important for theoretical reasons as well. The principal authority on the linguistic features of Pirahã is Daniel Everett, and he claims it does not exhibit the lynchpin of Chomsky’s Universal Grammar, recursion; that there are no embedded clauses.¹⁹ “With respect to Chomsky’s proposal,” he says, “the conclusion is severe”: no recursion, no Universal Grammar (Everett 2005:622). The reaction, too, was severe, and swift. Most notoriously, just before a talk Everett was to give at MIT in 2006, a new genre of email appeared, the likes of which the well-traveled Pullum had “never seen before: an attack ad against a linguist” (2006), a vicious character assassination of Everett ending in a billow of sarcasm: “You, too, can enjoy the spotlight of mass media and closet exoticists! Just find a remote tribe and exploit them for your own fame by making claims nobody will bother to check!”

The dispute grew in ugliness from there and lingers still. “Some linguists close to Chomsky were furious at Everett’s direct challenge to Chomsky’s views,” Pullum noted, “and began an intensive campaign to discredit Everett. Three of them published a lengthy paper in Language [Nevins, Pesetsky, & Rodrigues (2009)]” that argued Everett’s “claim was a lie.” Of the three, one was responsible for the deplorable attack-ad email and another was instrumental in politically blocking Everett’s further contact with the Pirahã, among whom he had lived for almost ten years. Aspersions of racism and data-fakery and grandstanding propelled the attacks on Everett. Chomsky called him an “utter charlatan”; and, in any case, said that the implications of his work for UG are “zero.” (“Ele virou”²⁰). By 2009, when Chomsky made this remark, the implications did indeed seem to have zeroed out, since:

Chomsky and various associates had meanwhile begun to claim that [they] had been misunderstood, and that languages without recursive phrasal or clausal structure are compatible with [UG]. . . . So now nothing is at issue. Continuing work by Everett with Edward Gibson of MIT is still attempting to determine whether “recursion” of any kind is found in Pirahã . . . But Chomsky and his associates now believe it just doesn’t matter. (Pullum 2012)

The Chomskyan position is now that while recursion is fundamental to UG and to the Biolinguistics evolutionary story, not every language need be recursive; rather, because of their innate grammatical endowments, every human can acquire a recursive language, but need not. “The Pirahã may not deploy recursion when speaking Pirahã,” Norbert Hornstein says, “but Pirahã children have no trouble learning Brazilian Portuguese (an undisputedly recursive language) and so there is no evidence that their UGs are any different from anyone else’s” (Hornstein 2012). Chomsky’s analogies on this front are less than convincing. “The speakers of Pirahã possess the same genetic components we do,” he says,

so children try to build a normal language. Suppose Pirahã does not allow for that. It would be like finding a community that crawls but doesn’t walk, in such a way that children raised there would only crawl. The implications of this for human genetics would be nil. (“Ele virou”)

Another one: “If some tribe were found in which everyone wears a black patch over one eye, it would have no bearing on the study of binocular vision in the human visual system” (Tanenhaus 2016). The existence of Creole languages make these analogies even more absurd than they appear on the surface. Creoles are created by children in a community that has only a fragmentary language (a pidgin) in common (usually for reasons of slavery or indentured labor, when people from different language groups are thrown together), creating morphology, modal verbs, articles, and so on, in addition to building a much richer vocabulary. Creoles make it abundantly clear that when children are given a broken language they will fix it. Plus, we all know that children will get up and run around, or pull off their eye patches, as soon as the cultural arbiters turn their backs for a moment; nor is there any evidence Pirahã children are punished or coerced out of recursion.

The proclaimed irrelevance of Pirahã data appeared to make no difference to the anti-Everett camp (perhaps because Chomsky coupled this explanation with the “utter charlatan” remark and the observation that “all serious linguists working with Brazilian languages ignore him”—“Ele virou”). Articles in the New York Times, the Guardian, the New Yorker, and the Chronicle of Higher Education, among others, quote Chomskyans impugning Everett, and these articles, as reports of disputes involving Chomsky so often do, remark that “linguists seem uncommonly hostile” in expressing their differences. “The word ‘brutal’ comes up again and again, as do ‘spiteful,’ ‘ridiculous,’ and ‘childish’ ” (Bartlett 2012). Even the dispassionate Hornstein passage I quoted a moment ago was accompanied by a gratuitous denunciation of the accuracy of Everett’s work. Hornstein, like Chomsky, could not resist insulting Everett even while claiming that it doesn’t matter if he is right.

Chomsky often characterizes the Linguistics Wars by saying the Generative Semanticists threw a war and no one showed up; in the Pirahã episode, it might be fair to say that Chomskyans threw a war and no one showed up. Everett did not return the insults and aspersions. He was long a Chomskyan himself, even inhabiting a neighboring office in Building 20 for a year in the 1980s, before eventually coming to see confounding implications for recursion and UG in Pirahã syntax. He professes tremendous respect for Chomsky: “the smartest person I have ever met,” he told one journalist (Schleusser 2012); “I’m not denigrating his intelligence or his honesty,” he told another, “but I do think he is wrong about this and he is unprepared to accept that he is wrong” (Barkam 2008). The most vicious thing Everett has said about him is that the criticism Chomsky has faced is “not entirely undeserved” (Everett 2017).

It has dragged on. “The tussle has now been going on almost as long as World War II,” Pullum noted in 2012, in a report from the field. The shelling has not stopped, getting a particular boost from Tom Wolfe’s book in 2016. Pullum laments that charges raised in the press of “brutality, spite, and puerility” against the discipline of linguistics “may not be entirely groundless.” The guilt for this rhetorical climate is not wholly Chomsky’s, of course, but it would be ludicrous, from what we have seen, to judge him entirely blameless.

The implications of Chomsky’s win-at-all-costs dogma are not just in the price paid by the sociology of the field. They also, and very clearly so, compromise Chomsky’s own integrity. “We encounter an embarrassment here,” one of his targets has said, epitomizing a view that virtually all of his targets share: Chomsky “cannot be relied on to tell the truth” (Boden 2008:1958). We encounter here the great paradox of Noam Chomsky (he says, as if there is only one): how can someone who is so utterly sincere also be so utterly reckless with facts, and especially with the reputations of others, that it becomes indistinguishable from malice? And, let’s not forget: he’s not a stupid man. Does he not see what he is doing?

Unfortunately, it is virtually impossible to raise the embarrassment of this paradox without immediately contributing to the polarization. If one mentions Chomsky’s honesty one is immediately slotted into the anti-Chomsky camp. There is no place in the topography of Chomskyan commentary to put his failings alongside his virtues, on the one side, or his virtues alongside his failings, on the other.²¹

In part, this inevitable slotting is because his work in linguistics is so visionary and expansive and inspirational that people whom it has galvanized become blind to any flaws. But for many it is also because of his political activism. It is just so hard to imagine that a man who wrote a scathingly influential manifesto entitled “The Responsibility of Intellectuals” and who has worked so tirelessly, at personal sacrifice, to expose both the malice of others and the systems of thought control obscuring and misrepresenting that malice, could himself be capable of misrepresentation. But we know that many accusations of vicious professional conduct have been leveled against him, most of them implicating dishonesty, as in Lakoff’s “Chomsky . . . fights dirty” (Shenker 1972). The data is quite overwhelming. Postal is far from alone as someone who finds Chomsky’s rhetoric offensively dishonest (I have said it myself: R. A. Harris 1998:14, et passim), though we can always count on Postal to say it as nakedly as it can be said:

After many years, I came to the conclusion that everything he says is false. He will lie just for the fun of it. Every one of his arguments was tinged and coded with falseness and pretense. It was like playing chess with extra pieces. It was all fake. (Quoted in MacFarquar 2003)

Margaret Boden’s statement is among the mildest of these rebukes. I am also now convinced it is the most accurate (and that my own 1998 view was too shallow): it is more a matter of unreliability than of dishonesty. The distinction may seem slight, but I now come down more on the side of reckless negligence than on the side of calculated deceit. Unreliability with the truth certainly does not give Chomsky a halo, and I’m not saying there aren’t gray areas, but neither does it fit him for horns.

Whatever it is that Chomsky does with the truth when it fails to suit him, it is not innocent. But neither is it uniformly malicious. About his own life, maybe it is tricks of memory, the distortions of ego; about other’s work, tricks of perception, borne of bad reading, lack of goodwill, blinding conviction. But Chomsky’s falsehoods, of which there are many, easily and widely documented, alongside easily and widely documented truths, insights, and profundities, are, like the man himself, not simple.

One factor is unquestionably arrogance. While he is remarkably humble in some respects (human beings are not cartoons or allegories; we all have our contradictions), Chomsky is also blindingly arrogant in other respects, particularly noetic respects. He is supremely confident in his knowledge, and in his ability to extract knowledge and understanding in a way virtually everyone else, certainly anyone who sees otherwise, is lacking. Fair enough. He has a lot to be arrogant about. Most of the rest of us would be dreadfully lacking if measured against his capacities; most of us, in fact, would not even register on the same scale. But arrogance is not the best route to clarity, let alone tranquility.

Chomsky perpetually “implies that people who disagree with him are stupid and ignorant” (Pinker, quoted in Flint 1995), but it goes beyond implication, beyond tactics of bullying and dismissal. I see no reason to doubt his sincerity. Every indication is that Chomsky genuinely believes such people—that is, people with other beliefs, opinions, reasoning than his—are stupid and ignorant, with the same utter certainty that he holds his catalogue of other convictions. Minimally, we know that he speaks and writes as if these other people are stupid and ignorant, which can’t help but imply he is correspondingly so much more intelligent that he knows better than they do, even about what they are actually saying. Conversely, he also acts as if his own semantic intentions are all that is necessary for his claims to go through, even if others are just too dim to recognize his meaning.

Chomsky has, in other words, (1) a hermeneutic disorder and (2) an expressive disorder. On the first count, he apparently cannot read (or listen) openly; on the second, he is apparently unaware (or just does not care) that his own sometimes idiosyncratic meanings are not shared by all. Both problems would seem to follow from that blinding arrogance.

I know, I know: psychobiography is a perilous activity. But Chomsky’s rhetorical practices are so persistent and so manifest and have been so consequential for so very long that some explanation is required, and I am more comfortable with the view that arrogance gives him an unreliability with the truth than I am with the view that he lies for sport. If I’m wrong, I’m wrong. But I would rather err by misjudging his personality than his morality.

Chomsky’s hermeneutic syndrome includes, as one defining disorder, the affliction that I. A. Richards (1936:24) dubbed the combative impulse, a disposition which puts “us in mental blinkers and make us take another man’s words in the ways in which we can down him with the least trouble.” These blinkers can equally make us overlook another person’s words, miss those words that might salvage their argument, or contextualize it in a way that reduces the force of our objection. It is this second aspect especially that explains why Chomsky is such an inept reader.

To call Chomsky inept sounds crazy, I realize, especially as a reader. By all accounts, Chomsky reads copiously, and at astonishing speed, and with remarkable precision of recall about many specific points. But he cannot (or will not) read with goodwill for authors he sees as opponents. Maybe the copiousness and the speed are parts of the problem. Maybe he scans for, or possibly is just magnetically drawn toward, facts and opinions that trigger his dispositions—his conviction that Smith is wrong, his belief that Jones is right, his feeling that P belongs with Q—whereupon his vaunted intellect is off and buzzing. He may not be all that good a listener, either. Certainly, I think it is fair to say, everyone whose work he has attacked feels misrepresented. Everyone.

Chomsky’s targets barely recognize themselves in the funhouse mirror he holds up, squashing their claims here, stretching their data there; omitting facts in one place, ascribing unaccountable beliefs in another. Back to Boden. She wrote a very well respected two-volume history of the cognitive sciences in relation to artificial intelligence (Boden 2006)—a history in which, as one would expect, Chomsky plays a significant and overwhelmingly positive role. But it’s a history. Facts count. Not all the facts are favorable to Chomsky. Currents of critical opposition surface in Boden’s chronicle, some levels of enmity get an airing, and some of Chomsky’s self-representations are challenged. Chomsky read it. It made him unhappy. Chomsky—as always, he reports, reluctantly; as always, only at the insistence of others; as always, merely in the service of the truth—attacked. Here is part of Boden’s response to Chomsky, doing double duty since it nicely sums up some of his criticisms as well:

According to [Chomsky], I said that cognitive scientists, himself included, had overlooked Karl Lashley’s paper on serial order in behavior. Moreover, he complains that I didn’t realise that the early ethologists had had much of relevance to say. He’s mistaken on each count. I’d already discussed Lashley at length. I’d even pointed out that [Lashley’s] serial-order talk was converted from “a mini-sensation” at the Hixon Symposium in 1948 into “a genuine sensation” in about 1960 by Miller and—guess who!—Chomsky. Similarly, I’d outlined the work of the pioneers of ethology, including four of the names listed by Chomsky, explaining how their views differed fundamentally from the behaviorist orthodoxy. (Boden 2008:1958)²²

Chomsky, it is clear, has not read the book that he is attacking very well. (You needn’t take Boden’s word for it. Read the book. Read the attack. The mismatches are glaring.) That is not shocking in itself, of course. He would not be the first reader to skim a book, nor the first polemicist to assemble bits and pieces, pulled out of context, into a straw adversary. But his reading is notable for two reasons (well, for three reasons, but one of them is on a special shelf of its own, always in plain sight: because it’s Chomsky). First, the distortions are remarkably blatant. Second, the pattern is endemic to Chomsky, almost to the point of pathology, a word I do not use lightly. All one needs do most of the time is to look at the text he is characterizing: the sentence, or the half-sentence, that he insists is saying one thing, is actually saying something very different, sometimes the very opposite. Really, all it takes is simple literacy to see the misrepresentations. They can be so staggeringly obvious that it is remarkable he thinks he can get away with them. At this point the explanation has to move from something like sloppiness (haste, distraction, whatever) toward something a bit deeper, like a hermeneutic incapacity (or dishonesty).

The Venerable Quine’s response to Chomsky’s characterizations of his classic Word and Object chimes with Boden’s:

Chomsky’s remarks leave me with feelings at once of reassurance and frustration. What I find reassuring is that he nowhere clearly disagrees with my position. What I find frustrating is that he expresses much disagreement with what he thinks to be my position. (Quine 1969:302)

Worse than the distortion for Quine—whom Chomsky first approached at Harvard when Quine was a professor and Chomsky a Junior Fellow; who became a mentor and a friend—was the latter’s utter lack of charity:

The more absurd the doctrine attributed to someone, caeteris paribus, the less the likelihood that we have well construed his words. In Word and Object I urged this precept. . . . I wish Chomsky had considered this precept before attributing to me the absurd belief that . . . .(304)

Quine’s precept is exactly inverse of Richards’s hermeneutic disorder, incompatible with someone who perpetually believes (or is simply content to imply) his opponents are stupid and ignorant. Chomsky, if he conceives someone as an adversary, appears to feel no responsibility for what their actual claims or intentions might be—very possibly does not even notice what their actual claims or intentions might be—only for his own interpretation of their claims or intentions. Boden remarks on his practice

of continually quoting contemptuous terms picked out of my text as though I had used them myself. I had not. Rather, I had quoted them—from critiques of Chomsky written by professional linguists and psycholinguists. His inability to see the difference leads to misinterpretation over and over again. It follows from the elementary distinction between mention and use that quotation doesn’t necessarily imply agreement. (Boden 2008:1958)

In fact, the terms needn’t even be contemptuous, just unwelcome to Chomsky, for reasons often known only to him. My own experience is exactly like Boden’s, except in private.

Chomsky’s public responses to the first edition of this book are highly disparaging, using terms that don’t fit my image of the book or of my scholarship at all well; or, in fact, the evidence. My account of the Linguistics Wars he calls, in one of his favored curse words, Foucauldian; I am a postmodernist, he says, believing all is power, nothing is grounded in fact (see Grewendorf 1994; Barsky 1998:56–57). You’ve read this far now. Have you caught a whiff of Foucault? Go check the bibliography. I’ll be here when you get back.

Find Foucault? Any of the familiar French suspects? Derrida? Lyotard? Baudrillard? Turn the whole text upside down in a Where’s-Postmodernist-Waldo game if you like. There are no claims that science is a mere collection of power plays, no denigration of facts. But Chomsky’s insults are vague and allusive and amount to little more than pique; they can be set aside. Chomsky was kind enough (read no sarcasm into the adjective—it was an incredible kindness) to correspond with me at length about the book, however, so I have extensive and detailed letters from him with far more specific criticisms of the book, and other correspondence I’ve had with him follows the exact same pattern.

My experience of his reading (and quoting and paraphrasing) practices is precisely the one Boden describes and Quine laments and scores of others have complained about, maybe hundreds of others—Chomsky picking words and phrases out of a context that compromises or contradicts the uses to which he chooses to put those words or phrases. But my experience is an even more perplexing one, because it is in private letters and emails, in multiple iterations. I am the only person to see his characterizations, and they are about words that I wrote, and yet Chomsky just has his way with them, repeatedly, in our correspondence.²³

In perhaps the most breathtaking example of Chomsky’s misreading, Chris Knight advanced the position that the high level of abstractness of Chomsky’s theories is a deliberate strategy to make those theories utterly unusable for such things as natural language interfaces, to prevent the military (which funded a great deal of Chomsky’s early research) from incorporating them into its killing machines.²⁴ Knight’s whole argument depends on the premise that Chomsky “was at all times refusing to collude” with the military (Knight 2018b). Somewhat astonishingly, though, Chomsky seems to think that Knight slanderously accuses him of complicity with the U.S. military—that his active resistance to the war in Vietnam refutes Knight’s position rather than, as it actually does, supports Knight’s position. Raising the absurdity stakes of his misreading to greater and greater levels, Chomsky goes on to point out that “Knight sidesteps” the matter of his anti-Vietnam activities (Chomsky 2017). Apparently, passages like this one evaded Chomsky’s gaze:

Already celebrated for his linguistics, he soon began commanding a mass audience, helping to organize draft card burning and other direct action against all aspects of the war. In October 1967, with many thousands of others, he attempted to form a human chain around the Pentagon. (Knight 2016:4)

We see Chomsky’s combative attitude to other people’s words playing out in the Linguistics Wars during a rather notorious episode for the Generative Semanticists, an exchange Chomsky and McCawley pursued over the respectively argument. Lakoff had heard Chomsky’s treatment of McCawley’s argument in lectures at MIT, and wrote about it to McCawley, who then wrote to Chomsky trying to sort things out. Chomsky responded as only Chomsky responds: a twelve-page, single-spaced letter—with minor editing and a few transitions, it could be an article—telling McCawley that he misunderstands himself. I encourage you to read the saga in Huck & Goldsmith (1995:60–70), who chart the claims and counterclaims closely, pulling out the strains of insurrection in McCawley’s language, the strains of displeasure and distrust in Chomsky’s. But in terms of Chomsky’s rhetorical practices what stands out most sharply is his complete unwillingness to credit McCawley with any understanding of his own argument, his own words.

McCawley writes repeatedly that the position Chomsky ascribes to him is one that he rejects in the original article, and one that he wants, in private correspondence, for Chomsky to realize he rejects. Specifically, he explains to Chomsky that he is only proposing one respectively-transformation. The other provisional rules in the article (that is, the postscript to McCawley 1968a) are there only to be rejected, eliminating possible contenders to buttress the final version, a rather standard move in scientific argumentation. Chomsky won’t budge a nanometer. “I simply don’t see the ‘eventual analysis’ that you mention for ‘respectively,’ ” he says. “What I see are just the ‘bits and pieces’ that I tried to pull together, but no rejection of them” (Chomsky to McCawley, February 12, 1969). It doesn’t matter to him what his former student, just two years out from his doctorate, says about his own intentions and claims. It doesn’t matter that the relevant argument, while admittedly somewhat muddled, has very clear signposts of McCawley’s intentions (“There are compelling reasons why this proposal must be rejected. . . . Thus, in order to explain 141–149, it will be necessary to change the formulation of the respectively transformation. . . . the correct formulation of the respectively transformation must thus involve a set index”—McCawley 1976b [1968]:163–64). Nothing matters to Chomsky here but his own unutterable conviction that he has the truth, McCawley not.

As we know, Chomsky went ahead and published an attack on the argument in which he not only ignores McCawley’s stated intentions but ascribes the opposite intentions to him with such feigned clairvoyance as “Presumably, then, McCawley intends” and “McCawley seems to have in mind the following organization,” and then complains that the bits and pieces don’t go together. He follows this with the accusation that McCawley is fallaciously equivocating by using one name (“respectively-transformation”) for the different operations performed by those bits and pieces (1972 [1968]:76–80).

Time has not made him more perceptive to meanings on the page, nor more charitable in construing the intentions of others, nor any tenderer in his wording; it might be the reverse on all scores. The most outrageous recent example of these tendencies—we are back, now, in the public sphere—is in his response to an article by John Searle in the pages of the New York Review of Books, the scene of Searle’s famous, celebratory 1972 essay, “Chomsky’s Revolution.” Three decades on, after many shifts in Chomsky’s program, after the Linguistics Wars, after the flowering of alternate generative models, after the fading and then disappearance of Deep Structure, after the rise of Cognitive Linguistics, after the quicksand of shifting terminology (transformations replaced by computational operation, competence by I-Language, sort of, performance out, with E-Language as a new kind of foil), and especially after yet one more book by Chomsky in his perpetual-revolutionary mode, on the goals and prospects and achievements of his framework—this one entitled New Horizons in the Study of Language and Mind (Chomsky 2000)—Searle had second thoughts about the revolution.

Under the guise of reviewing New Horizons, Searle offered his reappraisal, signaled by the title, “End of the Revolution.” Searle’s official verdict is, “Judged by the objectives stated in the original manifestoes, the revolution has not succeeded.” It is hard to disagree if we recall, for instance, Lees’s promotional optimism that Chomsky was ushering in “a comprehensive theory of language which may be understood in the same sense that a chemical [or] biological theory is ordinarily understood by experts in those fields” (Lees 1957:377). That has not proven to be the case.

Searle gives Chomsky his due. The review is respectful, thoughtful, and far from an attack. Searle says,

Something else may have succeeded, or may eventually succeed, but the goals of the original revolution have been altered and in a sense abandoned. I think Chomsky would say that this shows not a failure of the original project but a redefinition of its goals in ways dictated by new discoveries, and that such redefinitions are typical of ongoing scientific research projects. (Searle 2002a)

Searle is nostalgic. The Aspects theory had warmed the linguistic cockles of his heart, as it had warmed cockles throughout much of the academic world at the time, and he misses it. He sketches out the theory wistfully, and says,

It was a beautiful theory. But the effort to obtain sets of . . . rules that could generate all and only the sentences of a natural language failed. Why? I don’t know, . . . but seen from outside a striking feature of the failure is that in Chomsky’s later work even the apparently most well-substantiated rules, such as the rule for forming passive sentences from active sentences, have been quietly given up. (Searle 2002a)

He supports this lament with this quotation from New Horizons indicating that Chomsky’s program had indeed

rejected the concept of rule and grammatical construction entirely: there are no rules for forming relative clauses in Hindi, verb phrases in Swahili, passives in Japanese, and so on. (Chomsky 2000:8; see Searle 2002b)

We know Chomsky. We can’t expect him to respond simply that, yes indeed, he now believes the Aspects conception of rules and constructions has to go. “All the properties which were explained in terms of deep and surface structure were really mistakenly described,” he might have said, as he in fact had said to a New York Times reporter a few years earlier. “And they ought to be explained, and maybe can be explained better, without postulating these systems” (Fox 1998). He might have clarified that the short-term goals that those systems attempted to satisfy now look to be mistaken, and that longer-term, higher-order goals now take precedence. The hierarchy of values has changed—if not the “redefinition of goals” as Searle suggests, perhaps a realignment of goals.

However, it’s Chomsky. So, nothing doing. He is contemptuous of the “redefinition” olive branch Searle holds out, and simply denies ever having been involved in the beautiful theory Searle outlined:²⁵

Searle’s project was never entertained, in fact never even mentioned in the past fifty years except to stress—explicitly, forcefully, and unambiguously—that [generative grammar] adopted a conception of the nature, use, and acquisition of language in which his notions play no role at all.

. . .

Consider Searle’s most “striking feature of the failure” of [my] “revolution”: that “even the apparently most well-substantiated rules . . . have been quietly given up,” specifically, the rule converting “John loves Mary” into “Mary is loved by John.” It is true, as he says, that “nobody thinks that anymore,” because no one ever did. The rule could not have been “given up” because nothing remotely like it was even formulable in the GG framework. (Chomsky 2002b)

There was, it seems, no Passive Rule in generative grammar, ever; never proposed, never considered, never, ever, thought of, even remotely, and Searle is an utter bonehead—the response bristles with typical Chomskyan ad personams about intellectual feebleness (“[Searle] completely misunderstands . . . persistently misconstrues . . . misses entirely . . . fails to comprehend”)—for ever believing there was a project in any way associated with Chomsky that might have included a rule for such phenomena as passive. In fact, such a conception was explicitly, forcefully, and unambiguously rejected for the entirety of Chomsky’s career.

The beautiful theory never happened.

Searle surely knew to expect contention and invective. He had been around the block with Chomsky more than once, but the tack Chomsky takes stuns him all the same. “Does Chomsky seriously deny that he held this conception?” he asks in astonishment:

I went to MIT in the 1960s to do research with Chomsky precisely because I was writing a book on these and related issues. There definitely was a passivization rule. I heard Chomsky explain it in detail in lectures at MIT and read it in his books. (Searle 2002c)

Readers with any knowledge of Chomsky’s linguistics at all—and NYRB readers are pretty well informed—would have been just as baffled as Searle. Chomsky was known as the champion of a new type of linguistic rule, the transformation, and that rule type was exemplified by such things as the relations between active and passive sentences. Indeed, anyone who remembered Searle’s original article, in which “Chomsky’s transformational rules” are hailed because they “show the similarity of the passive to the active” (Searle 1972), might wonder why it took Chomsky thirty years to object.

Chomsky is not lying. He is selecting and deflecting very ungenerously but he is right, in a niggardly, hair-splitting way, hoisting Searle on the petard of a ridiculous technicality. Sentences were not converted into other sentences in the Aspects model. True. Deep Structures were converted into Surface Structures, a different theoretical conception altogether. Chomsky is right. He is just absurdly pedantic in his claim to rightness.

But one might wonder, also, about this grotesque misrepresentation of Transformational Grammar from another misbegotten soul: the allegation that there is a transformation which “converts ‘the boy is tall’ into ‘the tall boy.’ ” Or the same confused author, years later, saying, “There is a transformation which converts ‘that he left was unfortunate’ . . . into ‘it was unfortunate that he left’ ” This author, there can be no question, completely misunderstands Transformational Generative Grammar, persistently misconstrues it, misses entirely the way it functions, fails to comprehend it utterly.

We’ve been keeping company in this book long enough, you and I, that you surely know the punchline: Yes, these quotations are from Chomsky (1957a: 72, and 1964f:13, respectively). Chomsky can say whatever he likes, in service of informality and explication for a nontechnical audience. But woe betide Searle if he engages in a similar kind of short-hand. Chomsky is permitted to talk loosely, because he really knows the truth, but Searle gets no such license. Looseness for him can only prove incontrovertibly that Searle cannot possibly understand what he is talking about.

If Chomsky is not lying for sport here, as Postal has it, playing chess with extra pieces, he is certainly playing some unwholesome game, some highly skewed scholastic amusement, some excruciatingly petty variant of dialectics, with pieces camouflaged by private definitions, serving one goal and one goal only, to crush Searle.

And there’s something else: Chomsky routinely courts outrage, saying things radically against the grain. Championing rationalism was outrageous in the context of late-1950s philosophy of mind. Advocating mentalism in the context of late-1950s linguistic positivism was outrageous. Appealing to complexity in the simplicity-suffused rhetorical ecology of mid-1960s Transformational Grammar was outrageous; then turning to simplicity/minimalism in the early-1990s Principles and Parameters ecology, suffused as it was with intricacy and complexity; and proposing the 10¹⁰ neuron-packing account of language evolution; and the repeated claim that “there is only one human language, apart from the lexicon” (Chomsky 1995:131), just a few words separating Hittite from Mandarin from Kwak’wala; and, stretching for further shock-value, that there is really only one organism on earth “though with many apparent superficial variations” (Berwick and Chomsky 2016:61), like amoebae, elephants, and praying mantises; . . . one could keep this list of outrageous statements going for a page or two, it is such a standard move of Chomsky’s argumentation, Chomsky’s thought, Chomsky’s rhetoric.

Chomsky has even suggested it is perfectly reasonable that “humans [could] come to have an innate stock of notions including carburetor and bureaucrat” in the course of evolution; nature would only have to “anticipate” the relevant “future physical and cultural . . . contingencies” (2000a:65–66). And, perhaps the most outrageous claim of all, casually advancing the most discredited and ridiculed philosophy of mind in the history of philosophy of mind: that we have “some kind of internal homunculus ‘understander’ and agent,” a central scrutinizer of some kind “who’s using the entire sound and entire meaning.” When we speak, he says, we have a lot of factors beyond Merge and our computational atoms in play, factors that are

Proclaiming to Searle there never were any (sentence-to-sentence) rules in Transformational Grammar, especially no such Passive Rule, is only mildly eccentric in Chomskyan terms.

There is, then, more to Chomsky’s unreliability-with-truth problem than just impaired reading, because his claims, no matter how outrageous, have a foundation of some kind, often a very sturdy one, as with rationalism and mentalism, sometimes a very flimsy one, as with the never-were-any-rules maneuver.

He has carried out some similar sleight-of-hand with the once theory-defining notion of grammaticality. The Generative Semanticists eroded that notion, as we recall, then publicly rejected it as an artifact with little use to linguistics distinct from the context of utterance. Chomsky would have none of it: the first case he prosecutes in “Some Empirical Issues,” the only paper he admits to be a “response” to Generative Semantics, is against Lakoff’s suggestion that there is no encapsulated notion of grammaticality distinct from context and intention. “Consider the matter of grammaticalness,” Chomsky says, thumbs in his vest, “certainly a fundamental issue,” before dismissing Lakoff’s argument as a terminological confusion, “which is typical of . . . Lakoff” (1972b [1968]:120, 122). But things change in Chomsky’s program, and Chomsky later came to abandon grammaticality as well—in a different and altogether more imperious way than the Generative Semanticists. Chomsky doesn’t abandon grammaticality by arguing against it, as they had; nor does he so much as hint there had ever been a controversy about it, any point on which an argument might be raised. He abandons the notion of grammaticality by simply denying it even needed abandoning, denying that it had any importance in his work at all, or in any work related to his program, ever. The notion of grammaticality “has no significance,” he said in the 1990s. Not now, not ever:

The concepts “well-formed” and “grammatical” remain without characterization or known empirical justification; they played virtually no role in early work on generative grammar except in informal exposition, or since. See Chomsky 1955[a], 1965; and on various misunderstandings, Chomsky 1980b, 1986b (Chomsky 1995 [1993]:213n7)²⁷

If you’re doubting your memory about whether grammaticality did play a role, an absolutely definitional role, in early generative grammar, feel free to go back to Chapters 2 and 3 and dig up some of the many quotations on the matter, from Chomsky and others, or see if you can figure out what all those asterisks are marking if not a lack of grammaticality, or if you can sort out what the point of that famous Chomsky sentence, “Colorless green ideas sleep furiously” was if not to identify grammaticality as separate from semantics. Who would bother to separate grammaticality from meaning if it was a notion without significance?

There is a doth-protest-too-much quality about the abandonment, but the question presents itself, “To whom is he protesting?” No one is arguing with Chomsky that yes, in fact, grammaticality/well-formedness was important in early work (where early ranges from 1955 to 1993). Yet Chomsky insists that grammaticality has always been a negligible concept in generative grammar, without form or function, and always will be. If that is the case, one of course wonders, why would it need to be said?²⁸

The claim is so outrageous that Pullum calls it a “direct falsehood,” and, just to be sure we don’t miss his point, “certainly an untruth.” It is hard to disagree with him. Chomsky’s fingers always seem to be crossed behind his back on claims like this one. But with enough casuistic stretching the claim can be contextualized into some level of Humpty-Dumptyan veridicality (“When I use a word . . . it means just what I choose it to mean—neither more nor less,” L. Carroll 1999 [1872]: 213). One just needs to read Chomsky very, very charitably, an exercise we will now mount, pausing only to note that a similar level of charity, or any level at all, is notably lacking in Chomsky’s reception of other people’s arguments.

Clues can be found in two books Chomsky says are concerned with “various misunderstandings” about well-formedness, Rules and Representations (1980b [1978]) and Knowledge of Language (1986). Both of them are books in which Chomsky pursues or instigates many philosophical debates, so sifting through them for the treatment of the specific misunderstandings he has in mind here, is a bit like searching for a needle in a haystack; or, rather for a secret decoder ring—the one etched “Well-formedness/grammaticality claims. Save for 1993”—in a secret-decoder-ring stack. But it’s there. In particular, with his introduction of the terms I-Language and E-Language in Knowledge of Language, Chomsky does some lumping and splitting with respect to well-formedness that foreshadows his later dismissal of its importance, seeming to associate it with E-Language only. If my semantic archaeology is correct, well-formedness/grammaticality belongs with E-language. E-language played virtually no role in generative grammar. Therefore, well-formedness played virtually no role in generative grammar. Oh, and E-Language is without characterization or known empirical justification, so well-formedness is without characterization or known empirical justification.

Chomsky’s unreliability with truth here comes not with his positions on well-formedness, E-Language, and so on, all of which are articulated fairly carefully and in consonance with the Minimalist Program, which he was initiating in 1993 and which is distinctly uninterested in issues of formal representation or grammaticality. The unreliability comes with Chomsky’s retrofitting of this position to a history that, so far from supporting it, only provides evidence that it is a direct falsehood, certainly an untruth. But Chomsky treats his own history, theoretical and academic, as a blob of clay that he can shape any way he likes and never has to fire.

In some imagined universe, it may be that Chomsky always had a trajectory to the Minimalist Program in mind from his first recorded efforts, his BA thesis, and that dallying with well-formedness/grammaticality in the Logical Structure to Lectures on Government and Binding period was just a necessary but informal sandboxing of ideas along the way. It may be that when the Generative Semanticists repudiated grammaticality it was premature in this trajectory to share that rejection publicly; in fact, that rejection had to be exposed as a terminological ploy, without substance. But the concept of grammaticality was always informal, without real character or a genuine role, and once the conditions were right this could be made explicit. Grammaticality could be revealed as a trivial notion all along.

In such a universe, with such special knowledge of the long-term plan, and therefore of the meaning of such words as well-formed and informal exposition, Chomsky’s claim becomes truthful. And, just maybe, that universe is the one Chomsky inhabits. None of this elaborate rationalizing refutes Pullum’s accusation of a “direct falsehood,” of course, but it may be that the rest of us, including Pullum, saw one history. Chomsky lived another.

The same perhaps holds for Chomsky’s personal history. Certainly, he lived it intimately, and the rest of us only have data to go on. From the inside, as Chomsky saw his youth,

He always felt completely out of tune with almost everything around him . . . he was always either alone or part of a tiny minority [in his political beliefs]. . . .

He was always on the side of the losers. . . .

[After the bombing of Hiroshima] he just walked off by himself into the woods and stayed alone for a couple of hours. He felt completely isolated. (Otero 1988:22)

The perceptions of his professional life are in close sync with these feelings. In the late 1970s, for instance, after two decades of being lionized as the Einstein of linguistics for utterly changing the face of the discipline, after having spearheaded at least three recognizably distinct approaches, with their distinct groups of devoted followers, he could say

As I look back over my own relation to the field [of linguistics], at every point it has been completely isolated, or almost completely isolated (Chomsky 1982a [1979–80]:42)

He feels this stance so deeply and so aggressively that he takes offense at anyone who suggests otherwise. “Uh-oh. I think I’ve insulted Noam Chomsky,” begins a 1990 article in Scientific American. The author had. His mistake was identifying Chomsky as the principal authority in linguistics.

“No I’m not,” [Chomsky] snaps. His voice—which ordinarily is almost hypnotically calm, even when he is eviscerating someone—suddenly has an edge. “My position in linguistics is a minority position, and it always has been.” (1990:40)

From the outside, this view appears bizarre in the extreme—one that it is virtually inconceivable anyone else would hold about Chomsky’s relation to the field of linguistics—but there may be some justification for this view in Chomsky’s frequent feeling about the way others construe his ideas.

There seem regularly to be substantial misalignments between the intentions behind his claims or observations (to the extent they can be reconstructed) and the interpretations of those claims and observations by others. Certainly he often decries the meanings people ascribe to his statements. His biographer, Robert Barsky, commenting on ideology and indoctrination, says that those factors cause many people to be “impervious to what Chomsky considers obvious truths” (Barsky 1998:17). That is certainly accurate in some cases, possibly most, but another lesson here is that the things Chomsky considers to be obvious truths can be opaque to many others; maybe—dare we say it?—not true at all. The fault might be in ideology (the other person’s). It might be intellectual deficiency (obviously, the other person’s). But once in a while, something neither Barsky nor Chomsky appear prepared to admit, it might just be Chomsky.

It might be Chomsky, and not only because he is unclear or imprecise, which he undoubtedly can be. It might be that he has these unshakeable Humpty-Dumpty convictions about the meaning of words that others cannot access. Despite using the words well-formed and grammatical in ways that look absolutely definitional of generative grammar for almost four decades, and using them in ways that look pretty rigorous at times, Chomsky can one day remark casually that they have only been used loosely because the concepts have always been peripheral to his program. It’s what he always thought in this isolated Chomskyan universe, one supposes, but didn’t think relevant to tell the rest of us.

In much the way that his polemical targets feel misrepresented when he characterizes their views or even their explicit claims, Chomsky apparently feels misunderstood when he is confronted with these misalignments between his apparently private meanings and the public meanings of words and arguments. It’s true that he almost always ascribes unwelcome construals to the low cerebral wattage of the construer, in tones that do not encourage harmony. But there is no reason to believe his feelings are not genuine. He has said regularly he doesn’t want followers, in any domain, and he consistently denies any sort of mentorship role to the great scholars under whom he studied (Harris, Jakobson, Quine, Goodman); rather, in his perceptions, they misunderstood what little they bothered to read or listen to.

Chomsky courts isolation, in his meanings perhaps as much as in his disciplinary role. Maybe that explains the Linguistics Wars and related episodes. Maybe he wants to alienate overly enthusiastic acolytes, to purge his program regularly.

Is this why he would take the unpublished thesis of a junior lecturer to be the chief document of the “Transformationalist position” on nominalizations, why he would take George Lakoff as its chief representative, not Zellig Harris, who invented the position, not Robert Lees, whose Grammar of English Nominalizations was “the fullest transformationally-oriented grammar” available between “the publication in 1957 of Noam Chomsky’s apocalyptic little book, Syntactic Structures” and Aspects (Schachter 1962:134)? Indeed, why target anyone? Why not offer up his new treatment as act of explicit renovation of his own direct proposals, since the transformational treatment of nominalizations was a set piece of Syntactic Structures (1957a:72) and promoted (in opposition to a ‘lexicalist’ account!) in Aspects (1965 [1964]:184)?²⁹

Maybe all of this talk of Humpty-Dumpty and self-image is just a polite and convolutedly psychobiographical way of saying that Chomsky is delusional. Certainly, his claims about the history of his framework, and about the history of his own professional rise, and about the history of his involvement in the Linguistics Wars, are significantly incompatible with the documentary record. His claims are also incompatible with the recollections of many observers, some of whom worked quite closely with him. His claims are so incompatible with documentary and testimonial evidence that we can be sure Chomsky would not hesitate to call them delusional if they came from someone else. I leave the final judgment to you. What else can I do?

Returning to the rhetorical climate of the field, Chomsky’s attitudes have unquestionably bred an unwholesome contentiousness in linguistics. Contentiousness is far from universal, is perhaps now at its lowest level than in decades, and Chomsky’s own work seems now more relaxed in this respect than it has ever been. Even in the Pirahã affair he said very little beyond the muttered aside that Everett is a “charlatão puro,” whose work is unfit for the attention of serious linguists. Chomsky seems more willing to shift ground in agreeable ways these days, than to shift it in discordant ways. He did not retrench and double down on recursion in the face of the Pirahã data, his familiar pattern. Rather, he said, “It doesn’t matter.” Language is recursive, we would not have language without the mutation that delivered the recursive Merge to us, recursion might be all there is to Universal Grammar, and it is the fulcrum of the Basic Property, but apparently not all languages need to include recursion.

The contentiousness, too, pre-dates Chomsky’s ascension. He didn’t invent bad behavior in linguistics. He entered the field when scorn for traditional language teaching, and occasionally for each other, was common among linguists. But like the increase with many aspects of late Bloomfieldianism (formalism, meaning containment, the interest in syntax, psychology, and mathematical modeling), the level of rancor increased exponentially under Chomsky’s influence. And, for reasons we have seen repeatedly, the eristic climate is rightly associated more closely with Chomsky than with any other figure in the modern history of the field.

The Legacy?

When the intellectual history of this age is written Chomsky is the only linguist whom anybody will remember.

—Geoffrey Nunberg (Goldstein 2008)

Whether Chomsky is right about the nature of human language and cognition is an easy question: he isn’t. More interesting, to my mind, is the question how it could have come about that someone acquired such a towering reputation on such a flimsy basis. . . . Chomsky has been adequately refuted before, for readers willing to entertain the possibility of his being wrong. The time for that may be past. What we do with clowns is simply laugh at them.

—Geoffrey Sampson (2016:597, 601)

In 2011 at the big 150th birthday party that MIT threw for itself, answering a question as part of a panel on the institution’s impact on cognitive science, Chomsky casually and sweepingly dismissed a whole field of inquiry, as he is wont to do. This time it was statistical modeling of language behavior. He said that what researchers in that area call success is novel in the history of science and epistemology, novel not in a good way—not, Chomsky insisted, in a “sense that science has ever been interested in.” Ouch.

Greg Marcus notes that “a more polite person would have put that sentiment more gently . . . [while] a less influential person would simply have been ignored.” Neither of these characteristics is to be found in Chomsky. He is not polite in his opinions, and he is a big, big gorilla. So, what happened?

One of the world’s busiest and most influential software engineers—Peter Norvig, the director of research at Google—wrote an eight thousand four hundred word blog post, extensively footnoted, critiquing Chomsky’s remark. It was two hundred or so paragraphs in response to [less than three minutes’ extemporaneous commentary]. In practically any other context, it would be unseemly for a leading researcher at one of the world’s largest companies to spend so much effort picking on an off-the-cuff remark made by a man in his eighties.

“But I see it in a different way,” Marcus adds. “Two titans facing off, with Chomsky, as ever, defining the contest” (Marcus 2012).³⁰

Chomsky is titanic, no question, but his influence in linguistics may be as weak now as it has been in any of the seven decades that he has been ploughing the field. As we have seen repeatedly, Chomsky’s perception of his relation to the rest of linguistics, if not the rest of the world, is aggressively marginal. But, as we go to press, Chomsky might finally be getting his wish. They overstate it for their own promotional purposes, but Ibbotson and Tomasello are not far off the mark when they observe that “cognitive scientists and linguists have abandoned Chomsky’s ‘universal grammar’ theory in droves” (2016).

It is impossible not to notice, among the many other factors in Chomsky’s relative wane, the relative waxing of George Lakoff. Since shaping and settling into the Cognitive Linguistics framework, he has been the more stable theorizer of the two, and Cognitive Linguistics is very successful. No one would have disputed Nunberg’s observation about Chomsky’s towering role in the intellectual history of this age a decade or two ago, but Lakoff may be giving him a run for his money these days.

We have these superficial languages that we speak with, the hero (actually named Hiro) of a near-future science fiction novel says, written when Chomsky’s fame was at one of its highest peaks; but underneath those languages, there’s another one, “based in the deep structures of the brain, that everyone shares.” Hiro goes on: “These structures consist of basic neural circuits that have to exist in order to allow our brains to acquire higher languages.” Further, “ ‘deep structure’ and ‘infrastructure’ mean the same thing.” Hiro’s not just giving a lecture. As the climax looms, he’s explaining the plot of the novel, in which nefarious religio-fascistic forces are taking over the world by controlling the minds of computer programmers through special codes that

can tie into the deep structures, bypassing the higher language functions. . . . Once a neurolinguistic hacker plugs into the deep structures of our brain, we can’t get him out—because we can’t even control our own brain at such a basic level. (Stephenson 2003 [1992]:395)

The novel, Snow Crash, testifies not only to the popular reach of Chomsky’s ideas—the author is waxing poetical, but he has clearly read a fair measure of Chomsky’s linguistics, either secondhand or through books like Language and Mind or Reflections on Language—but to the enduring power of Deep Structure, not to mention the Universal Base hypothesis.

Snow Crash came out in the 1990s. A more recent science fiction novel, Embassytown, goes all in on Lakoff. Set on another planet, impossibly distant from earth and impossibly far in the future, it invokes George Lakoff as one of the great ancient thinkers on metaphor (Miéville 2012:141). Perhaps the author was basing his projection on sales numbers, which favor Lakoff over Chomsky by a very wide margin.

Metaphors We Live By has a sales record and a citational footprint surpassing Syntactic Structures, Aspects, Lectures on Government and Binding, and The Minimalist Program combined. And, as always with numbers, it’s not just the numbers. Like Chomsky’s most renowned works, Metaphors has wide spillage beyond the confines of linguistics. Metaphor as a research interest had been all but legislated out of linguistics in its structuralist phases, certainly within Chomskyan structuralism. But metaphor was a very hot property in a number of neighboring areas throughout the twentieth century, so the initial market for Lakoff and Johnson’s work came from other disciplines. English studies folk came to the book very quickly, psychologists and philosophers showed early interest, anthropologists and even computer scientists kicked the tires; then linguists slowly started to notice, and bolstered by the kind of work Lakoff highlighted in Women, Fire, and Dangerous Things, it really took off.

It was assuredly true in 1980 when Newmeyer made the assessment that, in terms of influence, “no viable alternative exist[ed]” to the Chomskyan paradigm (Newmeyer 1980:249), and for decades afterwards. But there is now. If we confine ourselves just to Chomsky’s more local program (the MP/Biolinguistics_NC amalgam—NC for Narrow Construal or for Noam Chomsky, take your pick), Cognitive Linguistics has now certainly surpassed it for disciplinary and cross-disciplinary influence. But even if we take the broader Chomskyan paradigm (including such approaches as Head-Driven Phrase Structure Grammar, Lexical Functional Grammar, Dependency Grammar, and Categorial Grammar), Cognitive Linguistics certainly rivals it. Chomsky’s titanic imprint is receding.

Chomsky’s legacy certainly seems to be on Chomsky’s mind lately, and on the minds of those around him, judged by the tone, the content, and the bodies of work published since the turn of the century, with a consistent and ongoing rhetorical maneuvering to make Chomsky the solely defining voice in linguistics; turning “to beam at the past,” as Nabokov puts it, “while massaging the lenses of the present” (1996 [1953]:303). A familiar refrain out of Chomsky’s camp, for instance, often word-for-word, is that it “is no exaggeration to say that more has been learned about languages in the past twenty-five years than in the earlier millennia of serious inquiry into language” (Chomsky 2008b:7; Berwick & Chomsky 2011:29, 2016:69; Polychroniou 2016).³¹ Undoubtedly, there is little exaggeration here, except in its tacit implication that all this learning traces to Chomsky and his insights. It is fair to wonder, (1) is there a field of knowledge about which the same claim might not be made, after the computationally aided, massively state- and privately funded epistemological explosion over those same years? and (2) is the increase solely, or even largely, a function (as is always suggested) of Chomskyan programs? Would we not want to count corpus linguistics, cognitive linguistics, sociolinguistics, psycholinguistics, neurolinguistics, pragmatics, and computational linguistics, to name a few of the fields and approaches that are largely either indifferent or antipathetic to Chomsky’s work, as contributing to, if not dominating, that expansion of knowledge? Would we not also pause to note, as Blevins does, specifically of the intensive study directly under Chomsky’s watch that it has “produced less of the stable, incremental progress that had been anticipated by early commentators” (2008:723), that it has lurched from model to model, approach to approach, without a clear inventory of which analyses or which results belong only to the period in which they surfaced, and which are claimed to transcend the particular models and approaches?

Similarly, one recurrently reads in the Chomskyan literature that his Merge-pivoting proposals are the focus of widespread evolutionary attention. In the course of dismissing a wealth of proposals in evolutionary linguistics, for instance, Berwick and Chomsky write that for other theorists

Merge has ridden in on the back of, well, almost anything else but what we’ve discussed in these pages: hierarchical motor planning; gestures; music; pre-Google era complex navigation or its rehearsal; complex food caching; a compositional language of thought; a qualitative difference in human plans; knot tying; or even—we’re not joking—baked potatoes. (The story being that we, but not other animals, gained more gene copies to build the enzymes handling more easily digested cooked starch, and this fueled brain expansion after the invention of fire . . .) We’re not convinced. (Berwick & Chomsky 2016:158)

Bless their hearts, all these other researchers, this clutch of country bumpkins with their comically earnest just-so stories, are bustling about, doing their damnedest to explain the remarkable evolutionary arrival of Merge, and failing to see it right before their eyes in the perfectly obvious proposals of Chomsky’s kapow! theory. One might look in vain, however, among the gesture theorists or the music-first theorists or the baked-potato theorists for Merge riding in at all, or for any transformations/computational procedures, or even for (kindly) mentions of Chomsky’s program. The baked-potato theorists (Hardy et al. 2015) do not, for instance, give any indication that Merge is the entrée, and the most prominent gesture-first theorists penned these bitter lines in Scientific American:

Evidence has overtaken Chomsky’s theory, which has been inching toward a slow death for years. It is dying so slowly because, as physicist Max Planck once noted, older scholars tend to hang on to the old ways: “Science progresses one funeral at a time.” (Ibbotson & Tomasello 2016; the Planck quotation, incidentally, is spurious, and is not infrequently used in attacks on Chomsky)

Another article by Tomasello is even more direct, but has the good taste to keep its funereal suggestions to the entirely theoretical domain: “Universal Grammar is Dead” (2009). Not quite. There are busy hands exploring the implications of Merge, though it is almost exclusively by linguists flying the Biolinguistics_NC flag, not by evolutionary theorists of language more generally. Many of them are interested in the evolution of symbols, for instance, about which Merge says nothing.

But it is in Chomsky’s publication profile since the turn of the century that one most sees a growing preoccupation with cementing, unifying, and branding Chomsky’s role in linguistics. First came New Horizons in the Study of Language and Mind (2000), which opens with a whiggishly historical essay delivering up the Minimalist Program. More recently we have What Kind of Creatures Are We? (2016a) which addresses perhaps the most fundamental question of humanity by sifting through the historical results of Chomskyan research, and Why Only Us? (Berwick & Chomsky 2016), which makes the strongest general-audience claims for his Biolinguistic program. That year also saw the political book, titularly sporting yet another rhetorical question, Who Rules the World? (Chomsky 2016b). (Three books in one year, at 87 years old! “How do you account for your amazing stamina and energy level?” an interviewer for the New York Times asked him. “The bicycle theory,” he said. “As long as you keep riding, you don’t fall” [Tanenhaus 2016].)

In between New Horizons and the latest spate, among many other publications, have come new editions on the linguistic front of Syntactic Structures (2002), The Generative Enterprise (2004), Aspects of the Theory of Syntax (2014; “the fiftieth anniversary edition”) and The Minimalist Program (2015; “the twentieth anniversary edition”), as well as, on the philosophical and cognitive science front, new editions of Rules and Representations (2005), Cartesian Linguistics (2002c, 2009), and Language and Mind (2006), as well as a retrospective collection of many technical papers, including several scattered contributions to the Minimalist Program, Chomsky’s Linguistics (Chomsky 2012).³² All of these volumes come with introductions or prefaces or forewords—in a few cases, new chapters—that tell the same story. They foreground Biolinguistics, a term that otherwise does not occur in the books, and the Minimalist Program, a phrase that does not appear in any of them prior to the titular monograph, as the endpoints of the natural, inevitable, uninterrupted march of progress. “MP is a seamless continuation of pursuits,” Chomsky says in the new preface to The Minimalist Program, “that trace back to the origins of generative grammar, even before the general biolinguistics program, as it is now often called, began to take shape in the 1950s” (Chomsky 2015b:vii).

The irony of the new Aspects preface laying out a minimalist architecture is particularly exquisite. “The most elementary fact about each individual’s language,” the new preface begins,

is that it generates a pair of interpretations (sensorimotor (SM), conceptual-intentional (C-I)) for each of infinitely many hierarchically structured expressions, where SM is the link to organs of externalization (typically articulatory-auditory) and C-I is the link to the systems of thought and action. We can refer to this virtual truism as the Basic [Property] of human language, instantiated in one or another form in the brain of each language user. (Chomsky 2015a:ix)³³

Aspects does indeed suggest such a mediational architecture, linking form (here, SM; in Aspects proper, phonological representation) and meaning (C-I; semantic interpretation), though we have seen in considerable depth what happened after Aspects, when Chomsky embraced complexity and found highly distasteful proposals from the Lakoffs, Ross, McCawley, and Postal (as well as related ones from Bach, Fillmore, and others) that took the Basic Property to be, well, Basic, with one result being that whatever progress made since the Aspects watershed has been be very far from seamless. What would linguistics look like, one wonders, if Chomsky, instead of conjuring up “Remarks on Nominalization” in that period, had leapt from his bath with a cry of “Eureka!,” conjured up “The Minimalist Program,” and called Postal on the phone with “Hey, Paul! Let’s really get after this virtual truism we’ve been building toward. A priori, it’s the Best Theory possible!”

But, of course, that would not, could not, have been Chomsky, the constant revolutionary, the rebel without a pause button. In the methinks-’tis-a-camel;-nay,-a-weasel;-nay,-a-whale department, it has perhaps been easy to miss the menagerie for the beasts in this story, so let me remind you that the major reconfigurations have not just been changes of direction, but abrupt reversals over the operation of his defining mechanism, the transformation. In Syntactic Structures through Aspects, transformations had structural descriptions. If the descriptions were met, the transformations fired; if not, the transformations passed by obliviously (but were presumed, in some abstract way, to “look” for their structural descriptions). In the post-Aspects period, reaching a crescendo with Move of Lectures, transformations (though perhaps the plural no longer makes sense at this point) applied with increasing promiscuity—whenever and wherever they could unless they encountered specific preclusions in specific circumstances. Constraints prevented movements. First transformations were prescribed, in other words, then they were proscribed. With the Minimalist Program, transformations once again “looked” for ways and places to apply, operating inevitably, like water in an economy of gravity. They applied on a “least effort” principle—not a regime of preclusions, like GB, with constituents buzzing like insects, everywhere and anywhere, sometimes encountering a hole in the screen, a regime of blocks and constraints; rather, a methodological regime of natural and easy movement, a regime of attraction; not pushing away, but pulling toward,—and then Merge emerged, now operating under universal laws of computation.

What comes of all these involutions? What is Chomsky’s linguistic legacy? In a very real sense, everything.

This is far from the whole truth. Chomsky is a creature of Harris, not to mention of Goodman and Quine. He is a conduit for Saussure and Sapir and Bloomfield, not to mention (in his construals of them) Descartes and Humboldt. He is confederate with Bar-Hillel and Halle and Katz and Postal and Lasnik and Berwick. He is a wellspring for Lees and Fillmore and Jackendoff and Bresnan and Emonds, not to mention Ross and the Lakoffs and McCawley. He is neither without the profound influence of precursors nor the vast assistance of colleagues and disciples. Still, . . .

In the early 1950s, a shatteringly precocious young Chomsky did begin a research program anchored to the insight that “grammar will in general contain a recursive specification of a denumerable set of sentences” (Chomsky 1979 [1951]:67n2), an insight that has driven the field of linguistics pretty much ever since. It’s been a bumpy ride. Several variations on that theme grew mutually antagonistic, most dramatically with the Generative Semantics Heresy, and all of them were foundationally antagonistic to various other approaches. But Chomsky, the presence, has towered over the field for the better part of a century since he first wielded his recursion insight, and—look under every blade of grass, turn over every stone, sift every teaspoon of soil—there is not an inch of the field where one cannot find the presence of an idea he has conceived or provoked or fostered or reconceived, or which did not develop elsewhere in response to one of his ideas, or otherwise develop in some way under his influence. Even non-developments can be pinned on him. Corpus linguistics, as the most obvious example, should by all accounts have exploded in the late 1950s. The Bloomfieldians revered corpora, there were these new machines called computers that could process great quantities of texts rapidly, machine translation was a going concern, and Transformational Grammar was drawing impressive resources into linguistics, human and financial. But corpus linguistics didn’t really take off until the latter 1970s, and those lost decades are routinely traced to the contempt with which Chomsky dismissed the very notion of corpora in linguistics, and the loathing he holds for statistics in linguistics; attitudes he maintains. But Chomsky’s influence over corpus linguistics is literally definitive. Corpus linguists define their mission in strongly via negativa terms with respect to Chomsky.³⁴

Where might one go to find a corner of linguistics uninfluenced, or unruffled, or ungoaded, by Chomsky? Beats me. Nothing in linguistics would be the way it is without Chomsky—setting aside the tentacles of his work that have extended into English studies, philosophy, psychology, computer science, and ignoring his impact on political science, media studies, propaganda theories, and more.

That’s the short game, our contemporary noetic landscape, shaped as much by Chomsky’s divisiveness as by his brilliance, by his personality as by his theories. His contempt, his misrepresentations of history and of other peoples’ work and words, his truculent, embattled ethos—these will not be his long-game legacy. Nor will his astonishing generosity with his time, his tireless devotion to intellectual life, his encouragement and support of students and co-workers, his infusion of ideas into their work and critiques of their arguments, perhaps not even his prodigious and courageous career as a dissident; all of this may go the way of his affection for cable-knit sweaters and nerd-jeans. No one except aficionados of the history of science know that Newton viciously attacked Leibniz over the invention of the calculus, or that he was life-long celibate, or that he was legendarily as generous to strangers as he was to family and friends. But everyone knows he “discovered gravity.” The question of Chomsky’s legacy resolves to this: does he have a gravity?

Newton, in fact, has become one of Chomsky’s own preferred points of reference, along with Galileo and the evergreen Descartes. Chomsky regularly appeals to them as scientific exemplars, extracts methodologies and attitudes from his discussions of them, then argues that his own work reflects those methods and reflects those attitudes (and of course, that other people’s methods and attitudes do not). Five hundred years in the future, will Chomsky look anything like Newton?

And let’s ask another question: Will Lakoff be in the human sciences pantheon instead of Chomsky, or right next to Chomsky, as the Messiah of Metaphor, or the Father of Framing, or the “discoverer” of general-purpose cognition in linguistics, or not at all? In neither case, Chomsky or Lakoff, is Newton an appropriate point of reference. Each of them may have their “gravity,” but neither have anything approaching a Principia Mathematica.

Copernicus might be a more appropriate point of reference for Chomsky, though, if the optimism of his most ardent supporters is realized; if not, perhaps Franz Joseph Gall. Copernicus and Gall have both been hailed as scientific revolutionaries—Copernicus to the extent that the scientific revolution and the Copernican revolution are virtually synonymous; Gall to a far lesser, largely forgotten and mostly embarrassing extent. Neither has a Principia, but both have lasting legacies.

Nicolaus Copernicus compellingly mathematized the ancient theory that the sun was the center of the universe in 1543, overthrowing the long-dominant, Church-sanctioned, Ptolemaic theory that the earth was at the center. We all learn a version of this in grade school, heliocentrism triumphing over geocentrism, and it has become something of a parable for evidentiary reasoning over faith-based reasoning, secular science over religious dogmatism. The cultural ramifications were as profound as the more narrowly scientific ramifications, because scripture seemed to back the geocentric picture (but so, it is often forgotten, did some very reliable mathematics).

What is often left out of the parable, however, is that the Copernican Revolution took a long time, over a hundred years, and not just years in which Copernicus’s theory was simply repeated more and more clearly; years, rather, in which it was redefined, opposed by other scientists, defended by yet others, calibrated, and strengthened, both by observation and theoretical adjustment, gaining persuasion as it grew in scope and elegance. Tycho Brahe, for instance, partially adopted, partially rejected, heliocentrism, and developed his own model of the cosmos, adapting some of Copernicus’s math, but he also made some astronomical observations that compromised the Ptolemaic system. Johannes Kepler changed the geometry of planetary orbits. Copernicus had inherited the geocentric reverence for spherical motion, which required some tricky pushing of the numbers. Kepler showed that if one made the orbits elliptical the math got simpler. Galileo made all sorts of observations with a telescope that either compromised geocentrism or supported heliocentrism, or both. And Newton ultimately sealed the deal in a unifying theory (with Principia) that used Kepler’s work to derive his law of universal gravitation.³⁵ The moral in Copernicus’s tale might be something like “Big Ideas take a while. Don’t be in such a hurry!’

Franz Joseph Gall’s story has a rather different moral. He was a physician and physiologist who developed a modular theory of mind prominently featuring mental organs. Different regions of the brain for Gall were responsible for different talents and dispositions: math, language, and music, but also goodness, vanity, amorousness and religious devotion—twenty-seven organs in all, localized to different areas of the brain. Organology, he called his corresponding science. Since we all have those organs but different people have the traits and abilities associated with those organs in different measures, ran the theory, it stands to reason our organs will be differentially developed. Since they are differentially developed, they will be different sizes and shapes. Since they are different sizes and shapes, they will affect the contours of the skull. Since they affect the contours of the skull, one could tell a good person from a bad person, a math whiz from dyscalculic, a criminal from a priest, a hot tamale from a cold fish, by the bumps on their heads.

Gall was especially celebrated for the exquisitely illustrated (and hugely expensive) five-volume Anatomy and Physiology of the Nervous System in General and of the Brain in Particular with Observations on the Possibility of Understanding the Many Moral and Intellectual Dispositions of Man and Animals by the Configuration of Their Heads. He worked closely with Johann Gaspar Spurzheim for a period, co-authoring a few volumes of Anatomy and Physiology with him. But they split bitterly.³⁶ Gall termed his bump-reading enterprise cranioscopy. When Spurzheim broke with Gall, he plied this trade in England and France, naming the practice phrenology. He also increased the inventory of mental organs, and their corresponding bumps.

Gall’s theory was as influential and popular a theory of psychology in the nineteenth century as Freud’s was in the twentieth. Paul Broca, among others, hailed Gall as the author of a scientific revolution, and phrenology swept through Europe. Popularizations—books, pamphlets, lectures, public demonstrations—were commonplace. Itinerant traveling protrusion-readers sprung up. Contour maps of the head, carefully marked off for organ bumps, were all the rage—no doubt you’ve seen them yourself in antiquarian sepia. People read intimate understandings of others, and their children’s futures, and the propensities of entire classes of people, by palpating skulls. Doctors practiced phrenology in their offices. Journals like the Phrenological Journal and Magazine of Moral Science and the Phrenological Journal and Science of Health, populated the newsstands and the libraries.

Phrenology is remembered today, if it is remembered at all, as the height of pseudo-scientific quackery, but Gall was a serious scientist, with a medical degree, who conducted extensive studies of human and animal brains, publishing highly detailed analyses of the nervous system. He also brought new methods and standards of precision to dissection. “Gall’s anatomy was exceptional,” one historian of neuroscience says; “his work forced people to entertain the possibility that the cerebral cortex may be composed of myriad distinct functional organs” (Finger 2000:135). The phrenology journals were full of articles on neuroanatomy. But, after a few heated decades, the movement cooled off quickly. Various cliques of phrenologists and organologists, squabbling among themselves, produced different charts and organ assignments, the count of mental organs going up as high as forty in some quarters. Different readings of the same skulls were not uncommon, and counterevidence began to mount. Phrenology had always had critics, some of them virulent, but by the second half of the century it had come to be exemplary of junk science on a par with reading palms. But here’s the thing: however screwy the specific instantiations of Gall’s theory, and however amusing his story is in popular science venues, in the specific history of neuroscience he is enshrined as one of the foremost pioneers of brain localization.³⁷ We hear loud echoes of organology in Fodor’s influential Modularity of Mind, for instance, who sees himself as a Kepler to Gall’s Copernicus (1983:14-23), and it is surely no coincidence that one of Chomsky’s favored terms is language organ (e.g., 2000a:4).

Copernicus had one big idea, sharply articulated, precisely wrought, and cogently argued, but incomplete and underdetermined and out of sync with the astronomy (and culture) of his time. He set generations of scientists in motion until his big idea was firmly embraced, scientifically and culturally, in later centuries. Gall had one big idea, which swept quickly through science and culture, but was subsequently overshadowed by more successful research programs, and laughed off the stage historically.

There are many differences between Chomsky and these two, especially the belief-beggaring intellectual fecundity of the man. But if we try to hold Chomsky to one big idea, what would it be? Not Universal Grammar alone. Some attempts to defend that notion against Tomasellian declarations of its death, take that stance, reducing it to utter vacuity. Since “UG is . . . the theory of the initial state of the language faculty,” philosopher José-Luis Mendívil (2020) says, “UG exists by definition,” leaving us in an untenable bind: “We can only deny the existence of the initial state of the language faculty if we deny that the language faculty exists.” By “language faculty” Mendívil means only the fact that humans produce such phenomena as Italian and English, while other species do not. If we can read Mendívil’s English, therefore, we cannot dispute his claim. Our knowledge of English came from somewhere. We did not always have it. At some point, we had an initial state.

What Mendívil does not mean, as Chomsky did for a few decades at least, is that the initial state is outfitted with a pro-drop parameter that, set one way, gives us Italian, set another, English. He does not mean that the UG is outfitted with anything very specific at all; rather, he suggests that it can in fact be fully compatible with the general-purpose cognitive dispositions raised by Generative Semantics and characteristic of so many current approaches. But whatever Chomsky’s current view is of some particular parameter, like pro-drop (it is unwise to make presumptions about such details), we know there is something specific about his Universal Grammar, something about the Language Organ, that is not part of a general cognitive repertoire—perhaps Merge, perhaps just recursion, to the extent these notions can be separated, and/or the FLN/FLB interfaces.

So, let’s take his one big idea to be some amalgam of Universal Grammar and something more specific—call it “the computational workspace for combinatoric creativity”—and try to guess who Chomsky will ultimately prove to be, Copernicus or Gall?

The one-revolution view, the view Chomsky usually favors, gives Chomskyan linguistics a decidedly Copernican cast: still under way, based on a key claim and a steady trajectory. The no-revolutions-at-all position, favored by many opponents, including George Lakoff, makes Chomskyan linguistics look like phrenology, Chomsky like Gall: one big mistaken, and ultimately ridiculous, idea that mushroomed into a cultural phenomenon, with no direct and lasting effect on scientific theory or practice.

Is Chomsky Copernicus? Or is he Gall?

Will Chomsky’s work be buried in animosity and ridicule? It’s been tried before, at least as early as The New Grammarian’s Funeral (Robinson 1975), and the rise of Generative Semantics included widespread intimations that Chomsky’s creative pulse had stopped, his role in the field over. It’s being tried now, in articles like Ewa Dabrowska’s (2015) missing-persons report, “What Exactly is Universal Grammar, and Has Anyone Seen It?” and Michael Tomasello’s death certificate, “Universal Grammar is Dead” (2009). Attempts in the past have always failed. But attempts in the past have always had to contend with Chomsky, in his direct polemics and in the indirect polemics of the linguists he has inspired, as well as in his deft changes of the Entire rhetoricol-scientific landscape. His pulse has never stopped. The bicycle is still upright. That cannot always be the case.

If his work is buried in animosity and ridicule, rather than refined and developed into some future account of language with the power and elegance of classical physics, he will prove, minimally, at least to be a Gall. He has been too astonishingly productive for his ideas not to resurface again, independently or in clusters, combined with other theoretical insights and correlated with new classes of evidence—in linguistics, in neuroscience, in genetics, in evolutionary biology, in Artificial Intelligence, somewhere; very likely multiple somewheres.

Is Chomsky Copernicus or Gall? Gall or Copernicus? Who knows? The future is not here yet. But I wouldn’t bet against Copernicus. I wouldn’t bet against Chomsky.