The Domestication of Language

The Conventions of a Human Language

SKEPTICISM ABOUT CONVENTIONS

In order to say very much about David Lewis’s Convention ([1969] 2002), it’s necessary to first say something about the philosophical problem that he was trying to solve there, and its relationship to the one that I am investigating here. Providing that explanation requires a brief description of a small part of the philosophical background against which the book was written.

Lewis was explicitly responding to W. V. Quine’s “Truth by Convention” ([1936] 2004a) and the other things Quine had written on the so-called conventions of our language and their relationship to analytic truth. In turn, in “Truth by Convention,” Quine himself was merely extending an argument first made by Bertrand Russell ([1921] 2010), so it seems reasonable to start the discussion by briefly looking at Russell’s ideas.

In general, Russell seems to have been a fairly orthodox adherent of Democritus’s theory of the origins and subsequent development of human languages. He acknowledged the tradition of regarding the meanings of words as in some sense conventional, but he felt that was true only “with great limitations.” Anticipating Quine, he suggested that this idea is basically just a myth, like the social contract. Certainly, Russell ([1921] 2010) admitted, scientific and technical terms can be adopted by deliberate convention, but

the basis of language is not conventional, either from the point of view of the individual, or that of the community. A child learning to speak is learning habits and associations which are just as much determined by the environment as the habit of expecting dogs to bark and cocks to crow. The community that speaks a language has learnt it, and modified it by processes almost all of which are not deliberate, but the result of causes operating according to more or less ascertainable laws. (138)

Why did he think that the kinds of explicit discussions people have in the sciences were a bad model? He insisted that understanding a word does not and cannot consist of being able to “say what it means.” In fact, that sort of explanation has nothing to do with the real meanings of words. Instead, a person understands the meaning of a word when “(a) suitable circumstances make him use it, (b) the hearing of it causes suitable behavior in him” (Russell [1921] 2010:143). According to Russell, “There is no more reason why a person who uses a word correctly should be able to tell what it means than there is why a planet which is moving correctly should know Kepler’s laws” (144). (But if some planets could tell us why they go around the sun in elliptical orbits, wouldn’t we be entitled to seek an explanation of this capability?)

In this story, we seem to learn the correct behavioral responses to particular words through the same sort of mental habit that leads us to assume that similar causes will have similar effects. We all are constantly imitating imitations of imitations of imitations of imitations, and that’s all there is to human language. There are no public conventions about the meanings of words. There are no conventional definitions. There are no correct or incorrect explanations. There is just this individual mental habit, and the behavior it produces.

Where the meanings of words came from in the first place was a problem that Russell ([1921] 2010) pushed off into the unknowable past: “The association of words with their meanings must have grown up by some natural process, though at present the nature of the process is unknown” (138). At some point long ago, we did something, but it isn’t what people are doing in the sciences now, nor is it something ordinary people are still authorized to do for themselves.

Quine’s contribution was taking this behaviorist theory of meaning to its logical conclusion. In “Truth by Convention” ([1936] 2004a) and “Two Dogmas of Empiricism” ([1951] 2004b), he extended Russell’s doubts about the existence of linguistic conventions to analytic truths, claims that are supposed to be “true by definition.” If there are no real conventional definitions behind our use of language, if that isn’t the right way to explain our linguistic behavior, then, Quine reasoned, nothing is ever true by definition, except in a few artificial and extraordinary situations. In this theory of language, it isn’t necessarily true that all bachelors are unmarried, because our individual linguistic behavior can easily differ in ways that make it false as far as some of us are concerned. In fact, there are no necessary or analytic truths, only beliefs that we’re more or less willing to abandon.

The quotation from Quine’s foreword to Convention that serves as an epigraph to this book preserves some of the flavor of the argument of “Truth by Convention.” That argument revolves around the idea that we tend to think of analytic truths as resulting directly from our own ancient conventions about the meanings of words. But, Quine argued, the meanings behind the analytic truths of logic can’t be purely conventional in any straightforward sense, because the “conventions” of logic are required even to state them. In fact, we can’t truly understand them unless we’ve already accepted them. What’s the use of pretending we ever had a chance to make a deliberate and explicit convention among ourselves about which language we’d speak and what its words would mean that somehow made them true? Why not just say that we seem to accept them as true and seem very reluctant to abandon them, even when faced with what might appear to be contrary evidence, and leave it at that?

He considered the possibility that our own attempts to define words might help establish their public meanings, but Quine ([1951] 2004b) rejected the idea because it seemed to him to reflect confusion about the purpose of the lexicographer’s activity: “Clearly this would be to put the cart before the horse. The lexicographer is an empirical scientist, whose business is the recording of antecedent facts; and if he glosses ‘bachelor’ as unmarried man, it is because of his belief that there is a relation of synonymy between those forms, implicit in general or preferred usage prior to his own work” (35). Does it matter what the lexicographer believes he’s doing? Quine seems to have felt it was important, but Darwin’s theory of domestication might make us doubt that it is.

The idea that we share conventions about the meanings of words doesn’t survive in Quine’s account of language, but the notion of “truth” had better survive, since the acceptance of some sentences as true in certain circumstances is all that gives them whatever meaning they may still have. Without conventional definitions, however, the notion that some sentences are “true by definition”—true because of the very meanings of the words they contain—can’t survive either. Somehow the things we say generally must be unambiguously true or false, even though nothing we say is ever necessarily true, true by definition. “Snow is white” must be unambiguously true, even in the absence of any shared, public conventions about what “snow” is or what counts as “white.”

Quine did try to give us a sort of substitute for necessary or analytic truths. In “Two Dogmas of Empiricism,” he observes that some of our beliefs are more “central” and some are more “peripheral,” more and less subject to revision in the face of new data. In the complete absence of any public conventions about this, however, nothing seems to prevent individuals from assigning to their beliefs idiosyncratic degrees of centrality or peripherality, and nothing seems to force us to revise our beliefs similarly on the basis of similar evidence. One of us may regard the claim that bachelors are unmarried as central and the claim that John is a bachelor as peripheral, while another regards the claim that John is a bachelor as central and the claim that bachelors are unmarried as peripheral and easily abandoned. Finding out that John is married will produce different conclusions about his bachelorhood for each of us.

Without public conventions about the meanings of words, we might worry that each individual is free to understand every sentence he hears or speaks in his own idiosyncratic way and to reinterpret or twist the words in it in any way he pleases. It seems that each person can have his or her own strange idiolect, which can drift arbitrarily far apart. One, it would appear, is as good as another. In the absence of public conventions, it also isn’t clear how our language can be repeatedly transmitted from generation to generation without our individual ways of using words being completely randomized over time by noisy transmission. But then why should we be confident that we can identify “the” truth conditions of particular sentences or say which synonymies do and don’t “exist”? If we can’t, then it seems they don’t mean anything in particular.

These consequences, or something like them, must have seemed so unpalatable to Lewis and his philosophical generation that he, at least, found himself forced to question Russell’s original rejection of the whole idea of public linguistic conventions and, with it, the validity of Quine’s paradox of the syndics. Is it really true that there’s no way a group of people can ever tacitly arrive at a convention about something? In the real, material world, couldn’t one of the syndics just raise an eyebrow and say “Table … ?” in a way that obviously proposed this as a name for the table they were all seated at, and couldn’t the other syndics just glance at one another and nod in unison? Wouldn’t that establish a convention? If all that is possible, then maybe the rather extravagant philosophical consequences that Quine drew from his modernized version of Plato’s paradox of the first nomothetes, his doubts about the very existence of necessary or analytic truths, the meanings of individual words, and so forth were, in the end, not warranted. The prospect of escaping from such odd conclusions by such an apparently simple move must have seemed very attractive.

For the attempted escape from Quine’s paradox of the syndics to succeed, what the syndics actually did in thus tacitly agreeing to adopt the convention that the word table would stand for tables, and not chairs, would have to be thoroughly analyzed. The stroke of genius behind Lewis’s Convention was his realization that Thomas Schelling’s (1966) new ideas about “focal points” in “coordination games” could serve as the basis for a theory of how that might work.

Lewis’s strategy was to try to understand conventions in general, and linguistic conventions in particular, as basically being conserved equilibriums in coordination games, built around established precedents. This allows the conventions to be tacit, while making our choice to follow them still a fully rational one. We often choose to follow the practices we see others following because it makes practical sense for us to do so, but we don’t need a written account of the whole system of rules or any overarching rationale to make that rational choice. We just have to be able to figure out what’s expected of us in particular situations well enough to be able to produce the right behavior most of the time. Those other things are still useful and may be acquired at a later point, but it’s the mere ability and willingness to conform to existing conventions that is the minimum requirement for successful play. In trying to conform, we may accidentally introduce variations, so our frequent inability to give a general and precise description of the regularities that our behavior must display in order to conform to the reigning conventions might even be, from an evolutionary point of view, a creative force in their elaboration and sequential modification.

COORDINATION GAMES AND PRECEDENTS

What is a “coordination game”? It’s sometimes useful to divide games into three general classes. Pure noncooperative games are games like poker and “chicken,” in which the players have no significant common interests. These games are relatively easy to analyze because they often have a unique equilibrium: everyone simply does his or her worst, knowing that all the others will do the same.

What’s the opposite of a noncooperative game? It seems obvious that it should be a “cooperative game,” but in fact that’s only partly right. There are at least two very different alternatives to noncooperative games, because there are two different things missing from a noncooperative game: there are no common interests, and there are no contracts. Although common interests are required to motivate contracts, contracts aren’t a necessary condition for the existence of common interests. We may end up coordinating our actions, in a mutually beneficial interaction, either with or without the security of an enforceable contract, but these are two very different kinds of interaction.

Two people playing a zero-sum game have no reason to enter into a contract with each other, because there’s no situation in which one wins without the other losing. Anything that one player is willing to agree to, the other should refuse. This situation can be altered in two different ways: either by giving the players common interests and the ability to commit to binding contracts, or by giving them only common interests. Games of the first kind are true “cooperative” games, games in which players can commit to binding contracts regarding their actions in the game before the start of play. Since there usually are lots of contracts they might adopt, corresponding to all the ways in which they could agree to divide the winnings, our questions about the likely outcome tend to become questions about the process by which the contracts are negotiated and coalitions are formed.

Games in which the players do have common interests and genuine choices to make about how to collectively pursue them, but can’t commit to binding contracts, perhaps because they lack any mechanism for enforcing them, are what Lewis ([1969] 2002), following Schelling (1966), called “coordination games.”

The existence of multiple possible equilibrium outcomes (multiple pure strategy Nash equilibriums, to be precise) is also part of our definition of a coordination game, because otherwise the problem of coordination becomes trivial: the players make the only possible choice. If we both need to get across a river, and the only available boat requires both of us to row it, then presumably, if we’re rational, we both get in that boat and row. There’s no coordination problem to be solved. Whether or not we’ve entered into a contract to use the same boat is moot. A coordination problem arises only when there’s more than one boat, and we must collectively figure out which boat to head for.

In a game of pure coordination, the players have all their interests in common. In a game in which the players have more mixed motives, an “impure” coordination game, they have some common interests and some divergent ones. For example, they may be able to procure a prize only if they coordinate their actions, but there may be various ways of dividing it up, or they may be jockeying for position in traffic. If each of us is the same distance from each boat, then picking a boat to converge on is a pure coordination problem. If some boats are more conveniently located for one of us and others are more convenient for the other, then it’s an impure coordination problem.

It’s here, in dealing with coordination problems, that we encounter a very puzzling and very revealing phenomenon, a phenomenon with much to tell us about human nature at its most distinctive. That phenomenon is the obvious and its invisible but tyrannical power over human affairs.

What gives obviousness its strange power? Let’s say we can’t see which boat the other party is heading for, but we have to meet at the same boat, and one of the seven available boats is bright red, while the other six are all blue. Then the red boat is the most obvious choice, so the easiest way to solve the coordination problem is for both of us to head for it, assuming that the other party will also head for it, because she expects us to head for it, because she expects us to expect her to head for it, because she expects us to expect her to expect us to head for it. This uniquely distinguishing information is so useful that we’re both likely to head for the red boat even if it’s somewhat farther away from one of us than one of the blue boats and even if that choice is somewhat inconvenient for one party. Clearly, if the problem recurred in this form over and over, it would be easy to establish a tradition of always heading for the red boat.

One of Schelling’s simplest examples of a coordination game is “divide the dollar,” a game in which two players can share a dollar if they can agree on how to divide it. This isn’t a “pure” coordination game, because the players have both common and conflicting interests. Like the problem of converging on the same boat, this is a game of mixed motives, what I’m calling an “impure” coordination game. To get anything at all, the players must coordinate their actions, but within that constraint, they have conflicting interests. Theoretically, one player could hold out for ninety-nine cents, reasoning that the other player would still regard a penny as better than nothing. Schelling observed that human players seldom actually divide the dollar in that way, however. An even division seems to be much easier to agree on, even though it’s only one of many possible equilibriums. People arrive at this outcome even if they’re not allowed to negotiate or communicate before putting in their bids, presumably through some process of tacit bargaining in which each individual imagines the other individual choosing a percentage, on the basis of the way he imagines that the first individual will choose, on the basis of his expectations about the way the first individual will expect him to choose.

Somehow its very obvious obviousness, its unmistakable salience, the fact that it’s the very first solution anyone would think of anyone thinking of, gives an even division an attractive power that’s difficult to overcome. Since any division is better for both players than nothing, the most important problem they face is both arriving at the same division, and the lack of a sufficient reason to deviate from the most obvious one makes an even split the choice they both tend to default to. This tiny, apparently ineffectual, inconsequential power, which seems like almost nothing—the power to be the very first idea that happens to occur to people and, perhaps more important, the very first idea anyone would expect anyone to expect to occur to anyone—in fact completely determines the outcome of the game, just as a very similar form of obviousness completely determined which boat we both would head for.

Schelling argued that this attractive power of the obvious would persist even if the players were allowed to bargain explicitly. Each player would expect the other player to expect him to expect her to be unimpressed by arguments that were supposed to show that he should get more than half, and they would still need to converge on one of the many possible outcomes without an inordinate amount of wrangling and confusion, so that’s still where they would end up. Human players, even human children, can often converge on this sort of coordination equilibrium without having to have all this explained to them or needing to discuss why it’s the logical choice or, if asked, even being able to explain why it is.

Another example of the power of perceived obviousness is the solution that two people might arrive at who are given the assignment of meeting in New York the next day but are forbidden to contact each other in advance to discuss where or when. Many of us would go to Grand Central Terminal at noon and, once there, would look for a conspicuous landmark—the information booth, perhaps—to stand next to. We might well succeed in tacitly coordinating our locations just by picking the most obviously obvious time and place, the place we might reasonably expect the other to expect us to expect him to expect us to be.

To accomplish this, we must do the most obvious thing we can think of. By doing a few simple experiments, Schelling discovered that humans are quite good at this sort of convergence on the obvious coordination equilibrium. They’re quite good at anticipating which of a range of possible choices will stick out in a conspicuous way to everyone involved and tend to expect one another to take that one, all other things being nearly equal, because it’s the most obvious choice, and coordination on some choice is often practically important.

Once our friend knows we’ll be looking for him, his own complex and unknown preferences aren’t what matters. He might prefer to be sitting down somewhere, eating lunch, but we can reliably expect him to expect us to expect him to stand next to the information booth at noon, even if that’s slightly inconvenient for him. If he didn’t know we were looking for him, or he didn’t know that we knew he knew, he wouldn’t expect us to expect him to expect us to expect anything, so finding him would become much harder.

In solving these sorts of problems, people depend on the existence of common knowledge. Even though my friend might have expected me to want to go to the Metropolitan Museum, he can’t be sure I will have expected him to expect that; he may not be sure that I know that he knows that that’s what I’d prefer, or would expect him to respect my preferences rather than his own. But we both know that it’s a generally admitted and widely acknowledged fact, known by all to be known by all, and so certainly known to both of us, no matter who we are or what else is going through our minds, that Grand Central Terminal is an extremely obvious and central place, so that’s where we each expect the other to expect us to go. If we both want to choose the same point to converge on, without any consultation, it’s easiest to rely on such things, on items of knowledge that already have an independent public existence, that already are common knowledge and generally known to be common knowledge, instead of trying to make up new, shared higher-order expectations on the fly.

In this example, one way of solving the problem seems like the most obvious choice, but sometimes none of the boats are red, or one is red and one is yellow. When it isn’t completely clear where any rational person would expect any rational person to expect any rational person to expect to meet, or what choice anyone would think of as most obvious, something like a conserved precedent is necessary to resolve the tie. The power of a referee or mediator, Schelling suggested, must often come from the same source. By making one way of settling a dispute the most obvious one, a third party can make it unavoidable by making agreement on anything else impossible.

As the number of players involved in a coordination problem increases, coordination can easily become more difficult, since each player has fewer resources to devote to anticipating the actions of each other player, and that other player’s expectations about the other players’ expectations about what the other players will do, and more chances to be wrong. The number of possible states of a game like this, a game involving Lewisian higher-order expectations, undergoes a combinatorial explosion as the number of players rises, since every permutation of all the possible expectations, and expectations about expectations, of all the many players might need to be considered. The potential for coordination problems to become complex is one reason that people often need to find some uniquely obvious and unmistakable focal point to converge on. This is one reason it matters what’s obvious.

In any m player game, even if there’s only one action that the players can take, there are (m − 1)ⁿ beliefs that each player might adopt regarding the nth-order expectations of each other player about who will or will not take that action. In an eleven-player game requiring the attribution of second-order expectations, I must choose from among a hundred possible second-order expectations that I could attribute to each of the other ten players. I may believe that any one of the other players will expect any one of the players aside from herself to expect any one of the players aside from themselves to do something. I might suppose that Joe expects Frank to expect me to do x or that Sally expects Sam to expect Sally to do it. A hundred such sequences are available to me for each player, a hundred for Joe, a hundred for Sam, a hundred for Sally, and so on. Each can be permuted with any of a hundred such sequences for each of the other players. Consequently, my total set of options for simultaneously making such assignments to all ten other players includes 10²⁰ distinct possibilities, even without attributing multiple expectations, or multiple possible actions, to single players. Since the universe is less than 10¹⁸ seconds old, in actual interactions among the members of moderately large human groups, there obviously is no time to consider every possibility.

Some way of drastically simplifying the problem of anticipating what the others will do, and expect us to do, and expect one another to expect one another to do is clearly needed if we’re to succeed in coordinating our actions. The power of obviousness (the power of everyone’s expectations about what any person whatsoever would expect him to expect anyone else to expect) in determining which possible equilibriums are realized should only grow as the number of individuals trying to solve a coordination problem increases. Without it, the odds of converging on a solution that’s optimal in any sense should only decline.

Why must men wear neckties in formal situations? Partly because we all want to do what everyone else is doing, and there are too just many of us to make negotiating a change in the conventions about men’s clothing practical at any given moment in time. A conspicuous or authoritative individual—Beau Brummell or the king—might succeed in changing the fashion anyway, however, if he can find a clever way to manipulate what seems obvious to everyone.

Precedent alone is often sufficient to make one of several possible ways of doing things seem more obvious than its rivals, especially when large numbers of people must coordinate on a single choice. To borrow an example from Lewis, once people start driving on the right-hand side of the road, each time you have to decide which side of the road to drive on, your goal being the avoidance of head-on collisions with other cars, the most obvious assumption is that all the other people will assume that all the other people will assume that the safest bet is to assume that everyone else will continue driving on the right, and will themselves continue driving on the right, so you had probably better drive on the right as well. Sometimes when one side of the road is very crowded and the other side is almost empty, it might seem that it would be more efficient to bend this rule a little, but in fact the outcome of such improvisation, if everyone were free to engage in it, could very easily be gridlock or chaos.

Schelling called this sort of arbitrary but obvious clue to successful coordination a “focal point.” Conventions, as Lewis ([1969] 2002) conceived of them, involve focal points deriving specifically from historical precedent, which must meet the following five additional conditions:

A regularity R in the behavior of members of a population P when they are agents in a recurrent situation S is a convention if and only if it is true that, and it is common knowledge in P that, in almost any instance of S among members of P,

1. Almost everyone conforms to R;

2. Almost everyone expects almost everyone else to conform to R;

3. Almost everyone has approximately the same preferences regarding all possible combinations of actions;

4. Almost everyone prefers that anyone more conform to R, on condition that almost everyone conform to R;

5. Almost everyone would prefer that anyone more conform to R', on the condition that almost everyone conform to R', where R' is some possible regularity in the behavior of members of P in S, such that almost no one in almost any instance of S among members of P could conform to both R' and R. (78)

For example, (1) almost everyone in North America drives on the right-hand side of the road; (2) almost everyone expects almost everyone else to drive on the right-hand side of the road; (3) almost everyone cares very little about whether people drive on the left-hand side of the road or the right-hand side as long as almost everyone does the same thing; (4) almost everyone prefers that each other driver adheres to the same rule of driving on the right-hand side of the road as long as everyone else is still doing so; and yet (5) if we all drove on the left-hand side, almost all of us would prefer that each other driver drove on that side too. And all of this is common knowledge, so we can safely assume that most of the other drivers on the road know it, and know that we know it, and know that we know that they know that we know it.

A convention like this is a tool for coordination, a device to allow the awkwardly vast number of possible ways of coordinating our behavior and expectations to be filtered down to a few conventional possibilities. People adopt conventions because they’re useful, because without them they would have to think about too many things, because without the convention of driving on the right we would instantly have a traffic jam. This doesn’t mean that once accepted, they always stay useful forever. This is the way human social life often seems to be organized: it’s a set of incredibly complex coordination problems that we deal with partly by conserving and incrementally improving a set of obvious, though sometimes arbitrary, solutions.

Convention, in this sense of the word, is a pretty good model of the way that a simple signaling system might work in a human population. Suppose there are two possible states of affairs (the English are coming by land or by sea) and two possible signals (hanging one or two lanterns in the belfry) and two possible actions by the receiver (blocking the land route or the sea route). Suppose also that the sender would prefer the receiver to take the appropriate action, and that the receiver would prefer the same thing, so they have common interests. Finally, suppose (contrary to history) that this is a recurrent coordination problem that many different pairs of senders and receivers in a population encounter over and over again.

If there’s a famous precedent of hanging one lantern in the belfry if the English are coming by land and two if by sea, this can become a convention if almost everyone goes on doing things that way because the precedent makes it the obvious choice, if almost everyone expects that almost everyone else will go on doing things in that way, if almost nobody cares whether it’s one lantern or two that indicates that the English are coming by land, if almost everyone prefers that each other person use one lantern to indicate a land route and two to indicate a sea route as long as most people are still doing things in that way, if almost everyone would prefer that each other person used two lanterns to indicate a land route and one to indicate a sea route if, counter-factually, most people did things in that way, and if all this is common knowledge, so that everyone can safely assume that most other people know it, and know that they know it, and know that they know that most other people know it.

The system can persist through tacit coordination because everyone is trying to anticipate what everyone else will do, and expect them to do, and expect them to expect others to do, and will conform without needing to have the rules explicitly described to them if they can manage to guess, by observing the behavior of others, what’s expected of them in particular situations.

Perhaps some people have only a limited vocabulary. If the British are invading by sea and two lanterns are lit, they’ll go out to meet them, but if only one lantern is lit, they’ll stay home and do nothing, thinking that’s normal. We would not consider these latter people to be fully competent speakers of the language. For a fully competent speaker to be a true member of the population in which some convention holds, he must, in some sense, know that it holds. A member of that population, the population defined by common knowledge of the convention, “knows” the convention in the sense of being party to it, knowing how to follow it, and expecting others to know how to follow it, and expecting them to expect him to know how to follow it.

Since the purpose of this particular signal is thwarting the British in a way that may be quite risky for those who do go out to meet them, and may be made more risky by the failure of those who don’t understand the signal to show up, other members of their community may end up being quite annoyed at the people who do nothing whenever they come by land, and may reproach them. The convention itself could always be otherwise, but compliance with the existing arbitrary convention may still be obligatory.

THE DYNAMICS OF CONVENTIONS

If conventions evolve over time, they must come and go. The existence of alternatives—driving on the left or driving on the right—makes this seem somewhat plausible. But what does a Lewisian convention look like as it’s coming or going?

Lewis ([1969] 2002:78–80) tells us that there are basically six ways a regularity, R, can cease to be conventional, or not yet quite be a convention, in a group. That we ought to do R—that when a phone call is cut off, the original caller ought to be the one who calls back—might not yet be commonly known by the members of the group of people who are supposed to be party to the convention. If it isn’t Lewisian “common knowledge” among them, if it isn’t true that almost everyone knows that almost everyone knows that’s what you’re supposed to do, obviously it can’t be one of their conventions. But there are five more ways a practice could fail to be fully conventional. Everyone might know that everyone knew that R was supposed to be the custom and yet

1. Many people might not choose to conform to R anyway—they might just try to call back no matter what because they’re impatient.

2. Many people still might not expect almost everyone else to choose to conform to R—they might not know if the other party would really call back when she was supposed to.

3. Many people might not have approximately the same preferences regarding all possible combinations of actions; many people might greatly prefer to conform to R'. For some reason, a large part of the population could greatly prefer that the person who was originally called be the one who was supposed to call back, perhaps because it showed willingness to continue the conversation.

4. Many people, while preferring to conform to R themselves, might also prefer that other individuals not conform to R (not go to Coney Island on Sunday), even if almost everyone else conformed to R, or might be indifferent.

5. Many people might not prefer that other individuals conform to R' (go to Fire Island) even if almost everyone now at Coney Island were to conform to R' (to go to Fire Island instead).

For example, suppose some new notation is introduced for logicians. If nobody knows about it, or if those who know about it don’t know that others know about it, or don’t know that others know others know about it, if the new notation isn’t yet Lewisian “common knowledge,” its use can’t yet be fully conventional. If everyone knows they’re supposed to use it, but nobody thinks it’s worth the effort involved so nobody bothers, its use isn’t yet conventional. If nobody expects anybody else to use it, for this reason or some other, its use isn’t yet conventional. If many people would rather not start using it, its use isn’t yet conventional. If most people who use it would prefer that others not be able to use it or understand it, its use isn’t yet conventional for the community around the clique of users. If most people prefer that whatever notation they used, this or any other, remained the exclusive property of a clique, then no notation, and in particular not this notation, can yet be fully conventional.

All these obstacles to full conventionality might later be removed. Lewisian “common knowledge” of the new notation might spread; people might begin to use it themselves; they might start to expect others to use it; they might begin to prefer to use it rather than the alternatives; they might begin to prefer that others use it; and they might become willing to share with the whole community whatever notation or idiom they used. Or a conventional notation might cease to be conventional by losing one or several of these properties, as the logical notation of the Principia Mathematica has done in the century that’s passed since its introduction.

Individuals can become party to a new convention by meeting the conditions implied by the six criteria, or fall out of the community of full parties to a convention by ceasing to meet one or more of them. They may still continue to follow the precedent out of a sense of moral obligation, or without conscious thought, or in some other way, but once their participation in any of the six ways described—knowing the precedent and knowing that others know it and know that they know it, voluntarily choosing to conform to it, expecting others to choose to conform, preferring to conform if others do, preferring that each other individual conform if everyone else is conforming, and preferring that they conform to some other precedent if that was the convention—ceases to be self-interested and deliberate, they are no longer adhering to the convention as a convention.

Each way of becoming, or ceasing to be, a party to a convention requires deliberate human choices or preferences or knowledge, or more or less rational expectations about the choices, preferences, and knowledge of others, even—in the case of logical notation—the choice to become familiar with it, which in this case involves a little labor and, for some people, may require the participation of a teacher. A dynamic version of Lewis’s model, in which conventions come and go and therefore evolve, would presumably involve these six processes, either moving forward or going in reverse.

Since all of them involve human knowledge or preferences or rational human choices or expectations, it’s hard to see how any evolutionary model other than domestication could be consistent with this theoretical framework. For example, both Russell’s ([1921] 2010) behaviorism and Daniel Dennett’s (2009a) “synanthropic” model of human language seem to be clearly incompatible with it. We can’t both choose to conform to some convention and conform to it inadvertently, having learned it through an unconscious process of association. We also can’t adhere to it voluntarily if we have no choice about whether or not to be infested by it. It’s we who adopt conventions and we who abandon them, just as it’s humans who adopt or abandon dogs.

OUR KNOWLEDGE OF CONVENTIONS

Nevertheless, we shouldn’t overestimate the amount or kind of knowledge necessary to be a party to a convention. Even people who are full parties to the conventions of a particular language or dialect, who are able to use and respond correctly to all the community’s conventional signals, may still have many different degrees and kinds of knowledge. To be party to a convention, we must “know” it in the purely practical sense of being able to conform to it if we so choose, but Lewis ([1969] 2002) warns us that this implies that our knowledge of conventions may be “quite a poor sort of knowledge” (60–68).

We can “know” a convention, he argues, if we’re merely in a position to believe that it holds, should the question ever come up, but haven’t yet formed any such belief. We can know conventions in irremediably nonverbal ways or on the basis of evidential justifications that we could never describe or report; and we may know how to follow a convention on the basis of our knowledge of the many particular things or situations that fall under it, without being able to assemble that knowledge into the sort of general claims about categories or kinds of thing that would allow us to describe the convention we’re following in general terms. Borrowing terms from Abelard, Lewis tells us that we may know the convention in sensu diviso rather than in sensu composito.

We may know in some inchoate way that all the adults in our small town habitually drive on the right side of the road that goes by our home, without knowing that all drivers on all roads in our country must drive on the right. Knowledge of the features or behaviors of particular drivers, however many, doesn’t directly support inductive inferences about which features drivers in general necessarily must have. There may be a way to get from the first kind of knowledge to the second, but some additional hypothesis about the existence of generalizable classifications such as “driver” and “right side of the road” is required in addition to a list of the proper names of particular persons and locations with their many individual features, and this is precisely what the knower of a general convention in sensu diviso lacks. He’s capable of recognizing all the cases to which the convention applies, or all the ones he’s likely to encounter, and he knows what everyone knows one ought to do in each case, but he can’t explain which common features make the cases the same. He doesn’t quite know which criteria of identity (in the sense of the term explained in Slote [1966], see Lewis [(1969) 2002:24n.4]) are the practically important ones.

The kind of sensu composito knowledge called for in this example is so trivial that only a child could lack it, but we should remember Justice Potter Stewart’s definition of pornography: I can’t intelligibly describe it, but “I know it when I see it.” Many of us would say more or less the same thing about the ethical way for a member of our community to behave, or what we should accept as fine art, so not all our sensu diviso knowledge concerns trivialities, or things whose correct description is a mystery only to children. “What, then, is time?” Augustine asks in the Confessions. “If no one asks me, I know; if I want to explain it to someone, I don’t know.” Yet we all manage to use the word, so we must know its meaning in sensu diviso.

It might seem that this is stretching the meaning of the word knowledgex. We’re used to thinking that anything we know, we can declare—but do we really want to take the position that a person doesn’t know what the word time or because means, even if he can conform perfectly to the conventions governing its use, unless he can also state them explicitly? Do we really want to say that nobody knows what the word know means unless she can give a full and explicit account of its meaning? The claim itself seems to be self-undermining: wouldn’t we have to know what know means in order to conclude that we don’t “know” what it means?

In that case, almost nobody knows those words, so it’s odd that we all can use them and spot mistakes in others’ efforts to use them correctly. (“You can’t say you’re late ‘because’ it rained, you started late so you would have been late anyway.” “You don’t know that, you haven’t even looked yet.”)

It seemed better to Lewis to say that we know their meanings in a certain inchoate way, know them but only in sensu diviso, since this is the only way past Quine’s supposition that a convention can be adopted only if those choosing to follow it can state it explicitly and all the unpalatable consequences that seem to flow from that assumption. It seems better simply to accept that rational choice doesn’t necessarily presume declarative knowledge. In fact, things seem to be the other way around, at least for language. We know the meanings of words. We can choose between the word know and words like believe or hope or suspect in a more or less rational manner. First, there must be knowledge and a word for it, and words for its alternatives, and some inarticulate sense of the nature of this thing, “knowledge.” Producing a final, explicit, accurate full description of the meaning of the word knowledge is a rather late step in the process of coming to know what that word means, one we haven’t actually reached yet in this particular case.

It isn’t clear why this should surprise us. First there had to be gold, and a word for it, and words for brass and bronze, and some knowledge about this thing, “gold.” Our current explicit definition of gold as “the element with atomic number 79” could have come along only after many centuries of successfully using the word to refer to gold. Still, Pharaoh knew what his language’s word for “gold” meant, as anyone who tried to sell him brass while calling it gold would soon have discovered.

THE LANGUAGE OF A HUMAN POPULATION

However we know it, a set of conventions about hanging lanterns in windows to indicate which route the British are taking is a signaling system. A verbal signaling system could also be constructed along the same lines. According to Lewis ([1969] 2002), such a system is a language with the signals as its sentences, but it’s a rudimentary language, much simpler than the language of any actual human population. A fixed, finite set of signals can be sent to identify a fixed, finite set of situations. Each situation should be responded to in one particular way, or at least with one particular discretionary contingency plan, and both parties want that equally. There are no questions, there’s no conversation, and nothing new can ever be said. There’s no syntax.

At this point, what the verbal signaling system already can have is an interpretation. For each signal of the system, this consists of a pair <µ, τ> of a mood and a set of truth conditions.

As examples, Lewis picks two moods, the declarative and the imperative. In this rudimentary language, signals activate contingency plans on the part of the hearer, and result from contingency plans themselves: If you see an s, say “σ” or if you see an s, say either “σ” or “ν,” or something like that. The difference between the declarative mood and the imperative, he argues, depends on the relative amount of latitude in the sender’s and receiver’s contingency plans. On observing the British coming by land, if the sender has no option but to put one lantern in the belfry, but the Minutemen, once warned by this signal, have a variety of responses they might take, the signal is a declarative one. But if the sender has several choices about the number of lanterns that he might hang, depending on what his overview of the situation suggests is best, and the Minutemen are restricted to responding to each of the possible signals in a particular way, then the signal is imperative.

The truth conditions of a declarative sentence are just the set of ways the world could be that would make it true, the set of “possible worlds” in which it’s true. To determine whether it’s true of the actual world, we construe the words of the sentence as referring to things, or sets of things, in the world around us. We have to fix their reference by correctly relating kind-names to instances of that kind of thing in the world, interpreting indexical terms like that, here, and my as referring to particular individuals, things, or places and so on. The speaker’s declaration “My gold earrings are in the upper-right-hand drawer” must be related to a particular set of small metal objects boxed up inside some particular planes of wood. Even a sentence like “If Nixon had never been president, he never would have been impeached,” which is about events in a merely possible world, still requires us to look through the history of our own world for the referent of the name Nixon and then to imagine that world changing around him. Once we’ve interpreted the sentence as a claim about particular objects, events, and so on and determined whether it’s true in the real world, we can start to figure out what other sorts of worlds it could be true in, keeping the earrings and the drawer, or Richard Nixon, fixed in our mind and allowing the world to change around them. So as I am using the word here and in subsequent chapters, the verb or action, interpretation, is the process of arriving at an interpretation, the thing, <μ, τ>. It is the process of finding the right things in the right worlds, and figuring out what other worlds the sentence would still be true in.

The consequence of a true declarative sentence being uttered, heard, and believed is that our shared conception of the world we live in (and/ or what other possible worlds are like) is altered. We come to believe that the specific pieces of yellow metal being referred to, the ones we saw yesterday, are indeed boxed up inside the glued-together pieces of wood. What to do about that is up to us. It’s the sender’s responsibility to make sure a declarative sentence is true, to send it only when the actual world, or the possible world under discussion, is one of those in which it would be true.

For Lewis, the truth conditions of an imperative sentence are also the set of worlds in which it’s true. Now, however, it’s the receiver’s responsibility to make sure the sentence is true, to make the actual world into one of those worlds. When I say “Scalpel!” in the imperative mood, I expect to soon inhabit a world that contains a scalpel in close proximity to myself, and will be disappointed if I don’t eventually see one. I’m not modifying our shared picture of the world by uttering this instruction; instead, I’m asking you to modify the world itself.

What gives the human speaker an incentive to make declarations only when they’re true of the actual world, and what gives the human receiver an incentive to make the actual world into the one described by the speaker in his instructions, is the existence of common interests, which give the speaker the authority to direct or to declare, and the receiver an incentive to obey or to believe. In the absence of a common interest in the instruction being obeyed, Lewis feels that orders, since they presumably must be enforced with threats, are better interpreted as promises, a type of declaration. (“Get me a brick, or I promise you’ll be sorry.”)

Since the concept of “common interests” lies at the roots of our ability to communicate in this very constitutive way, it would seem to be fundamental to our whole way of thinking about human uniqueness. A sense of having common interests in the performance of shared tasks is precisely what experts on their behavior like Michael Tomasello (2008:38– 55, 172–85) tell us that chimpanzees most strikingly lack. Lewis’s way of conceiving of the common interests involved in adhering to a convention, however, is more complicated than it might appear.

How similar must our interests be? It may be good for you if the order is obeyed or if we both adhere to the convention, and even better, much better, for me. Even though you may be threatening me with retaliation if I disobey, both of us might prefer that you didn’t have to carry out your threat, so even credible promises of harm can create common interests. Most people may follow a convention because they want to, while a few do so only because they fear punishment for unilaterally deviating. Do all these people have a common interest in following the convention? To what extent must interests be identical, or similar, for a signaling system of this Lewisian kind, or any other set of Lewisian social conventions, to exist? If some conventions are protected by the threat of punishment, are they still merely conventional? For example, isn’t the speaker’s obligation to say things that are true in his language part of the social contract, rather than part of the conventions of the language? Don’t we sometimes punish fraud and perjury? In doing so, are we punishing the mere violation of an arbitrary convention? Does a liar merely violate an arbitrary convention?

Lewis had an answer for these questions. He explained that the interests of the parties to a convention do not need to be exactly identical, as they would have to be in a pure coordination game, but only “nearly identical,” in the sense that no one individual could do better for herself by not participating at all, by simply not coordinating her behavior with that of the others, by just refusing to play by the rules ([1969] 2002:13–15, 88–97). Speaking Italian must at least be better for each person who wants to participate in the conversation, given that everyone else is speaking it, than not participating at all, in order for it to be a convention among them that the conversation will be conducted in Italian.

Conceivably there are subgroups of more than one individual—up to and including the whole group—that would be better off if none of them adhered to the convention. If all the workers in an unsafe mine stayed home, they all might be better off, being in a better position to negotiate for safer conditions, but nobody wants to stay home all by himself and end up being fired while the others work. And as long as we, and most of the others, are still going to work, we want each other worker to be there as well, in order to make the job less dangerous. So some situations that are very bad for some or all of us can still be situations in which we’d all individually choose to stick to the governing convention. As Lewis ([1969] 2002:92) says, sometimes we’re all trapped by a convention. Once created, conventions may be very hard to escape and very hard to uproot.

It’s in these sorts of cases that a convention can be inconsistent with the social contract. The social contract itself, according to Lewis, differs from a convention in that we all would prefer that everyone adhere to it, that everyone refrain from murder, whereas a convention requires us only to prefer compliance to unilateral nonconformity by ourselves, or by any single other individual, given that everyone else is conforming. We prefer going to work in the mine to being the odd man out, and if we all are working, we won’t want other individuals to be idle, but it would be much better for everyone if nobody went to work, if we all went on strike.

This makes the regularity of continuing to work in the mine very different from the regularity of telling the truth in a human language, since it would be difficult to make the case that we all would be better off if we all lied all the time, or all told the truth in our own distinctive brand of Italian, in which the meanings of the words were gerrymandered in whatever way happened to be most beneficial to ourselves. In fact, almost all of us would prefer that everyone told the truth in a fairly standard version of the language of our community, a preference that seems to give conventional human languages a special relationship to the social contract. The social contract doesn’t mandate that we speak Italian rather than Welsh, but it does have a special relationship to the general understanding that when we represent ourselves as speaking Italian, we should speak the same version of that language as everyone else does, and tell the truth in that version, instead of lying.

As the example of driving on a public road suggests, many conventions are associated with clubs, which use penalties or exclusions to prevent nonconformists from participating in the activity governed by the convention, or to force them to conform if they do participate. But the convention of driving on the right still isn’t the social contract. If someone started a political movement that advocated changing to driving on the left, it wouldn’t be treasonous, because there would be no attack on peaceable and orderly coordination among the members of society in general. If the movement succeeded and we made the change, some people would be inconvenienced, but life would go on. In contrast, unilateral noncompliance, driving down the wrong side of the road at high speed, is likely to result in negligent homicide, so it is inconsistent with the social contract.

The real difference between a set of conventions and a social contract, Lewis argues, is that the set of conventions has this sort of alternative—driving on the left rather than the right, or speaking Welsh rather than Italian—in which players are still coordinating, an alternative that is, in his sense, “almost equally good” for almost everyone. There is an alternative that is still better than unilateral violation, still better for almost everyone involved than being the odd man out, even though for some members of the community, it may in fact be considerably better or worse in an absolute sense than the convention we’re actually following. In contrast, the social contract has no alternative except a Hobbesian hell. We prefer that people drive on the right because it makes coordination possible in a complex world, so the preference is conditional on everyone else doing the same thing. But we care about the prohibition against murder for its own sake, no matter how many people happen to be doing it.

The rules of the club of drivers derive their authority from the social contract, not because they’re among its actual clauses, but because there must be some set of rules if the club is to function in a way consistent with the maintenance of the social contract. (If the functioning of the club is inconsistent with the maintenance of the social contract, if it’s a gang of criminals, that contract might actually forbid us to follow its rules.) We accepted our driver’s license with the understanding, between ourselves and the other drivers, that we would obey the rules of the road, whatever they might be. The convention of driving on the right is just a convention, but it is nevertheless a binding one. There are always conventional annexes to the basic social contract.

Since most of us do want to continue using whatever language our community uses, and since the use of some language or other is indispensable to maintaining the social contract, we must remember that we’ve been admitted to the club of informative speakers of that language with the contractual understanding between ourselves and the other speakers that we would adhere to its conventions, not to other made-up ones, and are not supposed to surreptitiously or flagrantly violate them by, say, using words in very nonstandard and self-serving ways or quibbling invidiously after the fact about the meaning of some word we used, or by uttering falsehoods or culpably misinterpreting what’s said to us, without warning anyone that this is what we’re doing (Lewis [1969] 2002:97–100, 177–95): “A convention of truthfulness in L is a social contract as well as a convention. Not only does each prefer truthfulness in L by all to truthfulness in L by all but himself. Still more does each prefer uniform truthfulness in L to Babel, the state of nature” (182). Lying in English is a violation of an obligation we have undertaken voluntarily by choosing to speak English. Willful or negligent misinterpretation is a violation of obligations we’ve voluntarily undertaken by presenting ourselves as competent members of an English-speaking audience. Telling the truth and faithfully interpreting what’s said to us are part of the rules of the road.

This relationship to the social contract, and the mechanisms of enforcement associated with many conventions, suggest that thinking of the Lewisian conventions of a human language as simply being equilibriums in a coordination game isn’t entirely correct. Cooperative games are precisely those games in which players can make binding contracts before the start of play. The social contract is a binding one. So this relationship with the social contract turns Lewis’s “coordination games,” at least insofar as they’re supposed to be models of the conventions of language, into cooperative games, a rather different thing. Not only does each person prefer each other person to use the word gold only for gold, provided that everyone else is doing so, but almost everyone prefers that everyone use the word to mean the same thing. The conventions aren’t just convenient precedents established in the interactions of particular pairs of people; they’re also norms backed up by the approval of the community as a whole.

We will see the alternative—precedents established in the interactions of particular pairs of individuals, without any role for the community as a whole, signaling “conventions” not associated with a social contract—when we look at the gestures that chimpanzees use to communicate (Call and Tomasello 2007).

In the picture of human behavior that Lewis gives us in Convention, most people follow the reigning conventions of the community’s language out of self-interest most of the time, without being forced to do so by the threat of punishment. Those people still do use some version of the sort of reasoning Schelling suggested that we use to solve the informational problem of coordination, to discover what coordinating would consist of. In some sense, then, they’re playing a coordination game inside a cooperative game. But those who blunder or choose to experiment too freely will find that the Lewisian conventions of a human language are actually binding ones. (Putnam made the same general point very forcefully in “The Meaning of ‘Meaning’” [1975b:248– 49.]) When it comes to things like calling gilded bars of lead “gold,” the enforcement of these binding conventions may become quite draconian.

The conventions that organize a human society often are binding for a good reason. In the decades after Lewis wrote Convention, Michihiro Kandori, George Mailath, and Rafael Rob (1993) and H. Peyton Young (1993) demonstrated that that simply letting nature take its course in a coordination game like this, a game involving temptations to misbehave, tends to make you end up at a “risk-dominant” equilibrium that can be far from optimal and may not involve any coordination at all.

Rousseau’s famous story, in the Discourse on the Origins of Inequality (1754), about a stag hunt is an example that’s often used to illustrate this problem. (Lewis [(1969) 2002:7] uses it as an example of a coordination problem in his introduction of the subject.) A group of hunters can capture a stag if they all work together. This is the best outcome for everyone. But sometimes while they’re hunting, a rabbit runs past. If one hunter deserts his post to chase the rabbit, he’s quite likely to catch it, which will give him some meat to eat, even if not as much as he would have gotten from the stag. It’s best for everyone if he ignores the rabbit, since that will lead to the highest possible payoff, provided that everyone else also does their part—but what if one of the other hunters decides to chase it? Then he’ll get nothing.

If none of the hunters ever makes the mistake of chasing the rabbit, if everyone is always completely certain about the relative payoffs of the possible outcomes to everyone involved and the complete rationality of all the other hunters, then the coordination equilibrium of hunting the stag will persist. But suppose that one hunter tries a foolish experiment, or mistakenly believes that he can get more by chasing the rabbit, or irrationally decides that the other hunters are unreliable and goes after it. Then the other hunters’ confidence will be damaged. It will no longer be rational for them to be certain that everyone else will do his part, which means that it’s now risky for them to do their part themselves, meaning that it’s risky for them to expect others to do their part, or to expect them to do their part, or to expect them to expect the others to do their part. Recursive mind reading of the kind that Lewis describes can help them solve the informational problem of how to succeed in coordinating if everyone is actually trying to coordinate, but it can’t predict blunders or unwise experiments, so it isn’t clear how it could solve this problem.

In this example, both parties chasing the rabbit is what John Harsanyi and Reinhard Selten (1988) dubbed the “risk-dominant” equilibrium, even though both parties chasing the stag is the “payoff-dominant” equilibrium, given some reasonable interpretation of the problem in terms of utilities. Kandori, Mailath, and Rob (1993) showed, well after Lewis wrote Convention, that evolutionary versions of this sort of game allowing occasional mistakes or experiments would, if the likelihood of further experiments eventually declined to zero, almost certainly end up at the risk-dominant equilibrium of noncoordination, and not the optimal payoff-dominant equilibrium of everyone chasing the stag. (If the players continued endlessly experimenting, they wouldn’t “end up” anywhere, which is almost as bad from the point of view of sustaining their interest in trying to participate in the stag hunt.) In “The Evolution of Conventions” (1993), published in the same year, Young demonstrated a similar result.

For Lewisian conventions, the solution to this problem seems to be contained in clauses 4 and 5 of his definition. Not only do we prefer that each other individual also follow the convention, provided everyone else is; we often take actions, extreme or mild, to ensure this. In the real world, people who can’t demonstrate that they know which side of the road to drive on, or simply aren’t willing to drive on the same side as the other drivers, are kept off the highway. They aren’t allowed to join the club of drivers. Experimentation (if you’re in a hurry, driving on the wrong side of the road to see if that would improve your speed) is strongly discouraged. Even innocent mistakes (driving on the wrong side of the road by accident while drunk) are quite likely to be punished. So this particular convention also has contractual force, though it’s seldom necessary to actually exclude anyone from the club of licensed drivers for those particular reasons. Kandori, Mailath, Rob, and Young assume that there are no traffic police actively discouraging experimentation with noncooperative behavior and careless mistakes, which makes it somewhat unsurprising that their models predict gridlock. For their stability and continued existence, the Lewisian conventions of an actual human language depend on the existence of a social contract and the mechanisms for enforcing that contract that come along with it.

Most people voluntarily follow the convention of driving on the right as a convention rather than as an obligation. It’s just that there’s also a filter in place to keep out those few who won’t or can’t, to prevent selfish behavior, ignorance, myopic rationality, and other kinds of noise from disrupting our ability to voluntarily coordinate around the precedent. We might naively hope that if the traffic police permanently vanished tomorrow, people would continue indefinitely to obey the rules of the road. In fact, however, probably a few people would start playing fast and loose almost immediately, for what seemed to them to be perfectly sufficient reasons. Given the large number of people on the road and the vast number of permutations of higher-order expectations that would then have to be considered in real time to avoid accidents, the confusion they generated would eventually undermine the workability of the whole existing system of conventions.

A convention of this kind, which has a filter around it to prevent scofflaws from ruining the coordination that it permits for those who would like to comply with it, must be clearly distinguished from a convention to which almost everyone conforms despite being completely free not to, because nobody ever has a significant incentive to deviate, and nobody ever makes a mistake or tries an experiment. Are there any such nonbinding conventions? Young gives the example of going to lunch at noon, which certainly is nonbinding. But many people don’t go to lunch at noon—if they’re busy, they may not go until 1:30, or at all. Sparse and inconsistent observance of this kind may be characteristic of nonbinding conventions in general, since people are free to ignore them if they wish.

In a static model of the conventions of a human language, the distinction between binding and nonbinding conventions (telling the truth in standard English or driving on the right, versus going to lunch at noon) might not seem to matter enormously, since we may be interested mostly in the typical experience of the typical individual, the individual who conforms voluntarily. In an evolutionary model, however, it seems likely to be very important, since it’s precisely at these filters that what does and doesn’t count as adequate adherence to the convention is settled, and it’s precisely through their action that the distinction might be maintained. By preventing dangerous forms of experimentation and discouraging casual blunders, they can allow the population to stay at the sort of non-risk-dominant optimal coordination equilibrium that Young showed might be stable if, and only if, most such things are absent.

From now on, to make this difference clear, I’ll continue to speak of conventions of the first kind, conventions that come with highway patrol officers or disapproving peers attached, to filter out scofflaws or those who might be tempted to call gilded lead “gold,” as binding conventions. This will allow me to distinguish them from the nonbinding conventions that Young wrote about, which leave us free to experiment or slip up, without any special sanction being applied to those who do.

A Youngian nonbinding convention differs from a Lewisian binding convention in that it doesn’t seem to have any associated preference corresponding to clauses 4 and 5 of Lewis’s definition. Each agent simply seeks what’s best for himself, given whatever expectations about the behavior of others his experience has allowed him to form. If he does have preferences about the other players’ behavior, he’s in no position to do anything about them. A binding convention is simply a convention for which Lewis’s clauses 4 and 5 have real teeth. Of course, when the temptation to misbehave is very weak or the consequences of noncompliance are trivial, the sanctions needed may be trivial as well. They may consist only of displeased or amused facial expressions, sarcastic remarks, or awkward silences.

The convention that we tell the truth in the specific version of our language that we share with most other speakers is a binding convention, and the pathological liar or habitual misinterpreter may soon find that the people he lies to or culpably misinterprets may be inclined to exclude him, if they can, from the club of informative speakers or competent audience members, or to apply some other sanction. It’s this quasi-contractual obligation to tell the truth in some fairly standard version of our language that makes its use a cooperative game and not a true coordination game, even though most players (those who seldom lie and are seldom lied to in consequential ways) may have experiences that make it look like a mere coordination game most of the time.

So is any particular language, L, ever actually the language of any human population? According to Lewis, no. In fact, in his opinion we never quite manage to converge on a single language, even though we’re always trying:

I think we should conclude that a convention of truthfulness in a single possible language is a limiting case—never reached—of something else: a convention of truthfulness in whichever language we choose of a tight cluster of very similar possible languages. The languages of the cluster have exactly the same sentences and give them corresponding sets of interpretations; but sometimes there are slight differences in corresponding truth conditions. These differences rarely affect worlds close enough to the actual world to be compatible with most of our ordinary beliefs. But as we go to more and more bizarre possible worlds, more and more of our sentences come out true in some languages of our cluster and false in others.

… by not committing ourselves to a single language, we avoid the risk of committing ourselves to a single language that will turn out to be inconvenient in the light of new discoveries and theories; we allow ourselves some flexibility without change of convention. ([1969] 2002:202)

Everyone speaks a slightly different idiolect, even though we are constantly trying to harmonize them all. The result is that our language has the internal resources to change in the face of new circumstances. This, of course, is very Darwinian. Over time, this naturally occurring variation presumably would allow the conventions of a human language to undergo a long series of small, sequential, perhaps almost unnoticeable changes as a result of human experiences and preferences, with one incremental variant repeatedly replacing another slightly different one in a subtler, less explicit version of the same sort of process that Lewis described for logical notation. The endless displacement of existing versions by incrementally improved ones is the essence of Darwinian evolution, so it seems to me that an implicit theory of the evolution of the conventions of a human language is already concealed in the static account of their character that Lewis has given us.

Of course, if we use a word in some slightly nonstandard sense without warning anyone or being willing to admit it, we’re likely to run up against the binding nature of the language’s conventions. Often, however, we’re able to explain what we mean or obliquely indicate that in a way that gives fair warning that we’re using a particular variant of its meaning. Still, some parts of this evolutionary process must often involve acrimony, accusations of bad faith, and attempts to pin down precisely what people are saying. The point is just that we don’t always end up pinning them down to the exact same place that we ourselves were in before the conversation started. This sort of friction is the inevitable result of the persistence of slightly different idiolects in a system of what are supposed to be binding conventions. It’s one of those strange, counterintuitive Darwinian processes that require errors to be continually introduced for perfection to be achieved (Cloud 2011).

There are further complexities to a real human language. An unrealistic aspect of the story as I’ve told it so far is its restriction to simple signaling languages like “one if by land, two if by sea.” These are still much simpler than any actual natural language. They can map only a certain finite set of situations to a certain finite set of contingency plans. Actual human languages, in contrast, have a potentially infinite number of sentences. Each grammatical sentence that isn’t nonsensical has a mood and a set of truth conditions, and if the sentence is ambiguous, it may have more than one of each.

Generating an effectively infinite number of sentences is no problem. They can be generated by the right kind of computer. We know exactly what sort of computer would be required to create the sort of finite but unbounded collection of strings of symbols that the actual sentences of human languages fall into. There’s no great puzzle about how it’s physiologically possible to produce enough distinct sentences, though it is a bit puzzling that no other living thing seems to string sounds together with the same degree of complexity. If it’s physiologically possible, why isn’t it more common? That, however, is a question for subsequent chapters. Right now, what probably ought to be puzzling us is how we manage to assign each of a potentially unlimited number of novel sentences its own peculiar set of truth conditions.

CONSPICUOUS ANALOGIES AND NONNATURAL MEANING

What sort of thing is the phrase “the British are coming by sea” supposed to stand for? We never face exactly the same situation twice, Lewis ([1969] 2002) tells us, so all the situations in which the signal of two lanterns is the appropriate one to send aren’t exactly the same, and the coordination problem we face isn’t exactly the same coordination problem each time. We really should think of the signal as one we should send in any of a large number of analogous situations, situations in which an analogous coordination problem arises and can be solved in an analogous way. The skill of interpretation requires mastering these analogies and learning how to apply them to particular situations. Lewis argued that a language is a system (a weaving together, in Plato’s very descriptive metaphor) of many different analogies, and a grammar for stringing them into collages of analogies, sentences that are true or false in possible worlds or sets of possible worlds, sometimes including the actual one, and that we interpret as referring to particular objects in this world or those other worlds.

Of course, any actual situation is analogous to many other situations in myriad different ways. Fortunately for us, Lewis ([1969] 2002:37–38) explains, most of the analogies strike us as artificial, and only a few leap out as “natural.” We ignore the artificial-seeming ones, and we expect others to ignore them, and to expect us to ignore them, and to expect us to expect them to ignore them. Because everyone expects everyone to expect everyone to ignore all but the most natural-seeming, obvious, and readily apparent analogies, we all can converge on the ones that anyone would expect anyone to expect anyone to notice.

The fact that already, several times, Redcoats have come in ships to attack our town isn’t just a noticeable similarity between different events; it’s a similarity we’d expect anyone living in the town to expect anyone living in the town to have noticed. Even if it also was cloudy on every occasion, the unusual but similar occurrence of soldiers arriving by sea each time two lanterns are lit has an obvious salience that trumps that similarity, so the signal is likely to be interpreted as referring to the soldiers, not the clouds, even by those not in on the secret. Once the British commander has been thwarted in this same way several times, he is unlikely to conclude that two lanterns are the local signal for clouds, and one for a clear sky.

This preeminent salience of certain analogies might be of two kinds. Some features of an environment—for example, whether or not any part of it is on fire, in a way analogous to what was happening when we once were burned, or contains food, as it does whenever we get to eat something—would stand out as especially salient to any creature whatsoever. Some things have salience simply as natural incentives. But we humans also might pay more attention to certain features of our environment because we expect others to do the same, and we may expect them to do the same because we expect them to expect us to do the same. Of all the analogies between all the various bits of paper we deal with, the analogies between certain pieces that make them all postage stamps or money are particularly noticeable to us, partly because we expect others to notice them and to expect us to expect them to notice them. Those analogies are salient because their existence and importance are Lewisian common knowledge, known by all to be known by all. We might say that these analogies are made salient by their commonly known “entrenchment” (Goodman [1955] 1983) in the minds of the population of speakers, rather than by being associated with direct natural incentives. In fact, we’re required to recognize these particular analogies and not others, to speak only of actual gold as gold and not to use the term, for example, in promises, to refer to our own, very different homemade version.

This is another area where developments in game theory that came after Lewis wrote Convention might give us reason to worry that he may have been too optimistic about our ability to spontaneously converge on the optimal form of coordination in the absence of any policing. One problem is that successfully attending to the same analogies and the same environmental cues as everyone else involves work, which may have a cost. Ken Binmore and Larry Samuelson (2006) demonstrated that if nature is simply allowed to take its course—if paying attention to environmental cues is costly, if how much attention the players of a coordination game pay to which sorts of environmental cues is left up to the individual players, and if no sanctions are applied to players who don’t pay enough attention or who attend to the wrong thing—then the players won’t pay the optimal amount of attention to the right cues. Instead of arriving at the optimal, “payoff-dominant” level of attentiveness to possible cues for coordination, players will gravitate toward paying a smaller, “risk-dominant” amount of attention, even though they may often miss cues that would allow them to coordinate successfully in particular situations.

Basically, the reason is that when a player who is paying close attention to lots of cues or distinguishing marks is paired with a player who is paying slightly less attention, it’s the attentiveness of the inattentive player that determines the probability that coordination will be successfully achieved. The inattentive player at least is spared the costs associated with attending carefully, but the attentive player gains the exact same benefit from their interaction, while paying a higher cost for monitoring the environment.

Since slightly less attentive players do better against slightly more attentive ones than the more attentive ones do against them, they can afford to do slightly worse in their interactions with their own kind than more attentive players do in their interactions with their own kind. So a population of more attentive players can be invaded by slightly less attentive ones. Through a long series of small steps of this kind, the population will inevitably move away from the optimal, “payoff-dominant” equilibrium of paying close attention to a large set of cues or differentia (which might allow all of us to see a fairly complex set of analogies as conspicuously salient) toward a more inattentive, “risk-dominant” equilibrium in which everyone is worse off, because fewer analogies can be treated as conspicuously salient.

Here again, by analogy with the public highway, the problem apparently would be solved by the existence of some sort of “attention police” who punished people for being inattentive or failing to attend to the right cues, weeding out slackers so that those who wanted to coordinate efficiently would be in a position to do so without interference from inattentive invaders.

Human parents, human teachers, and other humans with whom we interact do sometimes sanction us if we fail to pay attention to the sorts of things we’re supposed to be paying attention to. Not paying adequate attention is often grounds for reproach. Certain kinds of conversation—for example, almost all the conversations that take place at universities—might be less efficient if people who were not very interested in their subjects, and the fine distinctions they require, were completely free to participate in them, without even being frowned at. It seems plausible that the sorts of mild (or severe) sanctions or exclusions that humans sometimes apply to other humans when they fail to pay enough attention to the right things may help us share a richer sense of which analogies are the conspicuously salient ones in particular situations. This may be one reason that teachers give tests or glare at students who make comments in class when they haven’t done the reading.

An obligation to pay attention to certain things can be part of the social contract in the same way that the obligation to use the word gold only for gold is. Using the word gold only for gold is actually rather difficult, since it requires us to pay attention to certain abstruse distinguishing marks, which allow us to tell the difference between gold and fool’s gold, so the two obligations are intimately linked.

Some analogies are naturally conspicuous, while others stand out because others expect us to expect them to expect us to treat them as conspicuous, and may be irritated if we show no signs of any such expectation, or don’t seem to care what they expect us to do. In a similar way, some signals are naturally informative, and others are informative because others expect us to expect them to expect us to treat them as informative. For Lewis, even signals in a mere signaling language like “one if by land, two if by sea” have the second kind of meaning, what H. P. Grice (1957) called “nonnatural meaning” (or, for short, “meaning_NN”). The intended contrast—which has absolutely nothing to do with not being part of nature—is with what Grice called “natural meaning.”

Smoke naturally means a fire, and spots naturally mean measles, but to mean something by smoke is to use a smoke signal. Glancing at my watch may be a sign that I’m out of time, but only if I expect you to recognize that by glancing at my watch I mean to inform you that I am almost out of time do I mean_NN, by glancing at my watch, that I am almost out of time. A sigh may mean boredom, but meaning_NN boredom by means of a sigh involves an intention that the theatrical sigh be recognized by the audience as intended to convey boredom, and the recognition of the intention must be what causes the belief. A nonnatural signal doesn’t merely cause a belief about the world; it causes a belief about the utterer’s intent to cause a certain belief about the world. An accurate fourth-order picture of mental states on the part of the receiver (he intends me to believe that he intends me to believe) is required in order for Gricean nonnatural meaning to be successfully conveyed.

If I show Mr. X a photograph of Mr. Y showing undue familiarity to Mrs. X, he may conclude that something improper may have occurred, whether or not he attributes to me any intention to communicate that. But if I draw him a picture of the same thing, he will begin to suspect Mr. Y only if he realizes that I’m deliberately trying to send him a message, since the drawing itself is hardly evidence of impropriety. I can stop adding details to the drawing once he realizes what I’m trying to do. The certificate of intent to draw a picture can be much simpler than the whole picture would be: he may get the idea right away and punch me in the nose before the insinuating picture becomes very detailed.

If a policeman stops a car by standing in its path, the driver may slow down just to avoid an accident, but if a policeman tries to stop a car by waving it down, the driver must recognize his intention to communicate his wish that the driver should stop to be able to follow the instruction. Otherwise, it means nothing. Perhaps he’s saying hello? Only if we recognize that the policeman intends us to recognize that he wants us to stop, expects us to expect him to expect us to stop, will we stop.

We’d better recognize that, though. The recursive mind reading he’s asking us to engage in isn’t optional. The policeman will be upset with us if we’re distracted by some other feature of the environment and pay no attention to his waving. Failure to grasp his intention that we should grasp his intention may even be a crime, and the next thing we hear may be a siren. Which we really had better understand as intended to convey to us an intention to inform us that we must stop, or even worse consequences—a car chase, our arrest—will probably ensue.

Paying attention to the things we’re expected to pay attention to is often mandatory. Conventions about which analogies should be regarded as conspicuously salient often are binding. Faithful interpretation is part of the social contract, just as not lying is. We can’t just pretend we didn’t understand the siren or understood it in our own unique way. Saying that we didn’t notice it because our favorite song was on the radio or because we were busy reading a text message won’t excuse us. By driving on a public road, we’ve accepted the obligation of listening for sirens and interpreting them in a certain way, so to silently renege on that obligation later is to act in bad faith.

The sender of a Lewisian signal intends that the audience’s recognition of his intention to communicate something to that audience should be effective in communicating it. Once the intention itself is recognized, nothing else has to be communicated, so the signal can be quite stylized. It can be ringing a bell, or hanging two lanterns in a window, or drawing a very crude and simple picture, or waving a hand in a certain special way, or briefly turning on a siren. But the sender doesn’t regard it as a foregone conclusion that even if the intention isn’t recognized, the message will still be understood. Only if the receiver understands that by ringing the bell or hanging the lantern or drawing the picture or waving the hand or switching on the siren, the sender is trying to tell her, the receiver, something will the receiver be able to figure out what the sender is trying to say.

Since the attribution of particular intentions to particular other people is an all-things-considered judgment, the receiver’s whole picture of the world, and the nearby possible worlds, and of the sender’s picture of the actual world and the possible worlds closely associated with it, and of the sender’s picture of the receiver’s picture of the world all come into play in interpreting a signal of this kind. Higher-order beliefs are intimately involved in Gricean “nonnatural” meanings. Believing that a speaker intends that we believe that he intends that we believe a particular thing is a fairly complex act of higher-order mental representation. Assigning the correct interpretation to a sentence uttered by a particular person on a particular occasion can be a dauntingly complex enterprise, as we can see from the fact that people don’t always understand jokes. But the skills needed for interpretation aren’t optional; every member of the community must acquire an adequate version of these skills in order to be considered a competent adult.

If all Lewisian languages have nonnatural meaning, is all nonnatural meaning associated with languages or with conventions more generally? No. Lewis asks us to imagine that someone warns people of the presence of quicksand by making a stick figure out of branches, and then putting it in the quicksand so that it sinks part way down. Obviously, to any passing person, this half-buried stick figure will mean_NN “Look out, quicksand!” But it means this not by convention, not because there is some conserved and culturally transmitted precedent that a figure half buried in quicksand means “Quicksand!,” not because this is a conventional arbitrary sign. Rather, it means this by sheer force of obviousness, because being shown a mock human slipping down into quicksand makes it pretty obvious that someone is trying to inform us of the danger. This is obvious to you, though, only if you understand that the maker intends you to understand his intent to make it obvious. Otherwise, you’ll see only some oddly arranged sticks or, at best, a strangely positioned doll. Then we might say that you didn’t get it, which, when said of a human adult, is not a compliment.

We might wonder which form of nonnatural meaning—conventional, like the meanings of words, or occasional or “conversational,” like the meaning of Grice’s insinuating drawing and Lewis’s stick figure—came first in the history of the evolution of human communication. The answer isn’t as obvious as it might seem. Even a clout on the head or deliberate eye contact can convey a nonnatural meaning of the occasional kind. It can convey an intent to let the person we’re hitting, or meeting the eyes of, know that we want him to know he’d better stop what he’s doing, or had better not stop what he’s doing, or something like that.

(Chimpanzees do hit each other, of course, but probably not to convey this sort of complex multilayered message. They may simply be trying to inflict pain or harm or to take revenge. Direct eye contact is much less common than it is among humans, though bonobos do look directly into each other’s eyes during sex. This may be an example of the same cognitive machinery or its evolutionary precursors being used for a somewhat more restricted social purpose. Sex is a rather obvious opportunity for shared attention to a common project, for attending to the fact that the other party involved is attending to what it is we’re attending to.)

Consequently, the question is whether we humans started meaning_NN things by our words before or after we started meaning_NN things by whacking each other on the head or meeting each other’s eyes or helpfully pointing at things. We might try to answer this question by looking at what chimpanzees do and seeing if we can find any precursors to either thing. People like Daniel Sperber (2000) and Tomasello (2008) have made arguments, on the basis of what we now know about the differences between humans and chimpanzees, that seem to imply that some comprehension of “occasional” Gricean nonnatural meanings probably predates, and was necessary for, the evolution of our modern human kind of linguistic convention, that something like indicative pointing probably came first. They seem to be saying that it probably arose first as a type of occasional meaning, perhaps in the context of collaborative tool use or the performance of another shared food-gathering or food-processing task, a scenario that would be impossible if the two things were identical.

You can understand indicative pointing only by understanding that the pointer intends that you understand that he wants you to look at the clown or give him the hammer or look under the bucket. When chimpanzees are around humans a lot, they learn to point at things that they want because the gesture may have the magical effect of somehow inducing the humans to give the thing to them. But even when they’ve learned how to produce it, they are much less likely to be able to understand the gesture when it’s made by others. They sometimes may be able to understand a human pointing at or reaching for something the human wants, but a disinterested human helpfully pointing out the location of something that the chimpanzee might want appears to be beyond their comprehension.

Chimpanzees can follow a human’s gaze, but the human motive of being helpful, of unselfishly managing the attention of others, is apparently so foreign to them that they don’t make what seems to us to be the obvious inference that there might be something good in the place that the person is pointing out to them (Tomasello 2008:38–41). Because helpfully pointing out something like that isn’t something they would do, it seems difficult for them to grasp that we’re doing that. They certainly don’t seem to have any sense that they’re under any obligation to figure out what we’re pointing at. The whole idea—which seems to play such an important role in allowing us to refer to particular things in the world around us, confident that others will pay attention, and make the effort required to get the reference—is apparently inconceivable. Chimpanzees really do appear to live in the kind of every-person-for-himself world described by the game theorists. And as we’ll see, their nonbinding communicative “conventions,” though they do seem to have some, are just as sparsely observed as Young’s convention of going to lunch at noon.

Unlike the sclera of a chimpanzee’s eye, the sclera of a human eye is white, probably because this makes it much easier to follow our gaze and find the object we’re pointing to or attending to. Since this is a modification of the person attending, not the person trying to figure out what she’s attending to, it suggests that it’s been in our reproductive interests for a rather long time that others know what we’re attending to, which seems to make sense only if there have been lots of situations in which we were better off because the very fact that we were attending to something could convey to others the suggestion that they, too, should attend to it. Apparently, being helpful in this way, helping others attend to the right things, intervening in the management of attention by others, has been quite helpful to us. This Gricean kind of attention management—letting others see that we’d like to help them see something—is such an important part of human nature that it’s actually reflected in our physical appearance.

Dolphins, unlike chimpanzees, seem to understand human indicative pointing (Herman et al. 2000), possibly because they employ the tightly focused beam of sound they use for sonar to illuminate objects for other dolphins. This strikes me as even better than having white sclera. Dolphins also appear to use personal names for one another (King and Janik 2013), another very Gricean behavior, since it requires the user to intend that the party so addressed understand that the user intends that he should attend to her. After many thousands of years of domestication by humans, dogs also seem to be able to obtain information from indicative pointing in some way (Hare and Tomasello 2005; Tomasello 2008:42–43) and can respond to their own names—but they still can’t use names to call others, as dolphins can.

With the mention of evolution, we’ve gone beyond anything David Lewis himself said about human language. We do, however, now have a theory of the evolution of a rather different, more Youngian, nonbinding kind of “linguistic convention,” the one articulated by Brian Skyrms in Signals. Since researchers like Michael Tomasello and Daniel Sperber seem to have concluded that the kind of Gricean nonnatural meaning that Lewisian conventions have is quite important, while Skyrmsian signaling conventions involve no such thing, we seem to be moving forward from Lewis’s achievements in two rather different directions. To sort out this apparent contradiction, I must first make sure that we all know what Skyrms has actually said, so in the next chapter, I will introduce his story about the evolution of his rather different kind of signaling convention.