Small gestures can have a big impact.
Julianna Margulies
JUST AS TACIT CULTURAL KNOWLEDGE shapes grammars, it is also important in each of the latter’s components – words, gestures, phonology, syntax, discourse and conversation. However, for many linguists and anthropologists, gestures are often omitted from discussion, judged too quickly to be secondary accoutrements of speech, a separate, independent facet of human behaviour. But researchers from various theoretical perspectives have shown, to the contrary, that an intimate set of connections exists between hand movements, linguistic structure and cognition, held together by tacit, cultural knowledge. An analysis of the symbiosis between the hands, the mouth and the brain and how these evolved has to round out any theory of language evolution.1
Additionally, some have argued that this all shows something more, namely that the highly language-specific components of gestures are innate. This research, pioneered in the work of Susan Goldin-Meadow, examines the ‘spontaneous emergence’ of hand movements in children who have otherwise no access to linguistic input, as in the deaf children of hearing, non-signing parents. She calls these gestures ‘homesigns’ and the gestural systems she studies might indeed be crucial to our quest to tease apart native vs cultural or a priori vs a posteriori perspectives on the origins of (some) dark matter.
To understand the role of gestures in language, it is important to grasp how they work together with intonation, grammar and meaning. It is possible to get some idea of how these different abilities combine by focusing on gestures and intonation as ‘highlighters’, helping hearers pick out new or important information from old information that the speaker assumes to be shared with the hearer. One way to do this is to examine the evolving research into gestures and human language, from the ancients through the current and very important research of contemporary scientists. Without understanding gestures there is no understanding of grammar, the evolution of language, or the use of language. Gestures are vital for a fuller understanding of language, its origins and its broader role in human culture, communication and cognition.
Language is holistic and multimodal. Whatever a language’s grammar is like, language engages the whole person – intellect, emotions, hands, mouth, tongue, brain. And language likewise requires access to cultural information and unspoken knowledge, as we produce sounds, gestures, pitch patterns, facial expressions, body movements and postures all together as different aspects of language. I want to begin here with an overview of the functions and forms of gestures in the world’s languages, including most likely the language(s) of early Homo species. Gestures can be complex or simple. But they can be learned.
The gestures that accompany all human speech reveal an intersection of culture, individual experience, intentionality and other components of ‘dark matter’, or tacit knowledge. There are two kinds of knowledge of human grammars, as there are of most things: the static and the dynamic. These are quite possibly related to declarative and procedural memories, but they do seem a bit different. Static knowledge is a list of the things we know. Rules for telling stories are static knowledge. Dynamic knowledge, however, is understanding that things change and knowing how to adapt to changes in real time. If static knowledge is knowing the rules for telling a story, dynamic knowledge is telling the story. Gestures are crucial components of our multimodal languages. They are themselves intricate in structure, meaning and use. Contemporary research makes it clear that gesticulations are as analytically challenging and as intricate in design and function as any other part of language. But, to reiterate, these are not simply add-ons to language. There cannot be a language without gestures. Most of these are used unconsciously and employ tacit knowledge. They are shaped by the needs of the language they enhance and the cultures from which they emerge.
Kenneth Pike saw gestures as evidence for the idea that language should be studied in relation to a unified theory of human behaviour:
In a certain party game people start by singing a stanza which begins Under the spreading chestnut tree … Then they repeat the stanza to the same tune but replace the word spreading with a quick gesture in which the arms are extended rapidly outward, leaving a vocal silence during the length of time which would otherwise have been filled by singing. On the next repetition the word spreading gives place to the gesture, as before, and in addition the syllable chest is omitted and the gap is filled with a gesture of thumping the chest. On the succeeding repetition the head is slapped instead of the syllable nut being uttered … Finally, after further repetitions and replacements, there may be left only a few connecting words like the, and a sequence of gestures performed in unison to the original timing of the song.2
Pike concludes from this example that gestures can replace speech. Later researchers, however, have shown that the gestures he refers to are a very limited type out of several kinds that are possible. Language is just a form of behaviour, as gestures also are. Still, Pike’s basic point is valid – language and its components are human behaviour guided by individual psychology and culture, dark matter of the mind.
All human behaviour, including language, is the working out of intentions, what our minds are directed towards. Language is the best tool for communicating those intentions. Communication is a cooperative behaviour. It follows cultural principles of interaction.
Pike raised another question: why don’t people mix gestures or other noises with speech sounds in their grammars? Why is it that only sounds made by the mouth can be used in syllables and speech more generally? Why couldn’t a word like ‘slap’ be [sla#], where [#] would represent the sound of someone slapping their chest? It may sound easy, but really this is not a possible word or syllable in anyone’s language. As a beginning linguistics student, I thought this question was interesting but did not appreciate adequately the degree to which it impinges on the understanding of language.
Gestures aim towards what linguists and philosophers call ‘perlocutionary effects’, the effects that a speaker intends her language to have on a hearer. Speakers use highlighters in order to help the listener use or react to the information in the way the speaker hopes they will.
To more fully illustrate the need for a single theory of culture and language, indeed all human behaviour, one might contemplate a scene like the following. Two men are watching other men move some heavy furniture down the stairs in their apartment building. One man passing on the stairway landing is huffing and puffing and concentrating solely on his heavy load. His wallet is hanging loosely from his back pocket, about to fall out. He clearly wouldn’t notice if someone relieved him of this burden. The first observer looks at the second observer with raised eyebrows, looking at the wallet. The second one sees him and simply shakes his head to indicate ‘No’. What happened here? Is this language? It is a form of communication that is parallel to language. Certainly shared culture and conventions are necessary for this kind of exchange. Just about anything two members of a culture wish to exploit can be used to communicate.
There is a broad popular interest in gestures, though people often fail to recognise how fundamental they are to language. They formed the basis of a 2013 article in the New York Times by Rachel Donadio titled ‘When Italians Chat, Hands and Fingers Do the Talking’.3 Italians do indeed stand out gesturally, but so do we all. Even in the seventeenth century northern European Protestants disapproved of Italians’ ‘flamboyant’ hand movements. But the first person to study Italians’, or any other language’s, gestures from a modern scientific perspective was David Efron, a student of twentieth-century pioneer in anthropology and linguistics Frans Boas. Efron wrote the earliest modern anthropological linguistic study on cultural differences in gestures more than seventy years ago. He focused on the gestures of recent Italian and Jewish immigrants and later compared those with the gestures of second- and third-generation immigrants.
Efron’s study, Gesture, Race and Culture, was simultaneously a reaction against Nazi views of the racial bases of cognitive processes, a development of a model for recording and discussing gestures and an exploration of the effects of culture on gesture. The core of Efron’s contribution is his description of the gestures of unassimilated southern Italians and east European Jews (‘traditional’ Italians and Jews), recently emigrated to the United States and mainly living in New York City (though some of his subjects also came from the Adirondacks, Saratoga and the Catskills). According to Efron, Italians use gestures to signal and support content. For example, a ‘deep’ valley, a ‘tall’ man, ‘no way’. The Jewish immigrants of Efron’s study, on the other hand, use gestures as logical connectives, that is to indicate changes of scene, logical divisions of a story and so on. These uses of gesture underscore the fact that language is triple-patterned (symbols, structure and highlighters, such as gestures and pitch) and shaped by culture.
Efron wanted to know two things about gestures. First, are there standardised group differences in gestures between Italian and Jewish immigrants? Second, he wanted to understand how gestures change when an immigrant is socially assimilated. He discovered a strong cultural effect. An ‘Americanisation’ of gestures occurred in each group over time. The initially strong differences between Jewish vs Italian immigrants grew less pronounced until they were identical to any other citizen of the United States.
Since his was a pre-video-camera era, Efron contracted an artist to help him, Stuyvesant Van Veen. Efron was the first to come up with an effective methodology for studying and recording gestures, as well as a language for describing them. Although later parts of the book orthogonally attacked Nazi-science, the book was a breakthrough. Efron’s work, though pioneering, emerged from a long tradition.
Aristotle discouraged the overuse of gestures in speech as manipulative and unbecoming, while Cicero argued that the use of gestures was important in oratory and encouraged their education. In the first century, Marcus Fabius Quintilianus actually received a government grant for a book-length study of gesture. For Quintilian and most of the other classical writers, however, gesture was not limited to the hands but included general orientation of the body and facial expressions, so-called ‘body language’. In this they were correct. These early explorers of gesture in human languages discovered that communication is holistic and multimodal.
The Renaissance rediscovered the work of Cicero and other classical scholars, sparking European interest in the relationship between gesture and rhetoric. The first book in English on gesture was John Bulwer’s Chirologia: or the Naturall Language of the Hand in 1644.
By the eighteenth century researchers on gesture began to wonder whether gestures might have been the original source for language. This idea is echoed by several modern researchers, but it is one that should be discouraged. Gestures that can serve in place of speech, such as sign languages or mimes, in fact repel speech, as University of Chicago psychologist David McNeill has shown. They replace it. They are not replaced by speech, which would have to be the evolutionary progression if gestures came first.
Interest in understanding the significance and role of gestures in human language and psychology diminished tremendously, however, in the late nineteenth and early to mid-twentieth centuries. There were several reasons for this decline. First, psychology was more interested in the unconscious than in conscious thinking during this period and it was thought, erroneously, that gestures were completely under conscious control. Gesture studies also dwindled because linguists became more interested in grammar, narrowly defined by some so as to exclude gestures. The interest in the messy multimodality of language had waned. Another factor leading to the decline of gesture studies was that linguistic methods of the day were still not up to the task of studying gestures scientifically. Efron’s work was extremely hard to do and wasn’t amenable to widespread duplication, at least as many perceived it at the time. Not everyone can afford an artist.
Linguist Edward Sapir was different. He saw language and culture as two sides of a coin. Therefore, his view of gestures was similar to those of current research. As Sapir said, ‘the unwritten code of gestured messages and responses is the anonymous work of an elaborate social structure’. By ‘anonymous’ Sapir meant tacit knowledge or dark matter.
This raises the fundamental and obvious question: what are gestures? Are sign languages gestures? Is a mime gesture? Are signals such as the ‘OK’ sign with the thumb and forefinger or ‘the bird’ with an upraised middle finger gestures? Yes to all the above. Some researchers, such as David McNeill and Adam Kendon, classify all these different forms along a ‘gesture continuum’ that looks at gestures in terms of their dimensions and their relationship to grammar and language (Figure 32).
Gesticulation, the most basic element of the continuum, is the core of the theory of gestures. It involves gestures that intersect grammatical structures at the place where gestures and pitch and speech coincide. Gesticulation is, in fact, what most theories of gesture are about. Some gestures are not conventional – they may vary widely and have no societally fixed form (though they are culturally influenced). Gestures that can replace a word, as in Pike’s ‘chestnut tree’ language game, are ‘language-slotted’. These can also be seen if you were to tell someone, ‘He (use of foot in a kicking motion) the ball,’ where the gesture replaces the verb ‘kicked’, or, ‘She (use of open hand across your face) me,’ (for ‘She slapped me’). These gestures occupy the positions in sentences usually taken by words. They are special gestures. These gestures are improvised and used to produce particular effects according to the type of story being told. Fascinatingly, these language-slotted gestures are a window into speakers’ knowledge of their grammars. One cannot use gestures unless one knows how words, grammar, pitch and the rest fit together.
Figure 32: The gesture continuum
Our hand movements can also simulate an object or an action without speech. When they do this we are using mime, which follows only limited social conventions. Such forms vary widely. Just play a game of charades with a group of friends to see that. Conventionalised gestures can also function as ‘signs’ in their own right. As I mentioned, two common emblems in American culture are the forefinger and the thumb rounding and touching at their tips to form the ‘OK’ sign and ‘the bird’, the upraised, solitary middle finger.
These are all distinct from sign languages. These are full-blown languages. They have all the features of a spoken language, such as words, sentences, stories and even their own gestural and intonational highlighters. These are expressed in sign languages by different kinds of body and hand movements and facial expressions. In our discussion of language evolution, it is very important to keep in mind the most salient feature of gesture-based languages. That is that sign languages neither enhance nor interact with spoken language. In fact, sign languages repel speech, to use one of McNeill’s phrases. This is why many researchers believe that spoken languages did not and could not have begun as sign languages.
Now let’s move to the crux of gesture’s relevance for language evolution. The bedrock concept here, developed in McNeill’s research, is called the ‘growth point’. The growth point is the moment in an utterance where gesture and speech coincide. It is where four things happen. First, speech and gesture synchronise, each communicating different yet related information simultaneously.
The growth point is further described as that point where gesture and speech are redundant, each saying something similar in different ways, as in Figure 32. The gesture highlights a newsworthy item from the background of the rest of the conversation, again as in Figure 32. Intonation, it should be mentioned, is also active at the growth point and other places in what is being said. Third, at the growth point gesture and speech communicate a psychologically unified idea. In Figure 33, the gesture for ‘up’ occurs simultaneously with the word for ‘up’.
In short, gesture studies leave us with no alternative but to see language not as a memorised set of grammar rules but as a process of communication. Language is not static, only following rigid grammatical specifications of form and meaning, but it is dynamic, bringing together pitch, gestures, speech and grammar on the fly for effective communication. Language is manufactured by speakers in real time, following their tacit knowledge of themselves and their culture. Gestures are actions and processes par excellence. The boundaries between gestures are clear, being the intervals between successive movements of the limbs, according to McNeill. Like all symbols, gestures too can be decomposed into parts. I won’t go into these here except to say that this all means that gestures, intonation and speech become a multimodal, holistic system, requiring a Homo brain to orchestrate their cooperative action.
Another crucial component of the dynamic theory of language and gestures that McNeill develops is the catchment. This is a bit technical, but it is essential to understanding how gesture facilitates communication and thus the potential role of gesture at the beginning of language. A catchment indicates that two temporally discontinuous portions of a discourse go together – repeating the same gesture indicates that the points with such gestures form a unit. In essence a catchment is a way to mark continuity in the discourse through gestures. McNeill says:
Figure 33: The growth point
[A] catchment is recognized when one or more gesture features occur in at least two (not necessarily consecutive) gestures. The logic is that recurrent images suggest a common discourse theme and a discourse theme will produce gestures with recurring features … A catchment is a kind of thread of visuospatial imagery that runs through a discourse to reveal the larger discourse units that encompass the otherwise separate parts.4
Assume that while speaking you use an open hand, turned upward with the fingers also pointed upward, whenever you repeat the theme about a friend wanting something from you. The gesture then becomes associated with that theme, highlighting the theme thereby and helping your hearer follow the organisation of your remarks more easily.
In other words, through the catchment, gestures enable speakers to arrange sentences and their parts for use in stories and conversations. Without gestures there could be no language.
Various experiments have been developed that illustrate an ‘unbreakable bond’ between speech and gestures. One of the more famous experiments is called delayed auditory feedback. For this test the subject wears headphones and hears parts of their speech on roughly a 0.2 second delay, close to the length of a standard syllable in English. This produces an auditory stuttering effect. The speaker tries to adjust by slowing down. The reduced rate of speech offers no help, however, because the feedback is also slowed down. The speaker then simplifies their grammar. On top of this, the gestures produced by the speaker become more robust, more frequent, in effect trying to take more of the communication task upon themselves. But what is truly remarkable is that the gestures stay synchronised with the speech no matter what. Or, as McNeill puts it, the gestures ‘do not lose synchrony with speech’. This means that gestures are tied to speech not by some internal counting process, but by the intent and meaning of the speaker. The speaker adjusts the gestures and speech harmoniously in order to highlight the content being expressed.
Other experiments also illustrate clearly the tight connection between speech and gestures in normal talk. One experiment involves a subject referred to as ‘IW’. At age nineteen, IW suddenly lost all sense of touch and proprioception below the neck due to an infection. Experiments show that IW is unable to control his hand movements unless he can see his hands (if he cannot see them, such as when they are below the table he is seated at, then he cannot control them). What is fascinating is that IW, when speaking, uses gestures that are well coordinated, unplanned and closely connected to his speech as though he had no disability at all. The case of IW provides evidence that speech gestures are different from other uses of the hands, even other gesturing uses of the hands. Some suggest that this connection is innate. But we know too little about the connection of gestures and speech in the brain or the physiological history of IW to conclude this. In any case, however, this coordination comes about, gestures in speech are very unlike the use of our hands in any other task.
One final observation to underscore the special relationship between gestures and speech: even the blind use gestures.* This shows that gestures are a vital constituent of normal speech. The blind’s use of gestures has yet another lesson for us. Since the blind cannot have observed gestures in their speech community, their gestures will not match up exactly to those of the local sighted culture. And yet this very fact shows that gestures are part of communication and that language is holistic. We use as much of our bodies as we are able when we are communicatively engaged. We ‘feel’ what we are saying in our limbs and faces and so on.
The connection between gestures and speech is also culturally malleable. Field researchers have demonstrated that the Arrernte people of Australia regularly perform gestures after the speech. I believe that the reason for this is simple. The Arrernte simply prefer gestures to follow speech. The lack of synchrony between gestures and speech is simply a cultural choice, a cultural value. Gestures for the Arrernte could then be interpreted similarly to the Turkana people of Kenya, in which gestures function to echo and reinforce speech.
Were gestures also important for Homo erectus? I believe so, based, once again, on the work of David McNeill. He introduces the term ‘equiprimordiality’, by which he means that gestures and speech were equally and simultaneously present in the evolution of language. There never would have been nor could have been language without gestures.
If this is correct, claims McNeill, then ‘speech and gesture had to evolve together’. ‘There could not have been gesture-first or speech-first.’ This follows because of my concept of triality of patterning. You cannot have language without grammar, meaning and highlighters. By the same token, there could never have been intonation without language or language without intonation.
Once this initial hurdle of how gestures become meaningful for humans is overcome, the evolutionary story of the connection between gesture and speech may be addressed. McNeill’s theory hypothesises that early speech by the first speakers and human infants was ‘holophrastic’. That is, in these early utterances there are no ‘parts’, only a whole. To return to an earlier example, say that the first utterance by an erectus was ‘Shamalamadingdong!’ as he saw a sabre-toothed cat run by him only a hundred yards away. He was in all likelihood gesticulating, screaming and engaging his entire body to communicate what he had seen, unless he was frozen with fear. His body and head would have been directed towards the cat. Later, perhaps he recreated this scene, using slightly different gestures and intonation (he is calm now). The first time perhaps he uttered SHAMALAmadingDONG, with hand movements on shama and dong. The next time perhaps his intonation fell on shamalamaDINGdong. Perhaps his gestures remained over ‘shama’ and ‘dong’ or, more likely, they were more closely linked to any change in his intonation. It is now possible that erectus has inadvertently taken a holophrastic – single unit – utterance and transformed it into a construction with individual parts. And this is how McNeill proposes that grammar began to emerge.
As gestures and speech become synchronised, gestures can then show one of two characteristics. They either represent the viewpoint of the observer – the viewpoint of the speaker – or they represent the viewpoint of the person being talked about. And with these different viewpoints, different ways of highlighting content and attributing ownership of content, we lay the groundwork for distinctions among utterances such as questions, statements, quotes and other kinds of speech acts.
McNeill gives an example of one person retelling what they saw in a cartoon of Sylvester the cat and Tweety Bird. When their hand movements are meant to duplicate or stand for Sylvester’s movements, then their perspective is Sylvester’s. But when their hand movements indicate their own perspective, then the perspective is also their own.†
Intentionality – being directed at something – is also a prerequisite to having a language. And intentionality is shown not only in speech but also in gestures and other actions. We see it in anxiety, tail-pointing in canines and in focused attention across all species. One reason that gestures are used is because intentional actions engage the entire body. The orientation of our eyes, body, hands and so on varies according to where we are focusing our attention. This holistic nature of expressing intentions seems to be a very low-level biological fact that is exploited by communication. The fact is, ‘animals use as much of their bodily resources as necessary to get their message across’. If we are on the right track, though, gestures could not have been the initial form of language. They would have occurred simultaneously with intonation and vocalisation. This is not to say that prelinguistic creatures cannot express intentionality by pointing or gesturing in some way. It does mean that real linguistic communication must have always included both gestures and speech. There are a few additional reasons for this judgement.
First, speech did not replace gesture. Gestures and speech form an integrated system. The gesture-first origin of language predicts a mismatch between gesture and speech, since they would be separate systems. But in reality they are synchronous (they match in time) and parts of a single whole (a gesture plus intonation plus speech coordinated in a single utterance). Further, people regularly switch between gestures and speech. Why, if speech evolved from gestures, would the two still have this give-and-take relationship? Finally, if the gesture-first hypothesis is correct, then why, aside from languages of the deaf, is gesture never the primary ‘channel’ or mode of communication for any language in the world?
Intonation was alluded to earlier when discussing ‘Yesterday, what did John give to Mary in the library?’ Whenever we speak we also produce a ‘melody’ over our words. If an example of the importance of intonation is desired, one need only think about how artificial a car’s GPS sounds when it’s giving directions. Although computer scientists long ago learned that speech requires intonation, they still have not produced a computer that can use or interpret intonation well. Intonation, gestures and speech are built upon a stable grammar. The only gestures that provide stability are the conventionalised and grammaticised gestures in sign languages. In this case again, however, gestures are either used instead of or to supplant speech.
What is crucial is that gestures co-evolved with speech. If sign language, language-slotted gestures or mimes had preceded speech, then there would have been no functional need for speech to develop. The gesture-first idea stakes out an untenable position. We had a well-functioning gestural communication but replaced it wholesale with speech. And some gestures, such as mimes, actually are incompatible with speech.
This might seem to be contradicted by the earlier example from Kenneth Pike that apparently shows that gestures can substitute for speech. But the gestures Pike discusses are language-slotted gestures, a distinct kind of gesture parasitic on speech, not the type of gesture to function in place of speech. On the other hand, Pike’s example suggests another question, namely whether there could be ‘gesture-slotted speech’ corresponding to speech-slotted gestures. This would be a case of an output in which speech substitutes for what would usually be expressed by gestures. If speech evolved from gestures, after all, this is how it would have come about. And gesture-slotted speech is not hard to imagine. For example, consider someone bilingual in American Sign Language and English substituting a spoken word for each sign, one-by-one, in front of an audience. Yet such an event would not really exemplify gesture-slotted language, since it would be a translation between two independent languages, not speech replacing gestures in a single language. This is important for our point for a couple of reasons. The obviously utilitarian nature of hand signs offers us a clear route to understanding their origin and spread. And the fact that everyone seems to use gestures in all languages and cultures of the world is supportive of the Aristotelian view of knowledge as learned, over the Platonic conception of knowledge as always present. This follows because it shows that the usefulness of gestures is the key to their universality. When a behaviour is an obvious solution to a problem, there is no need to assume that it is innate. The problem alone guarantees that the behaviour will arise if the mind is intelligent enough. This principle of usefulness explains most supposedly universal characteristics of language that are often proposed to be innate. In other words, their utility explains their ubiquity.
As they stabilise by conventionalisation, gestures become sign languages. But sign languages are formed when gestures replace all speech functions. The idea that speech develops from gestures thus makes little sense either functionally or logically. The ‘gesture-first’ theory gets the direction of evolution backwards.
However, in spite of my overall positive view of McNeill’s reasoning about the absence of gesture-first languages, there seems to be something missing. If he were correct in his additional assertion or speculation that two now-extinct species of hominin had used either a gesture-first or a gesture-only language, and that this is the first stage in the development of modern language, then why would it be so surprising to think that Homo sapiens had also used gesture-first initially? I see no reason to believe that the path to language would have been different for any hominin species. In fact, I doubt seriously that pre-sapiens species of Homo would have followed a different path, since there are significant advantages to vocal vs gesture communication.
There are still other types of gesture important in human communication. These include iconic gestures, metaphoric gestures and beats. Each reveals a distinct facet of the gesture–speech relationship and its relationship to cognition and culture. There is no need to discuss these here, other than to mention them as evidence for the complexity of the relationship between gestures and speech and that they each contribute to our progress along the semiotic progression.
Nevertheless, we have yet to see anything in grammar, gestures, or other aspects of language that would lead us to believe that anything needs to be attributed to the genome of Homo sapiens that is specific to language. Cultural learning, statistical learning and individual apperceptional learning complemented with human episodic memory seem up to the task. The literature is rife with claims to the contrary, namely that there are phenomena that can be explained only if language is acquired at least partially based on language-specific biases in the newborn learner.
It is claimed that there is a spontaneous emergence of language features in communities that are claimed to otherwise lack language, such as Nicaraguan Sign Language and Al-Sayyid Bedouin Sign Language. These languages are purported to come into existence suddenly as a population needs language but otherwise lacks one. The problem with this claim is that all of these languages begin with very simple structures and then become more complex over time with more social interaction. Often it takes at least three generations to develop a complexity roughly equal to better-established languages. But this is just what we would expect if they were not derived from innate knowledge but invented and improved over time as they were learned by subsequent generations. For this reason, even if such examples provided some evidence for an innate predisposition to language, the knowledge would be very limited.‡
Susan Goldin-Meadow’s work argues that homesigners develop symbols for objects, principles for ordering these and constituents of distinct gestures. She also suggests that these newly minted gestures can fill slots in larger sentence-like structures, structures represented by the kind of tree diagrams we saw earlier. She also discusses a number of other characteristics of homesigns. Her conclusion is that all of this knowledge must be innate or how else could it appear so quickly among a group of speakers?
But none of these characteristics is specific to language. Indexes and icons, in all probability early forms of gestures, are used in one way or another by several species. There is no reason to believe that homesigns cannot be learned easily by humans. In fact, on one interpretation, that is all that Goldin-Meadow’s results on symbols show us, namely that children readily learn and adopt symbols. The object is a form with a meaning. As the child learns the object and desires to communicate, then – perhaps the most striking characteristic of our species – whether due to an interactional instinct or an emotional urge, the child will represent the object and the meaning of the object in the particular culture comes along for the ride. Children participate in their parents’ lives, even if without language, and try to communicate, as Helen Keller’s remarkable odyssey shows us. With an ability to see or hear or feel, the child can receive input from the environment, from its caregivers, and in fact will do so with most caregivers and in most environments. Learning the use of the object, the salience of the object to its parents and environment, it is unsurprising that children communicate about objects, as most other species (at least mammal species) do. Whole objects, as perceivable in a particular space in time, are most salient and learned relatively easily by dogs, humans and other creatures. Humans try to represent their objects, unlike other animals, because humans strive to communicate.
The fact that some features of the objects stand out to children is likewise unsurprising, though the particular reason that shape and size win out over many other features, if Goldin-Meadow is correct, is unclear. She ascribes it to the child’s native endowment. But I would suggest looking first at the way that objects are used, presented, structured and valued in the examples of the child’s caregivers. Furniture, dishes, houses, tools and so on are far more easily arranged and far more prevalent in the environment of US caregivers’ salient objects than other features. At least that could be tested and there is no suggestion that any such tests were contemplated.
With regard to the claim that homesigners’ speech is organised hierarchically, there are two caveats. First, structure vs simple juxtapositioning of words like beads on a string are very difficult to distinguish in practice. Are three objects related as in diagram (a) or (b) of Figure 34? Either might be the case, and the reasons for choosing one over the other are highly theoretical. For example, in Pirahã utterances, we might say, ‘The man is here. He is tall.’ Or, ‘I spoke. You are coming.’ And these could be interpreted as ‘The man who is tall is here,’ or ‘I said that you are coming.’ But the analysis is quite possibly much simpler, with the syntax lacking hierarchical structure.
In none of Goldin-Meadow’s examples purporting to show hierarchical structure in homesigners’ utterances is there convincing evidence for structures like (b). The second caveat is that some configurations provide a natural solution to presenting information, independent of language, and thus if one finds them in some languages, this is not evidence that there is an innate linguistic bias. Again, if a structure provides a useful solution to communicating information then nothing else needs to be said about why it is found in languages around the world. As information demands grow due to increasing societal complexity, hierarchy is the most efficient solution to information organisation, across many domains. Computers, atoms, universes and many other complex objects of nature are organised this way. It is a naturally occurring and observable solution. In fact, for any action that involves ordering, such that ‘you must do x before you do y’, there is structure. Such solutions are used in automobiles, canine behaviours and computer filing systems. There is absolutely nothing special about them when they also appear in language.
The ordering that homesigners are claimed to impose on their structures is mundane. First, they have no alternative but to put their symbols in some order. And since the main ingredients of any utterance are the thing being reported about and what happened to it, sentences tend to be organised in terms of topic and comment. The topic of a sentence (as opposed to the topic of a larger story) is the old information that is either being discussed or the information that the speaker assumes that the hearer will know. The comment is the new information about the topic. Very often, but not always, the topic is the same as the subject and the comment is the same as the predicate or verb phrase:
Figure 34: ‘The big boy’
‘John is a nice guy.’
Here the old information is ‘John’. The speaker offers nothing more than a name and assumes that the hearer knows who is being mentioned. The new information is ‘is a nice guy’. In other words, the speaker is saying something that he or she thinks might be new information for the hearer about ‘John’. A paraphrase might be something like, ‘I know you know John, but you might not know that he is a nice guy.’
The topic in most languages of the world precedes the comment. In other words, languages prefer to begin their sentences with shared or old information before giving new information. This order may aid our short-term memory. Within the comment, where the new information is placed, a large number of languages prefer to place the object before the verb. So if someone somewhere is eating a fruit, this can either be described as John fruit eats (majority of languages) or John eats fruit (English and many other languages). In the Subject Object Verb group are languages like German, Japanese and Pirahã. Languages like French and English, on the other hand, use the order Subject Verb Object. In fact, English used to belong to the former group (it is closely related to German), but after the Norman invasion of 1066 English switched to the French order of Subject Verb Object.
It is claimed occasionally that word orders may be heavily influenced by different communication strategies for dealing with problems in the communicative process. One problem is noise corrupting the signal. As people talk they can be distracted by background noise, children tugging on them, animals in the distance and new people arriving during the conversation. Therefore, linguistic strategies must be able to overcome the effect of ‘noise’ on the hearer’s ability to perceive the speaker’s words. These researchers claim that this is why Subject Object Verb order is most common – it helps avoid confusing the subject and the object and avoids confusing which is topic and which is comment, by having the phrase before the verb ‘to be’ in the comment section. On the other hand, since there are thousands of languages that use different orders, such as Subject Verb Object, Object Verb Subject, Object Subject Verb, Verb Subject Object and Verb Object Subject, no one of these orders is far superior to the others. The order that a language adopts results from cultural pressures that affect the history of a particular society. This is why that English changed in its history from Subject Object Verb to Subject Verb Object, as a result of the influence of the French spoken by the Norman conquerors.
Homesigning follows largely the same principles. The basic problem that interlocutors must solve is keeping novel information distinct from already known information. Thus the fact that homesigners follow a common order is simply not a big deal. It is just the way that communication works most efficiently.
Nor should it be surprising that once a basic sequence of words is made conventional, it is easier for speakers to remain consistent with that choice than to use different strategies in different parts of the language. Therefore, if your language chooses Subject + Object + Verb, it will, ceteris paribus, also choose to put possessors before nouns (as in ‘John’s book’ vs ‘Book John’s’). This is because, just as the verb is the semantic core or nucleus of the sentence, the possessed noun is the nucleus of its own noun phrase. The general rule in such a language would be ‘put the nucleus last’. Yet because this decision is based only on efficiency, and because languages often do things inefficiently, this type of rule is frequently violated all over the world and we observe a good deal of variation in word order principles in different languages for this reason. Nor is there anything specific in the genetic endowment of Homo sapiens that would be expected to be linked to information structure. Topic-comment is a natural communicative arrangement. But many who talk about the implications of homesigns, supposedly revealing innate linguistic capacity, neglect to discuss the way that information should ordered for most effective communication. Failure to refer to this, however, misses the most natural explanation for the facts about homesigns and appeals instead to the highly implausible idea that language is innate.
Homesigning clearly illustrates the desire of all members of sapiens to communicate. And it shows solutions to this communication problem can be straightforward and easy to ‘invent’ and understand. Unfortunately for the claims of homesign researchers that language is innate, there is not even a convincing analysis of the facts of the grammar of these languages. But evidence suggests that gestures are sufficiently motivated by communicational needs that sign languages and gestures of all kinds simply emerge because they are so useful. Utility explains ubiquity.
* To say that this also means that we use gestures without learning them, as McNeill seems to suggest, is unwarranted.
† As stated, many researchers have speculated that gestures might have preceded speech in the evolution of human language. It is possible that gestures were first in some way, preceding language proper. Though I should have first expected yells along with gestures. Even McNeill does not disagree entirely with this position.
‡ More importantly, what marks the work of Goldin-Meadow and many others is what I consider to be an over-charitable interpretation of the linguistic aspects of the signs and a less charitable view of the cultural input the child receives as well as the nature of the task the child is facing. Absent a serious consideration of either the task or the input, such claims are severely weakened.