As common in the more lively phases of scientific disciplines, the key concept of this book and of this chapter – embodied cognition – is not very well defined. Wilson (2002) has identified no less than six different meanings the concept has acquired in different writings, and this is arguably an underestimation. And yet, there certainly is quite some deal of overlap across approaches and authors, especially with regard to the shortcomings in cognitive theorizing that the concept is meant to overcome. As Wilson (2002, p. 625) summarizes,
There is a growing commitment to the idea that the mind must be understood in the context of its relationship to a physical body that interacts with the world.… Hence human cognition, rather than being centralized, abstract, and sharply distinct from peripheral input and output modules, may instead have deep roots in sensorimotor processing.
What is much less clear, however, is what that might mean and to which timescale it refers. In other words, does that really require that any cognition must always be accompanied by sensorimotor activity (however that might be defined), that the ontogenetic emergence of cognition relies on sensorimotor activity, or that the phylogenetic roots of the architecture of human cognition reflect the human ability to perceive and act?
Commitments to one or another of these possibilities are more frequent than are straightforward justifications and solid empirical support, which is likely to create islands of research that are dogmatically shielded from the mainstream of cognitive research. If and to the degree that the embodied-cognition movement has really put a finger on something important, the mainstream would strongly benefit from taking the main message of the movement on board, and mutual ignorance would be unfortunate for both sides. As I will argue, it is neither necessary nor helpful to accept or even embrace the strategy of embodied-cognition approaches to consider embodied cognition and cognitivistic models as mutually exclusive alternatives. More useful seems to be a Marxian strategy to treat embodied cognition as a challenging, interesting antithesis that would need to be synthesized with (and not just added to) more traditional cognitive approaches to reach a new level of scientific insight. To make that more concrete, I will address key claims of embodied-cognition approaches and show how they can be integrated with a cognitivistic approach, if only some ideological overhead is dropped. Obviously, such an integration requires mutual agreement on the basic assumption that sensorimotor processing is important for human cognition – that cognition is embodied. This rules out models and accounts based on the traditional artificial intelligence (AI) assumption that cognitive units are necessarily symbolic in nature. Fortunately, however, traditional AI has had hardly any impact on most fields in cognitive psychology and the cognitive neurosciences (a fact that is commonly overlooked by anticognitivistic theorists). In fact, cognitive theories assuming that cognition is embodied have been developed long before the embodied-cognition movement appeared on the scene. In this chapter, I will focus on the arguably most comprehensive of those theories: the Theory of Event Coding (TEC; Hommel et al., 2001a), which is rooted in the highly cognitivistic ideomotor approaches of Lotze (1852), Harless (1861), and James (1890) and yet embraces the idea that human cognition emerges from sensorimotor processing. In the following, I will first discuss the basic assumptions underlying TEC with an emphasis on the embodiment of cognition. Then, I will go through all six of the major claims that Wilson (2002) has identified as cornerstones of the embodied-cognition movement and discuss whether and to what degree these claims are met by TEC.
Almost all major approaches in the cognitive sciences consider actions the consequence of stimuli, both in their analysis of human cognition (which starts with the presentation or the processing of a stimulus and ends with some higher-level cognitive process, decision-making, or action) and in their attribution of the ultimate cause of the resulting mental or overt action. This applies to the most frugal versions of behaviorism (e.g., Watson, 1913) just as well as to the most complex cognitivistic information-processing models (e.g., Neisser, 1967). A major exception to this rule is ideomotor theory. This approach has long-standing roots in philosophy (Stock & Stock, 2004) and has been particularly popular in the beginnings of academic psychology – before American behaviorism took over and major figures of the movement took efforts to ridicule the approach (Thorndike, 1913). As the first versions of ideomotor theory were based on introspective insights rather than behavioral analysis (James, 1890), the theoretical approach has a strong first-person flavor to it. Accordingly, the agent under analysis is not considered a stimulus-driven being but a person carrying out actions in order to reach particular goals (which is more consistent with our inner, phenomenal view on our actions). Hence, the scientific analysis does not start with stimuli but with the current goal, which is assumed to trigger the execution of movements suited to reach it.
Given the absence of any conscious access to our motor system, so the assumption that ideomotor theorists share, how is it then possible that goals can activate just the right motor patterns? To account for the human ability to translate ideas (about wanted action effects) into motor acts, ideomotor theory assumes an automatic action-effect association/integration mechanism that picks up all the perceived consequences of our movements (the representations of action effects) and binds them to the currently active motor pattern – action-effect acquisition (Elsner & Hommel, 2001; Hommel & Elsner, 2009). The idea is that the resulting association between motor patterns and action-effect representations is bidirectional, so that activating one component of this pair tends to activate the other, a kind of spreading of activation. This provides the basis for voluntary action: The agent then only needs to “think of” (i.e., endogenously activate) the representation of a wanted action effect to activate the motor pattern needed to produce that effect. Numerous behavioral, developmental, and neuroscientific studies (for overviews, see Hommel, 2009; Shin, Proctor & Capaldi, 2010) have provided solid evidence that ideomotor mechanisms exist from the first year of age on (Verschoor, Spapé, Biro & Hommel, 2013), that people do pick up action effects automatically (Elsner & Hommel, 2001), that they associate representations thereof with the corresponding motor patterns in a bidirectional fashion (Melcher et al., 2008), and that they endogenously activate action-effect representations before acting (Kühn et al., 2011).
The ideomotor mechanism has been built into the Theory of Event Coding (Hommel et al., 2001a; Hommel, 2009), which combines it with assumptions about how perception and action interact and how perceptual and action events are represented. Most essential for present purposes is the claim that perception and action are two concepts that refer to the same process. According to ideomotor logic, an action can be described as the goal-directed production of perceptual input (the action effect[s]) through motor activity. As has been pointed out by Dewey (1896), the same description applies to perception. There is in fact hardly any interesting input that an active agent is picking up that has not been actively produced by that agent. This is particularly obvious for touch: Bringing one’s sensors in contact with some surface does not produce any information regarding the texture of the surface, its rigidity, and other relevant features – apart from its mere presence. Rather, it is the systematic, goal-directed movement of one’s fingers across a surface, and the pressure exerted on it, that produce the sought-for information. The same holds for vision, as it is the agent who determines by means of body, head, and eye movements which light waves are hitting her retina. And similar scenarios can be developed for the other senses as well. Hence, what we call perception is the goal-directed production of perceptual input, even if these goals can sometimes be as vague as curiosity, wanting to find out what is going on. If so, perceiving and acting is basically the same kind of process.
To summarize, TEC claims that the units of cognition are sensorimotor in nature, as they link the codes of features of perceptual events to motor patterns that have generated (changes in) such features in the past. The set of potential goals is not assumed to come with the agent’s hardware but is assumed to emerge through the continuous pickup of self-produced events and of the means to produce them. By moving in this world, we learn how we can change it. And by distinguishing between what happens through us or without us we learn who we effectively are, which means that TEC provides the mechanisms creating our minimal self (Hommel, 2013; Hommel, Colzato & van den Wildenberg, 2009). Taken all together, TEC can thus be considered a cognitivistic approach that not only assumes that, but also explains how and in which way, human cognition is embodied. In the following, I will discuss how this approach relates to both the cognitivistic approach that has been criticized and challenged by embodied-cognition approaches and the embodied-cognition approaches that have been presented as alternatives for the cognitivistic approach. To anticipate, these comparisons will reveal that some embodied-cognition approaches are not serious competitors when it comes to explaining about 90 percent of human cognition, while others insist in an ill-justified anticognitivistic attitude that stands in the way of theoretical developments. I will conclude that cognitivistic and embodied-cognition approaches are not necessarily incompatible and that indeed their integration would be most fruitful. Finally, I will argue that TEC provides an excellent basis for this integration.
A major motivation for developing the embodied-cognition idea was the failure to build truly flexible, intelligent robots (e.g., Brooks, 1999; Clark, 1997). The culprit responsible for this failure was considered to be the dominant artificial intelligence approach in cognitive robotics, which was based on the conviction that cognition consists in manipulating abstract, disembodied symbols for the purpose of creating models of the world. Embodying these symbols, or even getting rid of them to leave more room for online sensorimotor interactions, so the idea goes, could make robots smarter, faster, and much more flexible. A related implication would be that, if our body and the way it constrains our sensorimotor interactions with the world really affects our cognition, it would be unreasonable to believe that robots can show signs of human intelligence if they do not look like humans. In other words, only humanoid robots should be able to demonstrate human intelligence.
Whether this is true and whether less (cognition) can be more (of intelligence) in cognitive robotics seems to be an empirical issue. While some research groups still favor traditional artificial intelligence (AI) approaches, others began to rely more on online sensorimotor processing (Pfeifer & Bongard, 2006). The psychological community, if sufficiently interested, could simply wait and see who is producing the smarter robots, which would reveal the better approach. Moreover, the symbol-heavy AI preferred by many robotics researchers has very little impact in most areas of cognitive psychology and the cognitive neurosciences, perhaps with the exception of reasoning and language studies. And yet, there is the theoretical concern that what holds for robots might also apply to humans. Even though it is difficult to see why that should be the case (as, if the embodied-cognition approach really holds, concluding from machines to biological organisms with very different, continuously changing bodies should be unreasonable), various authors have used the popularity of embodied-cognition approaches to be skeptical about the usefulness of assuming cognitive codes and mechanisms (e.g., Brooks, 1991; Wilson & Golonka, 2013). This skepticism has a longer tradition dating back to behaviorism (Watson, 1913), ecological psychology (Gibson, 1979), and evolutionary psychology (Tooby & Cosmides, 2005), and indeed many of the ecological and evolutionary arguments and favorite findings have resurfaced in embodied-cognition approaches (an excellent example is Wilson & Golonka, 2013). According to proponents of the various strands of all these skeptical movements, it is a particularly pressing problem of cognitivistic approaches that they (a) assume the existence of mental representations that (b) are then taken to explain behavior. Given that TEC assumes the existence of internal representations and that it claims that such representations are involved in producing actions, it is thus worthwhile to consider whether and how this is a problem that might undermine TEC’s contribution to understanding human cognition and its embodiment.
Figure 4.1
James’s (1890) neural model of acquiring ideomotor control.
Source: Taken from James (1890, p. 582).
Figure 4.1 shows how William James thought of the ideomotor mechanism. The idea is that the acquisition of voluntary action is preceded by motor babbling, as it is sometimes called, which would consist in the random firing of motor neurons, here referred to as M. Given the hardware of the biological agent, the activation of some motor neurons would activate particular muscles, which again would activate receptors that are sensitive to the changes in the body, the environment, and the body-environment relationship that the muscle movement would bring about. In the figure, one of the neurons picking up these action-effects is K, which stands for a neuron sensitive to kinesthetic information, but there will also be other sensory neurons picking up other (e.g., visual) aspects of action effects, such as S. This allows the agent to register the sensory consequences of her own actions. The main assumption that ideomotor theory makes is that the overlap of firing of motor neurons and sensory neuron (e.g., of M and K in the example) creates a bidirectional association between these neurons – an example of Hebbian learning. If so, the motor neuron can be activated by activating its sensory counterparts. As sensory neurons can be activated endogenously (e.g., by actively imagining an event coded by these neurons), this provides the agent with the possibility of carrying out voluntary actions by just “thinking of” the wanted consequences. Indeed, asking a person to carry out an action that is producing particular consequences has been shown to induce the activation of the neural codes of these consequences quite some time before execution begins (Kühn et al., 2011).
On the one hand, it is clear that this approach assumes that external events are represented in the human brain/mind, in the sense that there are internal units that become active whenever the agent is facing the respective external event. There is too much neuroscientific evidence to doubt that such units exist for many sensory features: visual feature maps code for color, shape, orientation, and motion, up to faces and houses, auditory feature maps code for pitch and intensity, and so forth (e.g., Knierim & Van Essen, 1992). Under suitable conditions (including sufficient attention, stimulus intensity, etc.) the presence of the external event will unavoidably reactivate the respective neural unit, and this will predict the phenomenal experience and the reaction of the agent. Moreover, activating the unit by other means, such as willful imagination or transcranial magnetic stimulation (e.g., Cattaneo et al., 2009), will have very similar consequences as actually perceiving the event. It is difficult to see why it would be wrong or incorrect to call such a unit a representation, much like a thermometer represents the temperature it is exposed to. So, on the other hand, there does not seem to be any reason why assuming such a rather simple mechanism should be too opaque an idea to account for aspects of human cognition – especially if it has received considerable empirical support.
The ideomotor approach is also guilty with respect to the second objection from anticognitivists: It explains behavior by referring to the activation of representations. For instance, once an agent has acquired a bidirectional association between, say, pressing the “q” key on the keyboard on the one hand and having a kinesthetic experience of the keypress and sensing the letter “q” on a nearby screen on the other, she is assumed to activate the former by activating the sensory representation of the latter. Hence, in the terminology of James, the intentional activation of M will be preceded and causally produced by the activation of K (or V in the visual case). Obviously, one may want to extend the causal chain to even earlier relay stations and know, for example, why and for which purpose the agent decided to press this particular key on this particular location. As neither TEC nor classical ideomotor theory address motivational issues, an answer would fall out of their scope. And yet, the part of the causal chain that the ideomotor approach does address rests on the assumption that internal activations that have been correlated with the presence of external events can be used to activate motor patterns. As pointed out above, there is ample evidence that agents do anticipate the outcomes of their actions before executing them, which is consistent with the ideomotor assumption, even though one still would like to see more evidence in favor of the assumed causality between anticipation and action. But, apart from this empirical issue, it is difficult to see why it would be wrong and misguided to assume that the activation of some internal codes can lead to the activation of other internal codes and that this eventually leads to overt behavior.
One of the six claims that Wilson (2002) considers the cornerstones of the embodied-cognition approach is that cognitive activity is always situated, that is, always takes place in a particular context. This claim relates to a broader philosophical/pedagogical approach that assumes that knowing is inseparable from doing and therefore recommends learning-by-doing rather than the passive accumulation of knowledge (e.g., Greeno, 1998). Such an approach seems perfectly compatible with cognitivistic approaches that assume the existence of internal representations and an important role of action in their emergence. All it does is emphasize that the acquisition of these internal representations presupposes active agency and the experience of interactions with one’s environment – exactly as proposed by ideomotor theory and TEC.
In the context of cognitive robotics, the concept of situated cognition has assumed a different meaning, however (e.g., Clancey, 1997). It is often used to refer to the possibility that the situation an agent is facing provides quite a bit of information that the agent is therefore not required to store and retrieve, the agent can simply pick it up from her environment. Obviously, this blend borrows from Gibsonian ecological psychology and the assumption that environments provide affordances for the active perceiver, which can be used for perception and action control (Gibson, 1979; Michaels & Carello, 1981). And there is indeed strong evidence supporting that assumption. For instance, Milner and Goodale (1995) have collected behavioral, neurological, and neuroscientific evidence for the existence of two different visual processing streams in humans and other primates. Even though some aspects of these authors’ conclusions have been criticized (e.g., Glover, 2004) and led to a reformulation (Milner & Goodale, 2006), most researchers agree that there is a ventral processing stream devoted to object identification, planning, and other sorts of off-line processing as well as a dorsal processing stream (or even two dorsal streams (Binkofski & Buxbaum, 2013)) supporting online sensorimotor activities.
Even though the two processing streams are likely to interact to some degree, there is a consensus that the dorsal stream does not rely on memory and other sorts of long-term internal representations but rather keeps feeding fresh and continuously updated environmental information into the system to support and steer overt action. It is easy to see that this kind of online system meets all the criteria that situated-cognition and ecological-psychology proponents have formulated for the control of overt behavior (Michaels, 2000). It is also clear that cognitivistic approaches such as TEC do not have anything to contribute to our insight into the operation of this online system (as acknowledged by Hommel et al., 2001a, 2001b).
At the same time, however, it is also clear that the dorsal pathway alone is insufficient to generate goal-directed, planned behavior as we know it from human agents. In fact, most of our everyday actions rely on previously acquired knowledge about how to use the available tools to reach our goals – just think of using a computer, a coffee machine, a car, a mobile phone, or engaging in verbal communication and socially appropriate behavior. These kinds of actions often require planning ahead, which requires cognitive activities in the absence of situational cues. For the control of such activities, humans rely on their ventral processing stream and a cognitive system that is able to store, retrieve, and flexibly use off-line information. While ecological and situated approaches do not and cannot account for such activities by definition, it is the target of cognitive approaches like ideomotor theory and TEC. As explained elsewhere in more detail (Hommel, 2010, 2013), ideomotor action control is likely to define relatively abstract (but not necessarily symbolic) intended action effects, which then retrieve action schemata that are sufficiently specific to guarantee that the intended effects will be obtained but in need of online information before execution.
The emerging picture is thus that cognitivistic approaches tend to emphasize knowledge-dependent off-line processes while ecological and situated approaches emphasize environmentally driven online processes, which both need to be integrated to allow action to be goal-directed and context-sensitive at the same time. Accordingly, it makes little sense to put these approaches into opposition as dropping one at the expense of the other would not allow for the comprehensive understanding of human cognition and intelligent behavior.
The second of Wilson’s six cornerstones refers to the assumed fact that cognition is often under time pressure. The idea is that engaging in cognitive activities and thorough reasoning is particularly time costly and therefore unlikely to be the basis of everyday action. This idea has been used in the context of cognitive robotics to suggest dropping cognitive overhead to allow robots to meet real-time constraints (e.g., Pfeifer & Scheier, 1999). But it can also be found in the literature on reasoning, where Gigerenzer and colleagues (1999) have used it to support their claim that people often employ cognitive heuristics and shortcuts rather than full cognitive analyses of a problem. Along similar lines, Damasio (1994) has suggested that people often tag their actions with markers of their affective consequences (emotional action effects, so to speak), which allow them to take the action that “feels best” rather the one that is the most appropriate if time pressure is high.
One can argue whether time pressure is a real problem in humans. First, because there are very few situations in which we are facing inescapable time pressure that would not allow us to wait or ask for more time to allow for a fuller cognitive analysis. Even if we go back to the point in time when phylogeny might have established time-saving mechanisms, it is difficult to imagine that unavoidable time pressure was a frequent experience. Second, because even for the few situations where time might have been an issue, such as when facing a predator or enemy, nature has equipped us with reflexes that allow us to engage in fight or flight long before we cognitively grasp the situational demands. Hence, slow cognition and fast reflexes seem to be sufficient for humans to survive.
Moreover, it is interesting to see that even the researchers that agree that time pressure may sometimes be a problem strongly disagree with respect to the solution nature might have equipped us with. Whereas Gigerenzer and colleagues propose cognitive shortcuts, which are “fast and frugal” but nevertheless cognitive in nature, and Damasio assumes affective representations play a role, cognitive robot-icists take the same problem to argue against any contributions from cognitive mechanisms. What is more, the anticognitivistic attitude underlying this argument overlooks that one of the major advantages of having developed cognitive architectures of the sort that cognitivists assume is to allow for anticipatory preparation. In contrast to the implications of the situated-cognition approach, every voluntary action is preceded by a multitude of preparatory activities that partly rely on the situation and partly on memory, including increasing the general level of alertness (Kornhuber & Deecke, 1990), the focusing on spatial goal locations (Schneider & Deubel, 2002), the preparation for the processing of expected action effects (Kühn et al., 2011), and the preactivation of required effectors (Leuthold, Sommer & Ulrich, 2004). It is this preparation that ideomotor and TEC is trying to understand. Once an action has been sufficiently prepared, there does not seem to be any relevant cognitive activity involved in online control, as for instance visible in rapid online adjustments to unconscious goal changes (Prablanc & Pélisson, 1990). The more complex and extended actions become, the more cognitive processes engage in preparing the system to such a degree that environmental information is sufficient to drive the action to completion – a kind of prepared reflex (Hommel, 2000).
All this means that cognition is not for online control but for off-line preparation. If so, using the assumption that actions need to be fast to downplay the role of cognition is simply off target; in fact, it is the presence but not the absence of off-line cognition that allows online action to be fast.
The third claim discussed by Wilson refers to the assumption that the environment can serve as its own memory (e.g., Brooks, 1991). The idea is that the availability of environmental information may often make internal world models superfluous, which implies that the assumption that such world models exist may not be necessary. As pointed out in Wilson’s (2002) review, evidence supporting the claim that the world serves as its own model is exclusively coming from spatial tasks, and it indeed makes sense that spatial decisions consider the available spatial information.
And yet, there are two reasons why the offloading argument is not particularly strong. For one, hardly any available cognitive action-control approach assumes that people create complete models of their environment; most approaches do not even assume the existence of any model. In particular, neither historical ideomotor theorizing nor TEC claims the existence of anything like a situational or world model, which raises the possibility that the offload argument is aimed at some undefined strawman. For another, the offload argument suffers from the same limitations as the Gibsonian affordance approach: It is easy to see how a chair can afford sitting and thus provide information for controlling a sitting action, and the same holds for the affordance of grasping and similar actions. And yet, it is difficult to see how our environment can provide sufficiently constraining information for the control of the remaining 90 percent or so of everyday actions. Hence, there is little doubt that offloading cognitive work is a smart strategy, but it would be far-fetched to believe that it is sufficiently efficient to get rid of cognitive processes and cognitive representations.
The fourth claim considered by Wilson is that human cognition is not restricted to the mind and brain of an individual but that it involves the environment as well (e.g., Wilson & Golonka, 2013). Again, this is another renaissance of an actually much older theme, which, for instance, has also motivated interactionist approaches in personality psychology (e.g., Mischel, 1968). As pointed out by Wilson (2002), the claim actually consists of two parts: (1) that including the environment in analyses of human cognition provides more information than excluding it and (2) that excluding the environment does not allow for any interesting insight into human cognition in principle. It is easy to agree with the first part, as including more factors logically must increase the probability of finding more information, especially if the respective factor is defined as vaguely as in distributed-cognition approaches: Is it the immediate, perceivable environment; the social environment; the past environment; the envisioned environment; a virtual environment; and/or all environmental information obtained so far? The second part is more difficult to deal with, and I can imagine at least two different kinds of reply.
The first is metatheoretical in nature. The success of science relies on its ability to isolate phenomena and analyze phenomena in isolation, which always has something arbitrary and artificial to it; any attempt to analytically cut nature into pieces runs into the danger of overlooking important connections. For instance, many phenomena analyzed by sociologists (e.g., revolutions) include individual minds and brains but nevertheless cannot be comprehensively understood by looking into these minds and brains individually. Other phenomena (e.g., racial bias) also have sociological aspects but might be much easier to understand based on individual minds and brains. This is why researchers investigate the same phenomena from different angles, by using different methodologies, and by using different foci. Even though this will be likely to lead to different observations, it does not do justice to the history of science to find some of these observations more meaningful (or meaningless) than others by definition, as proponents of distributed cognition do (e.g., Wilson & Golonka, 2013). The scientific community determines the success of a theoretical approach by considering the number of predictions it generates and the number of times these predictions have been successfully tested. From this perspective, it is worrying that there are hardly any straightforward predictions that the distributed-cognition approach seems to offer and almost no empirical evidence supporting them. In fact, almost all of the few empirical observations that proponents of the distributed-cognition view have discussed so far are taken from studies that were conducted for other reasons than testing predictions motivated by the distributed-cognition hypothesis (e.g., Wilson & Golonka, 2013). This shows that even the few observations that distributed-cognition proponents find relevant did not require the distributed-cognition approach to make them.
The second possible response to the distributed-cognition challenge is more empirical in nature. The example of Milner and Goodale’s (1995) account of perception and action control, and of several models that followed (e.g., Glover, 2004), show that systematic neurocognitive research has provided evidence that humans and other primates integrate the results of off-line cognitive processing and online processing of environmental information (and how integration works), to generate visual experiences and control manual actions. Ideomotor theory and TEC focus more on the former than the latter (Hommel et al., 2001b), but the off-line architecture described by these approaches can be easily combined with the available knowledge about how environmental online information is fed into action control (Hommel, 2010). Hence, there are numerous, cognitivistic/neurocognitive approaches that provide strong empirical evidence that brain, body, and environment interact (and how they interact) to create intelligent behavior. Whether one may want to consider this evidence demonstrating distributed cognition may be a matter of semantic taste (as the term strangely implies that “the environment” would have the ability to perceive and recognize), but in any case the ignorance of environmental information on the cognitivistic side is much less pronounced than distributed-cognition proponents seem to suspect.
The fifth claim considered by Wilson is that human cognition evolved to support action. In a phylogenetic sense, this claim must be true, at least according to Darwin’s evolution theory. It may very well be that interesting insights and a large memory store have decreased the probability of lethal encounters faced by our ancestors (and they are likely to be still useful for that purpose), but the ultimate selection pressure was on the overt behavior – on action that is. Accordingly, the basic goal of developing a (rather heavy and energy-hungry) cognitive apparatus must have been ultimately driven by the possible improvements in the efficiency, accuracy, and speed of action. Like the other claims discussed in this chapter, this insight is not unique for the embodied-cognition approach. A dominant role of action in interacting with and even perceiving the world has been assumed by pragmatists (e.g., Dewey, 1896); behaviorists (e.g., Skinner, 1938); activity theorists (e.g., Vygotsky, 1962); motor theorists of speech perception (e.g., Liberman et al., 1967); attention theorists (e.g., Allport, 1987); and scientists interested in mirror neurons (Rizzolatti & Craighero, 2004), and this assumption represents the very core of TEC as well (Hommel et al., 2001a).
The considerable amount of research generated by the different versions of this claim demonstrates its strong heuristic force, but the question is what it implies for our understanding of human cognition. Phylogenetic arguments are likely to be relevant for analyses of both the neural and the functional architecture of cognition, which among other things has led to the dissociation of dorsal and ventral information-processing streams (as they differ in phylogenetic age). They also have motivated novel, empirically successful hypotheses about privileged connections between perception and action and about the impact of action on perception and attention (Hommel, 2010; Schütz-Bosbach & Prinz, 2007). However, phylogenetic arguments do not rule out the possibility that, once particular mechanisms were acquired, their owners have used them for other purposes as well. The same is true for an ontogenetic perspective: even if infants and children can be assumed to acquire cognitive skill and content through sensorimotor experience (an assumption that TEC decidedly shares), this does not mean that every single use of cognitive skill or content is accompanied by sensorimotor activity.
But this is what at least some proponents of embodied cognition seem to assume. For instance, Barsalou (2008) claims that the perception of evens requires the reactivation of sensory experience (perceptual simulation), while Gallese and Goldman (1998) claim that understanding observed actions relies on motor simulation of those actions. On the one hand, the idea that perceptual and action-related experience is stored in terms of sensorimotor codes rather than as abstract symbols certainly explains why evidence for such kinds of simulation could be demonstrated in several studies (for an overview, see Barsalou, 2008). On the other hand, however, even the most sensorimotor-devoted interpretation of embodied cognition does not require the existence of such simulations. Let me discuss two examples to justify this conclusion.
One relates to the Stroop effect, that is, the observation that people have a hard time naming the color of words if their meaning refers to a different color (e.g., if the word “red” is written in green ink; Stroop, 1935). This observation is commonly taken to imply that word processing is automatic and occurs even if we actually do not want to read. But, if that were the case, we would be unable to face any word in our environment without at least implicitly naming it – which should hardly allow us to produce coherent speech, say, in a library or near a newsstand. The fact that we do not suffer from problems of that sort suggests that it actually is the intention to name color words that makes our cognitive system vulnerable to color words, rather than some fully ballistic word-reading reflex, that makes us produce the Stroop effect. Indeed, there is evidence that adopting a particular intention primes those feature dimensions that are likely to be task relevant, such as spatial location if one intends to point or shape if one intends to grasp an object – a preparatory process that TEC refers to as intentional weighting (see Memelink & Hommel, 2013). This means that people do not always activate all the knowledge they have about a particular event, and they do not (fully) process all the features of that event that the current environment provides. Indeed, neuroimaging findings on expertise have shown that increasing expertise on a particular topic is associated with a decrease in neural activity (and of the activated area) when using this expertise for judgments (e.g., Petrini et al., 2011). Hence, it is unlikely that people activate all their sensory representations related to a particular event in every task; they rather seem to tailor the amount of activation to the task demands. If so, it may very well be that some perceptual events can be successfully processed without any evidence for sensory or action-related stimulation.
The other example refers to the acquisition of stimulus categories. In nonsymbolic approaches, this acquisition is often modeled as the detection of systematic relationships between features or feature configurations. In parallel distributed processing models (Rumelhart, McClelland, & the PDP Research Group, 1986) and other network models (Bishop, 1995), this comes down to the discovery of which feature values and (if hidden units are permitted) which relationships between these values produce behavior (e.g., categorization decisions) that receives reward. Technically, this leads to a distinction between units representing the sensory features proper (which stand for feature codes as found in the primate visual cortex); units representing relationships between feature values (hidden units, roughly standing for configurational codes as found in the inferior temporal lobe); and the resulting “category” units (as found in the primate frontal cortex). All these units can be referred to as representations, as they stand for external events but can also be endogenously activated (e.g., through imagination). And yet, all these units are rigorously grounded in environmental information extracted through sensorimotor experience. Once they are sufficiently tuned by this experience, however, there is no reason why they should be acting in unison. That is, activating a category unit (i.e., those neurons that represent feature relations specific for a particular perceptual category) might prime the associated sensory units (the simulation process Barsalou, 2008, has in mind) under some conditions but not under other conditions. Hence, referring to and using sensorimotorically derived stimulus categories does not necessarily require sensory simulation.
TEC does not explicitly deny the existence of symbolic information, but it does not require such information either. Instead, it assumes that humans are able to register features of (self- or other-produced) events and to integrate the codes of these features into event files (Hommel, 2004). These event files are representations in the sense that they stand for events that do not need to be present to activate their files – an assumption that is necessary for any approach that is willing to deal with the human ability to imagine past and future events and to plan actions long before they can be executed. All representations TEC is claiming to exist are assumed to be acquired through sensorimotor experience and, thus, fully grounded. But TEC differs from other models of grounding or embodied cognition by denying that the grounding activity has to be repeated every time the representation is accessed and used for information processing. Likewise, it differs from mirror-neuron-inspired approaches (e.g., Gallese & Goldman, 1998) by denying that the understanding of other people’s actions necessarily requires the activation of one’s own motor system. Obviously, the integration of sensory feature codes and motor codes into event files makes it possible, and perhaps often likely, that observing another person’s action does spread activation from sensory to motor codes, but there is no reason to believe that this is necessary for understanding what the action might mean. In other words, TEC considers grounding activities relevant for the acquisition of new information but not for the later use of it. Moreover, TEC assumes that representations can be more complex than codes representing single features (which make the representations abstract without making them symbolic), and it claims that the components making up these more complex codes are weighted according to task relevance and intentions (Memelink & Hommel, 2013) – so that not all components need to be involved in representing a particular event all the time.
The sixth claim Wilson considers refers to the idea that cognitive structures or skills that emerged through and for sensorimotor interactions with the environment could be used off-line – in the absence of overt behavior – to support cognitive activities (e.g., Glenberg, 1997). As with the other claims, this claim also has a rather long history. In particular, Vygotsky (1962), Luria (1962), and Piaget (1977) have conceived of cognition as interiorized action – an idea according to which cognitive skills and procedures are simulations of what formerly had been the overt behavior of oneself or of others. A well-investigated application of this idea refers to self-regulation, which according to Vygotsky and Luria emerges from verbal self-instruction, which again develops by internalizing previous instructions from one’s social environment. Even though there is still no systematic theory that explains how such interiorization processes might work, Wilson’s (2002) review shows that there is very substantial empirical evidence that many cognitive tasks involve or at least benefit from mental action simulations. Additional evidence comes from research on the spatial allocation of attention, which seems to be controlled by simulating eye movements (i.e., programming eye movements without necessarily carrying them out; Schneider & Deubel, 2002), and from studies on task switching, which show that subvocal self-instruction speeds the implementation of a new mental set (Emerson & Miyake, 2003).
Even though TEC also fails to provide a systematic scenario of how interiorization works in detail, it does provide the necessary cognitive infrastructure. As pointed out above, the ideomotor principle is assumed to bind sensory codes representing external events with the motoric means to produce these events. Accordingly, the agent simply needs to specify the features of the wanted event, which activates the necessary motor codes. The creation of sensorimotor event files permits the construction of representations of event sequences, the kind of syntactical structure organizing event files in time (for a computational model of this process, see Kachergis et al., in press). The representation of a transition from one event to another can thus be considered to show how one can move from one situation to another. Under a situated condition, such representations can control overt action (like in coffee making, the example modelled by Kachergis et al., in press), but nothing bars using representations for mental action as well. In other words, event files, and relations between them, can guide both overt and covert problem-solving strategies.
As I have tried to show, the possible points of contact between the embodied-cognition movement and post-AI cognitivistic/neurocognitive approaches to human cognition are more frequent and richer than anticognitivist proponents of embodied cognition suggest. None of Wilson’s (2002) six claims of the embodied-cognition movement is theoretical incommensurable with cognitive and neurocognitive approaches, especially if some of the ideological and empirically unfounded overhead is dropped. As I have demonstrated for TEC, there is no reason to believe that a cognitive model is unable to address embodied cognition in principle but rather good reasons to believe that they may often do so in a more systematic, more mechanistic, and empirically better-supported way than quite a number of the available embodied-cognition projects.
More integration between cognitive and embodied-cognition approaches has various advantages. The advantage for the embodied-cognition movement is that cognitive models can provide more concrete mechanisms and theoretical scenarios than present embodied-cognition theorists to explain how cognition can be embodied and how this embodiment affects human cognition. The advantage for the cognitive side is that the embodied-cognition movement has helped understanding the limitations of symbolic computation and re-emphasized (after Gibsonian ecological psychology) the contribution of environmental information to perception and action control. I have tried to show that cognitivism and embodied-cognition arguments need to be integrated to create truly comprehensive models of human cognition. I do not think we are there yet but we are on the right track.
The preparation of this work was supported by the European Commission (EU Cognitive Systems project ROBOHOW.COG; FP7-ICT-2011).
1 This is an extended version of the opinion paper “The theory of event coding (TEC) as embodied-cognition framework”, published in Frontiers in Cognition, 6:1318.
Allport, A. (1987). Selection for action: Some behavioral and neurophysiological considerations of attention and action. In H. Heuer & A. F. Sanders (Eds.), Perspectives on perception and action (pp. 395–419). Hillsdale, NJ: Lawrence Erlbaum Associates.
Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–645.
Binkofski, F., & Buxbaum, L. J. (2013). Two action systems in the human brain. Brain & Language, 127, 222–229.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.
Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence, 47, 139–159.
Brooks, R. A. (1999). Cambrian intelligence: The early history of the new AI. Cambridge MA: MIT Press.
Cattaneo, Z., Vecchi, T., Pascual-Leone, A. & Silvanto, J. (2009). Contrasting early visual cortical activation states causally involved in visual imagery and short-term memory. European Journal of Neuroscience, 30, 1393–1400.
Clancey, W. J. (1997). Situated cognition: On human knowledge and computer representation. New York: Cambridge University Press.
Clark, A. (1997). Being there: Putting brain, body and world together again. Cambridge, MA: MIT Press.
Damasio, A. R. (1994). Descartes’ error: Emotion, reason and the human brain. New York: Putnam.
Dewey, J. (1896). The reflex arc concept in psychology. Psychological Review, 3, 357–370.
Elsner, B., & Hommel, B. (2001). Effect anticipation and action control. Journal of Experimental Psychology: Human Perception and Performance, 27, 229–240.
Emerson, M. J., & Miyake, A. (2003). The role of inner speech in task switching: A dual-task investigation. Journal of Memory and Language, 48, 148–168.
Gallese, V., & Goldman, A. I. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2, 493–551.
Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin.
Gigerenzer, G., Todd, P. M., and the ABC Research Group (1999). Simple heuristics that make us smart. New York: Oxford University Press.
Glenberg, A. M. (1997). What memory is for. Behavioral and Brain Sciences, 20, 1–55.
Glover, S. (2004). Separate visual representations in the planning and control of action. Behavioral and Brain Sciences, 27, 3–24.
Greeno, J. G. (1998). The situativity of knowing, learning, and research. American Psychologist, 53, 5–26.
Harless, E. (1861). Der Apparat des Willens. Zeitschrift fuer Philosophie und philosophische Kritik, 38, 50–73.
Hommel, B. (2000). The prepared reflex: Automaticity and control in stimulus-response translation. In S. Monsell & J. Driver (Eds.), Control of cognitive processes: Attention and performance (Vol. XVIII, pp. 247–273). Cambridge, MA: MIT Press.
Hommel, B. (2004). Event files: Feature binding in and across perception and action. Trends in Cognitive Sciences, 8, 494–500.
Hommel, B. (2009). Action control according to TEC (theory of event coding). Psychological Research, 73, 512–526.
Hommel, B. (2010). Grounding attention in action control: The intentional control of selection. In B. J. Bruya (Ed.), Effortless attention: A new perspective in the cognitive science of attention and action (pp. 121–140). Cambridge, MA: MIT Press.
Hommel, B. (2013). Ideomotor action control: On the perceptual grounding of voluntary actions and agents. In W. Prinz, M. Beisert, & A. Herwig (Eds.), Action science: Foundations of an emerging discipline (pp. 113–136). Cambridge, MA: MIT Press.
Hommel, B., Colzato, L. S., & van den Wildenberg, W.P.M. (2009). How social are task representations? Psychological Science, 20, 794–798.
Hommel, B., & Elsner, B. (2009). Acquisition, representation, and control of action. In E. Morsella, J. A. Bargh, & P. M. Gollwitzer (Eds.), Oxford handbook of human action (pp. 371–398). New York: Oxford University Press.
Hommel, B., Müsseler, J., Aschersleben, G., & Prinz, W. (2001a). The theory of event coding (TEC): A framework for perception and action planning. Behavioral and Brain Sciences, 24, 849–937.
Hommel, B., Müsseler, J., Aschersleben, G., & Prinz, W. (2001b). Codes and their vicissitudes. Behavioral and Brain Sciences, 24, 910–937.
James, W. (1890). The principles of psychology. Vol. 2. New York: Dover Publications.
Kachergis, G., Wyatte, D., O’Reilly, R. C., de Kleijn, R., & Hommel, B. (2014). A continuous time neural model for sequential action. Philosophical Transactions of the Royal Society B, 369, 20130623.
Knierim, J. J., & Van Essen, D. C. (1992). Visual cortex: Cartography, connections, and concurrent processing. Current Opinion in Neurobiology, 2, 150–155.
Kornhuber, H. H., & Deecke, L. (1990). Readiness for movement: The Bereitschaftspotential-Story. Current Contents Life Sciences, 33, 14.
Kühn, S., Keizer, A., Rombouts, S.A.R.B., & Hommel, B. (2011). The functional and neural mechanism of action preparation: Roles of EBA and FFA in voluntary action control. Journal of Cognitive Neuroscience, 23, 214–220.
Leuthold, H., Sommer, W., & Ulrich, R. (2004). Preparing for action: Inferences from CNV and LRP. Journal of Psychophysiology, 18, 77–88.
Liberman, A. M., Cooper, F. S., Shankweiler, D.P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74, 431–461.
Lotze, R. H. (1852). Medicinische Psychologie oder die Physiologie der Seele. Leipzig: Weidmann’sche Buchhandlung.
Luria, A. R. (1962). Higher cortical functions in man. Moscow: Moscow University Press.
Melcher, T., Weidema, M., Eenshuistra, R.M., Hommel, B., & Gruber, O. (2008). The neural substrate of the ideomotor principle: An event-related fMRI analysis. NeuroImage, 39, 1274–1288.
Memelink, J., & Hommel, B. (2013). Intentional weighting: A basic principle in cognitive control. Psychological Research, 77, 249–259.
Michaels, C.F. (2000). Information, perception, and action: What should ecological psychologists learn from Milner and Goodale (1995)? Ecological Psychology, 12, 241–258.
Michaels, C. F., & Carello, C. (1981). Direct perception. Englewood Cliffs: Prentice-Hall.
Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford: Oxford University Press.
Milner, A.D., & Goodale, M. A. (2006). The visual brain in action. 2nd ed. Oxford: Oxford University Press.
Mischel, W. (1968). Personality and assessment. London: Wiley.
Neisser, U. (1967). Cognitive psychology. Englewood Cliffs: Prentice-Hall.
Petrini, K., Pollick, F. E., Dahl, S., McAleer, P., McKay, L., Rocchesso, D., Waadeland, C. H., Love, S., Avanzini, F., & Puce, A. (2011). Action expertise reduces brain activity for audiovisual matching actions: An fMRI study with expert drummers. NeuroImage, 56, 1480–1492.
Pfeifer, R., & Bongard, J. (2006). How the body shapes the way we think: A new view of intelligence. Cambridge MA: MIT Press.
Pfeifer, R., & Scheier, C. (1999). Understanding intelligence. Cambridge, MA: MIT Press.
Piaget, J. (1977). The essential Piaget. H. E. Gruber & J. J. Vonèche (Eds.). New York: Basic Books.
Prablanc, C., & Pélisson, D. (1990). Gaze saccade orienting and hand pointing are locked to their goal by quick internal loops. In M. Jeannerod (Ed.), Attention and performance (Vol. XIII, pp. 653–676). Hillsdale: Erlbaum.
Rizzolatti G., & Craighero L. (2004). The mirror-neuron system. Annual Review of Neuro-science, 27, 169–192.
Rumelhart, D. E., McClelland, J. L., & the PDP Research Group (1986). Parallel Distributed Processing: Explorations in the microstructure of cognition. Volume 1: Foundations. Cambridge, MA: MIT Press.
Schneider, W. X. & Deubel, H. (2002). Selection-for-perception and selection-for-spatial-motor-action are coupled by visual attention: A review of recent findings and new evidence from stimulus-driven saccade control. In W. Prinz & B. Hommel (Eds.), Attention and Performance XIX: Common mechanisms in perception and action (pp. 609–627). Oxford: Oxford University Press.
Schütz-Bosbach, S., & Prinz, W. (2007). Perceptual resonance: Action-induced modulation of perception. Trends in Cognitive Sciences, 11, 349–355.
Shin, Y. K., Proctor, R. W., & Capaldi, E. J. (2010). A review of contemporary ideomotor theory. Psychological Bulletin, 136, 943–974.
Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Cambridge, MA: B. F. Skinner Foundation.
Stock, A. & Stock, C. (2004). A short history of ideo-motor action. Psychological Research, 68, 176–188.
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 28, 643–662.
Thorndike, E. L. (1913). Ideo-motor action. Psychological Review, 20, 91–106.
Tooby, J., & Cosmides, L. (2005). Conceptual foundations of evolutionary psychology. In D. M. Buss (Ed.), The handbook of evolutionary psychology (pp. 5–67). Hoboken, NJ: Wiley.
Verschoor, S. A., Spapé, M., Biro, S., & Hommel, B. (2013). From outcome prediction to action selection: Developmental change in the role of action-effect bindings. Developmental Science, 16, 801–814.
Vygotsky, L. S. (1962). Thought and language. Cambridge MA: MIT Press.
Watson, J. B. (1913). Psychology as the behaviorist views it. Psychological Review, 20, 158–178.
Wilson, A. D., & Golonka, S. (2013). Embodied cognition is not what you think it is. Frontiers in Psychology, 4, 58, doi:10.3389/fpsyg.2013.00058
Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin & Review, 9, 625–636.