8 Embodied cognition and the sciences of the mind

Andy Clark

What is embodied cognition?

Cognitive science is the interdisciplinary study of the nature and mechanisms of mind. It seeks to explain how thinking, reasoning, and behaviour are possible in material systems. Core disciplines include psychology, neuroscience, philosophy, linguistics, and artificial intelligence. The last twenty years have seen an increasing interest, within cognitive science, in 'embodied cognition': in issues concerning the physical body, the local environment, and the complex interplay between neural systems and the wider world in which they function.

Work in embodied cognition provides a useful antidote to the increasingly 'neurocentric' (one might even say 'brain-obsessed') vision made popular in contemporary media. We often read about the discovery of some neural region active when navigating space, or falling in love, or riding a bicycle, as if it were itself solely responsible for the ability in question. But what we are really seeing is just one part of a complex web that may include crucial contributions from other brain areas, from bodily form and actions, and from the wider environment in which we learn, live, and act.

In this chapter, we adopt this broader perspective, and examine various ways in which cognitive functions and capacities are grounded in distributed webs of structure spanning brain, body, and world. Such distributions of labour reflect the basic fact that brains like ours were not evolved as 'platform-neutral' control systems: they are not evolved, that is to say, as organs for the control of just any old bodily form. Instead, the brain of a biological organism is geared to controlling that organism's distinctive bodily form and to supporting the distinctive actions that its lifestyle requires. Think of the huge differences in form and lifestyle separating you, a spider, and an octopus! The brain of each of these organisms co-evolved with their distinctive bodily forms, and did so in their own distinctive environments. The best way to get the flavour of these new approaches is by considering some examples, starting with the most basic (arguably non-cognitive) cases and working slowly 'upwards'.

Some examples

The bluefin tuna

Consider first the swimming know-how of the bluefin tuna. The bluefin tuna is a swimming prodigy, but its aquatic capacities - its ability to turn sharply, to move off quickly, and to reach such high speeds - have puzzled biologists. Physically speaking, so it seemed, the fish should be too weak (by about a factor of 7) to achieve these feats. The explanation is not magic, but the canny use of embodied, environmentally embedded action. Fluid dynamicists at MIT (Triantafyllou and Triantafyllou 1995) suggest that the fish use bodily action to manipulate and exploit the local environment (water) so as to swim faster, blast off more quickly, and so on. These fish find and exploit naturally occurring currents so as to gain speed, and use tail flaps to create additional vortices and pressure gradients, which are then used for quick take-off and rapid turns. The physical system, whose functioning explains the prodigious swimming capacities, is thus the fish-as-embedded-in, and as actively exploiting, its local environment.

Hopping robots

Next in line is a hopping robot. Raibert and Hodgins (1993) designed and built robots that balance and move by hopping on a single leg - basically, a pneumatic cylinder with a kind of foot. To get the hopper to locomote - to move, balance, and turn - involves solving a control problem that is radically impacted by the mechanical details, such as the elastic rebound when the leg hits the floor. The crucial control parameters include items such as leg spring, rest length, and degree of sideways tilt. To understand how the robot's 'brain' controls the robot's motions involves a shift towards an embodied perspective. The controller must learn to exploit the rich intrinsic dynamics of the system.

Walking robots

Next, consider the thorny problem of two-legged locomotion. Honda's Asimo has been billed, perhaps rightly, as the world's most advanced humanoid robot. Boasting a daunting 26 degrees of freedom (2 on the neck, 6 on each arm, and 6 on each leg) Asimo is able to navigate the real world, reach, grip, walk reasonably smoothly, climb stairs, and recognize faces and voices. The name Asimo stands (a little clumsily perhaps) for Advance Step in Innovative Mobility'. And certainly, Asimo (of which there are now several incarnations) is an incredible feat of engineering: still relatively short on brainpower but high on mobility and manoeuvrability.

As a walking robot, however, Asimo was far from energy efficient. For a walking agent, one way to measure energy efficiency is by the so-called 'specific cost of transport', namely the amount of energy required to carry a unit weight a unit distance. The lower the number, the less energy is required to shift a unit of weight a unit of distance. Asimo rumbles in with a specific cost of transport of about 3.2, whereas we humans display a specific metabolic cost of transport of about 0.2. What accounts for this massive difference in energetic expenditure?

Where robots like Asimo walk by means of very precise, and energy-intensive, joint-angle control systems, biological walking agents make maximal use of the mass properties and biomechanical couplings present in the overall musculoskeletal system and walking apparatus itself. Wild walkers thus make canny use of so-called passive dynamics, the kinematics and organization inhering in the physical device alone. Pure passive-dynamic walkers are simple devices that boast no power source apart from gravity, and no control system apart from some simple mechanical linkages such as a mechanical knee and the pairing of inner and outer legs to prevent the device from keeling over sideways. Yet despite (or perhaps because of) this simplicity, such devices are capable, if set on a slight slope, of walking smoothly and with a very realistic gait. The ancestors of these devices are not sophisticated robots but children's toys, some dating back to the late nineteenth century: toys that stroll, walk, or waddle down ramps or when pulled by string. Such toys have minimal actuation and no control system. Their walking is a consequence not of complex joint-movement planning and actuating, but of basic morphology (the shape of the body, the distribution of linkages and weights of components, etc.). Behind the passive-dynamic approach thus lies the compelling thought that:

Locomotion is mostly a natural motion ol legged mechanisms, just as swinging is a natural motion of pendulums. Stiff-legged walking toys naturally generate their comical walking motions. This suggests that human-like motions might come naturally to human-like mechanisms.

(Collins et al. 2001, p. 608)

Collins et al. (2001) built the first such device to mimic human-like walking, by adding curved feet, a compliant heel, and mechanically linked arms to the basic design pioneered by MIT roboticist Ted McGeer some ten years earlier. In action the device exhibits good, steady motion and is described by its creators as 'pleasing to watch' (ibid., p. 613). By contrast, robots that make extensive use of powered operations and joint-angle control tend to suffer from 'a kind of rigor mortis [because] joints encumbered by motors and high-reduction gear trains ... make joint movement inefficient when the actuators are on and nearly impossible when they are ofF (ibid., p. 607).

What, then, of powered locomotion? Once the body itself is equipped with the right kind of passive dynamics, powered walking can be brought about in a remarkably elegant and energy-efficient way. In essence, the tasks of actuation and control have now been massively reconfigured so that powered, directed locomotion can come about by systematically pushing, damping, and tweaking a system in which passive-dynamic effects still play a major role. The control design is delicately geared to utilize all the natural dynamics of the passive baseline, and the actuation is consequently efficient and fluid.

More advanced control systems are able to actively learn strategies that make the most of passive-dynamic opportunities. An example is Robotoddler, a walking robot that learns (using so-called 'actor-critic reinforcement learning') a control policy that exploits the passive dynamics of the body. Robotoddler, who features among the pack of passive-dynamics-based robots described in Collins et al. (2005), can learn to change speeds, to go forward and backward, and can adapt on the go to different terrains, including bricks, wooden tiles, carpet, and even a variable speed treadmill. And as you'd expect, the use of passive dynamics cuts power consumption to about 1/10th that of a standard robot like Asimo. Passive-dynamics-based robots have achieved a specific cost of transport of around 0.20, an order of magnitude lower than Asimo and quite comparable to the human case. The discrepancy here is thought not to be significantly reducible by further technological advance using Asimo-style control strategies (i.e. ones that do not exploit passive-dynamic effects). An apt comparison, the developers suggest, is with the energy consumption of a helicopter versus an aeroplane or glider. The helicopter, however well designed it may be, will still consume vastly more energy per unit distance travelled.

Passive walkers and their elegant powered counterparts conform to what Pfeifer and Bongard (2007, p. 123 ) describe as a principle of ecological balance. This principle states:

first... that given a certain task environment there has to be a match between the complexities of the agent's sensory, motor, and neural systems ... second ... that there is a certain balance or task-distribution between morphology, materials, control and environment.

One of the big lessons of contemporary robotics is thus that the co-evolution of bodily form, physical mechanics, and control yields a truly golden opportunity to spread the problem-solving load between brain, body and world.

Learning about objects

Embodied agents are also able to act on their worlds in ways that actively generate cognitively and computationally potent time-locked patterns of sensory stimulation. In this vein Fitzpatrick et al. (2003) show how active object manipulation (pushing and touching objects in view) can help generate information about object boundaries. The robot learns about the boundaries by poking and shoving. It uses motion detection to see its own hand/arm moving, but when the hand encounters (and pushes) an object there is a sudden spread of motion activity. This cheap signature picks out the object from the rest of the environment. In human infants, grasping, poking, pulling, sucking and shoving create a rich flow of time-locked multi-modal sensory stimulation. Such multi-modal input streams have been shown (Lungarella and Sporns 2005) to aid category learning and concept formation. The key to such capabilities is the robot or infants capacity to maintain coordinated sensorimotor engagement with its environment. Self-generated motor activity, such work suggests, acts as a 'complement to neural information-processing' (Lungarella and Sporns 2005, p. 25) in that:

The agent's control architecture (e.g. nervous system) attends to and processes streams of sensory stimulation, and ultimately generates sequences of motor actions, which in turn guide the further production and selection of sensory information. [In this way] 'information structuring' by motor activity and 'information processing' by the neural system are continuously linked to each other through sensorimotor loops.

Vision

Or consider vision. There is now a growing body of work devoted to animate vision. The key insight here is that the task of vision is not to build rich inner models of a surrounding 3D reality, but rather to use visual information efficiently, and cheaply, in the service of real-world, real-time action.

Such approaches reject what Churchland et al. (1994) dub the paradigm of 'pure vision' - the idea (associated with work in classical AI and in the use of vision for planning) that vision is largely a means of creating a world model rich enough to let us 'throw the world away', targeting reason and thought upon the inner model instead. Real-world action, in these 'pure vision' paradigms, functions merely as a means of implementing solutions arrived at by pure cognition. The animate vision paradigm, by contrast, gives action a starring role. Here, computational economy and temporal efficiency is purchased by a variety of bodily action and local environment exploiting tricks and ploys such as the use of cheap, easy-to-detect (possibly idiosyncratic) environmental cues. (Searching for a fast-food joint? Look out for a certain familiar combination of red and yellow signage!)

The key idea in this work, however, is that perception is not a passive phenomenon in which motor activity is only initiated at the end point of a complex process in which the animal creates a detailed representation of the perceived scene. Instead, perception and action engage in a kind of incremental game of tag in which motor assembly begins long before sensory signals reach the top level. Thus, early perceptual processing may yield a kind of proto-analysis of the scene, enabling the creature to select actions (such as head and eye movements) whose role is to provide a slightly upgraded sensory signal. That signal may, in turn, yield a new protoanalysis indicating further visuomotor action and so on. Even whole body motions may be deployed as part of this process of improving perceptual pick-up. Foveating an object can, for example, involve motion of the eyes, head, neck and torso. Churchland et al. (1994, p. 44) put it well: 'watching Michael Jordan play basketball or a group of ravens steal a caribou corpse from a wolf tends to underscore the integrated, whole-body character of visuomotor coordination'. This integrated character is consistent with neurophysiological and neuroanatomical data that show the influence of motor signals in visual processing.

Vision, this body of work suggests, is a highly active and intelligent process. It is not the passive creation of a rich inner model, so much as the active retrieval (typically by moving the high-resolution fovea in a saccade) of useful information as it is needed ('just-in-time') from the constantly present real-world scene.

Towards higher cognition

A natural reaction, at this point, would be to suggest that these kinds of reliance upon bodily mechanics and the use of environment-engaging action play a major role in many forms of 'lower' cognition (such as the control and orchestration of walking, hopping, seeing and learning about objects) but that everything changes when we turn to higher matters: to reasoning, planning and reflection. Here, the jury is still out, but a case can be made that higher cognition is not quite as different as it may at first appear.

One very promising move is to suggest that embodied cognitive science might treat offline reason as something like simulated sensing and acting. A nice early example of this kind of strategy can be found in work by Maja Mataric and Lynn Stein, using the TOTO (I'll just treat this as a real name, Toto) robot. Toto used ultrasonic range sensors to detect walls, corridors and so on and was able to use its physical explorations to build an inner map of its environment, which it could then use to revisit previously encountered locations on command. Toto's internal 'map' was, however, rather special in that it encoded geographic information by combining information about the robot's movement with correlated perceptual input. The inner mechanisms thus record navigation landmarks as a combination of robotic motion and sonar readings, so that a corridor might be encoded as a combination of forward motion and a sequence of short, lateral sonar distance readings. The stored 'map' was thus perfectly formatted to act as a direct controller of embodied action: using the map to find a route and generating a plan of actual robot movements turns out to be a single computational task. Toto could thus return, on command, to a previously encountered location. Toto could not, however, be prompted to track or 'think about' any location that it had not previously visited. It was locked in the present, and could not reason about the future, or about what might be.

Metatoto (Stein 1994) built on the original Toto architecture to create a system capable of finding its way, on command, to locations that it had never previously encountered. It did so by using the Toto architecture offline so as to support the exploration, in 'imagination', of a totally virtual 'environment'. When Metatoto was 'imagining', it deployed exactly the same machinery that (in Toto, and in Metatoto online) normally supports physical interactions with the real world. The difference lay solely at the lowest-level interface: where Toto used sonar to act and navigate in a real world, Metatoto used simulated sonar to explore a virtual world (including a virtual robot body). Metatoto included a program that can take, for example, a floor plan or map and use it to stimulate the robot's sensors in the way they would be stimulated were the robot locomoting along a given route on the map The map can thus induce sequences of 'experiences', which are qualitatively similar to those that would be generated by real sensing and acting. This allowed Metatoto to profit from 'virtual experiences'. As a result, Metatoto could immediately find its way to a target location it had not actually (but merely virtually) visited.

Work on perceptual symbol systems (Barsalou 2009) likewise offers an attractive means of combining a dynamical emphasis upon sensorimotor engagement with a story about higher-level cognitive capacities. The idea is that conceptual thought is accomplished using what Barsalou calls simulators - in essence, distributed neural resources that encode information about the various sensory features of typical events, items, or scenarios. Thus, the simulator for 'beer', Barsalou suggests, will include a host of coordinated multi -modal information about how different beers look, taste and smell, the typical contexts in which they are found, what effects they have on the consumer, etc. Details of the recent past and the current context recruit and nuance the activity of multiple simulators so as to yield 'situated conceptualizations'.

Simulations and situated conceptualizations enable a system to repurpose the kinds of information that characterize bouts of online sensorimotor engagement, rendering that information suitable for the control of various forms of 'offline reason' (e.g. imagining what will happen if we turn on the cars ignition, enter the neighbour's house from the back door, request the amber ale, and so on). The construction and use of perceptuo-motor based simulators constitutes, Barsalou suggests, a form of 'conceptual processing' that may be quite widespread among animal species. In humans, however, the very same apparatus can be selectively activated using words to select specific sets of sensory memories - as when you tell me that you saw a giant caterpillar in the forest today. Language may thus act as a kind of cognitive technology that enables us to do even more, using the same simulator-based resources.

In thinking about 'higher' cognition and advanced human reason, it may also prove fruitful (see Clark 1997, 2008) to consider the role of physical and symbolic artefacts. Such artefacts range from pen and paper to smartphones and Google glasses, and they now form a large part of the environment in which our neural and bodily systems operate, providing a variety of opportunities for higher cognition to make the most of embodied actions that engage and exploit these robust, reliably available resources. Thus we may argue that just as basic forms of real-world success turn on the interplay between neural, bodily and environmental factors, so advanced cognition turns - in crucial respects - upon the complex interplay between individual reason, artefact and culture.

The simplest illustration of this idea is probably the use of pen and paper to support or 'scaffold' human performance. Most of us, armed with pen and paper, can, for example, solve multiplication problems that would baffle our unaided brains. In so doing we create external symbols (numerical inscriptions) and use external storage and manipulation so as to reduce the complex problem to a sequence of simpler, pattern-completing steps that we already command. On this model, then, it is the combination of our biological computational profile with the fundamentally different properties of a concrete, structured, symbolic, external resource that is a key source of our peculiar brand of cognitive success.

Hutchins (1995) gives a wonderful and detailed account ol the way multiple biological brains, tools (such as sextants and alidades) and media (such as maps and charts) combine to make possible the act of ship navigation. In Hutch ins" words, such tools and media:

permit the users to do the tasks that need to be done while doing the kinds of things people are good at: recognizing patterns, modeling simple dynamics of the world, and manipulating objects in the environment.

(Hutchins 1995, p. 155)

In short, the world of artefacts, texts, media (and even cultural practices and institutions), may be for us what the actively created whorls and vortices are for the bluefin tuna. Human brains, raised in this sea of cultural tools will develop strategies for advanced problem-solving that 'factor in' these external resources as profoundly and deeply as the bodily motions of the tuna factor in and maximally exploit the reliable properties of the surrounding water.

Recognizing the complex ways in which human thought and reason exploit the presence of external symbols and problem-solving resources, and unravelling the ways biological brains couple themselves with these very special kinds of ecological objects, is surely one of the most exciting tasks confronting the sciences of embodied cognition.

Simple vs radical embodiment: implications for the sciences of the mind

In addition to asking how far the embodied approach can go, we should also ask to what extent it is a truly radical alternative to more traditional views. To focus this question, it helps to distinguish two different ways to appeal to facts about embodiment and environmental embedding. The first, which I'll call 'simple embodiment', treats such facts as, primarily, constraints upon a theory of inner organization and processing. The second, which I'll call 'radical embodiment' (see Chemero 2009) goes much further and treats such facts as profoundly altering the subject matter and theoretical framework of cognitive science. Examples of simple embodiment abound in the literature. A good deal of work in interactive vision trades heavily in internal representations, computational transformations and abstract data structures. Attention to the roles of body, world and action, in such cases, is basically a methodological tool aimed at getting the internal data structures and operations right.

The source of much recent excitement, however, lies in striking claims involving what I have dubbed 'radical embodiment'. Visions of radical embodiment (see Chemero 2009) all involve one or more of the following claims:

  1. That understanding the complex interplay of bram, body and world requires new analytic tools and methods.
  2. That traditional notions such as internal representation and computation are inadequate and unnecessary.

And:

This is not the time or place to enter into this complex and occasionally baroque discussion (for my own attempts, see Clark 1997, 2014). My own guess, however, is that a great deal of higher cognition, even when augmented and helped along by the use of bodily actions and worldly props, involves the development and use of various forms of internal representation and inner model. These will, however, be forms of inner model and representation delicately keyed to the reliable presence of environmental structures and to the rich possibilities afforded by real-world action.

Conclusions

Embodied, environmentally embedded approaches have a lot to offer to the sciences of the mind. It is increasingly clear that, in a wide spectrum of cases, the individual brain should not be the sole locus of cognitive scientific interest. Cognition is not a phenomenon that can successfully be studied while marginalizing the roles of body, world and action.

But many questions remain unanswered. The embodied approach itself seems to come in two distinct varieties. The first (simple embodiment) stresses the role of body, world and action in informing and constraining stories that still focus on inner computation, representation and problem-solving. The second (radical embodiment) sees such a profound interplay of brain, body, and world as to fundamentally transform both the subject matter and the theoretical framework of cognitive science itself.

Chapter summary

Study questions

  1. In what ways can properties of body, world and action alter the tasks facing the biological brain?
  2. Is cognition seamless, displaying a gentle, incremental trajectory linking fully embodied responsiveness to abstract thought and offline reason?
  3. Could cognition be a patchwork quilt, with jumps and discontinuities and with very different kinds of processing and representation serving different needs?
  4. Insofar as we depend heavily on cultural artefacts (pen, paper, smart-phones) to augment and enhance biological cognition, what should we say about their origins? Didn't we have to be extra-smart just to invent all those things in the first place?
  5. Is there room here for some kind of 'bootstrapping' effect in which a few cultural innovations enable augmented agents to create more and more such innovations? What might that process involve?
  6. What role does language play in all this? Do words merely reflect what we already know and understand, or does the capacity to create and share linguistic items radically alter the way the brain processes information? In other words, to borrow a line from Daniel Dennett, do words do things with us, or do we merely do things with words?
  7. Does taking embodiment seriously require us to re-think the tasks and tools of the sciences of mind?

Introductory further reading

Barsalou, L. (2009) 'Simulation, situated conceptualization, and prediction, Philosophical Transactions of the Royal Society of London: Biological Sciences 364: 1281-9.

Chemero, T. (2009) Radical Embodied Cognitive Science, Cambridge, MA: MIT Press. (A defense of the claim that taking embodiment seriously deeply reconfigures the sciences of the mind.)

Clark, A. (1997) Being There: Putting Brain, Body and World Together Again, Cambridge, MA: MIT Press. (Another useful introductory text, reviewing work on embodied cognition from a slightly more philosophical perspective.)

Clark, A. (2014) Mindware: An Introduction to the Philosophy of Cognitive Science, 2nd edn, Oxford: Oxford University Press. (General introduction to the field of 'philosophy of cognitive science'.)

Collins, S. H., Ruina, A. L., Tedrake, R. and Wisse, M. (2005) 'Efficient bipedal robots based on passive-dynamic walkers', Science 307: 1082-5.

Hutchins, E. (1995) Cognition in the Wild, Cambridge, MA: MIT Press. (A wonderful exploration of the many ways human agents, practices, and artefacts come together to solve real-world problems.)

Pfeifer, R. and Bongard, J. (2007) How the Body Shapes the Way We Think, Cambridge, MA: MIT Press. (Excellent introductory text written from the standpoint of work in real-world robotics.)

Triantafyllou, M. and Triantafyllou, G. (1995) An efficient swimming machine', Scientific American 272: 64-71.

Advanced further reading

Churchland, P., Ramachandran, V. and Sejnowski, T. (1994) 'A critique of pure vision', in C. Koch and J. Davis (eds) Large-Scale Neuronal Theories of the Brain, Cambridge, MA: MIT Press.

Clark, A. (2008) Supersizing the Mind: Embodiment, Action, and Cognitive Extension, Oxford: Oxford University Press. (A defense of the claim that bioexternal resources can form part of the human mind.)

Collins, S. H., Wisse, M. and Ruina, A. (2001) A three-dimensional passive-dynamic walking robot with two legs and knees', International Journal of Robotics Research 20 (7): 607-15.

Fitzpatrick, P., Metta, G., Natale, L., Rao, S. and Sandini, G. (2003) 'Learning about objects through action: initial steps towards artificial cognition', in Proceedings of the 2003 IEEE International Conference on Robotics and Automation (ICRA), 12-17 May, Taipei, Taiwan.

Lungarella, M. and Sporns, O. (2005) 'Information self-structuring: key principle for learning and development', Proceedings of the 2005 IEEE International Conference on Development and Learning, 19-21 July, Osaka, Japan, 25-30.

Raibert, M. H. and Hodgins, J. K. (1993) 'Legged robots in biological neural networks', in R. D. Beer, R. E. Ritzmann and T. McKenna (eds) Invertebrate Neuroethology and Robotics, San Diego: Academic Press, 319-54.

Robbins, P. and Aydede, M. (eds) (2009) The Cambridge Handbook of Situated Cognition, Cambridge: Cambridge University Press. (A useful resource including authoritative treatments of a variety of issues concerning embodiment, action, and the role of the environment.)

Stein, L. (1994) 'Imagination and situated cognition', Journal of Experimental and Theoretical Artificial Intelligence 6: 393-407.

Internet resources

A general tutorial:

Hoffmann, M., Assaf, D. and Pfeifer, R. (eds) (n.d.) Tutorial on Embodiment, European Network for the Advancement of Artificial Cognitive Systems, Interaction and Robotics [website], www.eucognition.org/index.php?page=tutorial-on-embo diment

A video about passive-dynamic walkers:

Amber-Lab (2010) 'McGeer and Passive Dynamic Bipedal Walking', YouTube, 18 October [video-streaming site] www.youtube.com/watch?v=WOPED7I5Lac

The current crop of Cornell robots:

Ruina, A. (n.d.) 'Cornell robots', Biorobotics and Locomotion Lab, Cornell University [website], http://ruina.tam.cornell.edu/research/topics/robots/index.php

The MIT Leg Lab site:

MIT Leg Laboratory (n.d.) 'Robots in Progress', MIT Computer Science and Artificial Intelligence Laboratory [website], www.ai.mit.edu/projects/leglab/robots/robots-main.html

More videos of real-world robots in action:

Department of Informatics (n.d.) 'Robots', Artificial Intelligence Lab, Department of Informatics, University of Zurich [website], www.ifi.unizh.ch/ailab/robots.html