Afterword: The Cambrian Explosion

The queen of sensors

While synthetic evolution of robots in the lab leaves a perfect digital record, biological evolution leaves few traces. What gave rise to the Cambrian explosion remains a mystery. Some attribute this sudden prehistoric diversification to the discovery of multicellularity itself. Others attribute it to an abundance of resources like oxygen, to the improvement of habitable conditions on earth, or to adaptive radiations following mass extinctions. Some ascribe part of it to the “discovery” of certain “enabling technologies” that unleased a wealth of new possibilities that were not feasible before.

One of the key “technologies” that appeared during the Cambrian explosion is eyesight. There is no evidence of eyes in any fossil record before the Cambrian era, but a wide diversity of eye organs followed it. In the Burgess Shale of the Middle Cambrian and even more in later shales, there are different types of eyes, adapted to the various conditions of the organisms that bore them—eyes with different acuity, different sensitivities to light levels, different wavelengths, and differences in the ability to detect motion and color.

As we noted in chapter 1, according to Andrew Parker’s “light switch” theory, the appearance of eyesight “technology” changed the nature of predator-prey encounters ² as well as chances of mating. Before eyesight, hunting and evading relied on close-range senses like smell, taste, vibration, and touch. But when predators could sense their prey from a distance using sight, new defensive and offensive strategies were needed, leading to a coevolutionary arms race. As the predator could see better, the pray needed to learn how to hide, how to run faster, how to camouflage or develop spines, each of these “new technologies” leading to further morphological diversification.

Whether or not eyesight played a key role in the Cambrian diversification we will never know for sure, but we can suggest one more hypothesis here: it was more than the eye itself that led to diversification, it was the development of the cognitive capacity that followed it.

Unlike touch, taste, and smell, visual information is “high bandwidth,” in spatial resolution as well as in temporal flow, leading to data rates that are substantially higher than other sensors. Because eyesight is also a long-distance sensor, it covers a broad swath of the world outside the organism, requiring new cognitive apparatuses for scene segmentation, spatial modeling, and an expanded world understanding. Perhaps the firehose of data that suddenly flooded early brains with eyes gave an advantage to those individuals with a slightly better cognitive capacity. With that extended cognitive capacity came numerous new opportunities: new predator-prey tactics, new mate finding strategies, and access to new resources.

The machinery that makes sense of the visual information dominates our brain. Each eye contains 150 million light sensors, whereas an average ear contains just 30,000 sound-sensitive neurons. Neurons devoted to processing visual information account for 30 percent of the cortex, compared to 8 percent for touch and 3 percent for hearing.³

Eyesight no doubt involved the gradual coevolution of the optical eye side-by-side with the evolution of the visual cortex. The neural apparatus involved in interpreting the visual scene probably quickly found new “applications,” leading to a cascade of biological innovations. First, eyesight may have contributed to cooperative evolution like symbiosis between organisms, such as the cooperation between bees and flowering plants. It may also have helped individuals find mates more easily from a distance. Initially, this advanced mate-finding sensor may have been used just to identify individuals of the same species. But as vision acuity improved, there was an advantage to distinguishing between more desirable and less desirable mates, probably leading to eventual behaviors involving sexual selection and other forms of social communication.

The analogy between the Cambrian explosion of biological life and the imminent explosion of robotic life is hard to miss. As Gill Pratt, former director of the DARPA robotics program, who was appointed head of Toyota’s autonomous vehicle division in 2015, writes:

Today, technological developments on several fronts are fomenting a similar explosion in the diversification and applicability of robotics. Many of the base hardware technologies on which robots depend—particularly computing, data storage, and communications—have been improving at exponential growth rates. Two newly blossoming technologies—“Cloud Robotics” and “Deep Learning”—could leverage these base technologies in a virtuous cycle of explosive growth.⁴

Indeed, a number of base technologies that are key to robotics are improving rapidly, and these technologies are enabling a potential and imminent diversification of forms of autonomous robotics.

Exponential improvement in power storage and efficiency Autonomous robots need to be autonomous in power. Over the last few decades, battery technology has improved substantially, from the lead-acid batteries of the 1950s to three-fold improvement seen in lithium polymer technology today.⁵ Battery capacity aside, even greater power gains are obtained from improved power efficiency of robot technologies, such as the exponential improvement of computing cycles per watt of power, and the improvement in motor efficiency. The combination of improving energy storage and improving energy efficiency led to the acceleration of total energy performance of autonomous systems. Robots with better power performance can spend more time doing and learning, and less time charging or looking for power.
Exponential improvement in computational power As Moore’s law predicted, the amount of computing power available per dollar continues to double every eighteen months or so. While transistor miniaturization has slowed down in recent years as a result of physical limitations, computing power per dollar continues to grow through other means such as parallelization of multiple cores. Computing power is essential to an autonomous system that needs to process streaming data locally and make decisions in real time. Faster processing allows robots to roam around in less structured environments, and to learn faster from their experiences.
Exponential improvement in sensor technology Across a range of technologies from lidar to sonar, sensors are becoming more precise, higher in bandwidth, and cheaper. One of the most rapidly improving sensors along all these dimensions is the video camera. Driven by mobile devices, camera technology has improved performance and price at an exponential rate. The cost, size, power, and performance of optics and sensors allow multiple cameras to be mounted on a single robot. Multiple data streams allow for better cognitive performance, because more reliable scene understanding can be obtained from multiple viewpoints (e.g., depth perception and velocity perception from super-stereo vision), as well as more physical robustness to damage or temporary sensory blinding.
Exponential improvement in data storage The ability to store data is improving at an exponential rate. This improvement affects not just how many bytes can be stored per dollar, but also the speed and reliability of data storage and retrieval, the energy per transaction, and the physical weight of memory (bytes per kilogram). When robots can efficiently store lots of data locally, they can recall and reuse prior experiences, extract new knowledge from past stored experiences, and store and use maps and other high-definition information about the world.
Exponential improvement in communication bandwidth Both short-range and long range bandwidth have been improving exponentially over the last few decades. Just a few decades ago sending information was slow, difficult, expensive, and unreliable. Today we send terabytes across the planet and we don’t think twice about whether the information will arrive intact. The ability to communicate long distance and reliably allows robots to share data and the results of its local analysis with other robots, leading to a combined shared intelligence known as cloud robotics.

The king of exponentials—algorithms

It is tempting to point at these hardware improvements in computation, communication, and sensing as the underlying root to the exponential takeoff of robotics technology, but all too often we forget the improvements resulting from the discovery and invention of new algorithmic techniques themselves.

Among computer scientists and electrical engineers there is a saying that whatever improvements hardware engineers will come up with, software engineers will waste away immediately (the actual wording is a bit spicier, but not fit for print). We all know that no matter how fast computers have become, their operating system software always runs too slowly. But the truth is quite the contrary.

Unlike processing speed per year or megapixels per dollar, algorithm improvement is difficult to quantify over long periods of time because algorithms are so diverse in the tasks they perform, and their goals are a rapidly moving target. But to take just one example, consider algorithms for solving differential equations. Such mathematical algorithms are a key component of any robot that needs to predict and control motion over time. Between 1945 and 1985, the algorithms for performing this fundamental task have improved by a factor of 30,000, or an average of 29 percent per year.⁶ This improvement rate is on a par with that of the underlying hardware during the same period.

Another example is the improvement of algorithms for data analysis. The classical fast Fourier transform (FFT) algorithm that is used ubiquitously in any sign-processing system has led to orders of magnitude of speedup in performance compared to the original algorithm. By exactly how much did the algorithm improve speed? It turns out that the factor of improvement depends largely on the size of the dataset to be analyzed. For small datasets the improvement is small, but for large datasets, the improvement is so substantial that it would take decades of hardware improvements to catch up.⁷ Because of algorithmic improvements we can now analyze data in a way that otherwise would not have been feasible even if Moore’s law kept on going for a hundred years.

Unlike hardware improvements that seem to be relatively smooth exponentials, improvements in algorithms are more similar to the “punctuated equilibria” observed or conjectured for evolutionary systems. Algorithms do not improve smoothly; they improve by fits and starts. Like biological ecosystems, an algorithmic improvement needs to take hold in a market of competing algorithms. Some algorithms get invented and then die in academic obscurity. Other algorithms make it big only to become extinct shortly after, when a better algorithm eats their lunch, or when a problem they solve is no longer relevant. An algorithm may need to be reinvented multiple times by different people, until it finally makes its way, by fortuitous circumstances, to global recognition.

Artificial-intelligence algorithms have been no exception to this trend. In fits and starts over a century, AI algorithms have improved and regressed, come into fashion and gone out of vogue. But regardless of whatever camp of artificial intelligence is currently in the lead, AI algorithms have improved dramatically over the decades. We know that no amount of processor speed, data storage, or camera resolution would allow Rosenblatt’s original Perceptron to reliably differentiate between a cat and a dog. We now know that no amount of computation would allow even a 1990s standard two-layer neural network to succeed reliably in that task, or a support vector machine from 2010, either. It took an ecology of competing algorithms in the Large Scale Visual Recognition Challenge to eventually allow one particular algorithm to rise to the top. Like mammalian critters hiding between the rocks, convolutional neural networks eventually rose to outperform the traditional AI dinosaurs.

The jolt felt in the AI community as deep-learning algorithms demonstrated their prowess was a visceral instance of a punctuated evolutionary process making one of its step transitions. Perhaps the automotive industry felt a similar jolt as the industry transitioned from hardware to software.

The cascade of improving algorithms

We will never know what role the eye played in catalyzing the development of the brain, but we do know that intelligence spread far beyond the visual cortex—from recognizing predators, prey, and potential mates, to the full-blown communication and self-awareness you are using to read this book. Similarly, we do know that the deep-learning algorithms that were initially developed for visual perception are now finding their way into many other areas of AI, from speech recognition to language generation and even artistic creativity. And we can only assume that this trend will continue.

How far this trend can continue and what is its endpoint is a topic for both science fiction writers and philosophers to speculate on. If we try to use raw hardware power as a baseline to predict AI progress, then predictions seem to converge on the 2020s as the decade when computing power meets the calculation power of the brain.⁸ To us, however, that prediction always remained somewhat unsatisfying. We really wanted to know when computers will be as smart as humans in their behavior, not in their raw computing power. The problem is that this kind of prediction is much harder to make.

The ability of computers to develop something akin to self-awareness, or consciousness, is not just a matter of better hardware—it requires a different kind of algorithm. And while we don’t know exactly what self-awareness is, we do know that it is more amorphous than the ability to play chess or drive a car, and therefore it is unlikely to be directly programmed by a genius software developer, as most sci-fi movies like to portray.

Instead, self-aware machines well develop slowly and gradually. What is self-awareness? Let’s take the practical definition, that it is “merely” the ability to “simulate oneself”—to predict the future consequence of current actions, without having to perform those actions in physical reality. Can you imagine yourself walking on the beach tomorrow? Can you smell the ocean and feel the sand? Is this feeling good enough to for you to consider action? If so, you are self-aware. One can argue that even sentient emotions, like fear and joy, are mere projections of future consequences onto our current state, based on learned past experiences. For example, while “pain” may signify current damage, “fear” may signify a high likelihood of grave imminent damage, whereas “worry” may reflect a less severe and more distant negative consequence predicted by the internal self-model.

If a robot is able to predict what it will sense in the future based on actions it takes now, and then use that predictive model to plan its future actions, then to some degree it is also self-aware. In 2006, we demonstrated a robot that was able to construct an image of itself—a sort of primitive stick-figure that was not very accurate, but was good enough to learn how to walk with no physical trials or external programming. But our robot’s self-image hit the limit of perception and the prediction algorithms available at the time.

Perhaps, as deep-learning algorithms worm their way into all AI applications, we will begin to see new generations of robots that form increasingly accurate models of themselves and their surroundings, gradually approaching self-awareness.

A self-aware car will not meet you in the driveway and joke about the road conditions. Nor will it be genuinely interested in your feelings. But a self-aware car will have an increasingly accurate model of how it drives, and how you would like it to drive—what it can and can’t do, and what the risks and benefits of each of its possible actions are. And just as our own self-awareness extends beyond ourselves to ascribe feelings and intentions to others, a future driverless car may be able to predict what other cars on the road are likely to be planning.

Let’s estimate that visual perception appeared about 50 million years after multicellularity, and Homo sapiens appeared 500 million years later, all using the same “hardware” infrastructure. We can attempt to draw an analogy: if it took fifty years to discover perception from the early blind robots of the 1950s, perhaps human-level self-awareness AI will take another 500 years. Hardware evolution will accelerate this this trend, but algorithm evolution has to go through its punctuated fits and starts.

Whether 2020 or 2500, that’s just a blip in human evolution.

* * *

The human race has been enamored of the quest to make life out of matter. Early alchemists tried and tested numerous ways to breathe life into clay. Mythical potions came and went, and over the years alchemists were replaced by their modern ancestors, roboticists. Today, we roboticists have better tools, deeper understanding, and a little more funding. But ultimately, we are still trying to breathe life into inanimate machines.