2 Machines That Think

The desire to create inanimate objects that behave in an intelligent way arguably stretches back to the beginnings of human evolution.1 Artificial intelligence is a modern manifestation of this innate human desire to create intelligently behaving artifacts, by making use of the computer. Artificial intelligence’s goal is therefore to make computers solve problems we generally associate with human cognition and perception: for example, diagnosing a disease, or navigating through an unknown terrain, or understanding human language, or recognizing a face in a crowd. Broadly speaking, there have been two schools of thought on how to emulate human intelligence in a computer, depending on how each school viewed how the brain acquires knowledge about the world.

For the “symbolic” school of AI, knowledge is the result of logic and is therefore something that emerges by combining a description of the world (the “what,” or “declarative knowledge”) and a description of how to make inferences about the world (the “how,” or “prescriptive knowledge”). Perhaps the most successful systems that came out of this school are “expert systems.” A typical expert system for medical diagnosis would have a knowledge base with descriptions of symptoms and of rules that encapsulate how doctors reason on the basis of symptoms and other information to arrive at a diagnosis. Symbolic approaches to AI—the favorite school of AI until the early 1990s—hit a philosophical dead end called Polanyi’s paradox. Named after the philosopher Michael Polanyi, the paradox makes the observation that much of human knowledge is tacit, that is, it escapes our consciousness and is therefore impossible to explicitly articulate, record, and code in a machine. We know things without really knowing how we know them. This paradox deals a serious blow to coding prescriptive knowledge in any symbolic form. To describe the incredibly complex real world by hard coding it is therefore impossible. For example, a driverless car programmed using symbolic AI would soon run into Polanyi’s paradox given the seemingly infinite possibilities and unpredictable situations one may encounter on the road. This realization undermined confidence in the symbolic approach and was the reason why AI went through its so-called winters with funding, as well as interest, drying up. Symbolic approaches in AI are still valuable in specific use cases but have largely relinquished the limelight to the second school of thought that proposes an alternative “nonsymbolic,” or “connectionist,” approach.

Connectionism follows a biological approach to knowledge and tries to emulate the way the human brain functions at the level of neurons. This approach assumes that knowledge is something that has to be acquired by the machine itself and not be handcrafted, or hard coded, by a human programmer. Therefore, intelligent machines should learn by imitating the functioning of the brain. In 1957 Frank Rosenblatt put forward the idea of artificial neurons and built the first “perceptron,” using motors, dials, and light detectors. He successfully trained this simulation of a neuron to tell the difference between basic shapes. But his approach was severely limited in recognizing complex patterns given the difficulty, at that time, in connecting many perceptrons together to build a sizable network. Connectionism’s time had to wait fifty years for three important developments to take place.

The first development came when Geoffrey Hinton and other researchers at the University of Toronto found a way for software neurons to teach themselves by layering their training. The first layer learns how to distinguish basic features, while successive layers identify more complex features, and so on, in a technique we now call “deep learning.” Deep learning is one of the approaches within a much larger framework of techniques and algorithms to make computers learn about the world without explicitly programming them; this larger framework is called “machine learning.” The second development in connectionist AI owes a lot to the massive volumes of data that are becoming available as billions of people interact with digital systems and devices and to the belief that one can mine valuable knowledge from that big data. Machine learning is the technology that can deliver powerful new solutions for making sense of massive and varied data sets. Data are absolutely necessary for training deep neural networks, and a lot of data makes for better learning. For example, in 2017 Google made publicly available a data set for training voice recognition systems that would distinguish between different accents. For engineers to train a typical deep neural network to distinguish merely 60 words, more than 30,000 audio clips are needed. The third breakthrough took place when a team at Stanford led by Andrew Ng realized that graphics processing unit chips—called GPUs—originally invented for image processing in video games, could be repurposed for training deep learning systems. To accelerate the rendering of graphics, GPUs perform parallel processing, that is, they split processing on several processing units and recombine the result. By exploiting the parallelism of GPUs, Ng showed that deep learning networks could learn in a day what used to take several weeks.

Nowadays, there are neural networks made up of millions of software neurons with billions of interconnections running on thousands of GPUs. Generally speaking, these networks learn in three different ways. The most common way is called “supervised learning,” whereby training data sets are “labeled” by humans; that is, the machine learns by been “told” what its output should be for a given input. For example, you can train a machine to recognize cats by showing it thousands of pictures of various animals, but each time a cat appears letting the machine know that this is a cat. Machines also learn through “unsupervised” training, whereby there is no a priori knowledge on how various data are correlated. In this case the machine discovers correlations in the data all by itself, usually by clustering together data that appear to have a high frequency of common features. Third, there is the method of “reinforcement” learning that was impressively used by DeepMind, a Google company that is one of the global leaders in AI research, to build AlphaGo and AlphaGo Zero. In this case the machine is given a goal (for instance, “maximize score in a game”) and is allowed to figure out a strategy for doing so (and thus win a contest).

Hardware, as well as software, is used to improve the accuracy and efficacy of AI systems and to increase the range of their application. Google, for example, has developed special-purpose chips called Tensor Processing Units that accelerate the training and deployment of AI applications that run on their open-source AI framework called TensorFlow. Much of AI processing is nowadays increasingly pushed on the “edge,” namely, in the devices that collect data in the field; think of sensors in factories or cities, smartphones and wearables, and just about anything that will be connected in the “Internet of Things.” This approach is called “federated learning” and has several advantages. First, it respects user privacy since the machine learning crunches personal data on the device, and no user data need to travel to a distant central server. Second, it provides more personalization and lowers latency; the AI on the edge responds much faster even when the device is not connected on the Internet. Federated learning is expected to become the norm for most consumer-oriented applications as more powerful AI-specific chips become available and are embedded in smartphones and remote devices.

The Quest for Artificial General Intelligence

Impressive as the developments in connectionist AI may be, we are still at the beginning. Current AI systems are “narrow” in the sense that they can be trained to solve only for specific problems and domains. For example, AlphaGo can only win in a game of Go but is completely useless at everything else. Moreover, the current approaches in deep learning essentially emulate the pattern recognition intelligence of the brain. Human intelligence is much broader than that. We possess memory and common sense, which we use in order to make sense and take action in a continuously evolving and uncertain environment. We also have feelings and emotions that drive our actions and our interpersonal relations. Current AI systems are nowhere near that level. Nevertheless, the quest for intelligent systems that have capabilities comparable to, or even surpassing, those of human brains continues and, indeed, has intensified.

In the summer of 2019 Microsoft announced an investment of US$1 billion into OpenAI, a research group that was originally founded by Elon Musk and Peter Thiel out of concern that AI may be a threat to humanity. The goal of the investment is to test the hypothesis that a neural network close to the size of a human brain running on Microsoft’s Azure cloud infrastructure can be trained to be an artificial general intelligence (AGI).2 Google, through its acquisition of DeepMind, is also pursuing human-level AI. Both companies approach the problem of AGI as one that requires orders of magnitude more computing power and hope that by brute computing force their systems will be able to crunch through such a wide variety of data sets that they will be able to have general application across many fields.

An alternative approach to AGI looks into advancing biologically inspired hardware, such as “neuromorphic” chips that behave like interconnected groups of neurons.3 Those chips allow for an alternative approach to AGI that is not based on mathematically manipulating weight functions in neural network algorithms but is based on electrical spikes similar to the actual neural dynamics in the human brain. The most developed experiment in using neuromorphic computing to emulate intelligence is currently taking place at Manchester University in the United Kingdom. Researchers at the SpiNNaker project4 have developed a massively powerful neuromorphic computer that simulates a billion interconnected neurons. Their ambition is to test their machine in applications ranging from robotics to image recognition and gradually scale it toward interconnecting a hundred billion neurons, which is the number inside a typical human brain. Hybrid approaches, where neuromorphic chips are also capable of training neural networks, are being pursued as well.5

And while researchers are trying to crack AGI using our existing understanding of brain function, much attention is also directed toward learning from new developments and breakthroughs in neuroscience. Perhaps one of the most interesting ideas from this field is the “free energy principle” developed by British neuroscientist and brain imaging expert Karl Friston.6 One of the key observations behind Friston’s thinking is that brains consume much less power, and transmit less heat, than electronic computers when doing similar tasks. In other words, biological brains have a way to reduce entropy and, just like the rest of life, they self-organize in the most energy-efficient way.

Inspired by this realization, Friston suggested that the reduction in entropy occurs because living things have a universal mechanism that constantly reduces the gap between their expectations and information coming through their sensory inputs. That gap is what Friston calls “free energy.” In effect, Friston is telling us that intelligence is the minimization of surprise! There is a mathematical way to describe this mechanism that is based on a construct called a “Markov blanket.” A Markov blanket is essentially a mathematical “shield” that separates a set of variables from other sets in a hierarchical way. So Friston suggested that the universe is made up of Markov blankets inside other Markov blankets, like an endless series of Russian babushkas.7 His ideas are becoming increasingly influential in the machine-learning community as they provide a general theory of intelligence that is mathematical and testable in a machine. Moreover, Friston’s theory allows not only for knowledge processing but for action too. Here’s how the theory explains why we act: if, say, my expectation is that I should be scratching my nose and my sensory input tells me that my hand is doing something else, I will minimize the free energy gap by moving my hand and scratching my nose. By explaining action, Friston provides a new way of thinking about the design of autonomous robots and is getting us closer to achieving AGI.

AI and the Fourth Industrial Revolution

Artificial intelligence does not need to reach the AGI level in order to have a profound impact on human civilization. The current state of narrow deep learning is perfectly capable of performing complex tasks of perception—for example, image and voice recognition—as well as cognition.8 Having machines capable of perception and cognition opens up boundless opportunities for innovation.9 More importantly, AI can take over and automate many human tasks that require perception and, to a certain extent, cognition. Because of that, AI is already impacting the workplace, and—most significantly—white-collar jobs.10 As an example, AI trading algorithms have almost obliterated Goldman Sachs’s Equity Trading desk, which was once run by 500 people and is now run by just three.11 Meanwhile, Goldman Sachs now employs 9,000 engineers and data scientists and is investing heavily in machine learning.

The example of Goldman Sachs illustrates the two opposing effects of technological disruption in the economy. Some workers get “displaced” by the new technology and lose their jobs to the new automation—as the trade desk workers did. But new work is also created that has more high-value tasks and is better compensated. As those higher-value workers—the data scientists and engineers in the Goldman Sachs example—are more affluent, their increased disposable income goes back into the economy to “compensate” those who lost their jobs by creating demand for new services.

This interplay between the “displacement” and “compensation” effect has been observed in past industrial revolutions as well. Take, for example, the invention and popularization of the automobile in the early twentieth century. Till then, many people used horses for land transportation, and there was a plethora of jobs that supported the owning, renting, maintenance, stabling, and use of horses. All those jobs were lost when people shifted to owning and using cars. Nevertheless, the automation of horses created new needs and new opportunities for work, in car manufacturing, as well as in servicing, driving, parking, washing, and selling cars. Moreover, the car created completely new opportunities, for example, for extending cities into suburbs and building new homes and infrastructure, with a resulting multiplier compensation effect rippling throughout the economy of the twentieth century. By “automating” the horse, the car “augmented” most humans to become more productive.

As we look into how AI will impact different categories of workers during the Fourth Industrial Revolution, the debate centers on the dipole of “automation” versus “augmentation” and how the displacement and compensation effects will play out. There will certainly be winners and losers. Take, for example, Google Translate, the online AI-powered translation service offered by Google for free. The system is automating the work of human translators, at zero cost, making them virtually obsolete.12 At the same time, it helps immigrants and refugees arriving in new countries to quickly learn the language and integrate faster. By “automating” human translators, Google Translate “augments” human immigrants. But how balanced is this equation? In the Fourth Industrial Revolution will human augmentation deliver enough “compensation effect” to overcome the “displacement effect”?

Although much of current economic research is generally upbeat about AI delivering handsome dividends to the workforce by increasing human productivity,13 it fails to factor further advances of AI technologies, including the possibility of achieving AGI in the next decade or so. As many high-level human skills become automated because of advances in AI and robotics, companies will have a financial incentive to shed even more jobs in order to reduce production costs and increase operational efficiencies. By simple extrapolation, if AI advances linearly—let alone exponentially—it will become increasingly harder to find a decently paying full-time job14 in the next ten to fifteen years. The greatest innovation of capitalism to date—the automation of the human intellect via AI—is replacing human workers with intelligent machines. With work being the fundamental bedrock of any modern economy, can liberal democracies survive in a world without work?