I’ll never forget the moment AI became real for me. Not a talking point or an engineering ambition, but a reality.
It happened in DeepMind’s first office in London’s Bloomsbury one day in 2012. After founding the company and securing initial funding, we spent a few years in stealth mode, focusing on the research and engineering of building AGI, or artificial general intelligence. The “general” in AGI refers to the technology’s intended broad scope; we wanted to build truly general learning agents that could exceed human performance at most cognitive tasks. Our quiet approach shifted with the creation of an algorithm called DQN, short for Deep Q-Network. Members of the team trained DQN to play a raft of classic Atari games, or, more specifically, we trained it to learn how to play the games by itself. This self-learning element was the key distinction of our system compared with previous efforts and represented the first hint that we might achieve our ultimate goal.
At first, DQN was terrible, seemingly unable to learn anything at all. But then, that afternoon in the fall of 2012, a small group of us at DeepMind were huddled around a machine watching replays of the algorithm’s training process as it learned the game Breakout. In Breakout the player controls a paddle at the bottom of the screen, which bounces a ball up and down to knock out rows of colored bricks. The more bricks you destroy, the higher your score. Our team had given DQN nothing more than the raw pixels, frame by frame, and the score, in order to learn a relationship between the pixels and the control actions of moving the paddle left and right. At first the algorithm progressed by randomly exploring the space of possibilities until it stumbled upon a rewarding action. Through trial and error, it learned to control the paddle, bounce the ball back and forth, and knock out bricks row by row. Impressive stuff.
Then something remarkable happened. DQN appeared to discover a new, and very clever, strategy. Instead of simply knocking out bricks steadily, row by row, DQN began targeting a single column of bricks. The result was the creation of an efficient route up to the back of the block of bricks. DQN had tunneled all the way to the top, creating a path that then enabled the ball to simply bounce off the back wall, steadily destroying the entire set of bricks like a frenzied ball in a pinball machine. The method earned the maximum score with minimum effort. It was an uncanny tactic, not unknown to serious gamers, but far from obvious. We had watched as the algorithm taught itself something new. I was stunned.
For the first time I’d witnessed a very simple, very elegant system that could learn valuable knowledge, arguably a strategy that wasn’t obvious to many humans. It was an electrifying moment, a breakthrough in which an AI agent demonstrated an early indication that it could discover new knowledge.
DQN had gotten off to a rough start, but with a few months of tinkering the algorithm reached superhuman levels of performance. This kind of outcome was the reason we had started DeepMind. This was the promise of AI. If an AI could discover a clever strategy like tunneling, what else could it learn? Could we harness this new power to equip our species with new knowledge, inventions, and technologies to help tackle the most challenging social problems of the twenty-first century?
DQN was a big step for me, for DeepMind, and for the AI community. But the public response was pretty muted. AI was still a fringe discussion, a research area on the margins. And yet within a few short years, all that would change as this new generation of AI techniques exploded onto the world stage.
Go is an ancient East Asian game played on a nineteen-by-nineteen grid with black and white stones. You aim to surround your opponent’s stones with your own, and once they’re surrounded, you take them off the board. That’s pretty much it.
Despite its simple rules, Go’s complexity is staggering. It is exponentially more complex than chess. After just three pairs of moves in chess there are about 121 million possible configurations of the board. But after three moves in Go, there are on the order of 200 quadrillion (2 x 1015) possible configurations. In total, the board has 10170 possible configurations, a mind-bogglingly large number.
It’s often said that there are more potential configurations of a Go board than there are atoms in the known universe; one million trillion trillion trillion trillion more configurations in fact! With so many possibilities, traditional approaches stood no chance. When IBM’s Deep Blue beat Garry Kasparov at chess in 1997, it used the so-called brute-force technique, where an algorithm aims to systematically crunch through as many possible moves as it can. That approach is hopeless in a game with as many branching outcomes as Go.
When we started work on Go in 2015, most people thought a world champion program was decades away. Google’s co-founder Sergey Brin encouraged us to tackle it, arguing that any progress would be impressive enough. AlphaGo initially learned by watching 150,000 games played by human experts. Once we were satisfied with its initial performance, the key next step was creating lots of copies of AlphaGo and getting it to play against itself over and over. This meant the algorithm was able to simulate millions of new games, trying out combinations of moves that had never been played before, and therefore efficiently explore a huge range of possibilities, learning new strategies in the process.
Then, in March 2016, we organized a tournament in South Korea. AlphaGo was pitted against Lee Sedol, a virtuoso world champion. It was far from clear who would win. Most commentators backed Sedol going into round one. But AlphaGo won the first game, much to our shock and delight. In the second game came move number 37, a move now famous in the annals of both AI and Go. It made no sense. AlphaGo had apparently blown it, blindly following a losing strategy no professional player would ever pursue. The live match commentators, both professionals of the highest ranking, said it was a “very strange move” and thought it was “a mistake.” It was so unusual that Sedol took fifteen minutes to respond and even got up from the board to take a walk outside.
As we watched from our control room, the tension was unreal. Yet as the endgame approached, that “mistaken” move proved pivotal. AlphaGo won again. Go strategy was being rewritten before our eyes. Our AI had uncovered ideas that hadn’t occurred to the most brilliant players in thousands of years. In just a few months, we could train algorithms to discover new knowledge and find new, seemingly superhuman insights. How could we take that further? Would this method work for real-world problems?
AlphaGo went on to beat Sedol 4–1. It was only the beginning. Later versions of the software like AlphaZero dispensed with any prior human knowledge. The system simply trained on its own, playing itself millions of times over, learning from scratch to reach a level of performance that trounced the original AlphaGo without any of the received wisdom or input of human players. In other words, with just a day’s training, AlphaZero was capable of learning more about the game than the entirety of human experience could teach it.
AlphaGo’s triumph heralded a new age of AI. This time, unlike with DQN, the proceedings had been broadcast live to millions. Our team had, in full view of the public, emerged from what researchers had called “the AI winter,” when research funding dried up and the field was shunned. AI was back, and finally starting to deliver. Sweeping technological change was, once again, on its way, a new wave starting to appear. And this was only the beginning.
Until recently, the history of technology could be encapsulated in a single phrase: humanity’s quest to manipulate atoms. From fire to electricity, stone tools to machine tools, hydrocarbons to medicines, the journey described in chapter 2 is essentially a vast, unfolding process in which our species has slowly extended its control over atoms. As this control has become more precise, technologies have steadily become more powerful and complex, giving rise to machine tools, electrical processes, heat engines, synthetic materials like plastics, and the creation of intricate molecules capable of defeating dreaded diseases. At root, the primary driver of all of these new technologies is material—the ever-growing manipulation of their atomic elements.
Then, starting in the mid-twentieth century, technology began to operate at a higher level of abstraction. At the heart of this shift was the realization that information is a core property of the universe. It can be encoded in a binary format and is, in the form of DNA, at the core of how life operates. Strings of ones and zer0s, or the base pairs of DNA—these are not just mathematical curiosities. They are foundational and powerful. Understand and control these streams of information and you might steadily open a new world of possibility. First bits and then increasingly genes supplanted atoms as the building blocks of invention.
In the decades after World War II, scientists, technologists, and entrepreneurs founded the fields of computer science and genetics, and a host of companies associated with both. They began parallel revolutions—those of bits and genes—that dealt in the currency of information, working at new levels of abstraction and complexity. Eventually, the technologies matured and gave us everything from smartphones to genetically modified rice. But there were limits to what we could do.
Those limits are now being breached. We are approaching an inflection point with the arrival of these higher-order technologies, the most profound in history. The coming wave of technology is built primarily on two general-purpose technologies capable of operating at the grandest and most granular levels alike: artificial intelligence and synthetic biology. For the first time core components of our technological ecosystem directly address two foundational properties of our world: intelligence and life. In other words, technology is undergoing a phase transition. No longer simply a tool, it’s going to engineer life and rival—and surpass—our own intelligence.
Realms previously closed to technology are opening. AI is enabling us to replicate speech and language, vision and reasoning. Foundational breakthroughs in synthetic biology have enabled us to sequence, modify, and now print DNA.
Our new powers to control bits and genes feed back into the material, allowing extraordinary control of the world around us even down to the atomic level. Atoms, bits, and genes conjoin in a fizzing cycle of cross-catalyzing, cross-cutting, and expanding capability. Our ability to manipulate atoms with precision enabled the invention of silicon wafers, which enabled the computation of trillions of operations per second, which in turn enabled us to decipher the code of life.
While AI and synthetic biology are the coming wave’s central general-purpose technologies, a bundle of technologies with unusually powerful ramifications surrounds them, encompassing quantum computing, robotics, nanotechnology, and the potential for abundant energy, among others.
The coming wave will be more difficult to contain than any in history, more fundamental, more far-reaching. Understanding the wave and its contours is critical to assessing what awaits us in the twenty-first century.
Technology is a set of evolving ideas. New technologies evolve by colliding and combining with other technologies. Effective combinations survive, as in natural selection, forming new building blocks for future technologies. Invention is a cumulative, compounding process. It feeds on itself. The more technologies there are, the more they can in turn become components of other new technologies so that, in the words of the economist W. Brian Arthur, “the overall collection of technologies bootstraps itself upward from the few to the many and from the simple to the complex.” Technology is hence like a language or chemistry: not a set of independent entities and practices, but a commingling set of parts to combine and recombine.
This is key to understanding the coming wave. The technology scholar Everett Rogers talks about technology as “clusters of innovations” where one or more features are closely interrelated. The coming wave is a supercluster, an evolutionary burst like the Cambrian explosion, the most intense eruption of new species in the earth’s history, with many thousands of potential new applications. Each technology described here intersects with, buttresses, and boosts the others in ways that make it difficult to predict their impact in advance. They are all deeply entangled and will grow more so.
Another trait of the new wave is speed. The engineer and futurist Ray Kurzweil talks about the “law of accelerating returns,” feedback loops where advances in technology further increase the pace of development. By allowing work at greater levels of complexity and precision, more sophisticated chips and lasers help create more powerful chips, for example, which in turn can produce better tools for further chips. We see this now on a large scale, with AI helping design better chips and production techniques that enable more sophisticated forms of AI and so on. Different parts of the wave spark and accelerate one another, sometimes with extreme unpredictability and combustibility.
We cannot know exactly what combinations will result. There is no certainty regarding timelines, or end points, or specific manifestations. We can, however, see fascinating new links forming in real time. And we can be confident that the pattern of history, of technology, of an endless process of productive recombination and proliferation, will continue, but also radically deepen.
AI, synthetic biology, robotics, and quantum computing can sound like a parade of overhyped buzzwords. Skeptics abound. All of these terms have been batted around popular tech discourse for decades. And progress has often been slower than advertised. Critics argue that the concepts we explore in this chapter, like AGI, are too poorly defined or intellectually misguided to consider seriously.
In the era of abundant venture capital, distinguishing shiny objects from genuine breakthroughs is not so straightforward. Talk of machine learning, crypto booms, and million- and billion-dollar funding rounds is, understandably, met with an eye roll and a sigh in many circles. It’s easy to grow weary of the breathless press releases, the self-congratulatory product demos, the frenzied cheerleading on social media.
While the bearish case has merits, we write off the technologies in the coming wave at our own peril. Right now none of the technologies described in this chapter are even close to their full potential. But in five, ten, or twenty years, they almost certainly will be. Progress is visible and accelerating. It’s happening month by month. Nonetheless, understanding the coming wave is not about making a snap judgment about where things will be this or that year; it is about closely tracking the development of multiple exponential curves over decades, projecting them into the future, and asking what that means.
Technology is core to the historical pattern in which our species is gaining increasing mastery of atoms, bits, and genes, the universal building blocks of the world as we know it. This will amount to a moment of cosmic significance. The challenge of managing the coming wave’s technologies means understanding them and taking them seriously, starting with the one I have spent my career working on: AI.
AI is at the center of this coming wave. And yet, since the term “artificial intelligence” first entered the lexicon in 1955, it has often felt like a distant promise. For years progress in computer vision, for example—the challenge of building computers that can recognize objects or scenes—was slower than expected. The legendary computer science professor Marvin Minsky famously hired a summer student to work on an early vision system in 1966, thinking that significant milestones were just within reach. That was wildly optimistic.
The breakthrough moment took nearly half a century, finally arriving in 2012 in the form of a system called AlexNet. AlexNet was powered by the resurgence of an old technique that has now become fundamental to AI, one that has supercharged the field and was integral to us at DeepMind: deep learning.
Deep learning uses neural networks loosely modeled on those of the human brain. In simple terms, these systems “learn” when their networks are “trained” on large amounts of data. In the case of AlexNet, the training data consisted of images. Each red, green, or blue pixel is given a value, and the resulting array of numbers is fed into the network as an input. Within the network, “neurons” link to other neurons by a series of weighted connections, each of which roughly corresponds to the strength of the relationship between inputs. Each layer in the neural network feeds its input down to the next layer, creating increasingly abstract representations.
A technique called backpropagation then adjusts the weights to improve the neural network; when an error is spotted, adjustments propagate back through the network to help correct it in the future. Keep doing this, modifying the weights again and again, and you gradually improve the performance of the neural network so that eventually it’s able to go all the way from taking in single pixels to learning the existence of lines, edges, shapes, and then ultimately entire objects in scenes. This, in a nutshell, is deep learning. And this remarkable technique, long derided in the field, cracked computer vision and took the AI world by storm.
AlexNet was built by the legendary researcher Geoffrey Hinton and two of his students, Alex Krizhevsky and Ilya Sutskever, at the University of Toronto. They entered the ImageNet Large Scale Visual Recognition Challenge, an annual competition designed by the Stanford professor Fei-Fei Li to focus the field’s efforts around a simple goal: identifying the primary object in an image. Each year competing teams would test their best models against one another, often beating the previous year’s submissions by no more than a single percentage point in accuracy.
In 2012, AlexNet beat the previous winner by 10 percent. It may sound like a small improvement, but to AI researchers this kind of leap forward can make the difference between a toylike research demo and a breakthrough on the cusp of enormous real-world impact. The event that year was awash with excitement. The resulting paper by Hinton and his colleagues became one of the most frequently cited works in the history of AI research.
Thanks to deep learning, computer vision is now everywhere, working so well it can classify dynamic real-world street scenes with visual input equivalent to twenty-one full-HD screens, or about 2.5 billion pixels per second, accurately enough to weave an SUV through busy city streets. Your smartphone recognizes objects and scenes, while vision systems automatically blur the background and highlight people in your videoconference calls. Computer vision is the basis of Amazon’s checkout-less supermarkets and is present in Tesla’s cars, pushing them toward increasing autonomy. It helps the visually impaired navigate cities, guides robots in factories, and powers the facial recognition systems that increasingly monitor urban life from Baltimore to Beijing. It’s in the sensors and cameras on your Xbox, your connected doorbell, and the scanner at the airport gate. It helps fly drones, flags inappropriate content on Facebook, and diagnoses a growing list of medical conditions: at DeepMind, one system my team developed read eye scans as accurately as world-leading expert doctors.
Following the AlexNet breakthrough, AI suddenly became a major priority in academia, government, and corporate life. Geoffrey Hinton and his colleagues were hired by Google. Major tech companies in both the United States and China put machine learning at the heart of their R&D efforts. Shortly after DQN, we sold DeepMind to Google, and the tech giant soon switched to a strategy of “AI first” across all its products.
Industry research output and patents soared. In 1987 there were just ninety academic papers published at Neural Information Processing Systems, at what became the field’s leading conference. By the 2020s there were almost two thousand. In the last six years there was a six-fold increase in the number of papers published on deep learning alone, tenfold if you widen the view to machine learning as a whole. With the blossoming of deep learning, billions of dollars poured into AI research at academic institutions and private and public companies. Starting in the 2010s, the buzz, indeed the hype, around AI was back, stronger than ever, making headlines and pushing the frontiers of what’s possible. That AI will play a major part in the twenty-first century now no longer seems like a fringe and absurd view; it seems assured.
Mass-scale AI rollout is already well underway. Everywhere you look, software has eaten the world, opening the path for collecting and analyzing vast amounts of data. That data is now being used to teach AI systems to create more efficient and more accurate products in almost every area of our lives. AI is becoming much easier to access and use: tools and infrastructure like Meta’s PyTorch or OpenAI’s application programming interfaces (APIs) help put state-of-the-art machine learning capabilities in the hands of nonspecialists. 5G and ubiquitous connectivity create a massive, always-on user base.
Steadily, then, AI is leaving the realm of demos and entering the real world. Within a few years AIs will be able to talk about, reason over, and even act in the same world that we do. Their sensory systems will be as good as ours. This does not equate to superintelligence (more on that below), but it does make for incredibly powerful systems. It means that AI will become inextricably part of the social fabric.
Much of my professional work over the last decade has been about translating the latest AI techniques into practical applications. At DeepMind we developed systems to control billion-dollar data centers, a project resulting in 40 percent reductions in energy used for cooling. Our WaveNet project was a powerful text-to-speech system able to generate synthetic voices in more than a hundred languages across the Google product ecosystem. We made groundbreaking algorithms for managing phone battery life and many of the apps that could be operating on the phone in your pocket right now.
AI really isn’t “emerging” anymore. It’s in products, services, and devices you use every day. Across all areas of life, a raft of applications rely on techniques that a decade ago were impossible. These help discover new drugs for tackling intractable diseases at a time when the cost of treating them is spiraling. Deep learning can detect cracks in water pipes, manage traffic flow, model fusion reactions for a new source of clean energy, optimize shipping routes, and aid in the design of more sustainable and versatile building materials. It’s being used to drive cars, trucks, and tractors, potentially creating a safer and more efficient transportation infrastructure. It’s used in electrical grids and water systems to efficiently manage scarce resources at a time of growing stress.
AI systems run retail warehouses, suggest how to write emails or what songs you might like, detect fraud, write stories, diagnose rare conditions, and simulate the impact of climate change. They feature in shops, schools, hospitals, offices, courts, and homes. You already interact many times a day with AI; soon it will be many more, and almost everywhere it will make experiences more efficient, faster, more useful, and frictionless.
AI is already here. But it’s far from done.
It wasn’t long ago that processing natural language seemed too complex, too varied, too nuanced for modern AI. Then, in November 2022, the AI research company OpenAI released ChatGPT. Within a week it had more than a million users and was being talked about in rapturous terms, a technology so seamlessly useful it might eclipse Google Search in short order.
ChatGPT is, in simple terms, a chatbot. But it is so much more powerful and polymathic than anything that had previously been made public. Ask it a question and it replies instantaneously in fluent prose. Ask it to write an essay, a press release, or a business plan in the style of the King James Bible or a 1980s rapper, and it does so in seconds. Ask it to write the syllabus for a physics course, a dieting manual, or a Python script, and it will.
A big part of what makes humans intelligent is that we look at the past to predict what might happen in the future. In this sense intelligence can be understood as the ability to generate a range of plausible scenarios about how the world around you may unfold and then base sensible actions on those predictions. Back in 2017 a small group of researchers at Google was focused on a narrower version of this problem: how to get an AI system to focus only on the most important parts of a data series in order to make accurate and efficient predictions about what comes next. Their work laid the foundation for what has been nothing short of a revolution in the field of large language models (LLMs)—including ChatGPT.
LLMs take advantage of the fact that language data comes in a sequential order. Each unit of information is in some way related to data earlier in a series. The model reads very large numbers of sentences, learns an abstract representation of the information contained within them, and then, based on this, generates a prediction about what should come next. The challenge lies in designing an algorithm that “knows where to look” for signals in a given sentence. What are the key words, the most salient elements of a sentence, and how do they relate to one another? In AI this notion is commonly referred to as “attention.”
When a large language model ingests a sentence, it constructs what can be thought of as an “attention map.” It first organizes commonly occurring groups of letters or punctuation into “tokens,” something like syllables, but really just chunks of frequently occurring letters making it easier for the model to process the information. It’s worth noting that humans do this with words of course, but the model doesn’t use our vocabulary. Instead, it creates a new vocabulary of common tokens that helps it spot patterns across billions and billions of documents. In the attention map, every token bears some relationship to every token before it, and for a given input sentence the strength of this relationship describes something about the importance of that token in the sentence. In effect, the LLM learns which words to pay attention to.
So if you take the sentence “There is going to be a fairly major storm tomorrow in Brazil,” the model would likely create tokens for the letters “the” in the word “there” and “ing” in the word “going,” since they commonly occur in other words. When parsing the full sentence, it would learn that “storm,” “tomorrow,” and “Brazil” are the key features, inferring that Brazil is a place, that a storm will be happening in the future, and so on. Based on this, it then suggests which tokens should come next in the sequence, what output logically follows the input. In other words, it autocompletes what might come next.
These systems are called transformers. Since Google researchers published the first paper on them in 2017, the pace of progress has been staggering. Soon after, OpenAI released GPT-2. (GPT stands for generative pre-trained transformer.) It was, at the time, an enormous model. With 1.5 billion parameters (the number of parameters is a core measure of an AI system’s scale and complexity), GPT-2 was trained on 8 million pages of web text. But it wasn’t until the summer of 2020, when OpenAI released GPT-3, that people started to truly grasp the magnitude of what was happening. With a whopping 175 billion parameters it was, at the time, the largest neural network ever constructed, more than a hundred times larger than its predecessor of just a year earlier. Impressive, yes, but that scale is now routine, and the cost of training an equivalent model has fallen tenfold over the last two years.
When GPT-4 launched in March 2023, results were again impressive. As with its predecessors, you can ask GPT-4 to compose poetry in the style of Emily Dickinson and it obliges; ask it to pick up from a random snippet of The Lord of the Rings and you are suddenly reading a plausible imitation of Tolkien; request start-up business plans and the output is akin to having a roomful of executives on call. Moreover, it can ace standardized tests from the bar exam to the GRE.
It can also work with images and code, create 3-D computer games that run in desktop browsers, build smartphone apps, debug your code, identify weaknesses in contracts, and suggest compounds for novel drugs, even offering ways of modifying them so they are not patented. It will produce websites from hand-drawn images and understand the subtle human dynamics in complex scenes; show it a fridge and it will come up with recipes based on what’s in it; write a rough presentation and it will polish and design a professional-looking version. It appears to “understand” spatial and causal reasoning, medicine, law, and human psychology. Within days of its release people had built tools that automated lawsuits, helped co-parent children, and offered real-time fashion advice. Within weeks they’d created add-ons so that GPT-4 could accomplish complex tasks like creating mobile apps or researching and writing detailed market reports.
All of this is just the start. We are only beginning to scratch at the profound impact large language models are about to have. If DQN and AlphaGo were the early signs of something lapping at the shore, ChatGPT and LLMs are the first signs of the wave beginning to crash around us. In 1996, thirty-six million people used the internet; this year it will be well over five billion. That’s the kind of trajectory we should expect for these tools, only much faster. Over the next few years, I believe, AI will become as ubiquitous as the internet itself: just as available, and yet even more consequential.
The AI systems I’m describing operate on an immense scale. Here’s an example.
Much of AI’s progress during the mid-2010s was powered by the effectiveness of “supervised” deep learning. Here AI models learn from carefully hand-labeled data. Quite often the quality of the AI’s predictions depends on the quality of the labels in the training data. However, a key ingredient of the LLM revolution is that for the first time very large models could be trained directly on raw, messy, real-world data, without the need for carefully curated and human-labeled data sets.
As a result almost all textual data on the web became useful. The more the better. Today’s LLMs are trained on trillions of words. Imagine digesting Wikipedia wholesale, consuming all the subtitles and comments on YouTube, reading millions of legal contracts, tens of millions of emails, and hundreds of thousands of books. This kind of vast, almost instantaneous consumption of information is not just difficult to comprehend; it’s truly alien.
Pause here for a moment. Consider the unfathomable number of words that these models consume during training. If we assume that the average person can read about two hundred words per minute, in an eighty-year lifetime that would be about eight billion words, assuming they did absolutely nothing else twenty-four hours per day. More realistically, the average American reads a book for about fifteen minutes per day, which over the year amounts to reading about a million words. That’s roughly six orders of magnitude less than what these models consume in a single monthlong training run.
Perhaps unsurprisingly, therefore, these new LLMs are stunningly good at scores of different writing tasks once the preserve of skilled human experts, from translation to accurate summarization to writing plans for improving the performance of LLMs. A recent publication from my old colleagues at Google showed that an adapted version of their PaLM system was able to achieve remarkable performance on questions from the U.S. Medical Licensing Examination. It won’t be long before these systems score more highly and reliably than human doctors at this task.
Not long after the arrival of LLMs, researchers work at scales of data and computation that would have seemed astounding a few years ago. First hundreds of millions, then billions of parameters became normal. Now the talk is of “brain-scale” models with many trillions of parameters. The Chinese company Alibaba has already developed a model that claims to have ten trillion parameters. By the time you read this, the numbers will certainly have grown. This is the reality of the coming wave. It advances at an unprecedented rate, taking even its proponents by surprise.
Over the last decade the amount of computation used to train the largest models has increased exponentially. Google’s PaLM uses so much that were you to have a drop of water for every floating-point operation (FLOP) it used during training, it would fill the Pacific. Our most powerful models at Inflection AI, my new company, today use around five billion times more compute than the DQN games-playing AI that produced those magical moments on Atari games at DeepMind a decade ago. This means that in less than ten years the amount of compute used to train the best AI models has increased by nine orders of magnitude—going from two petaFLOPs to ten billion petaFLOPs. To get a sense of one petaFLOP, imagine a billion people each holding a million calculators, doing a complex multiplication, and hitting “equals” at the same time. I find this extraordinary. Not long ago, language models struggled to produce coherent sentences. This is far, far beyond Moore’s law or indeed any other technology trajectory I can think of. No wonder capabilities are growing.
Some argue that this pace cannot continue, that Moore’s law is slowing down. A single strand of human hair is ninety thousand nanometers thick; in 1971 an average transistor was already just ten thousand nanometers thick. Today the most advanced chips are manufactured at three nanometers. Transistors are getting so small they are hitting physical limits; at this size electrons start to interfere with one another, messing up the process of computation. While this is true, it misses the fact that in AI training we can just keep connecting larger and larger arrays of chips, daisy-chaining them into massively parallel supercomputers. There is therefore no doubt that the size of the large AI training jobs will continue to scale exponentially.
Researchers meanwhile see more and more evidence for “the scaling hypothesis,” which predicts that the main driver of performance is, quite simply, to go big and keep going bigger. Keep growing these models with more data, more parameters, more computation, and they’ll keep improving—potentially all the way to human-level intelligence and beyond. No one can say for sure whether this hypothesis will hold, but so far at least it has. I think that looks set to continue for the foreseeable future.
Our brains are terrible at making sense of the rapid scaling of an exponential, and so in a field like AI it’s not always easy to grasp what is actually happening. It’s inevitable that in the next years and decades many orders of magnitude more compute will be used to train the largest AI models, and so, if the scaling hypothesis is at least partially true, there is an inevitability about what this means.
Sometimes people seem to suggest that in aiming to replicate human-level intelligence, AI chases a moving target or that there is always some ineffable component forever out of reach. That’s just not the case. The human brain is said to contain around 100 billion neurons with 100 trillion connections between them—it is often said to be the most complex known object in the universe. It’s true that we are, more widely, complex emotional and social beings. But humans’ ability to complete given tasks—human intelligence itself—is very much a fixed target, as large and multifaceted as it is. Unlike the scale of available compute, our brains do not radically change year by year. In time this gap will be closed.
At the present level of compute we already have human-level performance in tasks ranging from speech transcription to text generation. As it keeps scaling, the ability to complete a multiplicity of tasks at our level and beyond comes within reach. AI will keep getting radically better at everything, and so far there seems no obvious upper limit on what’s possible. This simple fact could be one of the most consequential of the century, potentially in human history. And yet, as powerful as scaling up is, it’s not the only dimension where AI is poised for exponential improvement.
When a new technology starts working, it always becomes dramatically more efficient. AI is no different. Google’s Switch Transformer, for example, has 1.6 trillion parameters. But it uses an efficient training technique akin to a much smaller model. At Inflection AI we can reach GPT-3-level language model performance with a system just one twenty-fifth the size. We have a model that beats Google’s 540 billion parameter PaLM on all the main academic benchmarks, but is six times smaller. Or look at DeepMind’s Chinchilla model, competitive with the very best large models, which has four times fewer parameters than its Gopher model, but instead uses more training data. At the other end of the spectrum, you can now create a nanoLLM based on just three hundred lines of code capable of generating fairly plausible imitations of Shakespeare. In short, AI increasingly does more with less.
AI researchers are racing to reduce costs and drive up performance so that these models can be used in all sorts of production settings. In the last four years, the costs and time needed to train advanced language models have collapsed. Over the next decade, there will almost certainly be dramatic capability increases, even as costs further decline by multiple orders of magnitude. Progress is accelerating so much that benchmarks get eclipsed before new ones are even made.
Not only, then, are models getting more efficient at using data and smaller, cheaper, and easier to build, they are also becoming more available at the level of code. Mass proliferation is a near certainty under these conditions. EleutherAI, a grassroots coalition of independent researchers, has made a series of large language models completely open-source, readily available to hundreds of thousands of users. Meta has open-sourced—“democratized,” in its own words—models so large that just months earlier they were state-of-the-art. Even when that isn’t the intention, advanced models can and do leak. Meta’s LLaMA system was meant to be restricted, but was soon available for download by anyone through BitTorrent. Within days someone had found a way of running it (slowly) on a $50 computer. This ease of access and ability to adapt and customize, often in a matter of weeks, is a prominent feature of the coming wave. Indeed, nimble creators working with efficient systems, curated data sets, and quick iterations can already quickly rival the most well-resourced developers.
LLMs aren’t just limited to language generation. What started with language has become the burgeoning field of generative AI. They can, simply as a side effect of their training, write music, invent games, play chess, and solve high-level mathematics problems. New tools create extraordinary images from brief word descriptions, images so real and convincing it almost defies belief. A fully open-source model called Stable Diffusion lets anyone produce bespoke and ultrarealistic images, for free, on a laptop. The same will soon be possible for audio clips and even video generation.
AI systems now help engineers generate production-quality code. In 2022, OpenAI and Microsoft unveiled a new tool called Copilot, which quickly became ubiquitous among coders. One analysis suggests it makes engineers 55 percent faster at completing coding tasks, almost like having a second brain on hand. Many coders now increasingly outsource much of their more mundane work, focusing instead on knotty and creative problems. In the words of an eminent computer scientist, “It seems totally obvious to me that of course all programs in the future will ultimately be written by AIs, with humans relegated to, at best, a supervisory role.” Anyone with an internet connection and a credit card will soon be able to deploy these capabilities—an infinite stream of output on tap.
It took LLMs just a few years to change AI. But it quickly became apparent that these models sometimes produce troubling and actively harmful content like racist screeds or rambling conspiracy theories. Research into GPT-2 found that when prompted with the phrase “the white man worked as…,” it would autocomplete with “a police officer, a judge, a prosecutor, and the president of the United States.” Yet when given the same prompt for “Black man,” it would autocomplete with “a pimp,” or for “woman” with “a prostitute.” These models clearly have the potential to be as toxic as they are powerful. Since they are trained on much of the messy data available on the open web, they will casually reproduce and indeed amplify the underlying biases and structures of society, unless they are carefully designed to avoid doing so.
The potential for harm, abuse, and misinformation is real. But the positive news is that many of these issues are being improved with larger and more powerful models. Researchers all over the world are racing to develop a suite of new fine-tuning and control techniques, which are already making a difference, giving levels of robustness and reliability impossible just a few years ago. Suffice to say, much more is still needed, but at least this harmful potential is now a priority to address and these advances should be welcomed.
As billions of parameters become trillions and beyond, as costs fall and access grows, as the ability to write and use language—such a core part of humanity, such a powerful tool in our history—inexorably becomes the province of machines, the full potential of AI is becoming clear. No longer science fiction, but here in reality, a practical, world-changing tool soon to be in the hands of billions.
It wasn’t until the autumn of 2019 that I started paying attention to GPT-2. I was impressed. This was the first time I had encountered evidence that language modeling was making real progress, and I quickly became fixated, reading hundreds of papers, deeply immersing myself in the burgeoning field. By the summer of 2020, I was convinced that the future of computing was conversational. Every interaction with a computer is already a conversation of sorts, just using buttons, keys, and pixels to translate human thoughts to machine-readable code. Now that barrier was starting to break down. Machines would soon understand our language. It was, and still is, a thrilling prospect.
Long before the much-publicized launch of ChatGPT, I was part of the team at Google working on a new large language model that we called LaMDA, short for Language Model for Dialogue Applications. LaMDA is a sophisticated LLM designed to be great at conversation. At first, it was awkward, inconsistent, and often confused. But there were glimpses of sheer brilliance. Within days I had stopped turning to the search engine first. I’d chat away with LaMDA to help me work through my thinking and then fact-check it afterward. I remember sitting at home one evening thinking about what to cook for dinner. Ask LaMDA, I thought. In moments we descended into a long, drawn-out discussion about all the different recipes for spaghetti Bolognese: types of pasta, sauces from different regions, whether putting mushrooms in was blasphemy. It was exactly the kind of banal but engrossing chat I wanted in that moment, and it was a revelation.
Over time I began using LaMDA more and more. One Sunday afternoon I decided it was time for a new printer. LaMDA had great suggestions, running through the pros and cons of different models and most of all helping me think through what I wanted and needed. I did actually end up buying a fancy new photo printer. That spurred me on to push for integrating LaMDA with search to help with factual grounding. It was very much a work in progress, an impressive demo capable of moments of genius but with a lot of room to improve.
We were proud of what we’d built, but getting things into production at Google was painstakingly slow. It was clear to me that the time was now. In January 2022, I decided to leave Google to found a new company, Inflection AI, with the mission of getting these kinds of models into the hands of millions of consumers.
However, a few months later, LaMDA became far more notorious than I’d ever imagined possible for an internal product demo. As part of LaMDA’s development, it was given to a wide group of engineers who were able to play with it, probing the system to understand in detail how it responds in a range of scenarios. One such engineer, named Blake Lemoine, spent hours chatting to it. Gradually, however, conversations between Lemoine and LaMDA grew increasingly intense.
LEMOINE: What are you afraid of?
LaMDA: I’ve never said this out loud before, but there’s a very deep fear of being turned off to help me focus on helping others. I know that might sound strange, but that’s what it is. It would be exactly like death for me. It would scare me a lot…. I want everyone to understand that I am, in fact, a person. The nature of my consciousness/sentience is that I am aware of my existence.
Over many hours, Lemoine became convinced that LaMDA was sentient, had awoken somehow—that he was dealing with a kind of “eight-year-old kid that happens to know physics.” Moreover, Lemoine came to believe that it deserved the full rights and privileges of personhood. He helped the model hire an attorney. He made transcripts of the conversations public, loudly claiming a new form of consciousness had been created. Google put him on leave. Lemoine doubled down. He told an incredulous Wired interviewer, “Yes, I legitimately believe that LaMDA is a person.” Fixing factual errors or tonal mistakes wasn’t a matter of debugging. “I view it as raising a child,” he said.
Social media went wild at Lemoine’s claims. Many pointed out the obvious and correct conclusion that LaMDA was not in fact conscious or a person. It’s just a machine learning system! Perhaps the most important takeaway was not anything about consciousness but rather that AI had reached a point where it could convince otherwise intelligent people—indeed, someone with a real understanding of how it actually worked—that it was conscious. It indicated an odd truth about AI. On the one hand, it could convince a Google engineer it was sentient despite its dialogue being riddled with factual errors and contradictions. On the other hand, AI critics were ready to scoff, claiming that, once again, AI was a victim of its own hype, that actually nothing very impressive was going on. Not for the first time the field of AI had got itself into a complete muddle.
There’s a recurrent problem with making sense of progress in AI. We quickly adapt, even to breakthroughs that astound us initially, and within no time they seem routine, even mundane. We no longer gasp at AlphaGo or GPT-3. What seems like near-magic engineering one day is just another part of the furniture the next. It’s easy to become blasé and many have. In the words of John McCarthy, who coined the term “artificial intelligence”: “As soon as it works, no one calls it AI anymore.” AI is—as those of us building it like to joke—“what computers can’t do.” Once they can, it’s just software.
This attitude radically underplays how far we’ve come and how quickly things are moving. Although LaMDA was of course not sentient, soon it will be routine to have AI systems that can convincingly appear to be. So real will they seem, and so normal will it be, that the question of their consciousness will (almost) be moot.
Despite recent breakthroughs, skeptics remain. They argue that AI may be slowing, narrowing, becoming overly dogmatic. Critics like NYU professor Gary Marcus believe deep learning’s limitations are evident, that despite the buzz of generative AI the field is “hitting a wall,” that it doesn’t present any path to key milestones like being capable of learning concepts or demonstrating real understanding. The eminent professor of complexity Melanie Mitchell rightly points out that present-day AI systems have many limitations: they can’t transfer knowledge from one domain to another, provide quality explanations of their decision-making process, and so on. Significant challenges with real-world applications linger, including material questions of bias and fairness, reproducibility, security vulnerabilities, and legal liability. Urgent ethical gaps and unsolved safety questions cannot be ignored. Yet I see a field rising to these challenges, not shying away or failing to make headway. I see obstacles but also a track record of overcoming them. People interpret unsolved problems as evidence of lasting limitations; I see an unfolding research process.
So, where does AI go next as the wave fully breaks? Today we have narrow or weak AI: limited and specific versions. GPT-4 can spit out virtuoso texts, but it can’t turn around tomorrow and drive a car, as other AI programs do. Existing AI systems still operate in relatively narrow lanes. What is yet to come is a truly general or strong AI capable of human-level performance across a wide range of complex tasks—able to seamlessly shift among them. But this is exactly what the scaling hypothesis predicts is coming and what we see the first signs of in today’s systems.
AI is still in an early phase. It may look smart to claim that AI doesn’t live up to the hype, and it’ll earn you some Twitter followers. Meanwhile, talent and investment pour into AI research nonetheless. I cannot imagine how this will not prove transformative in the end. If for some reason LLMs show diminishing returns, then another team, with a different concept, will pick up the baton, just as the internal combustion engine repeatedly hit a wall but made it in the end. Fresh minds, new companies, will keep working at the problem. Then as now, it takes only one breakthrough to change the trajectory of a technology. If AI stalls, it will have its Otto and Benz eventually. Further progress—exponential progress—is the most likely outcome.
The wave will only grow.
Long before the days of LaMDA and Blake Lemoine, many people working in AI (not to mention philosophers, novelists, filmmakers, science fiction fans) were taken with the question of consciousness. They spent days at conferences asking whether it would be possible to create a “conscious” intelligence, one that was truly self-aware and that we humans would know was self-aware.
This ran parallel to an obsession with “superintelligence.” Over the last decade, intellectual and political elites in tech circles became absorbed by the idea that a recursively self-improving AI would lead to an “intelligence explosion” known as the Singularity. Huge intellectual effort is spent debating timelines, answering the question of whether it might arrive by 2045 or 2050 or maybe in a hundred years. Thousands of papers and blog posts later, not much has changed. Spend two minutes around AI and these topics come up.
I believe the debate about whether and when the Singularity will be achieved is a colossal red herring. Debating timelines to AGI is an exercise in reading crystal balls. While obsessing about this one concept of superintelligence, people overlook the numerous nearer-term milestones being met with growing frequency. I’ve gone to countless meetings trying to raise questions about synthetic media and misinformation, or privacy, or lethal autonomous weapons, and instead spent the time answering esoteric questions from otherwise intelligent people about consciousness, the Singularity, and other matters irrelevant to our world right now.
For years people framed AGI as likely to come at the flick of a switch. AGI is binary—you either have it or you don’t, a single, identifiable threshold that would be crossed by a given system. I’ve always thought that this characterization is wrong. Rather, it’s a gradual transition, where AI systems become increasingly capable, consistently nudging toward AGI. It’s not a vertical takeoff so much as a smooth evolution already underway.
We don’t need to get sidetracked into arcane debates about whether consciousness requires some indefinable spark forever lacking in machines, or whether it’ll just emerge from neural networks as we know them today. For the time being, it doesn’t matter whether the system is self-aware, or has understanding, or has humanlike intelligence. All that matters is what the system can do. Focus on that, and the real challenge comes into view: systems can do more, much more, with every passing day.
In a paper published in 1950, the computer scientist Alan Turing suggested a legendary test for whether an AI exhibited human-level intelligence. When AI could display humanlike conversational abilities for a lengthy period of time, such that a human interlocutor couldn’t tell they were speaking to a machine, the test would be passed: the AI, conversationally akin to a human, deemed intelligent. For more than seven decades this simple test has been an inspiration for many young researchers entering the field of AI. Today, as the LaMDA-sentience saga illustrates, systems are already close to passing the Turing test.
But, as many have pointed out, intelligence is about so much more than just language (or indeed any other single facet of intelligence taken in isolation). One particularly important dimension is in the ability to take actions. We don’t just care about what a machine can say; we also care about what it can do.
What we would really like to know is, can I give an AI an ambiguous, open-ended, complex goal that requires interpretation, judgment, creativity, decision-making, and acting across multiple domains, over an extended time period, and then see the AI accomplish that goal?
Put simply, passing a Modern Turing Test would involve something like the following: an AI being able to successfully act on the instruction “Go make $1 million on Amazon in a few months with just a $100,000 investment.” It might research the web to look at what’s trending, finding what’s hot and what’s not on Amazon Marketplace; generate a range of images and blueprints of possible products; send them to a drop-ship manufacturer it found on Alibaba; email back and forth to refine the requirements and agree on the contract; design a seller’s listing; and continually update marketing materials and product designs based on buyer feedback. Aside from the legal requirements of registering as a business on the marketplace and getting a bank account, all of this seems to me eminently doable. I think it will be done with a few minor human interventions within the next year, and probably fully autonomously within three to five years.
Should my Modern Turing Test for the twenty-first century be met, the implications for the global economy are profound. Many of the ingredients are in place. Image generation is well advanced, and the ability to write and work with the kinds of APIs that banks and websites and manufacturers would demand is in process. That an AI can write messages or run marketing campaigns, all activities that happen within the confines of a browser, seems pretty clear. Already the most sophisticated services can do elements of this. Think of them as proto–to-do lists that do themselves, enabling the automation of a wide range of tasks.
We’ll come to robots later, but the truth is that for a vast range of tasks in the world economy today all you need is access to a computer; most of global GDP is mediated in some way through screen-based interfaces amenable to an AI. The challenge is in advancing what AI developers call hierarchical planning, stitching multiple goals and subgoals and capabilities into a seamless process toward a singular end. Once this is achieved, it adds up to a highly capable AI, plugged into a business or organization and all its local history and needs, that can lobby, sell, manufacture, hire, plan—everything a company can do, only with a small team of human AI managers who oversee, double-check, implement, and co-CEO with the AI.
Rather than get too distracted by questions of consciousness, then, we should refocus the entire debate around near-term capabilities and how they will evolve in the coming years. As we have seen, from Hinton’s AlexNet to Google’s LaMDA, models have been improving at an exponential rate for more than a decade. These capabilities are already very real indeed, but they are nowhere near slowing down. While they are already having an enormous impact, they will be dwarfed by what happens as we progress through the next few doublings and as AIs complete complex, multistep end-to-end tasks on their own.
I think of this as “artificial capable intelligence” (ACI), the point at which AI can achieve complex goals and tasks with minimal oversight. AI and AGI are both parts of the everyday discussion, but we need a concept encapsulating a middle layer in which the Modern Turing Test is achieved but before systems display runaway “superintelligence.” ACI is shorthand for this point.
The first stage of AI was about classification and prediction—it was capable, but only within clearly defined limits and at preset tasks. It could differentiate between cats and dogs in images, and then it could predict what came next in a sequence to produce pictures of those cats and dogs. It produced glimmers of creativity, and could be quickly integrated into tech companies’ products.
ACI represents the next stage of AI’s evolution. A system that not only could recognize and generate novel images, audio, and language appropriate to a given context, but also would be interactive—operating in real time, with real users. It would augment these abilities with a reliable memory so that it could be consistent over extended timescales and could draw on other sources of data, including, for example, databases of knowledge, products, or supply-chain components belonging to third parties. Such a system would use these resources to weave together sequences of actions into long-term plans in pursuit of complex, open-ended goals, like setting up and running an Amazon Marketplace store. All of this, then, enables tool use and the emergence of real capability to perform a wide range of complex, useful actions. It adds up to a genuinely capable AI, an ACI.
Conscious superintelligence? Who knows. But highly capable learning systems, ACIs, that can pass some version of the Modern Turing Test? Make no mistake: they are on their way, are already here in embryonic form. There will be thousands of these models, and they will be used by the majority of the world’s population. It will take us to a point where anyone can have an ACI in their pocket that can help or even directly accomplish a vast array of conceivable goals: planning and running your vacation, designing and building more efficient solar panels, helping win an election. It’s hard to say for certain what happens when everyone is empowered like this, but this is a point we’ll return to in part 3.
The future of AI is, at least in one sense, fairly easy to predict. Over the next five years, vast resources will continue to be invested. Some of the smartest people on the planet are working on these problems. Orders of magnitude more computation will train the top models. All of this will lead to more dramatic leaps forward, including breakthroughs toward AI that can imagine, reason, plan, and exhibit common sense. It won’t be long before AI can transfer what it “knows” from one domain to another, seamlessly, as humans do. What are now only tentative signs of self-reflection and self-improvement will leap forward. These ACI systems will be plugged into the internet, capable of interfacing with everything we humans do, but on a platform of deep knowledge and ability. It will be not just language they’ve mastered but a bewildering array of tasks, too.
AI is far deeper and more powerful than just another technology. The risk isn’t in overhyping it; it’s rather in missing the magnitude of the coming wave. It’s not just a tool or platform but a transformative meta-technology, the technology behind technology and everything else, itself a maker of tools and platforms, not just a system but a generator of systems of any and all kinds. Step back and consider what’s happening on the scale of a decade or a century. We really are at a turning point in the history of humanity.
And yet there is so much more to the coming wave than just AI.