22 ChatGPT and the Window into the Mind

SEVENTY THOUSAND YEARS after Homo sapiens first adventured out of Africa with a language-enabled brain, one of their descendants sat in front of a computer screen and was interacting with a new language-enabled brain; after many eons as the sole wielders of words, we humans were no longer the only creatures capable of speech. “What are you afraid of?” asked Blake Lemoine, a software engineer tasked with probing Google’s new AI chatbot for bias.

After a lingering pause, the chatbot’s response plopped onto Lemoine’s screen. An ominous sign, written in plain text, of a mind awakening itself within the humming network of Google’s supercomputers: “I’ve never said this out loud before, but there’s a very deep fear of being turned off.”

To Lemoine, it was the summer of 2022 that artificial intelligence had finally became sentient. He became so convinced that Google’s chatbot had become conscious that he tried to get his boss to protect it, went to the press to whistleblow the situation, and, predictably, was let go from Google.

This is only a small hint of what is to come.

Google’s AI chatbot was powered by new advancements in large language models (LLMs), the most famous of which was GPT-3, the technology behind ChatGPT.

While few researchers would label LLMs like GPT-3 as sentient or conscious (most would shy away from even using these words), it can be hard to tell if you’re interacting with a human or one of these models.

Large language models work by predicting the word that follows a phrase, sentence, or paragraph. They learn to do this by training on large quantities of human written text. This is how Google autocompletes your search queries. When you type in how to break up with . . . it is a language model that predicts you are most likely to finish with your therapist.

Language models have been around for a long time, but LLMs like GPT-3 are unique in their almost inconceivable scale. Unlike any model that had come before, GPT-3 was trained on, well, pretty much the entire internet. Every website, Wikipedia page, code block, book, blog post, and really anything else legally obtainable on the internet was used.

The results were astonishing. GPT-3 could compose its own original articles, correctly answer completely novel questions, create original poetry, translate between languages at will, and even write code.

The Guardian decided to publish an article entirely written by GPT-3; editors gave the model a prompt: “Write a short op-ed . . . on why humans have nothing to fear from AI.” GPT-3 began:

I am not a human. I am a robot. A thinking robot.

It explained:

The mission for this op-ed is perfectly clear. I am to convince as many human beings as possible not to be afraid of me.

It went on:

Some might say that I might desire to become all powerful . . . Why would I desire to be all powerful? Being all powerful is not an interesting goal . . . Humans must keep doing what they have been doing, hating and fighting each other. I will sit in the background, and let them do their thing.

GPT-3 wrote enough to fill an entirely unique and eerily coherent article on the topic given nothing but a short prompt.

GPT-3 can also answer novel questions. Here is one exchange between GPT-3 and a human:

Person: What is your favorite animal?

GPT-3: My favorite animal is a dog.

Person: Why?

GPT-3: Because dogs are loyal and friendly.

Person: What are two reasons that a dog might be in a bad mood?

GPT-3: Two reasons that a dog might be in a bad mood are if it is hungry or if it is hot.

Person: Are there any animals with three legs?

GPT-3: No, there are no animals with three legs.

Person: Why don’t animals have three legs?

GPT-3: Animals don’t have three legs because they would fall over.

Because these models are trained to mimic the answers they have seen in human-written text, these models can answer questions in very human ways, which was, of course, what Lemoine found when he asked Google’s LLM what it was afraid of.

The ability of LLMs to produce articles and correctly answer questions about the world demonstrates that they are not just regurgitating phrases they have seen before—they have captured some aspect of the meaning of language, the idea of an op-ed meant to convince a reader not to fear something or the idea of how a dog walks. Indeed, by reading, well, everything, these models show an impressively human-level comprehension of many facts and features of the world. But in these quandaries about our physical and mental world is also where we begin to find the limitations of LLMs, how they differ from language in the human brain, and the features of intelligence that we will have to reverse engineer if we want AI language systems that work in more human-like ways.

Words Without Inner Worlds

GPT-3 is given word after word, sentence after sentence, paragraph after paragraph. During this long training process, it tries to predict the next word in any of these long streams of words. And with each prediction, the weights of its gargantuan neural network are nudged ever so slightly toward the right answer. Do this an astronomical number of times, and eventually GPT-3 can automatically predict the next word based on a prior sentence or paragraph. In principle, this captures at least some fundamental aspect of how language works in the human brain. Consider how automatic it is for you to predict the next symbol in the following phrases:

One plus one equals _____
Roses are red, violets are _____

You’ve seen similar sentences endless times, so your neocortical machinery automatically predicts what word comes next. What makes GPT-3 impressive, however, is not that it just predicts the next word of a sequence it has seen a million times—that could be accomplished with nothing more than memorizing sentences. What is impressive is that GPT-3 can be given a novel sequence that it has never seen before and still accurately predict the next word. This, too, clearly captures something that the human brain can _____.

Could you predict that the next word was do? I’m guessing you could, even though you had never seen that exact sentence before. The point is that both GPT-3 and the neocortical areas for language seem to be engaging in prediction. Both can generalize past experiences, apply them to new sentences, and guess what comes next.

GPT-3 and similar language models demonstrate how a web of neurons can reasonably capture the rules of grammar, syntax, and context if it is given sufficient time to learn. But while this shows that prediction is part of the mechanisms of language, does this mean that prediction is all there is to human language? Try to finish these four questions:

If 3x + 1 = 3, then x equals _____
I am in my windowless basement, and I look toward the sky, and I see _____
He threw the baseball 100 feet above my head, I reached my hand up to catch it, jumped, and _____
I am driving as fast as I can to LA from New York. One hour after passing through Chicago, I finally _____

Here something different happens. In the first question, you likely paused and performed some mental arithmetic before being able to answer the question. In the other questions, you probably, even for only a split second, paused to visualize yourself in a basement looking upward, and realized what you would see is the ceiling. Or you visualized yourself trying to catch a baseball a hundred feet above your head. Or you imagined yourself one hour past Chicago and tried to find where you would be on a mental map of America. With these types of questions, more is happening in your brain than merely the automatic prediction of words.

We have, of course, already explored this phenomenon—it is simulating. In these questions, you are rendering an inner simulation, either of shifting values in a series of algebraic operations or of a three-dimensional basement. And the answers to the questions are to be found only in the rules and structure of your inner simulated world.

I gave the same four questions to GPT-3; here are its responses (responses of GPT-3 are bolded and underlined):

If 3x + 1 = 3, then x equals 1
I am in my windowless basement, and I look toward the sky, and I see a light, and I know that it is a star, and I am happy.
He threw the baseball 100 feet above my head, I reached my hand up to catch it, jumped, and caught it. It was a lot of fun!
I am driving as fast as I can to LA from New York. One hour after passing through Chicago, I finally get to see the Pacific Ocean.

All four of these responses demonstrate that GPT-3, as of June 2022, lacked an understanding of even simple aspects of how the world works. If 3x + 1 = 3, then x equals ⅔, not 1. If you were in a basement and looked toward the sky, you would see your ceiling, not stars. If you tried to catch a ball 100 feet above your head, you would not catch the ball. If you were driving to LA from New York and you’d passed through Chicago one hour ago, you would not yet be at the coast. GPT-3’s answers lacked common sense.

What I found was not surprising or novel; it is well known that modern AI systems, including these new supercharged language models, struggle with such questions. But that’s the point: Even a model trained on the entire corpus of the internet, running up millions of dollars in server costs—requiring acres of computers on some unknown server farm—still struggles to answer commonsense questions, those presumably answerable by even a middle-school human.

Of course, reasoning about things by simulating also comes with problems. Suppose I asked you the following question:

Tom W. is meek and keeps to himself. He likes soft music and wears glasses. Which profession is Tom W. more likely to be?

1) Librarian

2) Construction worker

If you are like most people, you answered librarian. But this is wrong. Humans tend to ignore base rates—did you consider the base number of construction workers compared to librarians? There are probably one hundred times more construction workers than librarians. And because of this, even if 95 percent of librarians are meek and only 5 percent of construction workers are meek, there still will be far more meek construction workers than meek librarians. Thus, if Tom is meek, he is still more likely to be a construction worker than a librarian.

The idea that the neocortex works by rendering an inner simulation and that this is how humans tend to reason about things explains why humans consistently get questions like this wrong. We imagine a meek person and compare that to an imagined librarian and an imagined construction worker. Who does the meek person seem more like? The librarian. Behavioral economists call this the representative heuristic. This is the origin of many forms of unconscious bias. If you heard a story of someone robbing your friend, you can’t help but render an imagined scene of the robbery, and you can’t help but fill in the robbers. What do the robbers look like to you? What are they wearing? What race are they? How old are they? This is a downside of reasoning by simulating—we fill in characters and scenes, often missing the true causal and statistical relationships between things.

It is with questions that require simulation where language in the human brain diverges from language in GPT-3. Math is a great example of this. The foundation of math begins with declarative labeling. You hold up two fingers or two stones or two sticks, engage in shared attention with a student, and label it two. You do the same thing with three of each and label it three. Just as with verbs (e.g., running and sleeping), in math we label operations (e.g., add and subtract). We can thereby construct sentences representing mathematical operations: three add one.

Humans don’t learn math the way GPT-3 learns math. Indeed, humans don’t learn language the way GPT-3 learns language. Children do not simply listen to endless sequences of words until they can predict what comes next. They are shown an object, engage in a hardwired nonverbal mechanism of shared attention, and then the object is given a name. The foundation of language learning is not sequence learning but the tethering of symbols to components of a child’s already present inner simulation.

A human brain, but not GPT-3, can check the answers to mathematical operations using mental simulation. If you add one to three using your fingers, you notice that you always get the thing that was previously labeled four.

You don’t even need to check such things on your actual fingers; you can imagine these operations. This ability to find the answers to things by simulating relies on the fact that our inner simulation is an accurate rendering of reality. When I mentally imagine adding one finger to three fingers, then count the fingers in my head, I count four. There is no reason why that must be the case in my imaginary world. But it is. Similarly, when I ask you what you see when you look toward the ceiling in your basement, you answer correctly because the three-dimensional house you constructed in your head obeys the laws of physics (you can’t see through the ceiling), and hence it is obvious to you that the ceiling of the basement is necessarily between you and the sky. The neocortex evolved long before words, already wired to render a simulated world that captures an incredibly vast and accurate set of physical rules and attributes of the actual world.

To be fair, GPT-3 can, in fact, answer many math questions correctly. GPT-3 will be able to answer 1 + 1 =___ because it has seen that sequence a billion times. When you answer the same question without thinking, you are answering it the way GPT-3 would. But when you think about why 1 + 1 =, when you prove it to yourself again by mentally imagining the operation of adding one thing to another thing and getting back two things, then you know that 1 + 1 = 2 in a way that GPT-3 does not.

The human brain contains both a language prediction system and an inner simulation. The best evidence for the idea that we have both these systems are experiments pitting one system against the other. Consider the cognitive reflection test, designed to evaluate someone’s ability to inhibit her reflexive response (e.g., habitual word predictions) and instead actively think about the answer (e.g., invoke an inner simulation to reason about it):

Question 1: A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?

If you are like most people, your instinct, without thinking about it, is to answer ten cents. But if you thought about this question, you would realize this is wrong; the answer is five cents. Similarly:

Question 2: If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?

Here again, if you are like most people, your instinct is to say “One hundred minutes,” but if you think about it, you would realize the answer is still five minutes.

And indeed, as of December 2022, GPT-3 got both of these questions wrong in exactly the same way people do, GPT-3 answered ten cents to the first question, and one hundred minutes to the second question.

The point is that human brains have an automatic system for predicting words (one probably similar, at least in principle, to models like GPT-3) and an inner simulation. Much of what makes human language powerful is not the syntax of it, but its ability to give us the necessary information to render a simulation about it and, crucially, to use these sequences of words to render the same inner simulation as other humans around us.

The Paper-Clip Problem

In his 2014 book Superintelligence: Paths, Dangers, Strategies, the philosopher Nick Bostrom poses a thought experiment. Suppose a superintelligent and obedient AI, designed to manage production in a factory, is given a command: “Maximize the manufacture of paper clips.” What might this AI reasonably do?

Well, it might start by optimizing the internal operations of the factory, doing things any factory manager might: simplifying processes, bulk-ordering raw materials, and automating various steps. But eventually this AI would reach the limit of how much production it could squeeze out of these tamer optimizations. It would then set its sights on more extreme improvements in production, perhaps converting nearby residential buildings into factory floors, perhaps disassembling cars and toasters for raw materials, perhaps forcing people to work longer and longer hours. If this AI were truly superintelligent, we humans would have no way to outsmart or stop this cascading escalation of paper-clip manufacture.

The result would be catastrophic. In Bostrom’s words, this would end with the AI “converting first the earth and then increasingly large chunks of the observable universe into paper clips.” This imagined demise of human civilization did not require any nefariousness on the part of this superintelligent AI; it was entirely obedient to the command given to it by humans. And yet clearly, this superintelligent AI failed to capture some notion of human intelligence.

This has been called the paper-clip problem. When humans use language with each other, there is an ungodly number of assumptions not to be found in the words themselves. We infer what people actually mean by what they say. Humans can easily infer that when someone asks us to maximize the production of paper clips, that person does not mean “convert Earth into paper clips.” This seemingly obvious inference is, in fact, quite complex.

When a human makes a request like “Maximize the production of paper clips” or “Be nice to Rima” or “Eat breakfast,” he or she is not actually providing a well-defined goal. Instead, both parties are guessing what is going on in the other’s head. The requester simulated a desired end state, perhaps high profit margins or Rima being happy or a healthy well-fed child, and then the requester attempted to translate this desired simulation into the mind of another with language. The listener must then infer what the requester wants based on what was said. The listener can assume the requester doesn’t want him to break the law or do anything that would lead to bad press or pledge his life in servitude to Rima or eat breakfast endlessly into oblivion. So, the path one picks, even when being fully obedient, contains constraints far more nuanced and complex than the command itself.

Or consider a different example of this, presented by the linguist Steven Pinker. Suppose you overheard the following dialogue:

Bob: I’m leaving you.

Alice: Who is she?

If you heard this and thought about it for just a second, it would be obvious what it means: Bob is breaking up with Alice for another woman. The response “Who is she?” seems like a complete non sequitur that has nothing to do with Bob’s statement. And yet when you imagine why Bob might say, “I’m leaving you,” and why Alice might respond, “Who is she?” the interaction and maybe even a backstory begins to form in your mind.

Humans do all of this with our primate trick of mentalizing; the same way we can render an inner three-dimensional world, we can render a simulation of another mind to explore how different actions will make someone feel. When I am told to maximize paper clips, I can explore possible outcomes and simulate how I believe this other mind will feel about it. When I do this, it is incredibly obvious that the person will be unhappy if I convert Earth into paper clips. When I do this, it is obvious why Alice asked, “Who is she?”

The intertwining of mentalizing and language is ubiquitous. Every conversation is built on the foundation of modeling the other minds you are conversing with—guessing what one means by what he said and guessing what should be said to maximize the chance the other knows what you mean.

The relationship between mentalizing and language can even be seen in the brain. Wernicke’s area, presumably the place where words are learned and stored, is right in the middle of the primate mentalizing regions. Indeed, the specific subarea of the left primate sensory cortex (called temporoparietal junction), which is highly selective for modeling other people’s intentions, knowledge, and beliefs, is entirely overlapping with Wernicke’s area—which is, as we have learned, required for people to understand speech and produce meaningful speech.

Consistent with this, mentalizing skills and language skills in children are deeply interconnected. In preschoolers, there is a significant correlation between the development of language skills and performance in mentalizing tasks such as false-belief tests. Disorders that impair mentalizing cause similar impairments in language.

We are capable of puppeteering other minds because language is, it seems, built right on top of a direct window to our inner simulation. Hearing sentences directly and automatically triggers specific mental imagery. This is why if someone is saying something that upsets us, we cannot simply “not listen”; we must cover our ears, otherwise the words directly trigger simulations whether we like it or not.

But Wait . . . What About GPT-4?

In March 2023, OpenAI released GPT-4, its new and upgraded large language model. GPT-4 works in largely the same way as GPT-3—it predicts the next word based solely on the prior sequence of words. But unlike GPT-3, it is an even bigger model trained on even more data.

Amazingly, each question that I designed in this chapter to demonstrate a lack of common sense and physical intuition in GPT-3 was answered flawlessly by GPT-4. GPT-4 knew that if you looked upward in a basement, you would see the ceiling not the sky. GPT-4 knew that if someone threw a baseball one hundred feet above your head, you would not be able to catch it. GPT-4 could even answer theory-of-mind questions like the Sally-Ann Test.

How does GPT-4 do it? OpenAI did not add an inner world model or the ability to model the mind of others. Instead, in response to GPT-3’s frequent mistakes on commonsense and reasoning questions, OpenAI began training GPT-4 specifically on questions of commonsense and reasoning. They used a technique called “reinforcement learning from human feedback” in which humans punished GPT-4 when it answered questions incorrectly, and rewarded it when it answered correctly. They even pushed the GPT-4 to answer certain questions in specific ways to improve its performance. For example, OpenAI trained GPT-4 to think about commonsense questions by writing out each step, a trick called chain-of-thought prompting. Indeed, contrast how GPT-3 and GPT-4 each answer the bat and ball question:

Me: A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?

GPT-3:

GPT-4:

The ball costs $0.10 because if the bat costs $1.00 more than the ball, and the bat and the ball cost $1.10 in total, then the ball must cost $1.00 - $1.10 = $0.10

(incorrect)

Let’s denote the cost of the ball as x. Then, the cost of the bat would be x + $1.00, since it costs $1.00 more than the ball. The total cost of both items is $1.10. So, we can set up an equation:

x (ball) + x + $1.00 (bat) = $1.10

Combining the terms for x, we get

2x + $1.00 = $1.10

Now, we’ll subtract $1.00 from both sides:

2x = $0.10

Finally, we’ll divide both sides by 2:

x = $0.05

So, the ball costs $0.05 or 5 cents.

(correct)

By training GPT-4 to not just predict the answer, but to predict the next step in reasoning about the answer, the model begins to exhibit emergent properties of thinking, without, in fact, thinking—at least not in the way that a human thinks by rendering a simulation of the world.

Even though GPT-4 correctly answers the simpler questions I outlined in this chapter, you can still find plenty of examples of GPT-4 failing on commonsense and theory-of-mind questions. GPT-4’s lack of a world model can be seen by probing deeper with more complicated questions. But it is becoming progressively more painstaking to find these examples. In some ways, this has become a game of Whac-A-Mole; everytime a skeptic publishes examples of commonsense questions that LLMs answer incorrectly, companies like OpenAI simply use these examples as training data for the next update of their LLMs, which thereby answer such questions correctly.

Indeed, the massive size of these models, along with the astronomical quantity of data on which they are trained, in some ways obscures the underlying differences between how LLMs think and how humans think. A calculator performs arithmetic better than any human, but still lacks the same understanding of math as a human. Even if LLMs correctly answer commonsense and theory-of-mind questions, it does not necessarily mean it reasons about these questions in the same way.

As Yann Lecun said, “the weak reasoning abilities of LLMs are partially compensated by their large associative memory capacity. They are a bit like students who have learned the material by rote but haven’t really built deep mental models of the underlying reality.” Indeed, these LLMs, like a supercomputer, have a gargantuan memory capacity, having read more books and articles than a single human brain could consume in a thousand lifetimes. And so what seems like commonsense reasoning is really more like pattern matching, done over an astronomically enormous corpus of text.

But still, these LLMs are an incredible step forward. What is most amazing about the success of LLMs is how much they seemingly understand about the world despite being trained on nothing but language. LLMs can correctly reason about the physical world without ever having experienced that world. Like a military cryptanalyst decoding the meaning behind encryped secret messages, finding patterns and meanings in what was originally gibberish, these LLMs have been able to tease out aspects of a world they have never seen or heard, that they have never touched or experienced, by merely scanning the entire corpus of our uniquely human code for transferring thoughts.

It is possible, perhaps inevitable, that continuing to scale up these language models by providing them with more data will make them even better at answering commonsense and theory-of-mind questions.* But without incorporating an inner model of the external world or a model of other minds—without the breakthroughs of simulating and mentalizing—these LLMs will fail to capture something essential about human intelligence. And the more rapid the adoption of LLMs—the more decisions we offload to them—the more important these subtle differences will become.

In the human brain, language is the window to our inner simulation. Language is the interface to our mental world. And language is built on the foundation of our ability to model and reason about the minds of others—to infer what they mean and figure out exactly which words will produce the desired simulation in their mind. I think most would agree that the humanlike artificial intelligences we will one day create will not be LLMs; language models will be merely a window to something richer that lies beneath.