Chapter 15. BARBARA J. GROSZ

I’m thrilled that AI is actually out there in the world making a difference because I didn’t think that it would happen in my lifetime—because it seemed the problems were so hard.

HIGGINS PROFESSOR OF NATURAL SCIENCES, HARVARD UNIVERSITY

Barbara J. Gros Barbara J. Grosz is Higgins Professor of Natural Sciences at Harvard University. Over the course of her career, she has made ground-breaking contributions in artificial intelligence that have led to the foundational principles of dialogue processing that are important for personal assistants like Apple’s Siri or Amazon’s Alexa. In 1993, she became the first woman to serve as president of the Association for the Advancement of Artificial Intelligence.

MARTIN FORD: What initially drove you to be interested in artificial intelligence, and how did your career progress?

BARBARA GROSZ: My career was a series of happy accidents. I went to college thinking I would be a 7th-grade math teacher because my 7th-grade math teacher was the only person I had met in my first 18 years of life who thought that women, in general, could do mathematics, and he told me that I was quite good at math. My world really opened up though when I went to Cornell for college, as they had just started a computer science faculty.

At the time there was no undergraduate major in computer science anywhere in the US, but Cornell provided the opportunity to take a few classes. I started in numerical analysis, a rather mathematical area of computer science, and ended up going to Berkeley to graduate school, initially for a master’s, then I moved into the PhD program.

I worked in what would come to be called computational science and then briefly in theoretical computer science. I decided that I liked the solutions in the mathematical areas of computer science, but not the problems. So when I needed a thesis topic, I talked with many people. Alan Kay said to me, “Listen. You have to do something ambitious for your thesis. Why don’t you write a program that will read a children’s story and tell it back from one of the character’s points of view?” That’s what spurred my interest in natural language processing and is the root of my becoming an AI researcher.

MARTIN FORD: Alan Kay? He invented the graphical user interface at Xerox PARC, right? That’s where Steve Jobs got the idea for the Macintosh.

BARBARA GROSZ: Yes, right, Alan was a key player in that Xerox PARC work. I actually worked with him on developing a programming language called Smalltalk, which was an object-oriented language. Our goal was to build a system suitable for students [K-12] and learning. My children’s story program was to be written in Smalltalk. Before the Smalltalk system was finished, though, I realized that children’s stories were not just stories to be read and understood, but that they’re meant to inculcate a culture, and that Alan’s challenge to me was going to be really hard to meet.

During that time, the first group of speech-understanding systems were also being developed through DARPA projects, and the people at SRI International who were working on one of them said to me, “If you’re willing to take the risk of working on children’s stories, why don’t you come work with us on a more objective kind of language, task-oriented dialogues, but using speech not text?” As a result, I got involved in the DARPA speech work, which was on systems that would assist people in getting tasks done, and that’s really when I started to do AI research.

It was that work which led to my discovery of how dialogue among people, when they’re working on a task together, has a structure that depends on the task structure—and that a dialogue is much more than just question-answer pairs. From that insight, I came to realize that as human beings we don’t in general ever speak in a sequence of isolated utterances, but that there’s always a larger structure, much like there is for a journal article, a newspaper article, a textbook, even for this book, and that we can model that structure. This was my first major contribution to natural-language processing and AI.

MARTIN FORD: You’ve touched on one of the natural language breakthroughs that you’re most known for: an effort to somehow model a conversation. The idea that a conversation can be computed, and that there’s some structure within a conversation that can be represented mathematically.

I assume that this has become very important, because we’ve seen a lot of progress in the field. Maybe you could talk about some of the work you’ve done there and how things have progressed. Has it astonished you where things are at now in terms of natural language processing, compared to where they were back when you started your research?

BARBARA GROSZ: It absolutely has astonished me. My early work was exactly in this area of how we might be able to build a computer system that could carry on a dialogue with a person fluently and in a way that seemed natural. One of the reasons I got connected to Alan Kay, and did that work with him, was because we shared an interest in building computer systems that would work with and adapt to people, rather than require people to adapt to them.

At the time that I took that work on, there was a lot of work in linguistics on syntax and on formal semantics in philosophy and linguistics, and on parsing algorithms in computer science. People knew there was more to language understanding than an individual sentence, and they knew that context mattered, but they had no formal tools, no mathematics, and no computational constructs to take that context into account in speech systems.

I said to people at the time that we couldn’t afford to just hypothesize about what was going on, that we couldn’t just carry on introspecting, that we had to get samples of how people actually carry on a dialogue when they’re doing a task. As a result, I invented this approach, which later was dubbed the “The Wizard of Oz” approach by some psychologists. In this work, I sat two people—in this case, an expert and an apprentice—in two different rooms, and I had the expert explain to the apprentice how to get something done. It was by studying the dialogues that resulted from their working together that I recognized the structure in these dialogues and its dependence on task structure.

Later, I co-wrote a paper with Candy Sidner titled Attention, Intentions, and the Structure of Discourse. In that paper we argue that dialogues have a structure that is in part the language itself and is in part the intentional structure of why you’re speaking, and what your purposes are when speaking. This intentional structure was a generalization of task structure. These structural aspects are then moderated by a model of the attentional state.

MARTIN FORD: Let’s fast forward and talk about today. What’s the biggest difference that you’ve seen?

BARBARA GROSZ: The biggest difference I see is going from speech systems that were essentially deaf, to today’s systems that are incredibly good at processing speech. In the early days we really could not get much out of speech, and it proved very hard to get the right kinds of parses and meaning back then. We’ve also come a long way forward with how incredibly well today’s technology can process individual utterances or sentences, which you can see in modern search engines and machine translation systems.

If you consider any of the systems that purport to carry on dialogues, however, the bottom line is they essentially don’t work. They seem to do well if the dialogue system constrains the person to following a script, but people aren’t very good at following a script. There are claims that these systems can carry on a dialogue with a person, but in truth, they really can’t. For instance, the Barbie doll that supposedly can converse with a child is script-based and gets in trouble if the child responds in a way the designers didn’t anticipate. I’ve argued that the mistakes it makes actually raise some serious ethical challenges.

Similar examples arise with all the phone personal assistant systems. For example, if you ask where the nearest emergency room is, you’ll get an answer of the nearest hospital to wherever you are when you ask, but if you ask where you can go to get a sprained ankle treated, the system is likely to just take you to a web page that tells you how to treat a sprained ankle. That’s not a problem for a sprained ankle, but if you’re asking about a heart attack because you think someone’s had one, it could actually lead to death. People would assume a system that can answer one of those questions you can answer the other.

A related problem arises with dialogue systems based on learning from data. Last summer (2017), I was given the Association for Computational Linguistics Lifetime Achievement Award and almost all the people listening to my talk at the conference work on deep learning based natural-language systems. I told them, “if you want to build a dialogue system, you have to recognize that Twitter is not a real dialogue.” To build a dialogue system that can handle dialogues of the sort people actually engage in, you need to have real data of real people having real dialogues, and that’s much harder to get than Twitter data.

MARTIN FORD: When you talk about going off script, it seems to me that this is the blurry line between pure language processing and real intelligence. The ability to go off script and deal with unpredictable situations is what true intelligence is all about; it’s the difference between an automaton or robot and a person.

BARBARA GROSZ: You’re exactly right, and that’s exactly the problem. If you think about having a lot of data, that, with deep learning, enables you to, say, go from a sentence in one language to the same sentence in another language; or to go from a sentence with a question in it to an answer to that question; or from one sentence to a possible following sentence, there’s no real understanding of what those sentences actually mean, so there’s no way to work off script with them.

This problem links back to a philosophical idea that was elaborated in the 1960s by Paul Grice, J. L. Austin, and John Searle that language is action. For example, if I say to the computer, “The printer is broken,” then what I don’t want is for it to say back to me, “Thanks, fact recorded.” What I actually want is for the system to do something that will get the printer fixed. For that to occur, the system needs to understand why I said something.

Current deep-learning based natural-language systems perform poorly on these kinds of sentences in general. The reasons are really deeply rooted. What we’re seeing here, is that these systems are really good at statistical learning, pattern recognition and large-scale data analysis, but they don’t go below the surface. They can’t reason about the purposes behind what someone says. Put another way, they ignore the intentional structure component of dialogue. Deep-learning based systems more generally lack other hallmarks of intelligence: they cannot do counterfactual reasoning or common-sense reasoning.

You need all these capabilities to participate in a dialogue, unless you tightly constrain what a person says and does; but that makes it very hard for people to actually do what they want to do!

MARTIN FORD: What would you point to as being state-of-the-art right now? I was pretty astonished when I saw IBM Watson win at Jeopardy! I thought that was really remarkable. Was that as much of a breakthrough as it seemed to be, or would you point to something else as really being on the leading edge?

BARBARA GROSZ: I was impressed by Apple’s Siri and by IBM’s Watson; they were phenomenal achievements of engineering. I think that what is available today with natural language and speech systems is terrific. It’s changing the way that we interact with computer systems, and it’s enabling us to get a lot done. But these systems are nowhere near the human capacity for language, and you see that when you try to engage in a dialogue with them.

When Siri came out it in 2011, it took me about three questions to break the system. Where Watson makes mistakes is most interesting in that it shows us where it is not processing language like people do.

So yes, on the one hand, I think the progress in natural language and speech systems is phenomenal. We are far beyond what we could do in the ‘70s, partly because computers are way more powerful, and partly because there’s a lot more data out there. I’m thrilled that AI is actually out in the world making a difference because I didn’t think that it would happen in my lifetime—because it seemed the problems were so hard.

MARTIN FORD: Really, you didn’t think it would happen in your lifetime?

BARBARA GROSZ: Back in the 1970s? No, I didn’t.

MARTIN FORD: I was certainly very taken aback by Watson and especially by the fact that it could handle, for example, puns, jokes, and very complex presentations of language.

BARBARA GROSZ: But just going back to “The Wizard of Oz” analogy, you look behind what’s actually in those systems, and you realize they all have limitations. We’re at a moment where it’s really important to understand what these systems are good at and where they fail.

This is why I think it’s very important for the field, and frankly for the world, to understand that we could make a lot more progress on AI systems that would be good for people in the world if we didn’t aim to replace people, or build generalized artificial intelligence—but if we instead focus our understanding on what all these great capabilities are both good and not good for, and how to complement people with these systems, and these systems with the people.

MARTIN FORD: Let’s focus on this idea of going off script and being able to really have a conversation. That relates directly to the Turing test, and I know you’ve done some additional work in that area. What do you think Turing’s intentions were in coming up with that test? Is it a good test of machine intelligence?

BARBARA GROSZ: I remind people that Turing proposed his test in 1950, a time where people had new computing machines that they thought were amazing. Now of course, those systems could do nothing compared to what a smartphone can do today, but at the time many people wondered if these machines could think like a human thinks. Remember, Turing used “intelligence” and “thinking” similarly—he wasn’t talking about intelligence like say, Nobel prize-winning science type of intelligence.

Turing was posing a very interesting philosophical question, and he made some conjectures about whether or not machines could exhibit a certain kind of behavior. The 1950s was also at a time where psychology was rooted in behaviorism, and so his test is not only an operational test but also a test where there would be no looking below the surface.

The Turing test is not a good test of intelligence. Frankly, I would probably fail the Turing test because I’m not very good at social banter. It’s also not a good guide for what the field should aim to do. Turing was an amazingly smart person, but I’ve conjectured, somewhat seriously, that if he were alive today—and if he knew what we now know about how learning works, how the brain and language work, and how people develop intelligence and thinking, then he would have proposed a different test.

MARTIN FORD: I know that you’ve proposed some enhancements or even a replacement for the Turing test.

BARBARA GROSZ: Who knows what Turing would have proposed, but I have made a proposal that, given that we know that the development of human intelligence depends on social interaction, and that language capacity depends on social interaction, and that human activity in many setting is collaborative—then I recommend that we aim to build a system that is a good team partner, and works so well with us that we don’t recognize that it isn’t human. I mean, it’s not that we’re fooled into the idea that a laptop, robot, or phone is a human being, but that you don’t keep wondering “Why did it do that?” when it makes a mistake that no human would.

I think that this is a better goal for the field, in part because it has several advantages over the Turing test. One advantage is that you can meet it incrementally—so if you pick a small enough arena in which to build a system, you can build a system that’s intelligent in that arena, and it works well on that kind of task. We could find systems out there now that we would say are intelligent in that way—and of course children, as they develop, are intelligent in different limited ways, and then they get more and different kinds of smart in more varied kinds of ways.

With the Turing test, a system either succeeds or it fails, and there’s no guide for how to incrementally improve its reasoning. For science to develop, you need to be able to make steps along the way. The test I proposed also recognizes that for the foreseeable future people and computer systems will have complementary abilities, and it builds on that insight rather than ignoring it.

I first proposed this test in a talk in Edinburgh on the occasion of the 100th anniversary of Turing’s birth. I said given all the progress in computing and psychology, “We should think of new tests.” I asked the attendees at that talk for their ideas, and in subsequent talks. To date, the main response has been that this test is a good one.

MARTIN FORD: I’ve always thought that once we really have machine intelligence, we’ll just kind of know it when we see it. It’ll just be somehow obvious, and maybe there’s not a really explicit test that you can define. I’m not sure there’s a single test for human intelligence. I mean, how do you know another human being is intelligent?

BARBARA GROSZ: That’s a really good observation. If you think about what I said when I gave this example of “where’s the nearest emergency room and where can I go to get a heart attack treated?”, no human being you would consider intelligent would be able to answer one of those questions and not the other one.

There’s a possibility that the person you asked might not be able to answer either question, say if you plonked them in some foreign city; but if they could answer one question, they could answer the other question. The point is, if you have a machine that answers both questions, then that seems intelligent to you. If you have a machine that answers only one and not the other question, then it doesn’t seem so intelligent.

What you just said actually fits with the test that I proposed. If the AI system is going along and acting, as it were, as intelligently as you would expect another human to act, then you’d think it is intelligent. What happens right now with many AI systems, is that people think the AI system is smart and then it does something that takes them aback, and then they think it’s completely stupid. At that point, the human wants to know why the AI system worked that way or didn’t work the way they expected, and by the end they no longer think it’s so smart.

By the way, the test that I proposed is not time-limited; in fact, it is actually supposed to be extended in time. Turing’s test was also not supposed to have a time limit, but that characteristic has been frequently forgotten, in particular in various recent AI competitions.

MARTIN FORD: That seems silly. People aren’t intelligent for only half an hour. It has to be for an indefinite time period to demonstrate true intelligence. I think there’s something called the Loebner Prize where Turing tests are run under certain limited conditions each year.

BARBARA GROSZ: Right, and it proves what you say. It also makes clear what we learned very early on in the natural-language processing arena, which is that if you have only a fixed task with a fixed set of issues (and in this case, a fixed amount of time), then cheap hacks will always win over real intelligent processing, because you’ll just design your AI system to the test!

MARTIN FORD: The other area that you have worked in is multi-agent systems, which sounds pretty esoteric. Could you talk a little about that and explain what that means?

BARBARA GROSZ: When Candy Sidner and I were developing the intentional model of discourse that I mentioned earlier, we first tried to build on the work of colleagues who were using AI models of planning developed for individual robots to formalize work in philosophy on speech act theory. When we tried to use those techniques in the context of dialogue, we found that they were inadequate. This discovery led us to the realization that teamwork or collaborative activity, or working together, cannot be characterized as simply the sum of individual plans.

After all, it’s not as if you have a plan to do a certain set of actions and I have a plan to do a certain set of actions, and they just happen to fit together. At the time, because AI planning researchers often used examples involving building stacks of toy blocks, I used the particular example of one child having a stack of blue blocks and another child having a stack of red blocks, and they build a tower that has both red and blue blocks. But it’s not that the child with the blue blocks has a plan with those blocks in spaces that just happen to match where the plan of the child with red blocks has empty spaces.

Sidner and I realized, at this point, that we had to come up with a new way of thinking about—and representing in a computer system—plans of multiple participants, whether people or computer agents or both. So that’s how I got into multi-agent systems research.

The goal of work in this field is to think about computer agents being situated among other agents. In the 1980s, work in this area mostly concerned situations with multiple computer agents, either multiple robots or multiple software agents, and asked questions about competition and coordination.

MARTIN FORD: Just to clarify: when you talk about a computer agent, what you mean is a program, a process that goes and performs some action or retrieves some information or does something.

BARBARA GROSZ: That’s right. In general, a computer agent is a system able to act autonomously. Originally, most computer agents were robots, but for several decades AI research has involved software agents as well. Today there are computer agents that search and ones that compete in auctions, among many other tasks. So, an agent doesn’t have to be a robot that’s actually out there physically in the world.

For instance, Jeff Rosenheim had some really interesting work in the early years of multi-systems agents research, which considered situations like having a bunch of delivery robots, and they need to get things all over the city, and maybe if they exchanged packages they could do it more efficiently. He considered questions like whether they would tell the truth or lie about the tasks they actually had to do, because if an agent lied, it might come out ahead.

This whole area of multi-agent systems now addresses a wide range of situations and problems. Some work focuses on strategic reasoning; other on teamwork. And, I’m thrilled to say, more recently, much of it is now really looking at how computer agents can work with people, rather than just with other computer agents.

MARTIN FORD: Did this multi-agent work lead directly to your work in computational collaboration?

BARBARA GROSZ: Yes, one of the results of my work in multiple-agent systems was to develop the first computational model of collaboration.

We asked, what does it mean to collaborate? People take an overall task and divide it up, delegating tasks to different people and leaving to them figuring out the details. We make commitments to one another to do subtasks, and we (mostly) don’t wander off and forget what we committed to doing.

In business, a common message is that one person doesn’t try to do everything, but delegates tasks to other people depending on their expertise. This is the same in more informal collaborations.

I developed a model of collaboration that made these intuitions formal, in work with Sarit Kraus, and then generated many new research questions including how you decide who’s capable of doing what, what happens if something goes wrong, and what’s your obligation to the team. So, you don’t just disappear or say, “Oh, I failed, Sorry. Hope you guys can do the task without me.”

In 2011-2012 I had a year’s sabbatical in California and I decided that I wanted to see if this work on collaboration could make a difference in the world. So, pretty much since then, I have been working in the healthcare arena developing new methods for healthcare coordination, working with Stanford pediatrician Lee Sanders. The particular medical setting is children who have complex medical conditions and see 12 or 15 doctors. In this context, we’re asking: how can we provide systems that help those doctors share information and more successfully coordinate what they’re doing.

MARTIN FORD: Would you say that health care is one the most promising areas for research for AI? It certainly seems like the part of the economy that most needs to be transformed and made more productive. I’d say we’d be much better off as a society if we could give transforming medicine a higher priority than having robots that flip hamburgers and produce cheaper fast food.

BARBARA GROSZ: Right, and healthcare is an area, along with education, where it’s absolutely crucial that we focus on building systems that complement people, rather than systems that replace people.

MARTIN FORD: Let’s talk about the future of artificial intelligence. What do you think about all of the focus right now on deep learning? I feel a normal person reads the press and could come away with the impression that AI and deep learning are synonymous. What would you point to, speaking of AI generally, as the things that are absolutely on the forefront?

BARBARA GROSZ: Deep learning is not deep in any philosophical sense. The name comes from there being many layers to the neural network. It isn’t that deep learning is more intelligent in the sense of being a deeper “thinker” than other kinds of AI systems or learning. It functions well because it mathematically has more flexibility.

Deep learning is tremendously good for certain tasks, essentially ones that fit its end-to-end processing: a signal comes in and you get an answer out; but it is also limited by the data it gets. We see this limitation in systems that can recognize white males much better than other kinds of people because there are more white males in the training data. We see it also in machine translation that works very well for literal language, where it’s had a lot of examples, but not for the kind of language you see in novels or anything that’s literary or alliterative.

MARTIN FORD: Do you think there will be a backlash against all the hype surrounding deep learning when its limitations are more widely recognized?

BARBARA GROSZ: I have survived numerous AI Winters in the past and I’ve come away from them feeling both fearful and hopeful. I’m fearful that people, once they see the limitations of deep learning will say, “Oh, it doesn’t really work.” But I’m hopeful that, because deep learning is so powerful for so many things, and in so many areas, that there won’t be an AI Winter around deep learning.

I do think, however, that to avoid an AI Winter for deep learning, people in the field need to put deep learning in its correct place, and be clear about its limitations.

I said at one point that “AI systems are best if they’re designed with people in mind.” Ece Kamar has noted that the data from which these deep learning systems learn, comes from people. Deep learning systems are trained by people. And these deep learning systems do better if there are people in the loop correcting them when they’re getting something wrong. On the one hand, deep learning is very powerful, and it’s enabled the development of a lot of fantastic things. But deep learning is not the answer to every AI question. It has, for instance, so far shown no usefulness for common sense reasoning!

MARTIN FORD: I think people are working on, for example, figuring out how to build a system so it can learn from a lot less data. Right now, systems do depend on enormous datasets in order to get them to work at all.

BARBARA GROSZ: Right, but notice the issue is not just how much data they need, but the diversity of the data.

I’ve been thinking about this recently; simply put, why does it matter? If I or you were building a system to work in New York City or San Francisco, that would be one thing. But these systems are being used by people around the world from different cultures, with different languages, and with different societal norms. Your data has to sample all of that space. And we don’t have equal amounts of data for different groups. If we go to less data, we have to say something like (and I’m being a bit facetious here), “This is a system that works really well for white men, upper income.”

MARTIN FORD: But is that just because the example you’re using is facial recognition and they’re feeding in photographs of white people mostly? If they expanded and had data from a more diverse population, then that would be fixed, right?

BARBARA GROSZ: Right, but that’s just the easiest example I can give you. Let’s take healthcare. Until only a few years ago, medical research was done only on males, and I’m not talking only about human males, I’m even talking about only male mice in basic biomedical research. Why? Because the females had hormones! If you’re developing a new medicine, a related problem arises with young people versus old people as older people don’t need the same dosages as young people. If most of your studies are on younger people, you again have a problem of biased data. The face data is an easy example, but the problem of data bias permeates everything.

MARTIN FORD: Of course, that’s not a problem that’s exclusive to AI; humans are subject to the same issues when confronted with flawed data. It’s a bias in the data that results from past decisions that people doing research made.

BARBARA GROSZ: Right, but now look what’s going on in some areas of medicine. The computer system can, “read all the papers” (more than a person could) and do certain kinds of information retrieval from them and extract results, and then do statistical analyses. But if most of the papers are on scientific work that was done only on male mice, or only on male humans, then the conclusions the system is coming to are limited.

We’re also seeing this problem in the legal realm, with policing and fairness. So, as we build these systems, we have to think, “OK. What about how my data can be used?” Medicine is a place where I think it’s really dangerous to not be careful about the limitations of the data that you’re using.

MARTIN FORD: I want to talk about the path to AGI. I know you feel very strongly about building machines that work with people, but I can tell you from having done these interviews that a lot of your colleagues are very interested in building machines that are going to be independent, alien intelligences.

BARBARA GROSZ: They read too much science fiction!

MARTIN FORD: But just in terms of the technical path to true intelligence, I guess the first question is if you think that AGI is achievable? Maybe you think it can’t be done at all. What are the technical hurdles ahead?

BARBARA GROSZ: The first thing I want to tell you is that in the late 1970s, as I was finishing my dissertation, I had this conversation with another student who said, “Good thing we don’t care about making a lot of money, because AI will never amount to anything.” I reflect on that prediction often, and I know I have no crystal ball about the future.

I don’t think AGI is the right direction to go. I think the focus on AGI is actually ethically dangerous because it raises all sorts of issues of people not having jobs, and robots run amok. Those are fine issues to think about, but they are very far in the future. They’re a distraction. The real point is we have any number of ethical issues right now, with the AI systems we have now, and I think it’s unfortunate to distract attention from those because of scary futuristic scenarios.

Is AGI a worthwhile direction to go or not? You know, people have been wondering since at least The Golem of Prague, and Frankenstein, for many hundreds of years, if humanity could create something that is as smart as a human. I mean, you can’t stop people from fantasizing and wondering, and I am not going to try, but I don’t think that thinking about AGI is the best use of the resources we have, including our intelligence.

MARTIN FORD: What are the actual hurdles to AGI?

BARBARA GROSZ: I mentioned one hurdle, which is getting the wide range of data that would be needed and getting that data ethically because you’re essentially being Big Brother and watching a lot of behavior and from that, taking a lot of data from a lot of people. I think that may be one of the biggest issues and biggest hurdles.

The second hurdle is that every AI system that exists today is an AI system with specialized abilities. Robots that can clean your house or systems that can answer questions about travel, or restaurants. To go from that kind of individualized intelligence to general intelligence that flexibly moves from one domain to another domain, and takes analogs from one domain to another, and can think not just about the present but also the future, those are really hard questions.

MARTIN FORD: One major concern is that AI is going to unleash a big economic disruption and that there might be a significant impact on jobs. That doesn’t require AGI, just narrow AI systems that do specialized things well enough to displace workers or deskill jobs. Where do you fall on the spectrum of concern about the potential economic impact? How worried should we be?

BARBARA GROSZ: So yes, I am concerned, but I’m concerned in a somewhat different way from how many other people are concerned. The first thing I want to say is that it’s not just an AI problem, but a wider technology problem. It’s a problem where those of us who are technologists of various sorts are partially responsible, but the business world carries a lot of responsibility as well.

Here’s an example. You used to call in to get customer service when something wasn’t working, and you got to talk to a human being. Not all of those human customer service agents were good, but the ones who were good understood your problem and got you an answer.

Of course, human beings are expensive, so now they’ve been replaced in many customer service settings by computer systems. At one stage, companies got rid of more intelligent people and hired the cheaper people who could only follow a script, and that wasn’t so good. But now, who needs a person who can only follow a script when you have a system? This approach makes for bad jobs, and it makes for bad customer service interactions.

When you think about AI and the increasingly intelligent systems, there are going to be more and more opportunities where you can think, “OK, we can replace the people.” But it’s problematic to do that if the system isn’t fully capable of doing the task it’s been assigned. It’s also why I’m on the soapbox about building systems that complement people.

MARTIN FORD: I’ve written quite a lot about this, and I guess the point I would make is that this is very much at the intersection of technology and capitalism.

BARBARA GROSZ: Exactly!

MARTIN FORD: There is an inherent drive within capitalism to make more money by cutting costs and historically that has been a positive thing. My view is that we need to adapt capitalism so that it can continue to thrive, even if we are at an inflection point, where capital will really start to displace labor to an unprecedented extent.

BARBARA GROSZ: I’m with you entirely on that. I spoke about this recently at the American Academy of Arts and Sciences, and for me there are two key points.

My first point was that it’s not a question of just what systems we can build but what systems we should build. As technologists, we have a choice about that, even in a capitalist system that will buy anything that saves money.

My second point was that we need to integrate ethics into the teaching of computer science, so students learn to think about this dimension of systems along with efficiency and elegance of code.

To the corporate and marketing people at this meeting, I gave the example of Volvo, who made a competitive advantage out of building cars that were safe. We need it to be a competitive advantage for companies to make systems that work well with people. But to do that is going to require engineers who don’t just think about replacing people, but who work with social scientists and ethicists to figure out, “OK. I can put this kind of capability in, but what does it mean if I do that? How does it fit with people?”

We need to support building the kind of systems we should build, not just the systems that in the short-term look like they’ll sell and save money.

MARTIN FORD: What about AI risks beyond the economic impact? What do you think we should be genuinely concerned about in terms of artificial intelligence, both in the near term and further out?

BARBARA GROSZ: From my perspective, there is a set of questions around the capabilities AI provides, the methods it has and what they can be used for, and the design of AI systems that go out in the world.

And there’s a choice. Even with weapons, there’s a choice. Are they fully autonomous? Where are the people in the loop? Even with cars, Elon Musk had a choice. He could have said that what Tesla cars had was driver-assist instead of saying he had a car with autopilot, because of course he doesn’t have a car with autopilot. People get in trouble because they buy into the autopilot idea, trust it will work, and then have accidents.

So, we have a choice in what we put in the systems, what claims we make about the systems, and how we test, verify and set up the systems. Will there be a disaster? That depends on what choices we make.

Now is an absolutely crucial time for everyone involved in building systems that incorporate AI in some way—because those are not just AI systems: they’re computer systems that have some AI involved. Everyone needs to sit down and have, as part of their design teams, people who are going to help them think more broadly about the unintended consequences of the systems they’re building.

I mean, the law talks about unintended consequences, and computer scientists talk about side effects. It’s time to stop, across technology development, as far as I’m concerned, saying, “Oh, I wonder if I can build a thing that does thus and such,” and then build it and foist it on the world. We have to think about the long-range implications of the systems we’re building. That’s a societal problem.

I have gone from teaching a course on Intelligent Systems: Design and Ethical Challenges to now mounting an effort with colleagues at Harvard, which we call Embedded EthiCS, to integrate the teaching of ethics into every computer science course. I think that people who are designing systems, should not only be thinking about efficient algorithms and efficient code, but they should also be thinking about the ethical implications of the system.

MARTIN FORD: Do you think there’s too much focus on existential threats? Elon Musk has set up OpenAI, which I think is an organization focused on working on this problem. Is that a good thing? Are these concerns something that we should take seriously, even though they may only be realized far in the future?

BARBARA GROSZ: Somebody could very easily put something very bad on a drone, and it could be very damaging. So yes, I’m in favor of people who are thinking about how they can design safe systems and what systems to build as well as how they can teach students to design programs that are more ethical. I would never say not to do that.

I do think that it’s too extreme, however, as some people are saying, that we shouldn’t be doing any more AI research or development until we have figured out how to avoid all such threats. It would be harmful to stop all of the wonderful ways in which AI can make the world a better place, because of perceived existential threats in the longer term.

I think we can continue to develop AI systems, but we have to be mindful of the ethical issues and to be honest about the capabilities and limitations of AI systems

MARTIN FORD: One phrase that you’ve used a lot is “we have a choice.” Given your strong feeling that we should build systems that work with people, are you suggesting that these choices should be made primarily by computer scientists and engineers, or by entrepreneurs? Decisions like that are pretty heavily driven by the incentives in the market. Should these choices be made by society as a whole? Is there a place for regulation or government oversight?

BARBARA GROSZ: One thing I want to say is that even if you don’t design the system to work with people, it’s got to eventually work with people, so you’d better think about people. I mean, the Microsoft Tay bot and Facebook fake news disasters are examples of designers and systems where people didn’t think enough about how they were releasing systems into the “wild,” into a world that is full of people, not all of whom are trying to be helpful and agreeable. You can’t ignore people!

So, I absolutely think there’s room for legislation, there’s room for policy, and there’s room for regulation. One of the reasons I have this hobbyhorse about designing systems to work well with people is that I think if you get social scientists and ethicists in the room when you’re thinking about your design, then you design better. As a result, the policies and the regulations will be needed only to do what you couldn’t do by design as opposed to over-reacting or retrofitting badly designed systems. I think we’ll always wind up with better systems if we design them to be the best systems they can be, and then the policy is on top of that.

MARTIN FORD: One concern that would be raised about regulation, within a country, or even in the West, is that there is an emerging competitive race with China. Is that something we should worry about, that the Chinese are going to leap ahead of us and set the pace, and that too much regulation might leave us at a disadvantage?

BARBARA GROSZ: There are two separate answers here right now. I know I sound like a broken record, but if we stop all AI research and development or severely restrict it, then the answer is yes.

If, however, we develop AI in a context which takes ethical reasoning and thinking into account as well as the efficiency of code then no, because we’ll keep developing AI.

The one place where there’s extraordinary danger is with weapons systems. A key issue is what would happen if we didn’t build AI-driven weapons and an enemy did; but that topic is so large that it would take another hour conversation.

MARTIN FORD: To wind up, I wanted to ask you about women in the field. Is there any advice you would offer to women, or men, or to students just getting started? What would you want to say about the role of women in the field of AI and how things have evolved over the course of your career?

BARBARA GROSZ: The first thing I would say to everybody is that this field has some of the most interesting questions of any field in the world. The set of questions that AI raises has always required a combination of thinking analytically, thinking mathematically, thinking about people and behavior, and thinking about engineering. You get to explore all sorts of ways of thinking and all sorts of design. I’m sure other people think their fields are the most exciting, but I think it’s even more exciting now for us in AI because we have much stronger tools: just look at our computing power. When I started in the field I had a colleague who’d knit a sweater waiting for a carriage return to echo!

Like all of computer science and all of technology, I think it’s essential that we have the broadest spectrum of people involved in designing our AI systems. I mean not just women as well as men, I mean people from different cultures, people of different races, because that’s who’s going to use the systems. If you don’t, you have two big dangers. One is the systems you design are only appropriate for certain populations, and the second is that you have work climates that aren’t welcoming to the broadest spectrum of people and therefore benefit from only certain subpopulations. We’ve got to all work together.

As for my experience, there were almost no women involved in AI at the beginning, and my experience depended entirely on what the men with whom I worked were like. Some of my experiences were fantastic, and some were horrible. Every university, every company that has a group doing technology, should take on the responsibility of making sure the environments encourage women as well as men, and people from under-represented minorities because, in the end, we know that the more diverse the design team, the better the design.

BARBARA GROSZ is Higgins Professor of Natural Sciences in the School of Engineering and Applied Sciences at Harvard University and a member of the External Faculty of Santa Fe Institute. She has made groundbreaking contributions to the field of artificial intelligence through pioneering research in natural language processing and in theories of multi-agent collaboration and their application to human-computer interaction. Her current research explores ways to use models developed in this research to improve health care coordination and science education.

Barbara received an AB in mathematics from Cornell University, and a master’s and PhD in computer science from the University of California, Berkeley. Her many awards and distinctions include election to the National Academy of Engineering, the American Philosophical Society, and the American Academy of Arts and Sciences, and as a fellow of the Association for the Advancement of Artificial Intelligence and the Association for Computing Machinery. She received the 2009 ACM/AAAI Allen Newell Award, the 2015 IJCAI Award for Research Excellence, and the 2017 Association for Computational Linguistics Lifetime Achievement Award. She is also known for her leadership of interdisciplinary institutions and contributions to the advancement of women in science.