Alexa, AI, and Machine Learning

ALEXA IS THE agent in the cloud, running on the internet. Echo is the device with a multitude of microphones so it can do far-field voice recognition. From the time we started working on it in 2012, our vision was that—in the long term—it would become the Star Trek computer. You could ask it anything—ask it to do things for you, ask it to find things for you—and it would be easy to converse with in a very natural way.

Working on Alexa and Echo was very challenging from a technical point of view. There are thousands of people working on Echo and Alexa, with teams in many different locations, including Cambridge, Massachusetts; Berlin; and Seattle.

With Echo there were several different things that had to get solved. One of the key insights we had when we started planting that seed for Echo was an always-on device, a device that was plugged into wall power, so you didn’t need to charge it. It could sit in your bedroom or in your kitchen or in your living room and play music for you, answer questions, and ultimately even be the way you might control some of your home systems, like lighting and temperature control. Just saying, “Alexa, please turn the temperature up two degrees” or “Alexa, turn off all the lights” is a very natural way of interacting in that kind of environment. Before Echo and Alexa, the primary way people interacted with their home automation system was bad: an app on their phone. If you want to control your lights, it’s very inconvenient if you need to find your phone, take it out, open a particular app, and find the right screen to control the lights on that app.

The devices team has just done an amazing job, and there’s so much progress still to come. We have a fantastic road map for Echo and Alexa. We have a big third-party ecosystem now of other companies who’ve built what we call skills for Alexa, so it’s kind of expanding what Alexa can do.

We—as humanity, as a civilization, as a technological civilization—are still quite a ways away from making anything as magical and amazing as the Star Trek computer. That has been a dream for so long, kind of a science-fiction scenario. The things we’re solving with machine learning today are extraordinary, and we really are at a tipping point where the progress is accelerating. I think we’re entering a golden age of machine learning and artificial intelligence. But we’re still a long way away from being able to have machines do things the way humans do things.

Human-like intelligence is still pretty mysterious, even to the most advanced AI researchers. If you think about how humans learn, we’re incredibly data efficient. So when we train something like Alexa to recognize natural language, we use millions of data points. And you have to collect what’s called a ground-truth database. It’s a huge, expensive effort to collect this ground-truth database that becomes the training set that Alexa learns from.

If today you are designing and building a machine-learning system for a self-driving car, you need millions of driving miles of data for that car to learn how to drive. Humans learn incredibly efficiently. Humans do not need to drive millions of miles before we learn how to drive. We’re probably doing something called “transfer learning” in the parlance of the machine-learning field.

Humans have already learned so many different skills, and we’re able to map those skills onto new skills in a very efficient way. The AlphaGo program that recently just beat the world Go champion played millions of games of Go. The human champion has played thousands of games of Go, not millions. And they’re almost at the same level, the human champion and the computer program. Plus, the human is doing something fundamentally different—we know because we are so power efficient.

I don’t remember the exact figure, but AlphaGo is one example that uses thousands and thousands of watts of power. I think it’s over one thousand servers running in parallel. And Lee Se-dol, the human champion, uses about fifty watts. Somehow we’re doing these unbelievably complex calculations incredibly power efficiently—we’re data efficient and power efficient. So when it comes to AI, we in the machine-learning community have a lot to learn.

But that’s what makes it such an exciting field. We’re solving unbelievably complex problems and not just in natural language and machine vision but also in some cases even the fusion of those two.

Privacy organizations take claims about privacy invasion related to devices or services and attempt to reenact the invasion claims. It’s actually pretty easy for privacy organizations to do this, and they do it all the time. They reverse-engineer devices to see if their privacy claims are true. And that’s a very good behavior, and I’m grateful for all those privacy organizations that do that. And they have uncovered honest mistakes that companies have made—sometimes maybe companies just weren’t careful enough.

Our device is not transmitting anything to the cloud until it hears the wake word, “Alexa.” And when it hears the wake word “Alexa,” the ring on the top lights up. When the ring is lit, the device is sending what you say to the cloud. It has to do that because we need access to all of the data in the cloud in order to be able to do the full range of things that Alexa can do—check the weather for you and so on.

Hacking is one of the great issues of our age, one that as a society and a civilization we have to globally figure out. And some of the solutions will become laws. Some of it is nation–states doing things that you wouldn’t want them to do, and it’s not clear at all how that’s going to be controlled.

With most devices and the technologies we have today, nation–states can easily listen in on any conversation by bouncing a laser beam off one of your windows in your home, or they can put a piece of malware on your phone and turn all the microphones on. A typical high-end phone today has four microphones. So we’re going to have to figure out as a society that it’s probably easier to control certain institutions like the FBI because we can come together and decide what the rules should be, what the laws should be, and how courts should enforce them. But when it comes to nation–states cyberhacking and so on, I consider it an unsolved issue. I don’t know what we’re going to do.

I don’t know the answer to the question of whether an internet-connected society can ever be made really secure. We’ve lived with these technologies for a long time. People want to carry a phone around, and I think that the phone phenomenon is here to stay. And that phone is completely controlled by software. It has multiple microphones on it. The microphones are controlled by software. The radio in that phone can transmit the data anywhere in the world.

And so the technical capability is there to turn any mobile phone into a listening device surreptitiously. With Alexa, the team made a very interesting and, I think, noteworthy decision. I hope other companies might emulate this decision—to include a mute button that turns the microphone off on the Echo. When you press the mute button, it and the ring turns red, and that red light is connected to the microphone with analog electronics. So it’s actually impossible, when that red light is on, for the microphones to come on. You can’t do that remotely through hacking. But phones are not like that.