[Control of the Internet] is not a matter of scale so much as a question of diversity.
Everyone’s behavior has its own long tail.
—Ricardo Baeza-Yates
Ricardo Baeza-Yates is Chief Technology Officer, NTENT, California, and part-time Professor in the Department of Information and Communication Technologies of Universitat Pompeu Fabra, Barcelona, Spain, and the Department of Computer Science, at the University of Chile. Formerly he was Vice President of Research at Yahoo! Labs, leading teams in Europe, the United States, and Latin America. He is a founding member of the Chilean Academy of Engineering.
His research interests include algorithms and data structures, information retrieval, web data mining, and data visualization. His contributions include algorithms for string search, such as the shift-or algorithm, and algorithms for fuzzy string searching, which inspired the bitap algorithm.
Among his publications are the second edition of Modern Information Retrieval, co-authored with Berthier Ribeiro-Neto (Addison-Wesley, 2011); the second edition of Handbook of Algorithms and Data Structures, co-authored with Gaston H. Gonnet (Addison-Wesley, 1991); and Information Retrieval: Data Structures and Algorithms, co-edited with W. Frakes (Prentice-Hall, 1992).
Note: This dialogue was done in 2014, and was revised by R.B.-Y. in 2016.
Adolfo Plasencia:
I’ve come to your Yahoo! lab in Barcelona to talk to you about technologies and the digital revolution, which is proving to be something like Pandora’s box.
Ricardo Baeza-Yates:
Welcome!
A.P.:
Ricardo, I think the search technologies that you use can anticipate behaviors. In your research, you handle data from 700 million people at the same time. With your search technologies, you are able to “catch” and figure out what a user can do the day after tomorrow in a particular region of the world, with a specific culture and activity. Are we so predictable? Could you do something like “remembering the future” of what many people are going to do?
R.B.-Y.:
Your question has many facets. I’d rather divide my answer into two parts: one about the immediate future, that is, what you are going to do in the next few minutes, and another one about the future-future, that is, what you are going to do the rest of your life. They are two different futures.
I'll start with the immediate future. In this case technologies try to predict what a person is going to do. The answer about the immediate future is that, in part, it is true. We can predict the next app you will access on your smart phone. We have information on many mobile phones and on many apps. We can predict that you are going to access a specific application today, with a 90 percent success rate. We are all creatures of habit to a great extent—up to 80 percent, for instance. And that part of us can be predicted. So my answer is yes.
A.P.:
Then 80 percent of our behavior is determined?
R.B.-Y.:
Yes. Let me give you some figures, though it depends on the person. Some people have more “deterministic” behaviors than others. Some people are “strange.” With strange people, we are likely to predict their behavior in 50 percent of cases. But there is also the long tail of people’s behavior.
A.P.:
A long tail of human behavior?
R.B.-Y.:
Yes, everyone’s behavior has its own long tail, a different long tail, and that is very important. We found this out in a study by Yahoo! in 2009, analyzing five sets of big data on music, searching, movies, and so forth. Basically, we saw that there was a long tail in everybody’s behavior, not only strange people’s behavior. On the other hand, we all are much like the rest of people; we all have to work, eat, and feed the cat. And there’s a part of what you do—20 percent, or 50 percent if you are very “strange” or very different—that includes specific things that matter only to you—your hobbies, your worries, or your personal motivations.
A.P.:
Is it the intellectual dimension that makes us more unpredictable? Or not necessarily?
R.B.-Y.
Not necessarily. Maybe in your case, for example, your long tail includes a health problem, because you may be ill, something that has to do with your body, with your “animal” side. The long tail is made up of many specific problems, things that I want to do right away, such as: I’m very worried about something, I want to find information about it. That’s something intellectual because I'm looking for information, but perhaps only you are interested right now in graphene and its future possibilities. Things that are common to everyone are easier to predict because we have a lot of data, and people are predictable in that respect. But it is very difficult to predict things that are related to the very specific part of you, because maybe you didn’t even know these things ten minutes ago. Imagine that you think about a particular topic for the first time. It is impossible to predict that.
A.P.:
It’s true.
R.B.-Y.:
Let me answer the second part, the future-future. Something that can be done and that we have actually achieved—and we won an award in 2010 for the best news analysis software. Let’s say you have a very big news collection. You can find there what people think of the future. It is like analyzing wisdom. Some people share opinions about politics, economics, or the environment. Other people express opinions on conflicts or tensions between countries. They even do so for the conflicts that may arise over the next hundred years. Many people make assumptions about the future. In a collection of millions of pieces of news from the New York Times, one can find things many people speculate about, and that helps you understand what people think about the future. Among them there are also outstanding individuals whose decisions may influence the future. So there is a greater chance that what they say may actually come true, because they influence the present by expressing their opinion about future things. This facilitates what I call “looking into the future.” As Alan Kay said, “The best way to predict the future is to invent it.”1
A.P.:
According to the neurologist Alvaro Pascual-Leone, at Harvard Medical School, neuroscientists already know that the brain is a hypothesis generator and that it is always making hypotheses.2 We like imagining what will happen. Is that right?
R.B.-Y.:
Exactly. It is a variant of the big questions we want to answer. Where do we come from? Where do we go? I think they have no answer, but people want to and have to live with such questions.
A.P.:
In another conversation in this book, I asked Bernardo Cuenca, an ontology and Semantic Web researcher in the Department of Computer Science at Oxford, about his work: Can you briefly describe what you do here? He said (I paraphrase slightly): “We work on Semantic Web technologies to make the implicit explicit.” In relation to that, do you think the Semantic Web—which apparently will be part of the Internet’s mainstream in the near future—will make explicit the implicit in global knowledge?
R.B.-Y.:
My answer is the same as that to your first question: Yes, in part. Perhaps the Semantic Web will be 20 percent successful in that, but it will fail in 80 percent, and not because the technology does not work. The technologies that can do the conversion already exist. The problem is that people, even though they know them, do not really want to use them. How many people, when editing a Word document, fill in the metadata, in other words, the explicit part? Nobody. They do not try to be clear about the day, the subject, what keywords they used before when they wrote about the same topic. If you use different words every time, explicitness does not work.
A.P.:
Why do you think people do not want to make the most of such a fantastic opportunity?
R.B.-Y.:
I think it has to do with our nature. It has to do with what is not deterministic. Let’s see: How many people have everything organized on their computer? You can see it when you go to their home. Only the obsessive-compulsive are very orderly. But that’s why many people say of them that they are not normal. For example, who has everything sorted out in different drawers so that they know exactly where things are? The Semantic Web is just that, having everything in the right place. But people are not like that; we like having a bit of a mess. Chaos is part of us. I think chaos makes life interesting. Imagine if everything were totally tidy, it would be very boring. There would be no problems; there would be … nothing.
A.P.:
Human beings cope better with determinism than with uncertainty.
R.B.-Y.:
I think we’re more comfortable with determinism because that is exactly what we want to achieve. That’s our goal, but we are chaotic. How often do you forget things? Imagine you could stop that and be methodical. If you were methodical in everything you did, your life would be less enjoyable. I don’t think all people want to be methodical.
A.P.:
Maybe it’s because I can’t change the “wiring” in my brain that easily.
R.B.-Y.:
That’s why I say it is part of our nature. Even if we want to, we may not be able to. An example is violence: we do not want to have wars, but we seem to fail all the time.
A.P.:
Speaking about being willing to do things and being able to do them, and about technology: do you think we should use technology as we want to or as we are told to?
R.B.-Y.:
That’s a good question. I think we should use technology as we please. There are examples in technology—some are clearer than others—that we use for some things and not for what we are told to use it for. The Internet and the Web were not designed for what they are today. Many technologies are now being used for something other than what they were planned for. We take the best of technology and modify it to use what is best. But if someone designs a technology so that it is definitely the very best one for a particular purpose, we will surely use it the way we are told. For example, iTunes is for music and is designed for that. It is very good, and there is no dilemma between using it for what I want or for another purpose. But when you’re faced with a dilemma, in the end, you will do what you want, if it is possible and legal, of course.
A.P.:
Let’s talk about cybernetics. Cybernetics is defined by Norbert Wiener as what in ancient Greece was known as a kibernos, a helmsman in charge of the ship’s course.3 In that sense, who is the kibernos of Internet cybernetics today?—if there is one.
R.B.-Y.:
I think that, in Norbert Wiener’s sense, there is no such figure.
A.P.:
Is there at least a wheelhouse?
R.B.-Y.:
Let’s say, for example, that the United States must have its own wheelhouse or bridge for any Internet infrastructure in its territory, but it cannot control all of the Internet. Again, the answer is “in part.” But in a global sense, there isn’t a master helmsman. It is very difficult to exert control because it depends on many countries, companies, and ultimately on how people use technology. People adapt to the conditions in which they live, and the same goes for the Internet. The Internet is our own reflection. The virtual world is an illusion; it’s just a reflection of the real world.
A.P.:
But a lot of people are worried about being under surveillance by the National Security Agency and other agencies. Apparently, some government agencies are so powerful that they can conduct mass surveillance over the Internet. Some countries have banned the Internet or its applications—unsuccessfully, though. Is the Internet so huge that not a single country can constrain it under its command? Is it bigger than what a vertical power can cover?
R.B.-Y.:
It is not a matter of scale so much as a question of diversity—diversity in the countries and languages involved. The Internet is very heterogeneous, so heterogeneous that nobody can control it.
A.P.:
Then is it digital diversity that has an endless dimension for the known powers?
R.B.-Y.:
Let me use a metaphor. If I have a thousand sheep, I can use a dog to guide them. But if I have a thousand different animals, I have a much more complex problem in controlling them. I don’t need a dog; I need a thousand dogs, at least. That’s the problem with diversity. The diversity of the problem is so complex that no one can control it, and then you have all the legal and political issues. In some places, although there is a very powerful entity that wants access to everything, it can’t have full access. Some data only you can have access to.
A.P.:
Diversity in the sense of complexity?
R.B.-Y.:
Complexity comes from diversity. If everything were uniform, it would not be so complex.
A.P.:
I have another, somewhat provocative question, though not from your field. Michail Bletsas, director of computing for the MIT MediaLab, is sure that there will be nonbiological intelligence in this century, or at least an intelligence not based on Homo sapiens. Do you think that is possible? Is it feasible? Will it ever happen?
R.B.-Y.:
The answer depends on how we define intelligence. If intelligence means beating the best chess player, then yes; if intelligence means beating the best Jeopardy player (as was the case), then yes.
But the question is, is that intelligence or something else? In those two well-known examples, we find the same thing: the computer does it differently from a person. It’s more brute force, massive computing; it involves much more data; and in the end, the computer performs better than a person. But is it because of greater computing capacity? I don’t think so. Because a computer has more information capacity, more storage? Maybe. Perhaps the human brain stores much more information but we do not know how to access it so quickly.
If we stipulate it will be nonbiological intelligence, therefore a different intelligence, and that it must solve this type of problem, then my answer is yes. Of course, I think this kind of intelligence is not human intelligence; it’s another kind of intelligence.
A.P.:
So you agree with someone who, in Barcelona—the city where we are now—said something that made me think about this a lot. Roger Penrose, after a lecture, said: “Maybe, in the future, at some lab, someone might make an intelligent machine but, of course, it will not be a computer.” Do you agree with Penrose?
R.B.-Y.:
Not fully, because if it really is as smart as a person, it will not be a computer for sure, but maybe it'll be like a computer of the future, and we don’t know what computers will be in the future.
A.P.:
Maybe a quantum computer, like the one described by Ignacio Cirac?4
R.B.-Y.:
A quantum computer, or something else entirely. We are constrained by what we know today. The best example for this is Asimov’s famous story, “The Last Question” (1956), in which a huge computer stores all the knowledge in the world. Every now and then, they asked the gigantic machine how something intelligent could be done, how life could be created, and the computer said again and again: “I do not know (… there is insufficient data for a meaningful answer).” When the last human was about to die, he asked the question again. And the computer said the same thing: “I do not know.” Then, after a very long time, the computer suddenly said: “Let there be light.”
So, maybe yes, but I don’t know what that computer is. So far, we use machine learning techniques to predict individuals’ behavior, but basically, all we’re doing is trying to predict data.
A.P.:
You are a computer scientist. Does intelligence—artificial or not—have more to do with the critical mass of complexity or with computing capacity?
R.B.-Y.:
I would say it is related to both things. The more complex the world is, the more computing power you need, and the problem is that, as humans, our computing power is limited.
A.P.:
With your big data technology, a few individuals can handle a huge, superhuman amount of information, something formerly impossible. And some research with big data is simulating a massive data recombination, because it is thought the brain might do this to create something new. Do you think that by recombining numerous data, you can come up with a creative process?
R.B.-Y.:
The process of creating things is not something I have delved into, but if research is analyzed—which is one of the creative processes we do here—there are two types of creation. One is like the great idea that you did not have a second ago and that you know once that second has elapsed. Once you know it, it seems obvious and trivial. You say “of course, it’s obvious” (and then it is trivial because you already know it). They are great ideas, but this happens very rarely. I think there’s a combination of factors that ultimately lead to thinking about that. It does not happen spontaneously. There is a process that allows you to find them. But you have to be thinking about a problem. You’re thinking about it and suddenly it happens: “I’ve got it!” In the other creative style, innovations are quite frequently the result of transferring what you already know in one area to another one. In other words, there is something I want to solve, and by making a change I can do it. It is a process that does not come from ingenie (not ingenuo) but from having a comprehensive, overall view of things. The more global your gaze at the world, the better, but it is now more difficult to know all the knowledge that we have and all the technologies available to us.
People are able to innovate and create because they have a more global view of things, a holistic vision that allows them to lay bridges between subjects. Not everyone can do that. This takes us back to the problem of diversity and complexity. There is so much diversity and knowledge that we cannot have a holistic view. It is part of the problem. In the Renaissance, there were people who knew almost everything that was important in human knowledge at the time—people like Leonardo da Vinci, who knew mathematics, astronomy, geometry, physics. But today it is very difficult for a person to have such a comprehensive knowledge. That’s part of today’s complexity.
A.P.:
When you started to study computer science, did you imagine that the Internet would impregnate all of your scientific life?
R.B.-Y.:
No, but when I was a student the Internet was almost unknown.
A.P.:
A group of people, not too big, invented what we now call the Internet, in stages. Before, there was nothing like it, in computing and technical terms. Jon Postel, Vinton Cerf, Robert Kahn. Later, Tim Berners-Lee invented the Web, then Tim O’Reilly formulated Web 2.0. Some people made the infrastructure, the communication mechanisms, the computing; and others connected that with the public. Do you think this group of people (to name just a few) opened a huge Pandora’s box?
R.B.-Y.:
I don’t think so. I'm sure because both the Internet and the Web were designed for other purposes. Those who were thinking about it devised the Internet to transmit information, files. The first widespread use was email, something they had not imagined yet, but that was a natural outcome. The Web was also intended to organize information. Now it is a platform for things as powerful as social media. I don’t think the earlier workers ever imagined the impact. People used it for something else; people used it for what they thought was more interesting.
A.P.:
People skipped the sequences of commands and the user guide. …
R.B.-Y.:
But that’s what we do all the time. Who reads manuals? Nobody. We are people. Maybe it’s part of our wild side: I want to find out for myself how it works, and if I find out something else, great. I'm inventing something, I’m teasing technology. The best recent examples of innovation have to do with the use of technologies for purposes other than those for which the technologies were created. It happens quite a lot. You see people using something in a completely different way. They are called crazy but they are actually inventing something new.
A.P.:
Bill Aulet, the managing director of the Martin Trust Center for MIT Entrepreneurship, told me that entrepreneurship is not an algorithm, and apparently, success isn’t either.
R.B.-Y.:
Sure. I agree with that. It is a creative process. Both things are creative processes.
A.P.:
You can never imagine that, out of something that you know, something you never imagined may arise. And that’s not an automatic thing.
R.B.-Y.:
If I could turn it into an algorithm, if I could innovate automatically, it would no longer be innovation to me. If you can repeat it on an industrial scale, it is not innovation any longer. Innovation is something you cannot repeat.
A.P.:
It is the opposite of falsifying, in Popper’s sense, a scientific experiment.
R.B.-Y.:
Exactly. Creating, designing, is an art, a craft. There is an artist behind it, a creator. If one could automatically paint as perfectly as Van Gogh, then we wouldn’t consider Van Gogh to be such a great artist. We make massive something we believe to be special. Art is something special. If everyone did it, it would not be art. What about you, Adolfo, do you agree with this?
A.P.:
I agree.
And here is the big question about the Internet. From the Web Internet invented by Tim Berners-Lee, we moved on to Web 2.0. Tim O’Reilly formulated it in 2005, and it became social. Shortly after they said, that’s it, the next one will be the Web 3.0 … but not really, apparently. The Semantic Web, the mobile Internet (Mobile Web), the Internet of Things, the Internet of Everything emerged. Do you think that the future of the Internet can be predicted? Or is it impossible to imagine what direction the mainstream Internet will take and how it will evolve in the near future?
R.B.-Y.:
It’s hard to predict. Five years ago my answer would have been different because I look at developments closely. One thing is clear: diversity is on the rise. The Internet does not consist only of connected computers. It includes smart phones, sensors, machines. Maybe we’ll see “wired” people, robots and brains directly connected. The Internet will more and more reflect the diversity of people and the world. And that is even more complex. I think there will be different Internet worlds to be shared. Some talk about what will happen and predict that the whole Internet will be social. But there is a big contradiction here: how can something that is inside something eat that something? It’s like telling your stomach to go and eat yourself.
A.P.:
Darwin said: we have found that evolution exists. But we do not know where it leads. Perhaps the evolution of the Internet could be Darwinian in that sense. We know it exists, how it works, that it generates growing complexity, but we do not know where it leads. Is it impossible to know, in the same way that Darwin did not know where evolution would take us?
R.B.-Y.:
Yes, exactly, and it has to do with the same thing, because the Internet is a collective creation by those who use it. More and more, it is a reflection of us. Currently, the Web 2.0 is a collective creation.5 None of the people you mentioned earlier could imagine that Wikipedia would be what it currently is. Nobody imagined that one billion people would be connected to a single online social network.
If we could answer the question of where the Internet is heading, we would be answering the question of where we are going.
A.P.:
Where the human condition is going and where humankind is going?
R.B.-Y.:
Interestingly, although people say that this creates a kind of chaos and things that are negative, that is not true. We are reflecting what we are, the positive and the negative things that were already there before. The difference is that we can now see it. Now, with the Internet, we can amplify the world and it is far more transparent, and you can even find things that you did not know of before.
On the Internet, you can find out who did something. In real life, if someone does something wrong, it is much more complicated to know. The Internet as an amplifier also amplifies new possibilities. What is happening is that the Internet also allows us to modulate the evil in ourselves, and that is very important. I do not want to use words like “censorship” or “blocking”; I’d rather use the term “modulate,” having control mechanisms to ensure freedom of expression, the democratization of the Internet and its diversity and, at the same time, exert some control over the misuse of technology in respect to issues that are important for a lot of people, such as privacy. These issues are being handled quite well, if one considers the global reach of the Internet.
A.P.:
So you’re an optimist. You are an optimistic technologist.
R.B.-Y.:
I am an optimist technologist, but sometimes I am also a skeptic.
A.P.:
Well, we do not have all the answers. If you find the answer, please let me know, and I’ll be here straightaway.
R.B.-Y.:
If I find the answer I don’t think you will find me. I would have to hide!
A.P.:
Thank you very much, Ricardo. It’s been a pleasure.
R.B.-Y.:
The same to you. Thank you, Adolfo.