We Need Algorithms That Can Make Explicit What Is Implicit

Adolfo Plasencia:

Bernardo, I would like to have a conversation with you about computing, the Semantic Web, and ontologies.¹

Bernardo Cuenca Grau:

Thanks! It’s a pleasure for me to talk about those things.

A.P.:

Your research focuses on knowledge representation and ontology-based technologies and their application to the Semantic Web.² Let’s talk about ontologies.³

In philosophy, ontology is the study of the nature of being. However, in computer science, which is your field, ontology has a different meaning. It refers to the formulation of a comprehensive and rigorous conceptual schema within a given context or domain.

Bernardo, how do you understand ontology? And what does it mean in your research? Is it also associated with metaphysics or just with computing?

B.C.G.:

To me, ontology is just a document containing a description which—in a formal language—describes a domain. Let me give you an example that everybody can understand. Imagine you want to write in a language that a computer can understand … let’s say about biomedicine. If, for example, I want to say that psychosis is a type of mental disorder, I couldn’t do it in natural language; I would have to find a way to represent it so that a computer could process the information.

A.P.:

And what a computer can process can be processed by another computer. So the idea is for any computer to be able to process it.

B.C.G.:

For a machine to be able to process it. But what does “process” mean? Process, in our case, means automated reasoning. For example, let’s say that schizophrenia is a type of psychosis; psychosis is a type of mental disorder, and if someone has been diagnosed with psychosis correctly, then an algorithm can infer that that person has a mental condition.

Normally, such automatic inferences—and that was a very simple example—can be very complicated. At a large scale, it is something that would be impossible to do manually. That is why we need algorithms that can make “explicit” what is “implicit,” and that is valuable because you are then able to know exactly what the implications of what you said are.

A.P.:

Make explicit what is implicit. …

B.C.G.:

Yes, and this has a huge impact not only on the Semantic Web but also on bioinformatics. There is a huge ontology with over half a million concepts called SNOMED.⁴ The NHS, the UK’s health system, is digitizing the medical records of all of its patients using SNOMED. In this system, when a doctor assigns a diagnosis, only standardized terms can be used; all doctors use the same terms so that it is easier to share information between departments, for example. You can also make more sophisticated inquiries that would be impossible with a traditional database. This application, which cannot really be called a Semantic Web, is very important in bioinformatics and also in ontologies.

A.P.:

Speaking of ontologies. … In technology, ontologies are also associated with artificial intelligence (AI) and knowledge representation, which is another area of expertise of yours. Has the digital revolution changed knowledge representation disciplines? What is knowledge representation for you, and how does it fit in this field?

B.C.G.:

Obviously, ontology-based technologies are part of knowledge representation. Knowledge representation has been studied for quite a few years now.⁵ It is halfway between mathematics, philosophy, and computer science.

A.P.:

Not a bad meeting point. …

B.C.G.:

Yes, a nice intersection. In knowledge representation conferences, some people have a philosophy background and some are pure mathematicians. You see people interested in computing and people who don’t care about computers. It is a very interdisciplinary field.

As far as I am concerned, my interest is in the application; that is, we want what we do to serve a specific purpose. We really are—not just me but several people in my group—in a most unconventional situation. We are neither theorists nor pure mathematicians. When we attend very formal conferences, we are treated as hackers in a way. We also are not “application guys.” When we go to conferences such as those on the Semantic Web, which are more application-focused, then we are the theorists. … So we are definitely at a midpoint. We try to understand enough about theory to be able to develop it and design algorithms that work, and we then use that theory and program prototypes, and that’s it. We fill the gap between theory and pure practice.

We do not want to do pure theory without a direct application. We are not interested in creating a product from a proof-of-concept prototype, either. As researchers, that’s not our job.

A.P.:

Wolfram MathWorld is an outstanding collaborative environment on the Internet. They say that the universe could be modeled with the logic of computing software much better than using conventional mathematical terms or models, which is how it has been mostly done so far. Bernardo, isn’t this a bold statement? Is computational logic different from that of mathematics?

B.C.G.:

Computational logic and mathematical logic largely overlap. In our case, at a theoretical level, we deal with questions such as, is it is possible to design an algorithm for a problem, which is purely computational theory; such questions are part of mathematics. But what we are ultimately interested in is designing computer programs, and it is there that our work differs from mathematics. What we do, as far as logic goes, is more meta-mathematics. We explore the boundaries of mathematics.

That is also done by pure mathematicians and philosophers. But we want to see which logic languages are suitable for practical applications.

A.P.:

So in some mathematics and pure science conferences, you are treated as hackers. Is the Internet the only place where Wolfram can claim that modeling the universe with algorithms might be more effective? You couldn’t make that type of statement in a pure science congress, could you? You would be expelled.

B.C.G.:

Let me tell you something about the hacker thing. I remember talking to a British professor once—a theorist, in my opinion. He was telling me a story. He had attended a conference on category theory, and another scientist said to him: “Ah, but what you do is ‘fixing’ the language, right?” Meaning, do you really study specific languages? Because we study the properties of all types of languages.

In other words, the researcher who was a theorist to me was a hacker to the scientist in his story. There are so many layers. …

A.P.:

Bernardo, let’s talk about the digital revolution. Hal Abelson, one of the most important scientists of the MIT CSAIL group, told me in his conversation for this book that the digital revolution, combined with the Internet—with billions of people already connected—is comparable to the revolution of the printing press, but that it does not compare to the invention of writing.⁶ The science philosopher Javier Echeverria says it does form part of the invention of writing.⁷ What do you think?

B.C.G.:

The invention of writing started it all, but the digital revolution—our access to information—it’s amazing. Here’s a very specific example. I use Wikipedia a lot. It takes only a click to know something, to obtain knowledge, or to read about anything. In the past you had to go to the library, find things in an encyclopedia, investigate. … And there were many things you couldn’t find. Besides, Wikipedia includes articles in different languages where you find different perspectives on the same topic. It has really changed everything, and so has Internet searching. As for social media, on Facebook you can find people you haven’t seen for ages.

It’s hard to say if it compares or not to the invention of writing or the printing press, but it is indeed one of the greatest revolutions of the modern world.

A.P.:

Now that you have mentioned that example, It reminds me of another conversation I had with Jimmy Wales and what he did when founding Wikipedia.

We are here speaking about orthodoxy in science, the layers between scientists and the relationship between a hacker and other scientists. Jimmy Wales took a gamble with Wikipedia; he went against the flow and stood up for utter trust and the radical decentralization of knowledge. I think Wikipedia is an example of how surprising this revolution can be.

Why are things still being published in the media saying that the contents of Wikipedia are not secure or not validated? Don’t you think this type of information comes from some commercial powers who want that utter trust and support from millions of people who use Wikipedia every day not to be so strong?

B.C.G.:

Perhaps. I don’t know. I always use Wikipedia and I find quality information, much more reliable than what you find on some websites because in Wikipedia, if something is not reliable, you can always discuss it. There are public discussions, and if something does not seem right, you can discuss about it and see previous discussions. You can really see where the controversy is. I use Wikipedia not for my field but for other topics. But in terms of scientific knowledge and as regards things that can be objectified, or even for historical events that are no longer in the spotlight, I think it is fairly reliable.

A.P.:

At the 2009 WWW Conference in Madrid, we celebrated the twentieth anniversary of the Web with Tim Berners-Lee and Vinton Cerf. Daniel Schwabe, who was a member of the original team of the TCP/IP, led by Vinton Cerf, told us that when Tim Berners-Lee presented his project at CERN more than twenty-five years ago and later in the United States for the first time, at the Technology Exhibition in San Antonio, nothing happened in the following eighteen months. So they presented the Web, and nothing happened for almost two years. Nobody did anything. No scientific authority responded, not from CERN or elsewhere. Nothing was done.

Bernardo, how come something like the Web, which twenty-five years later connects billions of people together, went unnoticed for a year and a half?

Can decisive knowledge go undetected?

Is it a problem of knowledge representation?

B.C.G.:

To a great extent, whether a scientific idea in technology thrives or not depends much on chance. Some very good technologies have not made any progress while worse ones seem to have flourished. What we are now doing at Oxford, for example, might be considered a total failure in five years’ time, or perhaps the whole world will use it. We don’t know. Many candidate technologies are developed; some make it and some don’t. It is a combination of factors: how good the idea is, if there are people willing to develop it, the community that forms around it, luck, and also the quality of the technology, although in my opinion the last aspect only accounts for 30 percent.

The Web is not the only example of what you just said about good ideas going unnoticed at the beginning. When Douglas Engelbart and engineers at Xerox PARC invented the mouse and showed it to their company directors, the executives said, “What is this?” And when the same people at Xerox developed the first graphical interface, their people did not like it either. On their visits, Steve Jobs and Bill Gates saw it, and they immediately saw the potential. So basically, this has always happened.

A.P.:

On Google your name is associated with a very trendy term, the Semantic Web.

How would you define the Semantic Web in simple words so that we can understand it conceptually, and what is meant by this concept in Oxford’s scientific circles?

B.C.G.:

The Semantic Web has no definition. It is still a very vague concept that encompasses many things, though it has drawn a lot of attention. If you think not about the idea itself but about what will come out of it, the specific side of it, it is a number of techniques, from knowledge representation to natural language processing and information retrieval. That is, it is a framework in which some techniques have stimulated research in certain areas with a particular application. The outcome of it will be a set of methods to process information in a smarter way. And that’s coming from such techniques, and even from some older ideas about deductive databases, and so forth It will be a great framework, a large “basket” with lots of things in it.

How can you define that? It’s very complicated; I think it has no definition. How will that be seen in the real world? There are already small applications where such things are working.

A.P.:

Should we see it as more like a tag cloud? Like a cloud with those semantic tags that we see on the Web?

B.C.G.:

It can be understood with a simple example. Imagine someone makes an inquiry, “I want to find all the science fiction books written by a particular author,” and with that you gain access to a number of websites containing information, maybe Amazon or another type of website where you would find such things. If Amazon’s ontology says that science fiction books and biographies are disjoint concepts—for example, that a biography cannot be a science fiction book—then an algorithm immediately “knows” that it has to ignore all the lists of biographies written by a particular author because that will an incorrect result.

A.P.:

In a debate on Internet governance that I attended in Madrid, during the question round, an executive in the audience said to Vinton Cerf and Tim Berners-Lee that he did not understand why they had let the Internet grow in such a “disordered” and “wild” way, almost without control.

Vinton Cerf replied, “We did it because both Tim and I continue to believe that the Internet should be an open place where no one has to ask permission of anyone to innovate.”

Do you agree? Should the Web be an “open” place?

B.C.G.:

Those terms, “open” and “closed,” are a bit blurred. In the end, there must be some type of regulation. Because with information. … It all depends on the reach of the violation of people’s privacy and data’s confidentiality, both for individuals and for companies. If people start reporting breaches and there are scandals, then people will consider regulation. Until these things start happening … we'll see … we don’t know.

A.P.:

Bernardo, you are cooperating from Oxford with the World Wide Web Consortium (W3C), the body that coordinates the development of standards and technologies for the Web. In the World Summit on the Information Society in Tunis in November 2005, Viviane Reding, the European Commissioner for the Information Society, said she did not understand why the regulatory agencies of the Web, such as the Internet Corporation for Assigned Names and Numbers (ICANN) and the W3C, were not directly answerable to “democratic governments.” She did not understand that the bodies that make decisions on the progress of the Internet do not depend on the “political class.” What do you think?

B.C.G.:

I do not understand why the political class should have a say in this. I think the W3C does an important job.

A.P.:

The Internet is used by billions of people, and it works perfectly. At the conference, we said, “What’s this woman saying?”

B.C.G.:

I’ve taken part in standardization meetings and, in groups with twenty or more people—as was the case with the second version of OWL, with twenty-five or thirty participants—it is really hard to make decisions and move on, and there was already a very solid base, namely, the initial proposal that we put forward. It’s a heavy burden. And you have to deal not only with technical issues but also with interpersonal relationships. It’s complicated … and there are economic, commercial interests. In other words, it’s really hard to reach consensus even on small things. And that happens with a group of twenty people, all of them technically skilled and working for leading companies and universities. Imagine what would happen if bureaucracy became stronger.

A.P.:

It’s like what they say in design: a camel is a horse that was designed by a committee. The Web would end up becoming a camel instead of a horse.

B.C.G.:

In technology, things cannot be overly bureaucratized because changes are so rapid that it would be terrible to overly burden processes with further bureaucracy.

Bureaucratization confuses process with progress, and with technology issues one cannot afford such burdens because progress must be made. It is difficult enough to evolve in the current situation. We don’t want to complicate it any further by creating more committees, implementing additional processes, filters, and so on. If we need a standard for people to start using it now … it currently takes us a year and a half or two. Imagine if it took ten years! By the time the standard was finished, nobody would use it. It would be totally outdated.

A.P.:

The mathematician Godfrey Harold Hardy once said,

A mathematician, like a painter or a poet, is a maker of patterns. … The mathematician's patterns, like the painter’s or the poet’s, must be beautiful; the ideas, like the colours or the words, must fit together in a harmonious way. Beauty is the first test: there is no permanent place in the world for ugly mathematics.”⁸

Bernardo, do you think there is beauty in the algorithms of the Semantic Web?

B.C.G.:

I think we deal with ugly mathematics. And I think it should be so. For example, there are knowledge-representation formalisms in my field that have been designed by mathematicians. They always think about the elegance of the formalism and the properties that it should have to be easily manipulated. But when you compare that with applications, or reality, you realize those formalisms do not work the way they are. You have to modify them, and in doing so, obviously, the mathematics behind them “get dirty.” Then those of us who try to prevent mathematics from getting too dirty see what we can do. In fact, if you try to prove theorems or whatever and you have a formalism that is not precise or is too difficult, then it is very difficult to prove anything.

You need to reach a compromise between what is needed in practice and the little “hacks” that we must do to formalisms and to the degree of dirtiness they have, because if they get too dirty it would be impossible to do anything with them.

A.P.:

So you don’t believe in the beauty of equations? I saw one of your recent projects and the part with the mathematical equations is almost like a score, so beautiful. But I may be wrong because I am a neophyte. I did see beauty in those equations.

B.C.G.:

Maybe if a pure mathematician saw it, he would be outraged! When we do theory, we demonstrate, for instance, that an algorithm is suitable for a particular formalism, and so forth. We have to write proofs and deal with formalisms. One of the problems we usually have is that formalisms not designed by mathematicians are more difficult to handle, but we have no choice.

A.P.:

Bernardo, thank you very much for your time and for your words.

B.C.G.:

Thank you!

15 We Need Algorithms That Can Make Explicit What Is Implicit

Notes