15 Conclusion: The Future of Machine Translation

Throughout the overview given in this book, we have seen the evolution of machine translation, from the first experiments in the 1950s to today’s systems, which are operational and available on the Internet at no cost. We also saw the main features of these systems: some based on dictionaries and transfer rules, others on the statistical analysis of very large corpora. Lastly, we have described a new approach based on deep learning that seems highly promising. This new approach is especially exciting from a technical and cognitive point of view. But let us first have a look at commercial challenges.

Commercial Challenges

As previously noted, machine translation has undergone a profound renewal since the 1990s, the period when very large amounts of bilingual documents became freely and directly available online. At the same time, the development of the Internet played a crucial role, since people can now communicate worldwide through email, blogs, and social networks. There is thus a need for tools making it possible to communicate in different languages, without mastering many languages. This technological revival was therefore supported by commercial and strategic prospects, notably in the fields of telecommunications and information technology.

In the course of this book, we have described several kinds of applications. Everybody is familiar with the translation tools freely available on the Internet and created by Internet giants such as Google or Microsoft. In a multipolar, multilingual world, mastering this technology is an absolute must for Internet and telecommunications companies with global ambitions. Machine translation is the key to multiple products with great potential in the near future, such as live multilingual communication or multilingual access to patent databases. Some of these applications will generate significant revenue in the years to come.

Another type of application is probably not as well known: several specialized companies supply professional commercial translation products. These products are complex, adaptable, and often sold with specific services (especially the development of specialized dictionaries or the rapid integration of new languages on demand). This kind of product is primarily sold to large companies and government administrations, especially in the military and intelligence domains. Strategic interests thus play a major role in this context. The adaptability of such systems is also a key element: a solution provider’s ability to rapidly supply accurate translations for a new field or a new language is of utmost importance. In this framework, it is also often crucial for customers to be able to develop resources themselves, especially when the data to be analyzed are classified.

The development of communications networks, mobile Internet, and the miniaturization of electronic devices also highlight the need to switch quickly to audio applications that are able to translate speech directly. Speech processing has been the subject of intensive research in recent decades, and performance is now acceptable. However, the task remains difficult since speech processing as well as machine translation have to be performed in real time, and errors are cumulative (i.e., if a word has not been properly analyzed by the speech recognition system, it will not be properly translated). Large companies producing connected tools (Apple, Google, Microsoft, or Samsung, to name a few) develop their own solutions and regularly buy start-ups in technological domains. They need to be first on the technological front and propose new features that may be an important source of revenue in the future.

The future will likely see the integration of machine translation modules in new kinds of appliances, as seen in chapter 14. Microsoft has already presented live demonstrations of multilingual conversations, integrating speech translation into Skype. Google, Samsung, and Apple are creating similar applications for mobile phones, and even for “smart” eyeglasses. While it is not yet clear whether these gadgets will really be used in everyday life, they are interesting for specific professional contexts, such as the maintenance of complex systems in the aeronautic or nuclear industry, where technicians must be able to communicate while keeping their hands free. It is clear that commercial challenges will continue to drive research toward more powerful and accurate systems.

We live in a multilingual world, yet there are problems of language domination related to machine translation (and to the field of natural language processing as a whole), since the domain is of course not independent of economic and political considerations. As has been emphasized, even if the systems available on the Internet officially propose to translate into several tens of languages, the quality is very poor for most, especially if English is not the source language, or, better, the target language. Aside from Indo-European languages (English, Russian, French, German, etc.), some languages (such as Arabic or Chinese) are now the focus of intensive research. These are usually the most widely spoken languages in the world and associated with great economic potential. One can also find research projects addressing rarer languages, but they remain marginal, and the quality of these systems is generally very moderate. Processing rarer languages remains a highly interesting challenge, as long as it is not dominated purely by economic interests.

A Cognitively Sound Approach to Machine Translation?

To conclude this journey, we would like to say a few words about cognitive issues. The most active researchers in the field of machine translation generally avoid addressing cognitive issues and make few parallels with the way humans perform a translation. The artificial intelligence domain has suffered from spectacular and inflated claims too much in the past, and in relation to systems that had nothing to do with the way humans think or reason. It may thus seem reasonable to focus on technological issues and leave any parallel with human behavior aside, especially because we do not, in fact, know much about the way the human brain works.

However, it may be interesting in this conclusion to have a look at cognitive issues despite what has just been said, because the evolution of the field of machine translation is arguably highly relevant from this point of view. The first systems were based on dictionaries and rules and on the assumption that it was necessary to encode all kinds of knowledge in the source and target languages in order to produce a relevant translation. This approach largely failed because information is often partial and sometimes contradictory, and knowledge is contextual and fuzzy. Moreover, no one really knows what knowledge is, or where it begins and where it ends. In other words, developing an efficient system of rules for machine translation cannot be carried out efficiently by humans, since the task is potentially infinite and it is not clear what should be encoded in practice.

Statistical systems then seemed like a good solution, since these systems are able to efficiently calculate complex contextual representations for thousands of words and expressions. This is something the brain probably does in a very different way, but nevertheless very efficiently: we have seen in chapter 2 that any language is full of ambiguities (cf. “the bank of a river” vs. “the bank that lends money”). Humans are not bothered at all by these ambiguities: most of the time we choose the right meaning in context without even considering the other meanings. In “I went to the bank to negotiate a mortgage,” it is clear that the word “bank” refers to the lending institution, and the fact that there is another meaning for “bank” is simply ignored by most humans. A computer still has to consider all options, but at least statistical systems offer interesting and efficient ways to model word senses based on the context of use.

We have also witnessed rapid progress, from the very first systems based on a word-for-word approach to segment-based approaches, which means that gradually longer sequences of text have been taken into account, leading to better translations. The new generation of systems based on deep learning takes into account the whole sentence as the basic translation unit and thus offer a valuable solution to the limitations of previous approaches. We have also seen that this approach takes into account all kinds of relations between words in the sentence, which means that structural knowledge (i.e., some kind of syntax) is involved in the translation process. The fact that all this information is embedded and processed at the same time in a unique learning process means that one does not need to deal either with the delicate integration of various complex modules or with the propagation of analysis errors, contrary to what happened with most previous systems (but note that errors can also be percolated into the neural network; the sole use of neural networks does not solve all problems magically, of course).

The new generation of systems based on deep learning directly takes into account the whole sentence as the basic translation unit and thus offers a valuable answer to the limitations of previous approaches.

In practice, deep learning systems still suffer from important limitations, and we saw in chapter 12 a number of the research issues at stake (unknown words, long sentences, optimization problems, etc.). While we are still far from perfect machine translation systems, it is nonetheless interesting to see that the best-performing systems now operate directly at sentence level, make limited use of manually defined syntactic or semantic knowledge, and produce translations on the fly based on huge quantities of data used for training. They thus seem to account for several characteristics of human language: for example, the fact that child language acquisition is based on language exposure (and not on the explicit learning of grammar rules); and the fact that word distribution and linguistic complexity play a role (some words are more frequent than others and are learned before others, and simpler syntactic structures are easier to acquire and easier to translate). It is not completely clear how neural networks work, what knowledge they effectively use, and how their architecture influences the overall result, but it is clear that they bear interesting similarities to basic features of human languages.

As already said, deep learning machine translation is still in its infancy. We can expect quick progress as the systems achieve better quality and will gradually appear in a broader number of professional contexts. Automatic systems will, of course, not replace human translation—this is neither a goal nor a desired outcome—but they will help millions of people have access to information they could not grasp otherwise. Digital communication will continue to grow, as will research in the machine translation domain, and one can expect that in the not-too-distant future, it will be possible to dialogue over the phone with someone speaking another language. One will then just need to introduce a small device into one’s ear to understand any language, and Douglas Adams’ Babel fish will no longer be a fiction—although the device may not be a fish!