In this chapter we examine the different possible approaches and the main tendencies observed in the domain of machine translation since its beginnings. It is important to have an idea of the main challenges and the main evolutions of the domain before diving into more detail. Each of these approaches will then be detailed in the following chapters.
Different approaches and different techniques have been used for machine translation. For example, translation can be direct, from one language to the other (i.e., with no intermediate representation), or indirect, when a system first tries to determine a more abstract representation of the content to be translated. This intermediate representation can also be language independent so as to make it possible to directly translate one source text into different target languages.
Translation can be direct, from one language to the other (i.e., with no intermediate representation), or indirect, when a system first tries to determine a more abstract representation of the content to be translated.
Each system is unique and implements a more or less original approach to the problem. However, for the sake of clarity and simplicity, the different approaches can be grouped into three different categories, as most textbooks on the topic do.
These three kinds of approaches can be considered to form a continuum, going from a strategy that is very close to the surface of the text (a word-for-word translation) up to systems trying to develop a fully artificial and abstract representation independent of any language. These varying strategies have been summarized in a very striking figure called the “Vauquois triangle,” from the name of a famous French researcher in machine translation in the 1960s (figure 2).
Direct transfer, represented at the bottom of the triangle, corresponds to word-for-word translation. In this framework, there is no need to analyze the source text and, in the simplest case, a simple bilingual dictionary is enough. Of course, this strategy does not work very well, since every language has its own specificities and everybody knows that word-for-word translation is a bad strategy that should be avoided. It can nevertheless give some rough information on the content of a text and may seem acceptable when the two languages considered are very close (same language family, similar syntax, etc.).
Researchers have from the very beginning also tried to develop more sophisticated strategies to take into account the structure of the languages at stake. The notion of “transfer rules” appeared in the 1950s: to go from a source language to a target language, one needs to have information on how to translate groups of words that form a linguistic unit (an idiom or even a phrase). The structure of sentences is too variable to be taken into account directly as a whole, but sentences can be split into fragments (or chunks) that can be translated using specific rules. For example, adjectives in French are usually placed after the noun, whereas they are before the noun in English. This can be specified using transfer rules. More complex rules can apply to structures like “je veux qu’il vienne” ⇔ “I want him to come,” where there is no exact word-for-word correspondence between the two sentences (“I want that he comes” is not very good English, and “je veux lui de venir” is simply ungrammatical in French).
The notion of transfer can also be applied to the semantic level in order to choose the right meaning of a word depending on the context (for example, to know whether a given occurrence of “bank” refers to the bank of a river or to a money-lending institution). In practice, this is a hard problem if done manually, since it is impossible to predict all the contexts of use of a given word. For exactly the same reason, this quickly proved to be one of the most difficult problems to solve during the early days of machine translation. We will see in the following chapters that, more recently, statistical techniques have produced much more satisfactory results, since the problem can be accurately approached by the observation of very large quantities of data—the kind of thing computers are very good at, and humans less so (at least when they try to provide an explicit and formal model of the problem).
Last but not least, another family of systems is based on the notion of interlingua, as we have already seen in the previous section. Transfer rules, by definition, always concern two different languages (i.e., English to French in our examples so far) and thus need to be adapted for each new couple of languages considered. The notion of interlingua is supposed to solve this problem by providing a language-independent level of representation. Compared to transfer systems, the interlingual approach still needs one analysis component to go from the source text to the interlingual representation, but then this representation can give birth to translations into several languages directly. The production of a target text from the interlingual representation format, however, requires what is called a “generation module”—in other words, a module able to go from a more or less abstract representation in the interlingual format to linguistically valid sentences in the different target languages.
Interlingual systems are very ambitious, since they need both a complete understanding of the sentence to be translated and accurate generation components to produce linguistically valid sentences in the different target languages. Moreover, we saw in the previous chapter that understanding text is to a great extent an abstract notion: what does it mean to “understand”? What information is required so as to be able to translate? To what extent is it possible to formalize the comprehension process given the current state of the art in the domain? As a result, and despite several years of research by several very active groups, interlingual systems have never been deployed on a very large scale. The issues are too complex: understanding a text may potentially mean representing an infinity of expressed and inferred information, which is of course highly challenging and simply goes beyond the current state of the art.
The classification of machine translation systems provided in the previous section is challenged by new approaches that have appeared since the early 1990s. The availability of huge quantities of text, especially on the Internet, and the development of the capacity of computers have revolutionized the domain.
Most current industrial machine translation systems, and especially the most popular ones (Google translation, Bing translation) are based on a statistical approach that does not completely fit into the previous classification. These systems are not primarily based on large bilingual dictionaries and sets of hand-crafted rules. The first statistical systems implemented a kind of direct translation approach, since they tried to find word equivalences between two different languages by directly looking at very large amounts of bilingual data (initially coming from specialized international institutions and, more recently, for the most part, harvested on the web).
Statistical approaches are now considerably more precise. They no longer deal with isolated words but are now able to spot sequences of words (such as compounds, idioms, frozen expressions, or just regular sequences of several words) that need to be translated as a whole. The most recent approaches even try to tackle the problem directly at the sentence level. It should be noted that these systems have their own internal representations that are generally not directly readable by a human being. It is thus necessary to consider the nature of these representations: to what extent do they render semantic information? Can we draw any parallel between this approach and the way humans handle language?
The success of these systems lies in the fact that they are able to grasp regular equivalencies between languages, but also more or less frozen sequences in a text, based on a purely statistical analysis. Since, as we have seen, meaning is not something formally defined but corresponds to the way words are used, a purely statistical approach can be quite powerful in discovering regularities and specific contexts of use. However, for a long time (and in most available online systems still now) equivalencies were calculated at a local level and involved fragments of text that often overlap. The main difficulty was then to make sense of all these fragments and equivalencies: statistical systems then had to deal with a multiplicity of fragments that had to be assembled to give birth to a well-formed sentence. These fragments may provide contradictory information and are thus not fully compatible. The task could be compared to assembling a jigsaw puzzle using a set of pieces coming from 10 different puzzles. In chapter 12, we give a brief overview of the most recent approaches based on deep learning, a new and again completely different paradigm, which attempts to tackle the problem directly at the sentence level. Deep learning approaches thus have the potential to give considerably more precise results.
The history of machine translation can be summarized as follows:
We now need to examine these different stages in more detail, so as to better understand the approaches and their main challenges as well as the limitations of each system.