Machine translation is closely related to the advent of computers, allowing scientists to imagine a fully automatic translation process. At the same time, we should not forget that very early on, there were also philosophical, religious, and scholarly speculations about the possibility of automating translation that are important for the history of the field. Eventually, the first half of the twentieth century saw the design of prototypes that prefigured some of the systems developed since the 1950s.
A long and relevant tradition from this perspective is the quest for a universal language. If such a language were to exist, it would by nature eliminate the need for translation, given its universal nature. More realistically, one could think of an artificial language that would facilitate translation between existing languages. Needless to say, this is a key point for machine translation. Different groups have explored this idea of a universal language; the systems developed on this basis are commonly called “interlingual,” as seen in the previous chapter.
In the Western tradition, the Ancients often refer to an Adamic language, that is, a hypothetical and universal protolanguage spoken by humanity before the story of Babel. This aroused the interest of Leibniz, even if he was also among the first to partially abandon this tradition, since he believed it was impossible to rediscover this Adamic language from our modern languages. He nevertheless developed a project intended to eliminate ambiguity in languages, by defining a new artificial language that no longer referred to any supposed Adamic tradition, for the purpose of solving various problems, such as moral, legal, or philosophical dilemmas (Leibniz, 1951).
Descartes1 is often cited along with Leibniz in this respect, since he was also interested in the idea of a universal language and its relationship with existing languages. Witness the following passage about a proposal for a universal language: “If [someone] put into [his] dictionary a single symbol corresponding to aymer, amare, philein and each of the synonyms, a book written in such symbols could be translated by all who possessed the dictionary” (Descartes, letter to Mersenne on November 20, 1629). This passage greatly inspired machine translation pioneers, since Descartes’ proposal aimed to replace words with unambiguous codes (“symbols” corresponding to numerical codes that are independent from the languages considered; symbols replace words in Descartes’ proposal).
In the wake of these proposals, several attempts to develop a “numerical dictionary” emerged at the end of the seventeenth century in Europe. A numerical dictionary is a dictionary in which a specific number (an identifier) is associated with each word or concept. Those who attempted the task include Cave Beck in 1657, Johann Joachim Becher in 1661, Athanasius Kircher in 1663, and John Wilkins in 1668. Hutchins2 mentions that Becher’s dictionary was republished in Germany in 1962 as “On mechanical translation: A coding attempt from 1661.”3 Also worth mentioning in France are Joseph de Maimieux (inventor of the term pasigraphy in 1797, which refers to a style of writing, or universal notation system) and Arman-Charles-Daniel de Firmas-Périés, who developed such a system in 1811. The main application was the encoding and decoding of messages, essentially for military needs.
However, it is important to refrain from seeing these initiatives as direct precursors of machine translation. Leibniz’s and Descartes’ essentially aimed to solve philosophical, logical, and moral problems. While they addressed questions of language and translation in their writing, their research by no means supported the idea of automatic translation (though the correspondence between Mersenne and Descartes regularly mentioned the topic of translation). Leibniz’s and Descartes’ work, as well as the coding systems that followed them, were sources of inspiration for various researchers (and are often cited in the writings of the pioneers of machine translation), but they do not appear to have ever been used for the development of real systems.
The notion of universal language brings to mind artificial languages, of which Volapuk and Esperanto are the most well known. Volapuk is an artificial language invented in 1879 by Johann Martin Schleyer (1831–1912), while Esperanto was invented by Ludwik Lejzer Zamenhof (1859–1917) with the goal of facilitating communication between people with different mother tongues. Zamenhof published his project, called Lingvo Internacia (International Language), in 1887, under the pseudonym of Doktoro Esperanto (“Doctor who hopes”), the name by which the language became popular afterwards. All these projects emerged at the end of the nineteenth century in order to facilitate trade and peaceful cooperation between populations.
Artificial languages remain as a source of inspiration more than a real resource actually used in automatic translation systems.
Although these projects resulted in relatively advanced proposals with vocabularies and grammar systems, they have rarely been actively used for machine translation. Esperanto was used during the 1980s in the European Distributed Translation Language project and within the Fujitsu company in Japan, but these two projects were not completed. Artificial languages thus remain as a source of inspiration more than a resource actually used in automatic translation systems. One reason is probably that Esperanto remains a language designed for humans (Esperanto being itself based on various existing European languages): it does not have the characteristics of a language intended to be directly manipulated by computers. During the 1990s, the Universal Networking Language project aimed to develop such an artificial language to be used directly by computers, but it also remains to date relatively little used.
During the 1930s, two researchers devised mechanical systems oriented toward multilingual dictionaries and semiautomatic translation (for more information, see Hutchins, 2004).
The first attempt was the work of Georges Artsrouni, a French engineer of Armenian origin who had completed his studies in Russia before emigrating to France in 1922. In July 1933, he filed a patent application for a “mechanical brain”: it was not so much a predecessor of modern computers as a machine to store and retrieve various types of information automatically. Two prototypes were built (probably between 1932 and 1935) and aroused great interest during public demonstrations. The machine even received a “grand prix” at the 1937 Universal Exposition in Paris (another prototype was built but never completed; the two existing models are stored at the Musée des Arts et Métiers in Paris).
In the late 1930s, various organizations handling large amounts of information showed considerable interest in this machine (in his patent, Artsrouni mentions that his machine could automate the consultation of railway schedules, telephone directories, and the search for words in dictionaries). Nevertheless, World War II prevented these contracts from succeeding. Finally, the emergence of computers after the war made these purely mechanical machines obsolete.
Artsrouni’s system was not specifically dedicated to translation, though the inventor stressed from the beginning that this field was one of the most promising. The machine could store linguistic data (i.e., simple words) in different languages on a simple strip of paper. Each word was encoded in a unique way thanks to a set of perforations along the paper strip according to the principles of punch cards. A keyboard was used to indicate to the machine the sought-after word, and it could then automatically find the corresponding translations from the coding strip.
The system did not allow Artsrouni to go any further. He was not a linguist and never addressed the difficulties of machine translation, but his archives clearly demonstrated that he was one of the first to invent a completely automatic system based on multilingual dictionaries. He also thought of fairly realistic uses for his machine; for example, telegrams written in an elliptical style that would fit well with a word-for-word translation. Artsrouni also planned to directly store more complex linguistic units such as compound words for his machine: the only limit was the time and effort needed to encode the data.
Petr Petrovitch Smirnov-Trojanskij (1894–1950), who worked as a professor in Russia, filed a patent for a machine that would select and code words for translation between several languages. The machine was probably never developed as a prototype.
Smirnov-Trojanskij invented a workspace that to a certain extent was close to Artsrouni’s machine: a mechanism specified a word to the machine, which was then capable of presenting translations for various available languages. Smirnov-Trojanskij’s invention was only concerned with translation, unlike the machine developed by Artsrouni.
What makes Smirnov-Trojanskij’s invention remarkable is that it goes beyond the simple coding of words and their translation. He imagined a system of 200 primitives capable of representing the function of a word in a sentence, in order to generate the correct translation in the target language (Smirnov-Trojanskij was interested in Russian, where nouns and adjectives are inflected to reflect their function in the sentence). The analyst had to specify whether the word to be translated was the subject or the object, whether the verb was in the present or imperfect tense, and so on. The machine then took over, selecting the correct word form for the translation.
The invention focused on a workspace, rather than on a simple device: Smirnov-Trojanskij’s system was designed in such a way that a translator could first simply look for translation elements at word level with the help of the device. A professional text editor or a translator then intervened at the very end to edit the text and make corrections from a stylistic point of view. The difficulties of machine translation are not described in detail in his proposal, but this project is interesting in that Trokanskij envisioned an environment for assisted translation rather than a completely automatic process. We will see in the following chapters that the quality of automatically obtained translations remains a major issue, along with the way in which machine translations could be efficiently corrected by human editors.
It should be noted that, despite the considerable interest of their proposals, these two inventors have remained largely ignored. Artsrouni’s system was not continued after the war, as it was clear that the future would lie with electronic machines (much more powerful than mechanical machines). Smirnov-Trojanskij’s work environment, which never produced an operating system, was also largely ignored in favor of completely automatic translating systems.