7

Machine Translation Systems

7.1 Computers that “translate”?

This chapter is about technology for automatic translation from one human language to another. This is an area where the gap between (science) fiction and (science) reality is especially large. In Star Trek, the crew uses a Universal Translator that provides instantaneous speech-to-speech translation for almost all alien languages. Sometimes there is a short pause while the Universal Translator adapts to a particularly challenging language, and very occasionally the system proves ineffective, but most of the time everything simply works. In The Hitchhiker’s Guide to the Galaxy, Douglas Adams’ affectionate parody of the conventions of science fiction, the same effect is achieved when Arthur Dent puts a Babel Fish in his ear. In Matt Groening’s Futurama (another parody), one of the characters invents a Universal Translator that translates everything into French (in Futurama, nobody speaks French any more, and almost everyone knows standard English anyway, so this device, while impressive, has no practical value whatsoever). In the real world however, instantaneous general-purpose speech-to-speech translation is still a far-off research goal, so this chapter will focus on simpler uses of translation.

Before discussing uses of translation, it is useful to define the concept and introduce some terms. Translation is the process of moving text or speech from one human language to another, while preserving the intended message. We will be returning to the question of what exactly is meant by preserving the intended message throughout the chapter. For the moment, please accept the idea at its intuitive face value. We call the language that the translation starts from the source language and the language that it ends up in the target language. We say that two words, one from the source language, one from the target language, are translation equivalents if they convey the same meaning in context. The same idea applies to pairs of phrases and sentences.

Figure 7.1 Some overconfident translations into English

image

Figure 7.1 contains some examples of difficulties that can arise if you are not careful to check whether a translation is saying the right thing. If you are ever asked to put up a sign in a language that you do not fully understand, remember these examples, and get the content checked.

It is unclear what happened with the first one. Perhaps the source text was about some kind of special ice-cream, and the adjective that described the ice-cream was mistranslated. Or perhaps the word “fat” was omitted, and the translation should have been “no-fat ice cream”. The second one is presumably trying to describe a ­doctor who specializes in women’s and other diseases, but the possessive marker is omitted, making the sentence into an unnecessary suggestion that women are a kind of disease. The third one happens because the sense of “have children” as “give birth” was not salient in the mind of the person who created the sign. The fourth one is just a poor word choice: “valuables” is the word that the translator was searching for, and “values” is unintentionally funny. The French word “valeurs” means “values” and the French phrase “valeurs familiales” is used in much the same way as “family values” is in English. Of course, this kind of ambiguity can also happen when English native speakers write notices in a hurry and do not check them carefully.

The last three are a little different, because they are failures of the process of translation. Presumably Wikipedia was used to find an article about some foodstuff, but the amateur translator mistook the name of the website for the name of the foodstuff. We have a great deal of sympathy for the translator here, because “Wikipedia” does feel like a good name for a foodstuff, possibly some kind of seafood with legs. For the dining hall, the graphic artist must have thought that the string “Translate server error” actually was the translation.

We chose the examples in Figure 7.1 because we hope that they are funny. But in other settings, translation errors can have serious consequences. This happened when a young male patient was brought into a Florida hospital’s emergency room in a coma. His mother and girlfriend, speaking in Spanish, talked to the non-Spanish-speaking emergency room staff. They used the word “intoxicado”, which can mean several things, including “nauseous”. Because the emergency room staff were not professional interpreters, they thought that “intoxicado” must mean the same as the English “intoxicated” and treated him on the assumption that he was under the influence of drugs or alcohol. The patient was eventually diagnosed with a brain aneurysm and became quadriplegic. This case is famous, because it led to a lawsuit and drew attention to the principle that hospitals, courts, and other institutions that work with the public should plan for the needs of a multilingual population and provide the necessary translation and interpretation services. This is an issue of basic fairness, public safety, and perhaps also civil rights.

7.2 Applications of translation

7.2.1 Translation needs

In web search and information retrieval, the user can be thought of as having an information need (Section 4.3.1). In the same way, a potential user of translation technology can be thought of as having a translation need. If you have taken comparative literature classes at high school or college, you will have encountered translations into English of foreign-language poetry and novels. A publisher who wants to make a new English version of a novel by José Saramago has a translation need. This need is far beyond the capabilities of any present-day automatic system, because for this purpose, a good translation needs to be a literary work in its own right. As well as getting across the content, it must capture the “feel” of the original writing. The only way to do this is to employ a human translator who has excellent literary skills in both English and Portuguese. Because there are so many English-speaking readers in the world, and the author is a Nobel prize winner, the publisher can expect a good return on investment in expert help. Nobody would expect a machine-generated translation to be good enough for this purpose, although it would be cheap.

Another quite demanding translation need is that of a scholar needing to understand an academic paper written in a foreign language. Here, the scholar does not care much about the “feel” of the article, but will want to be sure about the arguments that are being made, the conclusions that are being drawn, and the evidence that is being used to support these conclusions. To do this well, even if the paper is in your native language, you need to have some training in the relevant academic field or fields. Academic papers are designed to be read by informed experts, so if the paper is on linguistics, what you are looking for is a translator who is expert enough in linguistics to be able to understand and accurately translate the arguments in the text. A translator who is a specialist in translating business documents will almost certainly struggle to make a useful translation of a linguistics paper. We should not expect a machine translation system to be any better at meeting this specialized need.

7.2.2 What is machine translation really for?

In Chapters 2 and 4, we described technology that supports the everyday scholarly activities of information gathering and writing. Each one of you already knows much about these activities. The role of the computer is to help out with the aspects of the process that it does well, and to keep out of the way when the human writer knows better. Translation is a little different, because it is usually done by trained experts rather than the general public. Professional translators sometimes worry that machines are going to take over their role, or that free web-based services will lead to big changes in the market for their skills. These worries are reasonable, since all knowledge workers will need to adapt as technology changes, but the reality is that professional translators are still going to be required in future. Unless you have had an internship with a translation company, you probably do not know as much about the market for translation as you do about the activities of writing and information gathering. In this chapter, as well as explaining some of the technology, we will go into detail about the business needs for translation and the changes that are resulting from the easy availability of free online translation.

From this perspective, literary translations are interesting to think about when we are trying to understand the process of translation, but are not the first place to look for practically important uses of translation technology. For that, we need to focus on more everyday translation needs. For example, if you are considering buying a new mobile phone, but the model you want is not yet available in your country, you may want to read customer reviews from another market, perhaps ones written in a language you do not know well.

This is a translation need, but literary quality is no longer a relevant criterion. You want to know about battery life, quality of the input devices, usability of the built-in software, and so on. (See Chapter 5 on classifying documents for other ways in which computers can assist in this.) If there are points in the review that you can understand, errors may not matter. The German text says that the phone is a little slimmer (schmaler) than the iPhone, and the English version is “The Magic is smaller, and therefore slightly better in the hand”. A human translator might write this as “The Magic is slimmer and therefore more pleasant to handle”, but the rather literal automatic translation serves the purpose perfectly well, even though, strictly speaking, it is inaccurate. Free web-based translation (specifically, Google Translate, in July 2009) is quite adequate for this translation need.

A third type of translation need turns up in large multinational organizations such as electronics companies. For them, there is often a legal requirement that the instruction manuals and other documentation be available in the native language of the user. Here, accuracy obviously matters, but fine points of literary style are not important. Indeed, if there is a choice between elegant literary phrasing and an uglier version using simple and direct language, the latter is preferable. For this translation need, there is an especially good choice of translation technology, called example-based translation. This relies on the fact that most of the sentences in an instruction manual are either the same as or very similar to sentences found somewhere in a collection of previously translated manuals. Electronics companies keep copies of all the manuals that they produce, as well as the translations. The software that does this is called a translation memory. The first step is for the human translator to use the search capability of the translation memory to find relevant sentences that have already been translated. These previously translated sentences can be used as a basis for the translation of the new sentence. For example, suppose that the sentence to be translated into German is:

(33) The FX380B has a color LCD screen and an automatic rangefinder.

and we have the following example sentences already translated:

(34) a. The ZX65 has a color LCD screen.
      = Die ZX65 hat einen LCD-Farbbildschirm.
  b. The FX809 has a 4 cm screen and a flashgun.
      = Die FX809 hat einen 4 cm-Bildschirm und ein Blitzgerät.
  c. The larger model has an automatic rangefinder.
      = Das größere Modell verfügt über einen automatischen Entfernungsmesser.

Then it should not be too hard to piece together the underlined fragments to get:

(35) The FX380B has a color LCD screen and an automatic rangefinder.
= Die FX380B hat einen LCD-Farbbildschirm und einen automatischen Entfernungsmesser.

The simplest way of doing this is to write a program that uses search technology to find the best matching sentences, leaving to a human the task of working out how to put the fragments together. Alternatively, a fairly simple program could get most of the way. There would need to be a special routine to make sure that the name of the product “FX380B” was transferred appropriately, but the rest is pretty straightforward. The output should be checked by a human being, to make sure that the legal requirement for adequate documentation has been met. Once again, the standards that apply depend on the nature of the documentation need. The documentation for a medical x-ray machine that is going to be used by a multilingual staff, and that could hurt people if it is used incorrectly, needs to be excellent in all the relevant languages. The translations of the documentation for a mobile phone are not so safety-critical, so it may be acceptable to be a little less careful.

The important lesson of this section is that the translation technology that is used should depend on the details of the translation need, not just on some abstract notion of what counts as a good translation.

7.3 Translating Shakespeare

For fun, let us return to the idea of translating Shakespeare. Figures 7.2, 7.3, 7.4, and 7.5 contain a range of versions of Shakespeare’s Sonnet 29. One is the original Shakespeare, one a brilliant translation into German by the twentieth-century poet Stefan George. Then there are two machine translations from English to German, and a back-translation by Google Translate of George’s German into English.

Figure 7.2 Shakespeare’s Sonnet 29 in the original

image

Figure 7.3 Shakespeare’s Sonnet 29 as translated by Stefan George

image

Figure 7.4 Google’s translation of Sonnet 29

image

Figure 7.5 Google’s translation of Stefan George’s version

image

When translating poetry, it is important to do more than just get the meaning across: you have to also aim for a similar overall impression. Stefan George’s translation is almost line for line, and uses the same rhyme scheme as Shakespeare. “Deaf heaven” in Shakespeare’s third line finishes up as “tauben himmel” in George’s fourth, and the “rich in hope” in Shakespeare’s fifth becomes “von hoffnung voll” in George’s sixth. The crunchy contrast of “with what I most enjoy contented least” turns into the alliterative “Dess mindest froh was meist mich freuen soll”.

German and English are quite closely related – words like “king” and “König” or “earth” and “Erde” come from the same original source and mean the same. Linguists call these corresponding words cognates. Cognates sometimes make it possible to work out which lines go with which even if you don’t speak German well enough to read the text in detail.

Figures 7.4 and 7.5 show what happens with automatic translation. In Google’s German, the word “beweep” has defeated the system, and simply been transferred over unchanged. Also, the fourth line translates the English word “curse” as a noun “Fluch” when it should be a verb. Because the verb is missing from the German, the translation engine has a hard time finding an acceptable way of fitting in the material that should have been attached to it, and the result is very messy.

In Google’s English (which is a back-translation of George’s good German), there are lots of good word-for-word translations. But “los” ( = “fate”) has been “translated” in the same way that “beweep” was: the system has given up. The term “human-looking” is not quite right for “Menschenblick”: Shakespeare had “men’s eyes”, and “human view” is probably what a modern writer would produce. Also, the “For” in “For the deaf sky hopeless cry” should be “To”, thus rendering the final line as “To the deaf sky hopelessly cry”.

Overall, the conclusion regarding poetical translation has to be a qualified negative. Current systems are designed for quite a different translation need. Certainly, the output is not good poetry, or indeed particularly convincing English, but there might be circumstances in which knowing what the poem is about has some ­usefulness. For example, it comes through clearly that Shakespeare’s sensitive protagonist is not having an especially good time. And, more seriously, a human translator might be able to use the machine-translated output as a starting point for revisions.

7.4 The translation triangle

Figure 7.6 shows a diagram that explains one of the main tradeoffs in designing translation systems. At the bottom corners of the triangle in Figure 7.6 are the source and target languages. The captions in the body of the triangle indicate possible relationships between source and target language. If we are lucky, the words and concepts in the target language will match those in the source language, and direct word-for-word translation will just about work. In that case, there is no need to design a complex system, or to use linguistically sophisticated representations.

Figure 7.6 The translation triangle

image

The example-based translation that we discussed in Section 7.2.2 is a method that works directly with the words of the sentence, not analyzing them in any deep way. The translation triangle illustrates the fact that if you work at the level of words, source and target languages are quite far apart. For some uses, such as example-based methods for highly stereotyped and repetitious texts, such direct approaches to translation work well enough, but broadly speaking they are insufficient for more general tasks.

The labels on the body of the triangle represent various different kinds of similarities and differences that might turn up in translation. There can be differences in the linguistic rules used by the two languages and also differences in the way in which the languages map concepts onto words. If you are prepared to do work to produce abstract representations that reconcile these differences, you can make translation easier. The captions are placed according to our judgment of how easy it would be for an MT system to create them, with the ones that would take more work nearer the top.

The arrow labeled Abstraction points in the direction of increasing abstraction, as well as increasing distance from the specifics of the source and target languages. The higher you go, the fewer the differences between source and target languages, and the easier the task of converting the source language representation into the target language representation. This is represented by the fact that the sides of the triangle get closer as we go up. The corresponding arrow, labeled Concreteness, points in the direction of increasing concreteness (therefore, also, in the direction of ­decreasing abstraction).

However, although translation itself gets easier as the representation becomes more abstract, the task of moving from the words of the source language to a more abstract representation gets harder as the representation becomes more abstract, as does the task of moving from the abstract representation of the target language back down to the words of the target language. This is represented by the fact that the distance between the representations and the bottom of the triangle grows larger as we go up. Unless the consumers of your translations are highly sophisticated linguists, it will not do to give them abstract representations: they need words.

At the apex of the translation triangle we would have a so-called interlingua. At this point there would be no difference between the source and target language representations. If you could do full linguistic analysis and get all the way to a common language-neutral meaning, there would be no translation work to do.

Thus, an interlingua is a representation that can be reached by analyzing the source language, and that can then be used, unchanged, to generate the equivalent sentence in the target language. If this could be achieved, it would have the tremendous advantage that in order to add a new language (e.g., Hungarian) to our translation system we would only need to build a Hungarian-to-interlingua module and an interlingua-to-Hungarian module. Once that was done, we could translate to or from Hungarian into or out of any of the other languages in the system.

The fundamental problem with this approach is that the interlingua is an ideal rather than a reality. Despite great efforts, nobody has ever managed to design or a build a suitable interlingua. Worse, it has turned out to be very difficult to build fully adequate solutions for the step that analyzes the source words and produces the interlingua. A useful partial solution for this task, which is known as parsing, is described in detail in Section 2.4.1. Finally, the remaining step, the one that generates the target language words from the interlingua, is also a difficult research topic on which much effort and ink has been spent. In summary, the interlingua idea is useful because it clarifies what would need to be done in order to build an excellent MT system based on solid linguistic principles. This is helpful as a focus for research, but it does not offer a ready-made solution.

Figure 7.7 English phrase structure rules for John goes on the roller coaster

image

So, while an interlingua-based system would be good in theory, we probably cannot have one any time soon. For practical purposes, it is often best to be somewhat less idealistic, and build systems that do simple things well. Sometimes a direct approach will work well, because the given languages express things in similar ways.

Figures 7.7 and 7.8 give an example of two sentences for which everything is parallel, so the direct approach would work well. If you know German, you will realize that the only really tricky aspect of this pair of sentences is getting the right feminine dative ending on the article “der” that translates “the”. For the record, the German is actually a little less ambiguous than the English: the German unambiguously says that John is riding on the roller coaster, not going onto the roller coaster, but the English allows either situation. In translating, you often have to make an educated guess about what was meant in order to translate correctly. An automatic system has to do the same, by whatever means it can.

In practice, many systems fall in the middle ground between the direct method and interlinguas, doing some linguistic analysis in order to move a little way up the translation triangle. The result of this is a language-specific transfer representation for the source language sentence. Then the system designer writes transfer rules to move the language-specific source language representation into a language-specific target language representation. Finally, a third module moves from the language-specific target language representation to an actual target language sentence. Systems that use this design are called transfer systems. The advantage of this approach is that the transfer representations can be designed in such a way that the transfer rules are reasonably easy to write and test; certainly easier than the kind of rules you have to write in order to switch around the word order in a direct approach to the translation between English and German. The disadvantage is that if you have N languages you have to write N(N − 1) sets of rules, one for each pair of languages and direction of translation.

Figure 7.8 German phrase structure rules for John fährt auf der Achterbahn

image

7.5 Translation and meaning

The main requirement that we can realistically expect from machine translation is that the translation process preserves meaning. That is, we want a person who speaks only the target language to be able to read the translation with an accurate understanding of what the source language original was saying. Before we can tell whether a translation is good, we will need to have a clear way of talking precisely about the meanings we are aiming to preserve. Therefore, before studying the details of translation, we will need to introduce some ideas from the linguistic subfield of semantics, which is the study of the part of human language grammar that is used to construct meanings (see also Section 3.3 for a discussion of semantic relations in the context of language tutoring systems).

Notice that we are no longer asking for the translated text to preserve aspects of the message other than meaning. The style of the translation could be very different from the style of the original. So long as the meaning is preserved, we will call the result a success.

In our discussion of grammar checking (Section 2.4.1), we saw that it is helpful to break up English sentences into smaller components, and give these components names like noun phrase and verb phrase. These components may in turn be broken down into subcomponents, with the whole process bottoming out either in words or in a unit slightly smaller than the word that is called the morpheme. The same kind of process happens in semantics and we thus need to study two aspects. The first is lexical semantics, which is the study of the meanings of words and the relationships between these meanings. Lexical semantics includes the study of synonym (words that mean the same thing), antonyms (words that are opposites), and word senses (the subdivisions of meaning that a particular word can have). This matters for translation, because lexical semantics gives you the tools to understand some of the ways in which elements can change as you move from one language to another.

The other part of semantics explains how a sentence like “Roger outplayed Andy” means something quite different from “Andy outplayed Roger”. The words are the same, but the way they are arranged differs, and this affects the meaning. But “Roger outplayed Andy” means much the same as “Andy was outplayed by Roger”. Here the words differ, but the meaning somehow comes out almost identical. The part of semantics that studies this is called compositional semantics.

The reason that this term (compositional) is used is that the meaning of a whole expression is composed of the meanings of its component parts. Thus, the meaning of the phrase “triangular hat box” is constructed (one way or another) from the meanings of the individual words “triangular”, “hat”, and “box”. It could mean a triangular box for hats, a box (shape unspecified) for triangular hats, or even a triangular box made out of hats, but each one of these meanings can be built out of the meanings of the smaller parts. Researchers in compositional semantics begin with the assumption that there is some kind of kind of mechanism for assembling the word meanings into a sentence meaning and spend their research efforts on experiments and theories designed to shed light on the way this mechanism works. Different languages typically assemble meanings in similar ways, but the fine details of the process differ from language to language, so an automatic translator has to smooth over the differences somehow.

For example, in German, when you want to say that you like skiing, you use an adverb (gern) to express the idea of liking:

(36) Ich fahre gern Ski
   I drive (gladly) on-skis
  ‘I like skiing’.

There are two aspects to notice about this translation:

1. The word “skiing” translates into two words of German, one of which (fahren) means something like the English verb “to drive”, and the other (Ski) is rather like “on skis”.
2. “Gern” doesn’t really mean the same as “gladly”, but this is the best available English word if you want a literal translation.

The meaning comes across, and in each language it is easy to see how the meaning of the whole is related to the meaning of the parts, but the details differ.

Some phrases are not like this at all. Both German and English have idiomatic phrases for dying, but the usual English one is “kick the bucket” and the German one is “ins Gras beissen”, which is literally “to bite into the grass”. The German idiom is rather like the English “bites the dust”, and a tolerant reader can probably accept that the meaning is somehow related to the meaning of the component parts, but the English idiomatic meaning has no obvious connection with buckets or kicking, so linguists label this kind of phrase non-compositional. To understand “kick the bucket” you just have to learn the special idiomatic meaning for the phrase. As a bonus, once you have done that, the slang-term “bucket list” becomes more comprehensible as a way of talking about the list of things you want to do before you die. Later in the chapter, we will see a technology called phrase-based machine translation. This technology learns from data and would probably be able to get the translation right for many idiomatic phrases.

7.6 Words and meanings

7.6.1 Words and other languages

George W. Bush is reputed to have said: “The trouble with the French is that they don’t have a word for entrepreneur”. This is an example of a common type of claim made by journalists and others. But, of course, “entrepreneur” is a French word, listed in the dictionary, and means just about the same as it does in English. Claims like this are not really about the meanings of the words, but rather about cultural attitudes. If President Bush did actually say what was reported, people would have understood him as meaning something like: “The French people do not value the Anglo-Saxon concept of risk-taking and the charismatic business leader as much as they should”. When people say “<languageX> doesn’t have a word for <conceptY>”, they want their readers to assume that if the people who speak <languageX> really cared about <conceptY>, there would be a word for it. So if you want to assert that a culture does not care about a concept, you can say that there is no word for it. It is not important whether there really is a word or not: your intent will be understood. This is a good rhetorical trick, and a mark of someone who is a sensitive and effective language user, but it should not be mistaken for a scientific claim.

Vocabularies do differ from language to language, and even from dialect to dialect. One of the authors (who speaks British English) grew up using the word “pavements” to describe what the other two (speakers of standard American English) call “sidewalks”, and also prefers the terms “chips” and “crisps” to describe what the other two call “fries” and “chips”, respectively. These are actually easy cases because the words really are one-for-one substitutes. Trickier are words that could mean the same but carry extra senses, such as “fag”, which is an offensive term for a gay male in the USA, but (usually) a common slang term for a cigarette in Britain. In elite British private schools, “fagging” is also a highly specialized term for a kind of ­institutionalized but (hopefully) nonsexual hazing in which younger boys are required to, for example, shine the shoes of older boys. This one can be a source of real misunderstanding for Americans who are unaware of the specialized British meaning.

7.6.2 Synonyms and translation equivalents

In school, you may have been exposed to the concepts of synonyms and antonyms. Words like “tall” and “short”, “deep” and “shallow”, “good” and “bad”, “big” and “small” are called antonyms because they have opposite meanings. Words that have the same meaning are called synonyms. Examples of very similar words include pairs like “eat” and “devour” or “eat” and “consume”, “drink” and “beverage”, “hoover” and “vacuum”. As linguists, we actually doubt that there are any true synonyms, because there is no good reason for a language to have two words that mean exactly the same thing, so even words that are very similar in meaning will not be exactly equivalent. Nevertheless, the idea of synonyms is still useful.

Cross-linguistically, it also makes sense to talk about synonyms, but when you look carefully at the details it turns out that word meanings are subtle. The French word “chien” really does mean the same as the English word “dog”, but there is no single French word corresponding to all occurrences of the English word “know”. You should use “connaître” if you mean “to know someone” and “savoir” if you mean “to know something”. Translators have to choose, and so does a machine translation system. Sometimes, as below, you have to choose two different translations in the same sentence.

(37) Do you know that Freud and Conan-Doyle knew each other?
(38) Savez-vous que Freud et Conan-Doyle se connaissent?

The right way to think about the relationships between words in different languages is to note when they are translation equivalents. That is, we try to notice that a particular French word corresponds, in context, to a particular English word.

7.7 Word alignment

This idea of translation equivalence is helpful, because it leads to a simple automatic method that a computer can use to learn something about translation. Called the bag-of-words method, this is based on the use of parallel corpora. As an example, we use the English and German versions of a Shakespeare sonnet that we discussed in Section 7.3. This text is not typical of real translation tasks, but makes a good example for showing off the technology.

The main idea is to make connections between the pairs of sentences that translate each other, then use these connections to establish connections between the words that translate each other. Figure 7.9 is a diagram of the alignment between the first four sentences of Shakespeare’s original and the first four lines of the German translation. Because this is a sonnet, and is organized into lines, it is natural to align the individual lines. In ordinary prose texts you would instead break the text up into sentences and align the sentences, producing a so-called sentence alignment.

Figure 7.9 Correspondence between the two versions

image

Figure 7.10 Word alignment for the first line

image

Once you have the sentence alignment, you can begin to build word alignments. Look carefully at Figure 7.10. Notice that the word “menschenblick” is associated with two words, and that the word “ich” is not associated with any word, because the line we are matching does not have the word “I” anywhere in it. It is useful to make sure that every word is connected to something, which we can do by introducing a special null word that has no other purpose than to provide a hook for the words that would otherwise not be connected. This is shown in Figure 7.11.

Figure 7.11 Revised word alignment for the first line

image

Once we have the word alignment, we have collected some evidence about the way in which individual German words can go with corresponding English words. We are saying that “wenn” is a possible translation of “When”, that “in disgrace” could be translated by “verbannt”, that “men’s eyes” could go with “menschenblick”, and so on. Notice that these are not supposed to be the only answers. In another poem “menschenblick” could easily turn out out to be translated by “human view”, “verbannt” could go with “banned”, and “wenn” could turn out to correspond to “if”. But in Sonnet 29, the correspondences are as shown in the diagram.

In order to automate the idea of word alignment, we rely on the fact that it is fairly easy to identify corresponding sentences in a parallel corpus. Then, to get started, we just count the number of times each word pair occurs in corresponding sentences. If we try this for a little fragment of Hansard (the official record of Canadian parliamentary debates), which is conveniently published in French and English, we can find out that the French word “gouvernement” lines up with the frequent English words in the second column in Table 7.1.

Table 7.1 Word alignment of gouvernement in Hansard

FrenchEnglishCount
Gouvernementthe335
Gouvernementto160
Gouvernementgovernment128
Gouvernementof123
Gouvernementand81
Gouvernementthat77
Gouvernementin73
Gouvernementis60
Gouvernementa50
Gouvernementit46

Likewise, the English word “government” lines up with the frequent French words in the first column of Table 7.2.

Table 7.2 Word alignment of government in Hansard

FrenchEnglishCount
Degovernment195
Legovernment189
Gouvernementgovernment128
Quegovernment91
?government86
Lagovernment80
Lesgovernment79
Etgovernment74
Desgovernment69
Engovernment46

We have left out the infrequently paired words in both lists, because we are expecting many accidental matches. But we are also expecting that word pairs that truly are translations will occur together more often than we would expect by chance. Unfortunately, as you see, most of the frequent pairs are also unsurprising, as the word for government is lining up with a common word of the other language, such as “the”. However, one pair is high up in both lists:

      Gouvernement       government     128

Table 7.3 Selected word-pair statistics in a small aligned corpus

image

This gives us a clue that these words probably do translate as each other. You can do the same with phrases. You can use the word-pair statistics about: “Président”, “Speaker”, and “Mr” to work out, just by counting, that “Monsieur le Président” and “Mr Speaker” are probably translation equivalents. Here is the data that you need to decide this:

Président

Mr

135

Président

Speaker

132

Monsieur

Mr

129

Monsieur

Speaker

127

Because these numbers are all roughly the same, you can tell that this set of words are tightly linked to each other.

The tables we are showing are based on 1,923 sentences, but in a full system we would process many thousands, hundreds of thousands, or millions of sentences, so the tables would be correspondingly bigger. To make further progress on this, we need to automate a little more, because it is hard work poring over lists of word pairs looking for the interesting patterns.

The way to deal with this is to do statistics on the word pairs. Table 7.3 contains some of the higher-scoring pairs from a single file of the Canadian Hansard. Now, instead of calculating the number of times the word pair occurs together, we also collect other counts. The first column of the table is a statistical score called ϕ2 (phi- squared) which is a measure of how closely related the words seem to be. The second column is the French word, the third the English word, and the fourth through seventh are, respectively:

In reality, the table would be much bigger and based on more words, but you can already see that good word pairings are beginning to appear. The point of this table is to motivate you to believe that statistical calculations about word pairings have some value.

7.8 IBM Model 1

Many of the tools of statistical machine translation were first developed at IBM Research in the 1980s. This work completely changed the nature of academic research in machine translation, made it much more practical to develop web services such as Google Translate, and will probably be recognized as the most important work in machine translation for several decades. IBM made five models of the translation process. The first and simplest one, called IBM Model 1, is explained here.

Taking the task of translating from English to French, Model 1 is based on the idea that in each aligned sentence each English word chooses a French word to align to. Initially, the model has no basis for choosing a particular French word, so it aligns all pairs equally.

Figure 7.12 Part of an initial alignment

image

Figure 7.12 shows part of the initial alignment between two sentences. The point of making such alignments is as a step along the road toward making a so-called translation model; we are not doing translation yet, just working with the aligned sentences in order to produce a mathematical model, which will later be used by the translation software to translate new sentences. Notice that we add a special NULL word, to cover the possibility that an English word has no French equivalent. We make an alignment like this for each of the sentences in the corpus, and count the number of times that each word pair occurs. After that, words that occur together frequently will have larger counts than words that occur infrequently. This is what we saw in the previous section. Now we use those counts to form a new alignment that drops some of the weaker edges from the original figure.

An idealized version of what might happen in this process is shown in Figure 7.13. The connections between the first part and the second part of the sentence have been broken, reflecting the fact that there are two distinct phrases.

Figure 7.13 Part of a second-stage alignment

image

The details of what really happens depend on the statistics that are collected over the whole corpus. The hope is that “années” will occur more in sentences aligned with ones having the word “years” than it does with ones aligned with “change”, thereby providing more weight for the “années”–”years” link.

Once we have created a new alignment for each sentence in the parallel corpus, dropping some edges that do not correspond to likely translation pairs, we have a new and more focused set of counts that can be used to form a third series of alignments, as shown in Figure 7.14.

Figure 7.14 Part of a third-stage alignment

image

The principle here is to use the statistics over a whole corpus to thin out the alignments such that the right words finish up aligned. We are going to create a model that represents the probabilities of many of the ways in which words can combine to form a translation. To do this, we use the information contained in the bilingual corpus of aligned sentences.

Since the model is just a representation of what has been learned by processing a set of aligned sentences, you may wonder why we bother with the whole elaborate apparatus of alignments, model building, and probabilities. Could we not simply memorize all the sentence pairs? That would allow us to achieve a perfect translation of the sentences in the bilingual corpus, but it is not really good enough, because we want a solution that generalizes to new sentences that we have never seen before. To do this, we need a way of assigning probabilities to sentence pairs that we have not previously seen. This is exactly what the translation model provides.

When we come to translate new sentences, we will choose a translation by using the probabilities provided by the translation model to piece together a good collection of words as equivalents for the French. Model 1 provides a translation model. We will soon see how to combine this translation model with a language model. This is order to make sure that the translation output not only matches the French, but also works as a fluent sentence in English.

IBM Model 1 is clearly too simple (else why would the IBMers have made Models 2, 3, 4, and 5?) Specifically, Model 1 has the following problems:

Models 2, 3, 4, and 5 introduce different ways of building alignments, ones that are carefully tuned to avoid the deficiencies of Model 1. For a long time, systems based around variants of Models 3, 4, and 5 were state of the art, but lately a new and related idea of phrase-based translation has gained ground. This will be discussed later.


Under the Hood 11
The noisy channel model
To understand how much of machine translation works, we need to step back and look at something called the noisy channel model. The science of information theory was developed in the 1930s and 1940s in response to a need for mathematical tools for reasoning about telegraphy, and also about codes and ciphers. A key concept of information theory is the noisy channel. Standard telephones have a very limited frequency range (from 300 Hz to 3500 Hz), but human hearing goes from roughly 20 Hz to as much as 20,000 Hz, and typical human voices have significant energy in the range from 80 Hz to 5000 Hz (see more in Section 1.4). Male voices have a fundamental frequency that is well below the frequency range transmitted by the telephone. Yet, even over the telephone, we still perceive the low fundamental, and male voices still sound male. This is because hearing is an active process: our perceptual system reconstructs the missing fundamental by extrapolating from the higher harmonics that are transmitted through the telephone. In the same way, over the telephone it is very difficult to hear any difference between /f/ and /s/, because the acoustic differences between these sounds are in frequency bands that are too high to be transmitted. We say that the telephone is a noisy channel, because it fails to transmit the whole of the input signal. This is shown in Figure 7.15.

Figure 7.15 The noisy channel model

image
    The task of the listener at the far end of the noisy channel provided by the telephone is to guess successfully what the speaker actually said. Fortunately, after a little experience, we become accustomed to the effects of the noisy channel, and get better at interpreting telephone speech. We call this knowledge about the channel the channel model: it describes the way in which the channel degrades the input speech. Unfortunately, on its own, the degraded signal passed by the noisy channel may not be enough to recover the original message reliably. Fortunately, however, we have additional information, because we know something about the kinds of messages that we expect to hear. For telephone speech, we know, for example, that the strongest harmonics of the signal will be simple multiples of the missing fundamental frequency, which is why we are able to reconstruct the fundamental frequency even though it is not really transmitted. This knowledge about which messages are likely can be referred to as the message model. Taken together, the channel model and the message model can be used in a process that consumes the degraded form of the input message and produces a well-informed best guess about the original clean message.
    The noisy channel idea is widely applicable. We can think of spelling correction as a channel-encoding problem. Consider the nonword errors discussed in Section 2.2.1. Suppose that you get an email inviting you to a party at “Dsvid’s house”. Think of the original clean signal as being what the user meant to type and the degraded signal as what the user’s fingers actually did type. It is easy to tell that the clean signal must have been “David’s house”, and that the error is a key-for-key substitution of s for a. Here, the channel model says that substitutions of adjacent keys are common, and the message model says that “David” is a more plausible name for the mutual friend than “Dsvid”. (Can you see what must have happened if the degraded signal was instead “Dsbif’d houfr”? )
    We can turn the noisy channel model into math by writing
image
where x is the observed output of the channel (that is, the degraded message) and y is the hypothesized clean message. The notation arg ymax P(y)P(x|y) is a mathematician’s way of saying “search for the y that gives the highest value for the probability expression P(y)P(x|y)”. image is therefore the system’s best guess at what the original message must have been. The message model is expressed as P(y), which is the probability that the writer will have intended a particular word y. This is what tells us “Dsvid” is unlikely and “Dscis” very unlikely. P(x|y) is the channel model, and tells us that the s for a substitution is likely. Obviously, all bets are off if you do have friends called “Dsvid” and “Dscis”, or if the writer was using a nonstandard keyboard.
    At the end of the chapter, in the exercises, there are a couple of rather more exotic examples of the noisy channel model: part-of-speech tagging and cryptography. If you want to stretch your mind, do those exercises! After that, there is a risk that you will start seeing the noisy channel model everywhere.
    You may already be able to see why this discussion belongs in a chapter about translation. In 1949, Warren Weaver put it this way:
It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the “Chinese code”. If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation?
Modern machine translation technology builds on this appealing idea. Indeed, the noisy channel idea works perfectly. If we have a message in Chinese c that we would prefer to read in English e, we can factor the task into three parts:
1. Estimating a translation model P(c|e)
2. Estimating a language model P(e)
3. Maximizing the product P(e)P(c|e) and returning the resulting English. This process is usually called decoding by analogy with the cryptography example above.
This decomposition gives rise to a pleasing division of labor. Imagine that we are trying to translate the Latin phrase “summa cum laude” into English, and that the system is considering three possible candidates. (This example is adapted from a lecture slide by Jason Eisner.)
(39) topmost with praise
(40) cheese and crackers
(41) with highest distinction
1. The language model is responsible for making sure that what is proposed as a translation actually looks like a reasonable piece of English. On this metric (40) and (41) should get fairly high probabilities, but (39) should get a low score, because nobody ever says anything like “topmost with praise”.
2. The translation model is responsible for making sure that the intended content gets across. (39) scores especially well on this, because “summa” goes well with “topmost”, “cum” with “with”, and “laude” with “praise”. (41) does pretty well too, because “summa” goes well with “highest”, “cum” and “with” still go well with each other, and “distinction” and “laude” are not horrible. The word matches with (40) are pretty awful, except perhaps for “cum” with “and”, so its score is low.
3. The decoder multiplies together the two components and (we hope) ­finishes up giving the highest overall score to “with highest distinction”. One of the two bad alternatives is vetoed because the language model hates it (as an unlikely phrase in English), the other because the translation model hates it (as an improbable equivalence between foodstuffs and academic honors).
It is not easy to build a good decoder, language model, or translation model. Each of these tasks has resulted in decades of research, but the basic idea of factoring the problem into these three parts has stood the test of time. The point of this section is to persuade the reader that the noisy channel model is a useful abstraction and can be used effectively in many situations, including the design of machine translation systems. In the translation setting, one major advantage is that the language model needs only monolingual text, and can be built ­separately using huge volumes of text from newspapers or from the web. The translation model still needs parallel text (as discussed in Section 7.7), which is much more difficult to find, but because the models can be built separately it is possible to make progress even if you do not have that much parallel text.


Under the Hood 12
Phrase-based statistical translation
We have been discussing translation models based on translating a single word into a single other word. But notice the word-pair statistics in Table 7.4.

Table 7.4 Word pairs in French and English

image

    Modern statistical machine translation systems are phrase-based. What this means is that instead of aligning individual words, the system aligns longer phrases as well. What is really going on here is that the phrase “l’Église orthodoxe” is lining up with the English phrase “The Orthodox Church”. Intuitively, translation is based on phrases rather than individual words. All we mean by phrase here is “sequence of words that might be longer than just one”; these phrases often turn out to have nothing to do with the verb phrases and noun phrases that we talked about in Chapter 2. The idea of phrase-based statistical translation is to build statistical models that reflect the intuition that phrases should be involved. We make the units of translation phrases instead of words, and build a table, called the phrase table, that takes over the role that was played by the word-pair tables earlier on. Everything that could have been part of the word-pair table can still be a part of the phrase table, but we can now also have pairings between collections of words.
    We hope that in a system trained on the corpus we have been working with, the phrase table will contain the pair:
image
with a moderately high score. The phrase table will also contain numerous pairs that are matched 1-1, and these pairs correspond to entries that were in the word-pair tables earlier. What is important is that when the data has enough evidence to be confident about a translation relationship that is many-to-one or many-to-many, the phrase table can include the relevant entry. This approach works well; current research focuses on methods for finding good ways of choosing which phrases should be placed in the phrase table and how the scores should be assigned (see Further reading).
    Returning to the discussion of idioms earlier in the chapter, you can probably see why a phrase-based system might be able to successfully translate the French idiomatic expression for dying, which is “casser sa pipe”, with the English “kick the bucket”. The system does not really have to understand what is going on with these phrases, it just has to notice that they occur together, put them in the phrase table, and use them in the translation. If the aligned sentences happened to make heavier use of some other common English idiomatic expression for dying, then “casser sa pipe” could equally well be translated as “meet one’s maker”.
    It is not obvious exactly where phrase-based translation belongs in relation to the translation triangle. There is certainly no interlingua in sight, and there is little explicit linguistic analysis going on, so it is tempting to call it a direct approach. However, the kinds of reasoning and generalization that it is doing about the properties of chunks are very similar to the kinds of reasoning and generalization that traditional transfer systems do about verb phrases, noun phrases, and sentences, so one can definitely make the case that these systems have something in common with transfer systems. The big difference is that the statistical systems learn directly from data, whereas more typical transfer systems rely on linguists and language technologists to create the necessary transfer rules. The statistical systems often work well, but because they represent their knowledge in the form of complex mathematical models, it can be very difficult for human beings to understand them or to correct the mistakes they make. Carefully built transfer systems are easier to understand and test, but require constant care and maintenance by experts.

7.9 Commercial automatic translation

In this section, we discuss the practicalities of using (or not using) automatic translation in a real-world commercial setting. This discussion is not concerned with difficult literary translations, or with the complexities of the technology, but with what the technology lets you do.

7.9.1 Translating weather reports

A system built for Environment Canada was used from 1981 to 2001 to translate government weather reports into French, as Canadian law requires. A typical report is shown in Figure 7.16.

This is a very structured text, as shown in Figure 7.17, but it is not really written in conventional English at all. The main part of each “sentence” is a weather condition such as “CLOUDY WITH A FEW SHOWERS”, “CLEARING”, “SNOW”, or “WINDS”. These do not correspond to any standard linguistic categories: the main part can be an adjective, a verb in the -ing form, a noun, and probably other things.

To translate into French, the METEO system relies on a detailed understanding of the conventions about what weather reports say and how they say them, and on a special grammar, largely separate from traditional grammars of English, which explains which words are used to mean what and in what combinations. This is called a sublanguage. The translation of the dateline is completely straightforward, since it simply consists of the date plugged into a standard template. Similarly, place names are easy to translate: they are the same in the English and French version. The body text does change, but the sentences are highly telegraphic, and mostly very similar from day to day.

Figure 7.16 A typical Canadian weather report

image

Figure 7.17 The parts of Canadian weather report

image

Finally, METEO benefits from two facts about the way it is used. First, it is ­replacing a task that is so crushingly boring for junior translators that they were not able, even with the best will in the world, to do it consistently well. Secondly, by law, METEO’s output had to be checked by a human before it was sent out, so occasional mistakes were acceptable provided that they were obvious to the human checker. METEO was used until 2001, when a controversial government contracting process caused it to be replaced by a competitor’s system. By that time, it had translated over a million words of weather reports.

7.9.2 Translation in the European Union

The European Union (EU) is a group of European countries that cooperate on trade policy and many other aspects of government. It was founded by 6 countries in 1957, has expanded to include 25 countries, and now has 23 official languages. There is a rule that official documents may be sent to the EU in any of the 23 languages and that the reply will come back in the same language. Regulations need to be translated into all the official languages. In practice, European institutions tend to use English, French, and German for internal communications, but anything that comes in front of the public has to be fully multilingual.

One way of meeting this need is to hire 23 × 22 = 506 teams of good translators, one for each pair of source language and target language. You might think that 253 would suffice, because someone who speaks two languages well can translate in either direction. This idea has been tested and found to be wrong, however, because a translator who is good at translating English into French will usually be a native speaker of French, and should not be used to translate from French into English. It is harder to write well in a foreign language than to understand it. Translators are expensive, and government documents long and boring, so the cost of maintaining the language infrastructure of the EU is conservatively estimated at hundreds of millions of euros. In practice, it makes little sense to employ highly trained teams to cover every one of the possibilities. While there will often be a need to translate from English into German, it is unusual to have a specific need to translate a document originally drafted in Maltese into Lithuanian. Nevertheless, the EU needs a plan to cover this case as well. In practice, the solution it adopts is to use English, French, and German as bridge languages, and to translate between uncommon language pairs by first translating into and then out of one of the bridge languages. With careful quality control by the senior translators, this achieves adequate results.

Because of the extent of its translation needs, the EU has been a major sponsor of research and development in translation technology. This includes fully automatic machine translation, but also covers machine-aided human translation. The idea is that it may be more efficient to provide technology that allows a human translator to do a better and faster job than to strive for systems that can do the job without human help. Despite substantial progress in machine translation, nobody is ready to hand over the job of translating crucial legislative documents to a machine, so, at least for the EU, there will probably always be a human in the loop. Since the EU was early in the game, it makes heavy use of highly-tuned transfer systems, with transfer rules written and tested against real legislative documents. Over time, it may begin to make more use of statistical translation systems, especially if future research is able to find good methods for tuning these systems to do well on the very particular workloads of the EU. For the moment, the EU, with its cadre of good translators, is likely to stick with transfer methods. By contrast, Google, with statisticians aplenty, computational power to burn, and access to virtually limitless training data, seems likely to continue pushing the frontiers of ­statistical methods. For the casual consumer of machine translation, this diversity of approaches can only be good.

7.9.3 Prospects for translators

Translators entering the professional market will need to become experts on how to use translation memories, electronic dictionaries, and other computer-based tools. They may rely on automatic translation to provide first drafts of their work but will probably need to create the final drafts themselves thus the need for expert human judgment will certainly not disappear. In part, this is because users of commercial translation need documents on which they can rely, so they want a responsible human being to vouch for the accuracy and appropriateness of the final result. Specialist commercial translators will certainly continue to be in demand for the foreseeable future; there is no need to worry about the risk of being replaced by a computer. As in most other workplaces, you have to be comfortable using technology, and you should expect to have to keep learning new things as the technology changes and offers you new ways of doing your job more efficiently.

The range of languages for which free web-based translation is available will continue to grow, in large part because the statistical techniques that are used are very general and do not require programmers to have detailed knowledge of the languages with which they are working. In principle, given a parallel corpus, a reasonable system can be created very rapidly. A possible example of this was seen in June 2009, when Google responded to a political crisis in Iran by rolling out a data-driven translation system for Persian. It appears that this system was created in response to an immediate need due to the fact that the quality of the English translations from Persian did not match up to the results from Google’s system for Arabic, a system which used essentially the same technology but had been tuned and tested over a much longer period.

Improvements in the quality of web-based translation are also likely. We think this will happen for three reasons: first, the amount of training data available will continue to increase as the web grows; secondly, the translation providers will devote effort to tuning and tweaking the systems to exploit opportunities offered by the different languages (notice that for this aspect of the work, having programmers with detailed knowledge of the source and target languages really would be useful after all); and thirdly, it seems likely that advances in statistical translation technology will feed from the research world into commercial systems. There will be a continuing need for linguists and engineers who understand how to incorporate linguistic insights into efficient statistical models.

Literary and historical translators could be unaffected by all of this, although some of them will probably benefit from tools for digital scholarship such as those offered by the Perseus Project. These allow scholars to compare different versions of a text, follow cross-references, seek out previous works that might have inspired the current work, and so on. For example, a scholar working on a Roman cookbook might want to check up on all uses of the word “callosiores” in cooking-related Latin, to see how often it seems to mean “harder” and how often it means “al dente.”

Many potential users want to use automatic translation as a tool for gathering a broader range of information than they otherwise could. Marketers who already do the kind of opinion mining mentioned in Chapter 5 will also want to collect opinions from their non-English-speaking customers. This is much the same as what we did to decode the German cellphone review.

Checklist

After reading the chapter, you should be able to:

Exercises

1. MATH: The three procedural languages of the EU are English, French, and German. There are 20 other official languages, making a total of 23. You want to make sure that it is possible to translate from any of these languages to any other. As we discussed, having a separate translation team for each language results in a need for 506 teams. How many expert translation teams do you need if you adopt a policy that all documents will be first be translated from the source language into English, French, or German, then translated back out into the target language? How many would you need if you managed to agree that there should be just one procedural language (for example German) and that all translations would either begin, end, or pass through German?
2. ALL: Find a native speaker of a language other than yours (and other than English) and sit down with them with a short passage of text in their native language. Discuss what problems there are in translating from their language into English (or into your own native language). Which kinds of sentences/constructions are fairly easy to translate? Which ones border on impossible?
3. ALL: Go to http://translate.google.com/. This online automatic translation system provides free translations between languages such as French, English, Spanish, Arabic, Chinese, and Russian, among others.
(a) Try translating three sentences from English into another language and then back to English. To do this, first translate from English to one of the other languages. Then, copy and paste the response and translate back to English. Try using different languages for each example. Write down your sentences, the languages you translate to/from, and what the back-translations are.
(b) Rate the intelligibility of the translations. Is the word order correct? Does the system choose words that make sense? How close is the back-translation to the original input sentence? Can you figure out the meaning or is it gibberish?
(c) Do some languages offer better back-translations than others? For example, does an English–Spanish–English translation produce better results than English–Russian–English? Why or why not?
4. ALL: Take the following German and English sentence pairs and do the following (you might find a German–English dictionary helpful, although it is not strictly necessary):
(42) a. Das Kind ist mir ans Herz gewachsen.
            I have grown fond of the child.
   b. Sie ist bloß ein Kind.
       She is but a child.
   c. Sie nahm das Kind mit.
       She took the baby with her.
(a) Describe how the bag-of-words method will derive a translation/­alignment for “Kind” and for “but”. Address why we have to use several iterations when calculating alignments between words.
(b) How will phrase-based translation improve on the translation models here?
5. MATH: The math for the noisy channel can also function as a way of working out the parts of speech mentioned in Sections 2.4.1 and 3.4.2. We have to imagine that speakers were once really cooperative, and that instead of speaking normally and saying things like:
(43) He checked out a book from the library.
they actually used to say:
(44) He/pronoun checked/verb out/adverb a/article book/noun from/preposition the/article library/noun ./punctuation
helpfully spelling out all the parts of speech for us. Unfortunately for us, people no longer speak like this, so, if we want the parts of speech, we must guess them. We can think of example (44) as y (the clean message) and example (43) as x (the degraded form of the message). In other words, we are imagining that people still really speak in a form that spells out the parts of speech, but that everything they say is filtered through a horrible channel that deletes the parts of speech and retains only the words. Of course, this is not actually what is going on, but we can still go through the steps of searching for the part-of-speech sequence that is most likely to go with the words we saw.
Finish this story by designing a part-of-speech tagging model that:
(a) Uses probabilities of the form p(tag2|tag1) to model the chance that, for example, a noun will follow a verb.
(b) Builds up the probabilities of longer series of tags by chaining together the individual probabilities p(tag2|tag1).
(c) Uses probabilities of the form p(word|tag) to model the chance that, for example, a randomly chosen noun will turn out to be the word “dog” (or zebra or axolotl).
Test whether you fully understand your model by seeing whether you can explain to yourself how it would give different probabilities to two different interpretations of the sentence “He saw her duck.”
   If you can spell out the details of this model on your own, without further clues, you will have reproduced one of the better achievements of computational linguistics. If you need further hints, go to the Further reading, especially Jurafsky and Martin (2009), which covers this approach to part-of-speech tagging.
6. ALL: You can apply the noisy channel model to cryptography. Imagine that you receive a coded message mentioning an impending invasion of Britain by “MXOLXV FDHVDU”. As an expert in cryptography, you know to shift the letter three letters back in the alphabet and identify the culprit: “JULIUS CAESAR”. Here y is the original Latin name, x is the encoded message, the channel model says “shift three letters forward”, and the message model is about “who is a likely invasion threat”. The channel is specifically designed so that those who are in the know can undo its corrupting effect.
    You now receive a message using the rail fence cipher, which involves ­laying out the message in the form of a rail fence, then reading it off row by row. Here is an example of how the message TRANSPOSITION CIPHERS ARE FUN would be laid out in a four-rail cipher:

T . . . . . 0 . . . . . N . . . . . R . . . . . U .
. R . . . P . S . . . O . C . . . E . S . . . F . N
. . A . S . . . I . I . . . I . H . . . A . E . . .
. . . N . . . . . T . . . . . P . . . . . R . . . .

In cryptography we leave out the spaces between the words, and group the ­encoded message into fives, so this message would be encoded as TONRU ­RPSOC ESFNA SIIIH AENTP R. In order to read the message, the recipient has to know that when the message was sent, the writer used four rails. Knowing that, it is possible to recreate the layout and read off the message, once again in the conventional cryptographic groups of five, as TRANS POSIT IONCI PHERS AREFU N. All that remains is to regroup the letters into words, and the reader has decoded the message. If the reader uses three rails instead of four, this happens:

T . . . O . . . N . . . R . . . U . . . R . . . P.
. S . 0 . C . E . S . F . N . A . S . I . I . I . H
. . A . . . E . . . N . . . T . . . P . . . R . . .

and the “decoded” message is TSAOO CEENS NFRNT AUSPI RIRIP H. The fact that this is unrecognizable gibberish proves that a mistake has been made somewhere.
The process for encoding is:
Consider the rail fence message TAEIS HRIFN ESAYE LCE.
7. ALL: When translating from English into the Native American language Mam (in Guatemala), a translator reported the following terms used among siblings (in phonetic transcription here):
Both words are used for males and females.
(a) In terms of hyponymy/hypernymy, describe the relationship between the English word “sibling” and these words.
(b) Draw a Venn diagram showing how the English words “brother” and “sister” overlap with the Mam words “ntz?ica” and “witzin”.
(c) You come across the text “Maxwell is the brother of Santiago”, but it gives no indication who is older. If you had to translate this into Mam and were being forced to preserve this age ambiguity, how would you do it?

Further reading

There is a large and growing literature on translation and machine translation. We provide a few pointers into this literature.

   Chapter 25 of Jurafsky and Martin’s textbook (Jurafsky and Martin, 2009) covers modern machine translation in depth. Phillipp Koehn (Koehn, 2008) has written a comprehensive technical introduction to statistical machine translation.

   An older but more general introduction to machine translation is provided by John Hutchins and Harry Somers (Hutchins and Somers, 1992). Language Files (Mihaliček and Wilson, 2011) has a full chapter on machine translation.

   Doug Arnold and his colleagues have made a good introductory textbook available for free on the internet: http://purl.org/lang-and-comp/mtbook.

   The cautionary tale about “intoxicado” is from Harsham (1984).

   There have been a few attempts to make machine translation systems more capable of translating poetry. In Genzel, Uszkoreit, and Och (2010), the authors describe a system that aims to produce translations with appropriate meter and rhyme. This is an impressive technical achievement, but does not address the bigger problem of how to produce translations that have the beauty and precision of good human-created poetic translations.

   The Perseus project (Crane, 2009) presents beautiful web versions of literary texts, ­including 13 million words of Latin and 10 million words of Ancient Greek. These include commentary, translations, and all kinds of support for multilingual scholarship.