Fucking Stationary and Flagrant Mistranslations
Feb 12th, 2008 by admin
There’s been a flurry of posts lately about the ever-mistranslated 干. Victor Mair wrote a short essay at the Language Log asking why it is so frequently translated as “fuck” in inappropriate situations, while John Pasden and Brendan O’Kane have blogged about related cases recently as well.
I think there’s something really interesting about the fact that serious people are actually debating whether this is machine translation. At the very least, this is a major sign of how much public expectations about the quality of MT have changed for the positive over the past five years. But while machine translation has come a long way, there’s no mistaking the 干 mistranslations for human-error. There are a few things that tip us off from the start:
- underlying segmentation problems: when 干锅 gets translated into something like “fuck the pot” we have a prime example of semantic missegmentation. This is the kind of mistake Chinese speakers don’t generally make (it’s foreigners who butcher segmentation), perferring inappropriate word choice and poor grammatical integration instead.
- strange part-of-speech transfers: whenever a noun in Chinese translates into a verb+object compound in English the beast of machine translation is rarely far off. Machines using excessively rigid grammar parsers stumble into this problem quite naturally. It’s contextual knowledge that helps us differentiate between the grammar structures separating 我喜欢干果 and 我喜欢吃饭, which is exactly the sort of thing it is difficult to teach a machine (although we’re trying).
- the egracious nature of the mistake: whenever English speakers joke about 干 translating as fuck, I think the humour is at least a little bit racist for implying that the translator was Chinese. The joke wouldn’t be funny if it wasn’t such as stupid mistake to make, right? And no English speaker would EVER make that mistake, right??? What’s forgotten is that it’s equally ludicrous for a translator to make this mistake going the other way. People simply don’t translate specific and understood phrases in their own tongue into vague expletives in foreign languages.
At the best we could probably make a case for non-native English speakers using translation tools as crutches and simply being grossly incompetent. This seems quite reasonable, especially since I think the boundaries between machine and human translation are already quite blurry everywhere but (possibly) literary translation.
To get down to technical details, if anyone is curious how Adso gets around the problem…. we simply treat 干什么 as a semantic compound and default the character to translate as “to do” in situations where the grammar parser thinks it’s a noun (”to fuck” is an uncommon subtranslation that is still in the dictionary and may be preferred in some cases). In most cases this replaces a truly awkward translation with a slightly awkward one. The real problems translating the character involve when it is used to refer to work. SMT, semantic analysis and phrase-level translation approaches are the best way to resolve these sorts of problems, and that is my guess as to what Kingsoft has done between 2002 and 2005 judging from Joel and Victor’s screenshots.
The new Adso is plenty nice, but I’ve caught it making weird parsing mistakes it didn’t used to. How much overlap is there between the new rules and the old ones? Also, the new popups are indisputably Teh Hawt, but what’s with the pin1yin1?
Most of the basic code is similar, although proper noun recognition is not as good as in the previous version yet and the software does some new things like try to build up phrases in ways that can lead to mistakes that weren’t there before. Most of the time this is a sign that something is mistagged in the database.
Major improvement with 将 this weekend, and am spending time on ongoing basis improving problems as I stumble into them. This week was dates, numbers and the unification of some geographic place names. Prepositional phrases are a problem area now, but that will change soon. If you run into obvious problems that you know the old version got right email me the text and I’ll do what I can.
Cool — will keep my eyes open. One sweet little irony: In the text “我喜欢干果” above, your Adso popup is giving the, forgive me, wrong fucking Pinyin — “gàn guǒ” rather than “gānguǒ.” (Another pain-in-the-ass nitpick:spaces between the syllables? There are rules, you know!)
Actually had to add 干果 when writing the post too - really surprised it wasn’t there before. Turned out the editing-api wasn’t updating the pinyin field. Is fixed, along with the pinyin.
Is it evil and wrong of me that I just want to know how to make Adso annotate 干 as “to fuck”? I tried “我想干范冰冰” (merely an invented example, I assure you), but only got “to want to do” as an annotation. Come on David, give us a clue!
@Todd,
Hypothetically, the simplest solution is adding a rule to the definition “to fuck” associated with 干 (or 想干) using the advanced editing page. This adds weight to the gloss whenever it’s found before a person’s name.
<NEXT me 1>Person</NEXT>
Non-hypothetically, this is a good suggestion so I’ve just done it. Opens up problems with sentences like 我想干范冰冰干的那个活, but who can say what that means for sure anyway.
I think “I want to do Fan Bingbing” is a pretty good translation. Also, “I want to do what Fan Bingbing does” is correct as well, no?
I was amused that the translation of 干果 you used is “nuts”. Seems apt.
Micah: yeah, it was quite a good *translation*, just not an accurate *gloss*.
dried fruit guys, you win.