Associate Teaching Professor of Linguistics at UC San Diego
Director of UCSD's Computational Social Science Program
Of Google Translate and Rugged Anatomy
This was originally posted on my blog, Notes from a Linguistic Mystic in 2013. See all posts
I happened upon a thread over at Reddit, featuring the below picture of a popular knife manufacturer’s website, put through Google Translate (click to expand):
Here’s the text:
A sharp blade with a distinct tip, an integrated ignition steel and a diamond sharpener makes Bushcraft Survival the ultimate knife to force Bush enthusiasts. On the rugged vagina is a well lit place for the steel and with the diamond it becomes easy to sharpen the blade. It’s easy to swap two supplied bältesclipsen that lets you choose how you want to carry your knife.
This is, of course, absolutely wonderful. As several in the thread pointed out, in Swedish, French, Danish, German (and likely others), the word “sheath” (through some pretty straightforward analogy) is also is used to refer to a woman’s vagina, and indeed, the original site is in Swedish.
It appears that when the original poster used Google Translate, it saw “slidan” and chose “vagina” instead of “sheath”, resulting in comedy gold. You can see it also stumbled on bältesclipsen (‘belt clips’) later in the note, refusing to translate the term at all. Of course, if you visit the manufacturer’s official English version of that page, the “rugged vagina” becomes a “robust sheath), and oddly enough, the belt clips disappear entirely.
An illustrative example
However amusing, this is actually a wonderful example of one of machine translation’s key shortcomings: computers have no understanding of the real world.
Any human who was trying to translate that passage (and who was aware of both meanings of ‘slidan’) would likely use ‘sheath’ without a second thought. It’s an article about a knife, they’re referring to the sheath of a knife, and there are no women mentioned anywhere in the article, so the choice is clear.
To a computer, it’s all just words. The machine was processing along, then came to a point where a word could mean either ‘sheath’ or ‘vagina’. It has no understanding of knives, sheathes, vaginas, or tabooed subjects. It had likely been programmed to choose the more frequently used of the two words, and vagina (84,900,000 results) shows up almost three times as often in Google’s results than sheath (24,800,000 results). So, unaware of the meaning, the taboo, or the humor, the translation was made.
Machine translation is hard, and although we laugh at these occasional funny results1, we should be amazed at how good it already is. More importantly, though, it’s crucial that we understand the shortcomings of these programs, because your company’s website is only a word-frequency-based decision away from selling rugged vaginas.