Google+

Quora

Follow Mayo Takeuchi on Quora

Tuesday, November 29, 2011

Language ID part 2: Callooh! Callay!

 As mentioned in part 1, thinking about identifying language leads one to the fundamental question: "what defines a language?" The example that our team used was from the children's book by Lewis Carroll - a  poem that the protagonist reads, and although not comprehending it, thinks it "pretty" and that it was "clear" that "somebody killed something". Here it is, courtesy of the Wikipedia (English) article:
"Jabberwocky"

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.

"Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!"

He took his vorpal sword in hand:
Long time the manxome foe he sought--
So rested he by the Tumtum tree,
And stood awhile in thought.

And as in uffish thought he stood,
The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,
And burbled as it came!

One, two! One, two! and through and through
The vorpal blade went snicker-snack!
He left it dead, and with its head
He went galumphing back.

"And hast thou slain the Jabberwock?
Come to my arms, my beamish boy!
O frabjous day! Callooh! Callay!"
He chortled in his joy.

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.

from Through the Looking-Glass, and What Alice Found There (1872).
Since this novel has enjoyed worldwide popularity, unsurprisingly translations exist in dozens of languages. Accordingly, translators have been thus challenged with interpreting the "nonce" words from Carroll, in ways they deemed fit. Here's a site that lists various translations (including Klingon!)
I've chosen one of three listed German ones to quote here, the version found also in Gödel, Escher, Bach.

Der Jammerwoch

Es brillig war. Die schlichte Toven
Wirrten und wimmelten in Waben;
Und aller-mümsige Burggoven
Die mohmen Räth' ausgraben.
»Bewahre doch vor Jammerwoch!
Die Zähne knirschen, Krallen kratzen!
Bewahr' vor Jubjub-Vogel, vor
Frumiösen Banderschntzchen!«
Er griff sein vorpals Schwertchen zu,
Er suchte lang das manchsan' Ding;
Dann, stehend unterm Tumtum Baum,
Er an-zu-denken-fing.
Als stand er tief in Andacht auf,
Des Jammerwochen's Augen-feuer
Durch tulgen Wald mit Wiffek kam
Ein burbelnd Ungeheuer!
Eins, Zwei! Eins, Zwei! Und durch und durch
Sein vorpals Schwert zerschnifer-schnück,
Da blieb es todt! Er, Kopf in Hand,
Geläumfig zog zurück.
»Und schlugst Du ja den Jammerwoch?
Umarme mich, mien Böhm'sches Kind!
O Freuden-Tag! O Halloo-Schlag!«
Er schortelt froh-gesinnt.
Es brillig war. Die schlichte Toven
Wirrten und wimmelten in Waben;
Und aller-mümsige Burggoven
Die mohmen Räth' ausgraben.


Original source:
Scott, Robert. "The Jabberwock Traced to Its True Source", MacMillan's Magazine, Feb 1872.

Compared with the other available German versions (by Lieselotte & Martin Remané and Christian Enzensberger, respectively, the two that are cited in Wikipedia), this seemed somehow closest in "feel" to the original (but with more consistent rhyming), but perhaps my German speaking readership may share their impressions of this, compared with the others?

Meanwhile, I was disappointed that the aforementioned translations list site (although apparently authoritative enough to be listed as a reference in Wikipedia) didn't list the Japanese version. Here I'm taking the one presented in the Wikipedia page:
ジャバウォックの詩 (literally, Jabawock's poem)


夕火あぶりの刻、粘滑ねばらかなるトーヴ
遥場はるばにありて回儀まわりふるま錐穿きりうがつ。
総て弱ぼらしきはボロゴーヴ、
かくて郷遠さととおしラースのうずめき叫ばん。

『我が息子よ、ジャバウォックに用心あれ!
 喰らいつくあぎと、引き掴む鈎爪!
 ジャブジャブ鳥にも心配るべし、そしてゆめ
 燻り狂えるバンダースナッチの傍に寄るべからず!』

ヴォーパルのつるぎぞ手に取りて
尾揃おそろしき物探すこと永きに渉れり
憩う傍らにあるはタムタムの樹、 
物想いに耽りて足を休めぬ。 

かくてぼうなる想いに立ち止まりしその折、 
両のまなこ炯々けいけいと燃やしたるジャバウォック、 
そよそよとタルジイの森移ろい抜けて、 
めきずりつつもそこに迫り来たらん! 

一、二! 一、二! 貫きて尚も貫く 
ヴォーパルのつるぎが刻み刈り獲らん! 
ジャバウォックからは命を、勇士へは首を。 
彼は意気踏々いきとうとうたる凱旋のギャロップを踏む。 

『さてもジャバウォックの討ち倒されしはまことなりや? 
 我がかいなに来たれ、赤射せきしゃ男子おのこよ! 
 おお芳晴かんばらしき日よ! 花柳かな! 華麗かな!』 
父は喜びにクスクスと鼻を鳴らせり。 

夕火あぶりの刻、粘滑ねばらかなるトーヴ
遥場はるばにありて回儀まわりふるま錐穿きりうがつ。
すべて弱ぼらしきはボロゴーヴ、
かくて郷遠さととおしラースのうずめき叫ばん。
Whoever created this Japanese rendition had clearly studied Carroll's own annotations (e.g. brillig becomes literally "the hour of twilight-fire"). I'm not convinced that I agree with some of the straight phonetic transliterations, but like the more archaic inflections in use.

These examples hopefully reveal to the reader what was meant by "stop words" in the context of identifying language. Indeed, with this amount of text, the traditional approach works even though so many words are classifiable either as nonce words or portmanteaux. But as mentioned before, newspaper headlines omit these very same identifiable words. And as we'll see in the next part, some languages share so much history (yet are politically segregated) that even these stop words overlap sufficiently to cause false positives and consternation.

Wishing my readers a frabjous day!

Here's the prior post about Language ID.
Part 3 of Language ID is here, or you can skip to the final post.

About Mayo

My photo

Professional: As "Senior Enterprise SEO Strategist" in IBM's Digital Marketing division, I provide consulting and training services for both internal and external clients. Formerly I was involved in Natural Language Processing, software localization, quality assurance and documentation authoring.
Personal: INTJ Nikkei Nisei ex-patriated Canadian who takes photographs and enjoys Baroque through late Classical music. The G+ page shares some of the "best of" photos.