Posts

Showing posts with the label language

LanguageWare's robust, extensible Language ID (part 4)

Having introduced the "prior art" approaches of identifying textual language in part 1 and part 3 (to wit: stop word presence and n-gram detection), I can now speak to the patented idea which we implemented as part of  LanguageWare , which is a set of Java libraries that offer NLP functionality. Simply put, our solution involves a  dictionary that is highly compactible (I may ask a guest blogger from my former team to delve into this aspect), and thus made it possible to store the following types of information: Each entry consists of the following: Term or n-gram Language(s) with which it's associated Whether it can occur as a standalone term, at the beginning of a word, the middle of a word, or the end of a word, or some combination of these, and An integer weighting value (per term/language pairing) Thus, for the Chinese Simplified/Traditional and Japanese disambiguation problem, the Japanese-specific kana (listed as unigrams) were given large positive va

Part 1 of Cross-cultural communication conundrums ("It's a slam dunk!")

A US-based colleague of mine whom I'd had the recent pleasure to meet, spoke to my immediate team about working effectively across various cultures. The basketball expression "it's a slam dunk!" was, in his experience, as bewildering to his overseas colleagues as what we might consider to be less esoteric expressions, like "no kidding". This prompted me to consider what lessons I'd learned in my position as a thrice ex-patriated person with a working knowledge of Japanese and French (and now challenged with Austrian German). In Ireland, it took me a surprisingly long time to truly understand Hiberno-English. Never mind the phonetic shifts (aka "accent"); even transcribed, common interjections and vocabulary can quickly mystify the non-initiated. Take this example sentence which artificially conflates a lot of native (Dublin centric) expressions: "Oh be the hokie: my laptop was banjaxed. I felt so knackered after trying