Posts

Showing posts with the label parts_of_speech

Thoughts on Google's Knowledge Graph

Image
Disambiguating "Taj Mahal" - structure or music band? Courtesy of Google's own blog Otherwise known as semantic web, Google has announced its roll-out of ways to prompt the user to help disambiguate query terms ("strings", as in sequences of textual characters) to more specific concepts ("things"). Very catchy slogan. The Mashable article provides a basic overview of what this news means, and as I read this, my thoughts invariably turned to my former job in LanguageWare (which has been partially described over four non-contiguous blog posts last year, related to Language Identification ). When one is first exposed to linguistic data which has been amassed for the purpose of spell-check, it becomes quickly clear that in order to use this same word lists effect grammatical checks and even orthographical ones (e.g. whether a proper noun needs to be title-cased even when it doesn't commence a sentence), the part of speech is important. The afor

Language ID Part 3 - more challenges

Stop word detection usually works...  In my prior post about this subject, hopefully the Jabberwocky poem examples demonstrated that when certain types of words occur in text that can be identified as belonging to a language's pronoun/conjunction/ adposition parts of speech, a language label can still be assigned to text. Such identifiers are, in this context, considered to be stop words. The presence of such terms was sufficient for us to recognize the language even when nouns, verbs, adjectives and adverbs are unidentifiable (nonexistent in our vocabulary). However it's useful to note that in those examples, there were some inflections that hinted at the nonsensical words having specific qualities. Specifically in the case of spotting nouns, these were enabled when in the inflected languages, pluralization or possession were shown via -s/'s endings (English, though -s can indicate possession in German also) or combination of title case capitalization and (when plu

Language ID part 2: Callooh! Callay!

 As mentioned in part 1 , thinking about identifying language leads one to the fundamental question: "what defines a language?" The example that our team used was from the children's book by Lewis Carroll - a  poem that the protagonist reads, and although not comprehending it, thinks it "pretty" and that it was "clear" that "somebody killed something". Here it is, courtesy of the Wikipedia (English) article: " Jabberwocky " 'Twas brillig, and the slithy toves Did gyre and gimble in the wabe; All mimsy were the borogoves, And the mome raths outgrabe. "Beware the Jabberwock, my son! The jaws that bite, the claws that catch! Beware the Jubjub bird, and shun The frumious Bandersnatch!" He took his vorpal sword in hand: Long time the manxome foe he sought-- So rested he by the Tumtum tree, And stood awhile in thought. And as in uffish thought he stood, The Jabberwock, with eyes of flame, Came whiffling