Language ID (textual) - part 1
Word cloud of one person's compilation of English stop words, courtesy of Armand Brahaj (whose site has been infected by malware). Here instead is ranks.nl's list Now that half a year has lapsed since the inception of this blog, some readers may be wondering when I might share more topics that are related to the "Linguistics" part of "SEO, Linguistics, Localization". In fact, one of the triggers of my instigating this blog arose from the issuance of two patents, which had been filed in 2005 and 2006 for which I was a co-inventor and sole inventor , respectively. Both filings concerned language identification (from textual input): the first approached the challenges of identifying a text's (primary) language, and the second was an application of the first, as combined with messaging software. Rather than overwhelm the reader with extensive explanations, I'm going to attempt to create a series of posts that will cover everything in the way