Supported Languages

The distributed version includes morphological dictionaries for covered languages (English, Spanish, Catalan, Galician, and Italian):

Smaller dictionaries (Catalan and Galician) are expected to cover over 80% of open-category tokens in a text. Larger dictionaries are expected to cover between 90-95% of open-category tokens in a text. For words not found in the dictionary, all open categories are assumed, with a probability distribution based on word suffixes, which includes the right tag for 99% of the words, and allow the tagger to make the most suitable choice based on tag sequence probability.

This version also includes WordNet-based sense dictionaries for covered languages, as well as some knowledge extracted from WordNet, such as semantic file codes, or hypernymy relationships.

2008-01-24