The main processing classes in the library are:
- tokenizer: Receives plain text and returns a list of word objects.
- splitter: Receives a list of word objects and
returns a list of sentence objects.
- maco: Receives a list of sentence objects and
morphologically annotates each word object in the given
sentences. Includes specific submodules (e.g, detectiion of date,
number, multiwords, etc.) which can be activated at will.
- tagger: Receives a list of sentence objects and
disambiguates the PoS of each word object in the given
sentences.
- parser: Receives a list of sentence objects and
associates to each of them a parse_tree object.
- dependency: Receives a list of parsed sentence
objects associates to each of them a dep_tree object.
You may create as many instances of each as you need.
Constructors for each of them receive the appropriate options
(e.g. the name of a dictionary, hmm, or grammar file), so you can
create each instance with the required capabilities (for instance,
a tagger for English and another for Spanish).
2008-01-24