The Named Entity Classification module requires three configuration files, with the same path and name, with suffixes .rgf, .lex, and .abm. Only the basename must be given as a configuration option, suffixes are automatically added.
The .abm file contains an AdaBoost model based on shallow Decision Trees (see [CMP03] for details). You don't need to understand this, unless you want to enter into the code of the AdaBoost classifier.
The .lex file is a dictionary that assigns a number to each symbolic feature used in the AdaBoost model. You don't need to understand this either unless you are a Machine Learning hacker..
Both .abm and .lex files may be generated from an annotated corpus using the training programs in the Omlet package, a great machine-learning library, available at http://www.lsi.upc.edu/~ nlp/omlet+fries
The important file in the set is the .rgf file. This contains a definition of the context features that must be extracted for each named entity. The feature extraction language is that of [RCSY04] with some useful extensions.
If you need to know more about this (e.g. to develop a NE classifier for your language) please contact FreeLing authors.
2008-01-24