Named entity classification data files

The Named Entity Classification module requires three configuration files, with the same path and name, with suffixes .rgf, .lex, and .abm. Only the basename must be given as a configuration option, suffixes are automatically added.

The .abm file contains an AdaBoost model based on shallow Decision Trees (see [CMP03] for details). You don't need to understand this, unless you want to enter into the code of the AdaBoost classifier.

The .lex file is a dictionary that assigns a number to each symbolic feature used in the AdaBoost model. You don't need to understand this either unless you are a Machine Learning hacker..

Both .abm and .lex files may be generated from an annotated corpus using the training programs in the Omlet package, a great machine-learning library, available at http://www.lsi.upc.edu/~ nlp/omlet+fries

The important file in the set is the .rgf file. This contains a definition of the context features that must be extracted for each named entity. The feature extraction language is that of [RCSY04] with some useful extensions.

If you need to know more about this (e.g. to develop a NE classifier for your language) please contact FreeLing authors.

2008-01-24