Sense annotation

The sense annotation module uses a BerkeleyDB indexed file. This file may also be used by the dependency labeling module (see section 2.17).

It may be created for a new language with the src/utilities/indexdict program provided with FreeLing.

The source file must have the sense list for one lemma-PoS at each line.

Each line has format: W:lemma:PoS synset1 synset2 .... E.g.
W:cebolla:N 05760066 08734429 08734702

The first sense code in the list is assumed to be the most frequent sense for that lemma-PoS by the sense annotation module. This only takes effect when value msf is selected for the SenseAnnotation option.

The file may also contain the same information indexed by synset (that is, the list of synonyms for a given synset). This is useful if you are using the synon function in your dependency rules (see section 2.17). The lines with this information have the format

Each line has format: S:synset:PoS lemma1 lemma2 .... E.g.
S:07389783:N chaval chico joven mozo muchacho niņo

To create a sense file for a new language, just list the sense codes for each lemma-PoS combination in a text file (e.g. sensefile.txt), with lines in the format described above, and then issue:
indexdict sense.db < sensefile.txt

This will produce an indexed file sense.db which is to be given to the analyzer via the SenseFile option in configuration file, or via the -fsense option at command line. It can also be referred to in the entry WNFile of the <SEMDB> section of a file of dependency labeling rules (section 2.17).

2008-01-24