It may be created for a new language with the src/utilities/indexdict program provided with FreeLing.
The source file must have the sense list for one lemma-PoS at each line.
Each line has format: W:lemma:PoS synset1 synset2 .... E.g.
W:cebolla:N 05760066 08734429 08734702
The first sense code in the list is assumed to be the most frequent sense for that lemma-PoS by the sense annotation module. This only takes effect when value msf is selected for the SenseAnnotation option.
The file may also contain the same information indexed by synset (that is, the list of synonyms for a given synset). This is useful if you are using the synon
function in your dependency rules (see section 2.17).
The lines with this information have the format
Each line has format: S:synset:PoS lemma1 lemma2 .... E.g.
S:07389783:N chaval chico joven mozo muchacho niņo
To create a sense file for a new language, just list the sense codes
for each lemma-PoS combination in a text file
(e.g. sensefile.txt
), with lines in the format described above, and
then issue:
indexdict sense.db < sensefile.txt
This will produce an indexed file sense.db which is to be
given to the analyzer via the SenseFile option in
configuration file, or via the -fsense option at command line.
It can also be referred to in the entry WNFile of the <SEMDB>
section of a file of dependency labeling rules (section 2.17).
2008-01-24