Word form dictionary file

Berkeley DB indexed file.

It may be created with the src/utilities/indexdict program provided with FreeLing. The source file must have the lemma-PoS list for a word form at each line.

Each line has format: form lemma1 PoS1 lemma2 PoS2 .... E.g.
casa casa NCFS000 casar VMIP3S0 casar VMM02S0

Lines corresponding to word that are contractions may have an alternative format if the contraction is to be splitted. The format is form form1+form2+... PoS1+PoS2+....
For instance:

del de+el SPS+DA

This line expresses that whenever the form del is found, it is replaced with two words: de and el. Each of the new two word forms are searched in the dictionary, and assigned any tag matching their correspondig tag in the third field. So, de will be assigned all tags starting with SPS that this entry may have in the dictionary, and el will get any tag starting with DA.

If all tags for one of the new forms are to be used, a wildcard may be written as a tag. E.g.:

pal para+el SPS+*

This will replace pal with two words, para with only its SPS analysis, plus el with all its possible tags.

Note that a contraction cannot be splitted in two different ways, so only a combination of lemmas and a combination of tags may appear in the dictionary.

2008-01-24