Multiword definition file

The file contains a list of multiwords to be recognized. The format of the file is one multiword per line. Each line has three fields: the multiword form, the multiword lemma, and the multiword PoS tag.

The multiword form may admit lemmas in angle brackets, meaning that any form with that lemma will be considered a valid component for the multiword.

For instance:

a_buenas_horas a_buenas_horas RG
a_causa_de a_causa_de SPS00
<accidente>_de_trabajo accidente_de_trabajo $1:NC

The tag may be specified directly, or as a reference to the tag of some of the multiword components. In the previous example, the last multiword specification will build a multiword with any of the forms accidente de trabajo or accidentes de trabajo. The tag of the multiword will be that of its first form ($1) which starts with NC. This will assign the right singular/plural tag to the multiword, depending on whether the form was ``accidente'' or ``accidentes''.



2008-01-24