Quantity recognition data file

This file contains the data necessary to perform currency amount and physical magnitude recognition. It consists of three sections: <Currency>, <Measure>, and </MeasureNames>.

Section <Currency> contains a single line indicating which is the code, among those used in section <Measure>, that stands for 'currency amount'.

E.g.:

<Currency>
CUR
</Currency>

Section <Measure> indicates the type of measure corresponding to each possible unit. Each line contains two fields: the measure code and the unit code. The codes may be anything, at user's choice, and will be used to build the lemma of the recognized quantity multiword.

E.g., the following section states that USD and FRF are of type CUR (currency), mm is of type LN (length), and ft/s is of type SP (speed):

<Measure>
CUR USD
CUR FRF
LN mm
SP ft/s
</Measure>

Finally, section <MeasureNames> describes which multiwords have to be interpreted as a measure, and which unit they represent. The unit must appear in section <Measure> with its associated code. Each line has the format:

multiword_description code tag
where multiword_description is a multiword pattern as in multiwords file described in section 2.5, code is the type of magnitude the unit describes (currency, speed, etc.), and tag is a constraint on the lemmatized components of the multiword, following the same conventions than in multiwords file (section 2.5).

E.g.,

<MeasureNames>
french_<franc> FRF $2:N
<franc> FRF $1:N
<dollar> USD $1:N
american_<dollar> USD $2:N
us_<dollar> USD $2:N
<milimeter> mm $1:N
<foot>_per_second ft/s $1:N
<foot>_Fh_second ft/s $1:N
<foot>_Fh_s ft/s $1:N
<foot>_second ft/s $1:N
</MeasureNames>

This section will recognize strings such as the following:

 234_french_francs CUR_FRF:234 Zm
 one_dollar CUR_USD:1 Zm
 two_hundred_fifty_feet_per_second SP_ft/s:250 Zu

Quantity multiwords will be recognized only when following a number, that is, in the sentence There were many french francs, the multiword won't be recognized since it is not assigning units to a determined quantity.

It is important to note that the lemmatized multiword expressions (the ones that containt angle brackets) will only be recognized if the lemma is present in the dictionary with its corresponding inflected forms.

2008-01-24