This file contains the data necessary to perform currency amount and
physical magnitude recognition.
It consists of three sections: <Currency>
, <Measure>
,
and </MeasureNames>
.
Section <Currency>
contains a single line indicating which is
the code, among those used in section <Measure>
, that stands for
'currency amount'.
E.g.:
<Currency> CUR </Currency>
Section <Measure>
indicates the type of measure corresponding
to each possible unit. Each line contains two fields: the measure code
and the unit code. The codes may be anything, at user's choice, and
will be used to build the lemma of the recognized quantity multiword.
E.g., the following section states that USD and FRF are of type CUR (currency), mm is of type LN (length), and ft/s is of type SP (speed):
<Measure> CUR USD CUR FRF LN mm SP ft/s </Measure>
Finally, section <MeasureNames>
describes which multiwords have
to be interpreted as a measure, and which unit they represent. The
unit must appear in section <Measure>
with its associated code.
Each line has the format:
multiword_description code tagwhere multiword_description is a multiword pattern as in multiwords file described in section 2.5, code is the type of magnitude the unit describes (currency, speed, etc.), and tag is a constraint on the lemmatized components of the multiword, following the same conventions than in multiwords file (section 2.5).
E.g.,
<MeasureNames> french_<franc> FRF $2:N <franc> FRF $1:N <dollar> USD $1:N american_<dollar> USD $2:N us_<dollar> USD $2:N <milimeter> mm $1:N <foot>_per_second ft/s $1:N <foot>_Fh_second ft/s $1:N <foot>_Fh_s ft/s $1:N <foot>_second ft/s $1:N </MeasureNames>
This section will recognize strings such as the following:
234_french_francs CUR_FRF:234 Zm one_dollar CUR_USD:1 Zm two_hundred_fifty_feet_per_second SP_ft/s:250 Zu
Quantity multiwords will be recognized only when following a number, that is, in the sentence There were many french francs, the multiword won't be recognized since it is not assigning units to a determined quantity.
It is important to note that the lemmatized multiword expressions (the ones that containt angle brackets) will only be recognized if the lemma is present in the dictionary with its corresponding inflected forms.
2008-01-24