Assuming we have the folowing input file mytext.txt:
El gato come pescado. Pero a Don Jaime no le gustan los gatos. |
analyzer -f myconfig.cfg <mytext.txt >mytext.mrfLet's assume that myconfig.cfg is the file presented in section 2.2.2. Given the options there, the produced output would correspond to morfo format (i.e. morphological analysis but no PoS tagging). The expected results are:
El el DA0MS0 1 |
gato gato NCMS000 1 |
come comer VMIP3S0 0.75 comer VMM02S0 0.25 |
pescado pescado NCMS000 0.833333 pescar VMP00SM 0.166667 |
. . Fp 1 |
Pero pero CC 0.99878 pero NCMS000 0.00121951 Pero NP00000 0.00121951 |
a a NCFS000 0.0054008 a SPS00 0.994599 |
Don_Jaime Don_Jaime NP00000 1 |
no no NCMS000 0.00231911 no RN 0.997681 |
le él PP3CSD00 1 |
gustan gustar VMIP3P0 1 |
los el DA0MP0 0.975719 lo NCMP000 0.00019425 él PP3MPA00 0.024087 |
gatos gato NCMP000 1 |
. . Fp 1 |
If we also wanted PoS tagging, we could have issued the command:
analyzer -f myconfig.cfg --outf tagged <mytext.txt >mytext.tag
to obtain the tagged output:
El el DA0MS0 |
gato gato NCMS000 |
come comer VMIP3S0 |
pescado pescado NCMS000 |
. . Fp |
Pero pero CC |
a a SPS00 |
Don_Jaime Don_Jaime NP00000 |
no no RN |
le él PP3CSD00 |
gustan gustar VMIP3P0 |
los el DA0MP0 |
gatos gato NCMP000 |
. . Fp |
We can also ask for the synsets of the tagged words:
analyzer -f myconfig.cfg --outf sense --sense=all <mytext.txt >mytext.sen
obtaining the output:
El el DA0MS0 |
gato gato NCMS000 01630731:07221232:01631653 |
come comer VMIP3S0 00794578:00793267 |
pescado pescado NCMS000 05810856:02006311 |
. . Fp |
Pero pero CC |
a a SPS00 |
Don_Jaime Don_Jaime NP00000 |
no no RN |
le él PP3CSD00 |
gustan gustar VMIP3P0 01244897:01213391:01241953 |
los el DA0MP0 |
gatos gato NCMP000 01630731:07221232:01631653 |
. . Fp |
Alternatively, if we don't want to repeat the first steps that we had already performed, we could use the output of the morphological analyzer as input to the tagger:
analyzer -f myconfig.cfg --inpf morfo --outf tagged <mytext.mrf >mytext.tag
See options InputFormat and OutputFormat in section 2.2.1 for details on which are valid input and output formats.
2008-01-24