Sample program

A very simple sample program using the library is depicted below. It reads text from stdin, morphologically analyzes it, and processes the obtained results. Depending on the application, the input text could be obtained from a speech recongnition system, or from a XML parser, or from any source suiting the application goals.

int main() {
  string text;
  list<word> lw;
  list<sentence> ls;
  
  // create analyzers
  tokenizer tk("myTokenizerFile.dat"); 
  splitter sp(false,0);
  
  // morphological analysis has a lot of options, and for simplicity they are packed up
  // in a maco_options object. First, create the maco_options object with default values.
  maco_options opt("es");
  
  // set required options  
  opt.noQuantitiesDetection = true;  // deactivate quantities submodule
  
  // Data files for morphological submodules. Note that it is not necessary
  // to set opt.CurrencyFile, since quantities module was deactivated.
  opt.LocutionsFile="myMultiwordsFile.dat";       opt.SuffixFile="mySuffixesFile.dat";
  opt.ProbabilityFile="myProbabilitiesFile.dat";  opt.DictionaryFile="myDictionaryFile.dat";
  opt.NPdataFile="myNPdatafile.dat";              opt.PunctuationFile="myPunctuationFile.dat"; 
  
  // create the analyzer with the given set of maco_options
  maco morfo(opt);    
  
  // create a hmm tagger
  hmm_tagger tagger("es", "myTaggerFile.dat"); 
  
  // get plain text input lines while not EOF.
  while (getline(cin,text)) {
    
    // clear temporary lists;
    lw.clear(); ls.clear();
    
    // tokenize input line into a list of words
    lw=tk.tokenize(text);
    
    // accumulate list of words in splitter buffer, returning a list of sentences.
    // The resulting list of sentences may be empty if the splitter has still not 
    // enough evidence to decide that a complete sentence has been found. The list
    // may contain more than one sentence (since a single input line may consist 
    // of several complete sentences).
    ls=sp.split(lw, false);
    
    // analyze all words in all sentences of the list, enriching them with lemma and PoS 
    // information. Some of the words may be glued in one (e.g. dates, multiwords, etc.)
    morfo.analyze(ls);
    
    // disambiguate words in each sentence of given sentence list.
    tagger.analyze(ls);
    
    // Process the enriched/disambiguated objects in the list of sentences
    ProcessResults(ls);
  }
  
  // No more lines to read. Make sure the splitter doesn't retain anything  
  ls=sp.split(lw, true);  
  
  // morphologically enrich and disambiguate last sentence(s)
  morfo.analyze(ls);
  tagger.analyze(ls);
  
  // process last sentence(s)   
  ProcessResults(ls);
}

The processing performed on the obtained results would obviously depend on the goal of the application (translation, indexation, etc.). In order to illustrate the structure of the linguistic data objects, a simple procedure is presented below, in which the processing consists of merely printing the results to stdout in XML format.

void ProcessResults(const list<sentence> &ls) {
  
  list<sentence>::const_iterator s;
  word::const_iterator a;   //iterator over all analysis of a word
  sentence::const_iterator w;
  
  // for each sentence in list
  for (s=ls.begin(); s!=ls.end(); s++) {
    
    // print sentence XML tag
    cout<<"<SENT>"<<endl;
      
    // for each word in sentence
    for (w=s->begin(); w!=s->end(); w++) {
      
      // print word form, with PoS and lemma chosen by the tagger
      cout<<"  <WORD form=\""<<w->get_form();
      cout<<"\" lemma=\""<<w->get_lemma();
      cout<<"\" pos=\""<<w->get_parole();
      cout<<"\">"<<endl;
      
      // for each possible analysis in word, output lemma, parole and probability
      for (a=w->analysis_begin(); a!=w->analysis_end(); ++a) {
	
        // print analysis info
        cout<<"    <ANALYSIS lemma=\""<<a->get_lemma();
        cout<<"\" pos=\""<<a->get_parole();
        cout<<"\" prob=\""<<a->get_prob();
        cout<<"\"/>"<<endl;
      }
      
      // close word XML tag after list of analysis
      cout<<"</WORD>"<<endl;
    }
    
    // close sentence XML tag
    cout<<"</SENT>"<<endl;
  }
}

The above sample program may be found in file FreeLing-build-dir/src/main/sample.cc

Once you have compiled and installed FreeLing, you can build this sample program (or any other you may want to write) with the command:
g++ -o sample sample.cc -lmorfo -ldb_cxx -lpcre -lomlet -fries

Option -lmorfo links with libmorfo library, which is the final result of the FreeLing compilation process. The oher options refer to above mentioned libraries required by FreeLing. You may have to add some -I and/or -L options to the compilation command depending on where the headers and code of required libraries are located. For instance, if you installed some of the libraries in /usr/local/mylib instead of the default place /usr/local, you'll have to add the options
-I/usr/local/mylib/include -L/usr/local/mylib/lib
to the command above.

More clues on how to use the freeling library from your own program may be obtained by looking at the source code of the main program provided in the package. The program is quite simple and commented, so it should be easy to understand what it does. The source can be found in file FreeLing-build-dir/src/main/analyzer.cc

2008-01-24