- -

GREAT: open source software for statistical machine translation

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

GREAT: open source software for statistical machine translation

Show simple item record

Files in this item

dc.contributor.author González Mollá, Jorge es_ES
dc.contributor.author Casacuberta Nolla, Francisco es_ES
dc.date.accessioned 2014-03-24T10:34:23Z
dc.date.issued 2011-06-01
dc.identifier.issn 0922-6567
dc.identifier.uri http://hdl.handle.net/10251/36594
dc.description The final publication is available at Springer via http://dx.doi.org/10.1007/s10590-011-9097-6 es_ES
dc.description.abstract [EN] In this article, the first public release of GREAT as an open-source, statistical machine translation (SMT) software toolkit is described. GREAT is based on a bilingual language modelling approach for SMT, which is so far implemented for n-gram models based on the framework of stochastic finite-state transducers. The use of finite-state models is motivated by their simplicity, their versatility, and the fact that they present a lower computational cost, if compared with other more expressive models. Moreover, if translation is assumed to be a subsequential process, finite-state models are enough for modelling the existing relations between a source and a target language. GREAT includes some characteristics usually present in state-of-the-art SMT, such as phrase-based translation models or a log-linear framework for local features. Experimental results on a well-known corpus such as Europarl are reported in order to validate this software. A competitive translation quality is achieved, yet using both a lower number of model parameters and a lower response time than the widely-used, state-of-the-art SMT system Moses. © 2011 Springer Science+Business Media B.V. es_ES
dc.description.sponsorship Study was supported by the EC (FEDER, FSE), the Spanish government (MICINN, MITyC, “Plan E”, under Grants MIPRCV “Consolider Ingenio 2010”, iTrans2 TIN2009-14511, and erudito.com TSI-020110-2009-439), and the Generalitat Valenciana (Grant Prometeo/2009/014).
dc.format.extent 16 es_ES
dc.language Inglés es_ES
dc.publisher Springer Netherlands es_ES
dc.relation MICINN/TIN2009-14511 es_ES
dc.relation MICINN/TSI-020110-2009-439 es_ES
dc.relation GV/PROMETEO/2009/014 es_ES
dc.relation.ispartof Machine Translation es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Grammatical inference es_ES
dc.subject Language modelling es_ES
dc.subject Monotonic bilingual segmentation es_ES
dc.subject Statistical machine translation es_ES
dc.subject Stochastic finite-state transducers es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title GREAT: open source software for statistical machine translation es_ES
dc.type Artículo es_ES
dc.embargo.lift 10000-01-01
dc.embargo.terms forever es_ES
dc.identifier.doi 10.1007/s10590-011-9097-6
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation González Mollá, J.; Casacuberta Nolla, F. (2011). GREAT: open source software for statistical machine translation. Machine Translation. 25(2):145-160. doi:10.1007/s10590-011-9097-6 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://link.springer.com/article/10.1007%2Fs10590-011-9097-6 es_ES
dc.description.upvformatpinicio 145 es_ES
dc.description.upvformatpfin 160 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 25 es_ES
dc.description.issue 2 es_ES
dc.relation.senia 201860
dc.contributor.funder Ministerio de Ciencia e Innovación
dc.contributor.funder Generalitat Valenciana
dc.relation.references Amengual JC, Benedí JM, Casacuberta F, Castaño MA, Castellanos A, Jiménez VM, Llorens D, Marzal A, Pastor M, Prat F, Vidal E, Vilar JM (2000) The EUTRANS-I speech translation system. Mach Transl 15(1-2): 75–103 es_ES
dc.relation.references Andrés-Ferrer J, Juan-Císcar A, Casacuberta F (2008) Statistical estimation of rational transducers applied to machine translation. Appl Artif Intell 22(1–2): 4–22 es_ES
dc.relation.references Bangalore S, Riccardi G (2002) Stochastic finite-state models for spoken language machine translation. Mach Transl 17(3): 165–184 es_ES
dc.relation.references Berstel J (1979) Transductions and context-free languages. B.G. Teubner, Stuttgart, Germany es_ES
dc.relation.references Casacuberta F, Vidal E (2004) Machine translation with inferred stochastic finite-state transducers. Comput Linguist 30(2): 205–225 es_ES
dc.relation.references Casacuberta F, Vidal E (2007) Learning finite-state models for machine translation. Mach Learn 66(1): 69–91 es_ES
dc.relation.references Foster G, Kuhn R, Johnson H (2006) Phrasetable smoothing for statistical machine translation. In: Proceedings of the 11th Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, pp 53–61 es_ES
dc.relation.references González J (2009) Aprendizaje de transductores estocásticos de estados finitos y su aplicación en traducción automática. PhD thesis, Universitat Politècnica de València. Advisor: Casacuberta F es_ES
dc.relation.references González J, Casacuberta F (2009) GREAT: a finite-state machine translation toolkit implementing a grammatical inference approach for transducer inference (GIATI). In: Proceedings of the EACL Workshop on Computational Linguistic Aspects of Grammatical Inference, Athens, Greece, pp 24–32 es_ES
dc.relation.references Kanthak S, Vilar D, Matusov E, Zens R, Ney H (2005) Novel reordering approaches in phrase-based statistical machine translation. In: Proceedings of the ACL Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Ann Arbor, MI, pp 167–174 es_ES
dc.relation.references Karttunen L (2001) Applications of finite-state transducers in natural language processing. In: Proceedings of the 5th Conference on Implementation and Application of Automata, London, UK, pp 34–46 es_ES
dc.relation.references Kneser R, Ney H (1995) Improved backing-off for n-gram language modeling. In: Proceedings of the 20th IEEE International Conference on Acoustic, Speech and Signal Processing, Detroit, MI, pp 181–184 es_ES
dc.relation.references Knight K, Al-Onaizan Y (1998) Translation with finite-state devices. In: Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas, Langhorne, PA, pp 421–437 es_ES
dc.relation.references Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 9th Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, pp 388–395 es_ES
dc.relation.references Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the 10th Machine Translation Summit, Phuket, Thailand, pp 79–86 es_ES
dc.relation.references Koehn P (2010) Statistical machine translation. Cambridge University Press, Cambridge, UK es_ES
dc.relation.references Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp 868–876 es_ES
dc.relation.references Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp 177–180 es_ES
dc.relation.references Kumar S, Deng Y, Byrne W (2006) A weighted finite state transducer translation template model for statistical machine translation. Nat Lang Eng 12(1): 35–75 es_ES
dc.relation.references Li Z, Callison-Burch C, Dyer C, Ganitkevitch J, Khudanpur S, Schwartz L, Thornton WNG, Weese J, Zaidan OF (2009) Joshua: an open source toolkit for parsing-based machine translation. In: Procee- dings of the ACL Workshop on Statistical Machine Translation, Morristown, NJ, pp 135–139 es_ES
dc.relation.references Llorens D, Vilar JM, Casacuberta F (2002) Finite state language models smoothed using n-grams. Int J Pattern Recognit Artif Intell 16(3): 275–289 es_ES
dc.relation.references Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the 7th Conference on Empirical Methods in Natural Language Processing, Morristown, NJ, pp 133–139 es_ES
dc.relation.references Mariño JB, Banchs RE, Crego JM, de Gispert A, Lambert P, Fonollosa JAR, Costa-jussà MR (2006) N-gram-based machine translation. Comput Linguist 32(4): 527–549 es_ES
dc.relation.references Medvedev YT (1964) On the class of events representable in a finite automaton. In: Moore EF (eds) Sequential machines selected papers. Addison Wesley, Reading, MA es_ES
dc.relation.references Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1): 69–88 es_ES
dc.relation.references Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302 es_ES
dc.relation.references Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51 es_ES
dc.relation.references Ortiz D, García-Varea I, Casacuberta F (2005) Thot: a toolkit to train phrase-based statistical translation models. In: Proceedings of the 10th Machine Translation Summit, Phuket, Thailand, pp 141–148 es_ES
dc.relation.references Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318 es_ES
dc.relation.references Pérez A, Torres MI, Casacuberta F (2008) Joining linguistic and statistical methods for Spanish-to-Basque speech translation. Speech Commun 50: 1021–1033 es_ES
dc.relation.references Picó D, Casacuberta F (2001) Some statistical-estimation methods for stochastic finite-state transducers. Mach Learn 44: 121–142 es_ES
dc.relation.references Rosenfeld R (1996) A maximum entropy approach to adaptive statistical language modeling. Comput Speech Lang 10: 187–228 es_ES
dc.relation.references Simard M, Plamondon P (1998) Bilingual sentence alignment: balancing robustness and accuracy. Mach Transl 13(1): 59–80 es_ES
dc.relation.references Singh AK, Husain S (2007) Exploring translation similarities for building a better sentence aligner. In: Proceedings of the 3rd Indian International Conference on Artificial Intelligence, Pune, India, pp 1852–1863 es_ES
dc.relation.references Steinbiss V, Tran BH, Ney H (1994) Improvements in beam search. In: Proceedings of the 3rd International Conference on Spoken Language Processing, Yokohama, Japan, pp 2143–2146 es_ES
dc.relation.references Torres MI, Varona A (2001) k-TSS language models in speech recognition systems. Comput Speech Lang 15(2): 127–149 es_ES
dc.relation.references Vidal E (1997) Finite-state speech-to-speech translation. In: Proceedings of the 22nd IEEE International Conference on Acoustic, Speech and Signal Processing, Munich, Germany, pp 111–114 es_ES
dc.relation.references Vidal E, Thollard F, de la Higuera C, Casacuberta F, Carrasco RC (2005) Probabilistic finite-state machines–Part II. IEEE Trans Pattern Anal Mach Intell 27(7): 1025–1039 es_ES
dc.relation.references Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2): 260–269 es_ES


This item appears in the following Collection(s)

Show simple item record