- -

Comparison of Data Selection Techniques for the Translation of Video Lectures

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

Comparison of Data Selection Techniques for the Translation of Video Lectures

Show simple item record

Files in this item

dc.contributor.author Wuebker, Joern es_ES
dc.contributor.author Ney, Hermann es_ES
dc.contributor.author Martínez-Villaronga, Adrià es_ES
dc.contributor.author Giménez Pastor, Adrián es_ES
dc.contributor.author Juan Císcar, Alfonso es_ES
dc.contributor.author Servan, Christophe es_ES
dc.contributor.author Dymetman, Marc es_ES
dc.contributor.author Mirkin, Shashar es_ES
dc.date.accessioned 2015-09-09T10:38:30Z
dc.date.available 2015-09-09T10:38:30Z
dc.date.issued 2014-10-22
dc.identifier.uri http://hdl.handle.net/10251/54431
dc.description.abstract [EN] For the task of online translation of scientific video lectures, using huge models is not possible. In order to get smaller and efficient models, we perform data selection. In this paper, we perform a qualitative and quantitative comparison of several data selection techniques, based on cross-entropy and infrequent n-gram criteria. In terms of BLEU, a combination of translation and language model cross-entropy achieves the most stable results. As another important criterion for measuring translation quality in our application, we identify the number of out-ofvocabulary words. Here, infrequent n-gram recovery shows superior performance. Finally, we combine the two selection techniques in order to benefit from both their strengths. es_ES
dc.description.sponsorship The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755 (transLectures), and the Spanish MINECO Active2Trans (TIN2012-31723) research project. es_ES
dc.format.extent 15 es_ES
dc.language Inglés es_ES
dc.publisher Association for Machine Translation in the Americas es_ES
dc.relation MINECO/TIN2012-31723 es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Data selection techniques es_ES
dc.subject Translation quality es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Comparison of Data Selection Techniques for the Translation of Video Lectures es_ES
dc.type Comunicación en congreso es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/287755/EU es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Área de Sistemas de Información y Comunicaciones - Àrea de Sistemes d'Informació i Comunicacions es_ES
dc.description.bibliographicCitation Wuebker, J.; Ney, H.; Martínez-Villaronga, A.; Giménez Pastor, A.; Juan Císcar, A.; Servan, C.; Dymetman, M.... (2014). Comparison of Data Selection Techniques for the Translation of Video Lectures. Association for Machine Translation in the Americas. http://hdl.handle.net/10251/54431 es_ES
dc.description.accrualMethod Senia es_ES
dc.relation.conferencename AMTA 2014 Workshop on Interactive and Adaptive Machine Translation es_ES
dc.relation.conferencedate October 22, 2014 es_ES
dc.relation.conferenceplace Vancouver, Canada es_ES
dc.relation.publisherversion http://www.mt-archive.info/10/AMTA-2014-TOC.htm es_ES
dc.relation.senia 277406 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES


This item appears in the following Collection(s)

Show simple item record