Mostrar el registro sencillo del ítem
dc.contributor.author | Wuebker, Joern | es_ES |
dc.contributor.author | Ney, Hermann | es_ES |
dc.contributor.author | Martínez-Villaronga, Adrià | es_ES |
dc.contributor.author | Giménez Pastor, Adrián | es_ES |
dc.contributor.author | Juan Císcar, Alfonso | es_ES |
dc.contributor.author | Servan, Christophe | es_ES |
dc.contributor.author | Dymetman, Marc | es_ES |
dc.contributor.author | Mirkin, Shashar | es_ES |
dc.date.accessioned | 2015-09-09T10:38:30Z | |
dc.date.available | 2015-09-09T10:38:30Z | |
dc.date.issued | 2014-10-22 | |
dc.identifier.uri | http://hdl.handle.net/10251/54431 | |
dc.description.abstract | [EN] For the task of online translation of scientific video lectures, using huge models is not possible. In order to get smaller and efficient models, we perform data selection. In this paper, we perform a qualitative and quantitative comparison of several data selection techniques, based on cross-entropy and infrequent n-gram criteria. In terms of BLEU, a combination of translation and language model cross-entropy achieves the most stable results. As another important criterion for measuring translation quality in our application, we identify the number of out-ofvocabulary words. Here, infrequent n-gram recovery shows superior performance. Finally, we combine the two selection techniques in order to benefit from both their strengths. | es_ES |
dc.description.sponsorship | The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755 (transLectures), and the Spanish MINECO Active2Trans (TIN2012-31723) research project. | es_ES |
dc.format.extent | 15 | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Association for Machine Translation in the Americas | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Data selection techniques | es_ES |
dc.subject | Translation quality | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Comparison of Data Selection Techniques for the Translation of Video Lectures | es_ES |
dc.type | Comunicación en congreso | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2012-31723/ES/INTERACCION ACTIVA PARA TRANSCRIPCION DE HABLA Y TRADUCCION/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/FP7/287755/EU/Transcription and Translation of Video Lectures/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Área de Sistemas de Información y Comunicaciones - Àrea de Sistemes d'Informació i Comunicacions | es_ES |
dc.description.bibliographicCitation | Wuebker, J.; Ney, H.; Martínez-Villaronga, A.; Giménez Pastor, A.; Juan Císcar, A.; Servan, C.; Dymetman, M.... (2014). Comparison of Data Selection Techniques for the Translation of Video Lectures. Association for Machine Translation in the Americas. http://hdl.handle.net/10251/54431 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.conferencename | AMTA 2014 Workshop on Interactive and Adaptive Machine Translation | es_ES |
dc.relation.conferencedate | October 22, 2014 | es_ES |
dc.relation.conferenceplace | Vancouver, Canada | es_ES |
dc.relation.publisherversion | http://www.mt-archive.info/10/AMTA-2014-TOC.htm | es_ES |
dc.relation.senia | 277406 | es_ES |
dc.contributor.funder | European Commission | es_ES |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |