Mostrar el registro sencillo del ítem
dc.contributor.author | Parcheta, Zuzanna | es_ES |
dc.contributor.author | Sanchis Trilles, Germán | es_ES |
dc.contributor.author | Casacuberta Nolla, Francisco | es_ES |
dc.date.accessioned | 2022-02-08T12:40:08Z | |
dc.date.available | 2022-02-08T12:40:08Z | |
dc.date.issued | 2019-08-02 | es_ES |
dc.identifier.isbn | 978-1-950737-27-7 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/180620 | |
dc.description.abstract | [EN] The filtering task of noisy parallel corpora in WMT2019 aims to challenge participants to create filtering methods to be useful for training machine translation systems. In this work, we introduce a noisy parallel corpora filtering system based on generating hypotheses by means of a translation model. We train translation models in both language pairs: Nepali English and Sinhala English using provided parallel corpora. To create the best possible translation model, we first join all provided parallel corpora (Nepali, Sinhala and Hindi to English) and after that, we applied bilingual cross-entropy selection for both language pairs (Nepali English and Sinhala English). Once the translation models are trained, we translate the noisy corpora and generate a hypothesis for each sentence pair. We compute the smoothed BLEU score between the target sentence and generated hypothesis. In addition, we apply several rules to discard very noisy or inadequate sentences which can lower the translation score. These heuristics are based on sentence length, source and target similarity and source language detection. We compare our results with the baseline published on the shared task website, which uses the Zipporah model, over which we achieve significant improvements in one of the conditions in the shared task. The designed filtering system is domain independent and all experiments are conducted using neural machine translation. | es_ES |
dc.description.sponsorship | Work partially supported by MINECO under grant DI-15-08169 and by Sciling under its R+D programme. The authors would like to thank NVIDIA for their donation of Titan Xp GPU that allowed to conduct this research. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | The Association for Computational Linguistics | es_ES |
dc.relation.ispartof | Proceedings of the Conference | es_ES |
dc.rights | Reconocimiento (by) | es_ES |
dc.subject | Noisy corpora | es_ES |
dc.subject | Corpus filtering | es_ES |
dc.subject | Low-resource languages | es_ES |
dc.subject | Hypothesis Generation | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Filtering of Noisy Parallel Corpora Based on Hypothesis Generation | es_ES |
dc.type | Comunicación en congreso | es_ES |
dc.type | Capítulo de libro | es_ES |
dc.identifier.doi | 10.18653/v1/W19-5439 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//DI-15-08169/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.description.bibliographicCitation | Parcheta, Z.; Sanchis Trilles, G.; Casacuberta Nolla, F. (2019). Filtering of Noisy Parallel Corpora Based on Hypothesis Generation. The Association for Computational Linguistics. 284-290. https://doi.org/10.18653/v1/W19-5439 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.conferencename | Fourth Conference on Machine Translation (WMT 2019) | es_ES |
dc.relation.conferencedate | Agosto 01-02,2019 | es_ES |
dc.relation.conferenceplace | Florence, Italy | es_ES |
dc.relation.publisherversion | https://doi.org/10.18653/v1/W19-5439 | es_ES |
dc.description.upvformatpinicio | 284 | es_ES |
dc.description.upvformatpfin | 290 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.relation.pasarela | S\394621 | es_ES |
dc.contributor.funder | Nvidia | es_ES |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |