- -

Filtering of Noisy Parallel Corpora Based on Hypothesis Generation

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Filtering of Noisy Parallel Corpora Based on Hypothesis Generation

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Parcheta, Zuzanna es_ES
dc.contributor.author Sanchis Trilles, Germán es_ES
dc.contributor.author Casacuberta Nolla, Francisco es_ES
dc.date.accessioned 2022-02-08T12:40:08Z
dc.date.available 2022-02-08T12:40:08Z
dc.date.issued 2019-08-02 es_ES
dc.identifier.isbn 978-1-950737-27-7 es_ES
dc.identifier.uri http://hdl.handle.net/10251/180620
dc.description.abstract [EN] The filtering task of noisy parallel corpora in WMT2019 aims to challenge participants to create filtering methods to be useful for training machine translation systems. In this work, we introduce a noisy parallel corpora filtering system based on generating hypotheses by means of a translation model. We train translation models in both language pairs: Nepali English and Sinhala English using provided parallel corpora. To create the best possible translation model, we first join all provided parallel corpora (Nepali, Sinhala and Hindi to English) and after that, we applied bilingual cross-entropy selection for both language pairs (Nepali English and Sinhala English). Once the translation models are trained, we translate the noisy corpora and generate a hypothesis for each sentence pair. We compute the smoothed BLEU score between the target sentence and generated hypothesis. In addition, we apply several rules to discard very noisy or inadequate sentences which can lower the translation score. These heuristics are based on sentence length, source and target similarity and source language detection. We compare our results with the baseline published on the shared task website, which uses the Zipporah model, over which we achieve significant improvements in one of the conditions in the shared task. The designed filtering system is domain independent and all experiments are conducted using neural machine translation. es_ES
dc.description.sponsorship Work partially supported by MINECO under grant DI-15-08169 and by Sciling under its R+D programme. The authors would like to thank NVIDIA for their donation of Titan Xp GPU that allowed to conduct this research. es_ES
dc.language Inglés es_ES
dc.publisher The Association for Computational Linguistics es_ES
dc.relation.ispartof Proceedings of the Conference es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Noisy corpora es_ES
dc.subject Corpus filtering es_ES
dc.subject Low-resource languages es_ES
dc.subject Hypothesis Generation es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Filtering of Noisy Parallel Corpora Based on Hypothesis Generation es_ES
dc.type Comunicación en congreso es_ES
dc.type Capítulo de libro es_ES
dc.identifier.doi 10.18653/v1/W19-5439 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//DI-15-08169/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Parcheta, Z.; Sanchis Trilles, G.; Casacuberta Nolla, F. (2019). Filtering of Noisy Parallel Corpora Based on Hypothesis Generation. The Association for Computational Linguistics. 284-290. https://doi.org/10.18653/v1/W19-5439 es_ES
dc.description.accrualMethod S es_ES
dc.relation.conferencename Fourth Conference on Machine Translation (WMT 2019) es_ES
dc.relation.conferencedate Agosto 01-02,2019 es_ES
dc.relation.conferenceplace Florence, Italy es_ES
dc.relation.publisherversion https://doi.org/10.18653/v1/W19-5439 es_ES
dc.description.upvformatpinicio 284 es_ES
dc.description.upvformatpfin 290 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.pasarela S\394621 es_ES
dc.contributor.funder Nvidia es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem