Vector sentences representation for data selection in statistical machine translation

Chinea-Rios, Mara; Sanchis Trilles, Germán; Casacuberta Nolla, Francisco

doi:10.1016/j.csl.2018.12.005

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Vector sentences representation for data selection in statistical machine translation

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Chinea-Rios;Sanch ...

Tamaño: 513.6Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: 1-s2.0-S088523081 ...

Tamaño: 736.2Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Chinea-Rios, Mara	es_ES
dc.contributor.author	Sanchis Trilles, Germán	es_ES
dc.contributor.author	Casacuberta Nolla, Francisco	es_ES
dc.date.accessioned	2020-11-20T04:31:36Z
dc.date.available	2020-11-20T04:31:36Z
dc.date.issued	2019-07	es_ES
dc.identifier.issn	0885-2308	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/155404
dc.description.abstract	[EN] One of the most popular approaches to machine translation consists in formulating the problem as a pattern recognition approach. Under this perspective, bilingual corpora are precious resources, as they allow for a proper estimation of the underlying models. In this framework, selecting the best possible corpus is critical, and data selection aims to find the best subset of the bilingual sentences from an available pool of sentences such that the final translation quality is improved. In this paper, we present a new data selection technique that leverages a continuous vector-space representation of sentences. Experimental results report improvements compared not only with a system trained only with in-domain data, but also compared with a system trained on all the available data. Finally, we compared our proposal with other state-of-the-art data selection techniques (Cross-entropy selection and Infrequent ngrams recovery) in two different scenarios, obtaining very promising results with our proposal: our data selection strategy is able to yield results that are at least as good as the best-performing strfategy for each scenario. The empirical results reported are coherent across different language pairs.	es_ES
dc.description.sponsorship	Work supported by the Generalitat Valenciana under grant ALMAMATER (PrometeoII/2014/030) and the FPI (2014) grant by Universitat Politècnica de València.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Elsevier	es_ES
dc.relation.ispartof	Computer Speech & Language	es_ES
dc.rights	Reconocimiento - No comercial - Sin obra derivada (by-nc-nd)	es_ES
dc.subject	Statistical machine translation	es_ES
dc.subject	Data selection	es_ES
dc.subject	Continuous vector-space representation	es_ES
dc.subject	Cross-entropy	es_ES
dc.subject	Infrequent ngrams recovery	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Vector sentences representation for data selection in statistical machine translation	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1016/j.csl.2018.12.005	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2014%2F030/ES/ Adaptive learning and multimodality in machine translation and text transcription/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/UPV//FPI-2014	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Chinea-Rios, M.; Sanchis Trilles, G.; Casacuberta Nolla, F. (2019). Vector sentences representation for data selection in statistical machine translation. Computer Speech & Language. 56:1-16. https://doi.org/10.1016/j.csl.2018.12.005	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.csl.2018.12.005	es_ES
dc.description.upvformatpinicio	1	es_ES
dc.description.upvformatpfin	16	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	56	es_ES
dc.relation.pasarela	S\403697	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Universitat Politècnica de València	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [47184]

Mostrar el registro sencillo del ítem

Vector sentences representation for data selection in statistical machine translation

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Vector sentences representation for data selection in statistical machine translation

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)