Dimensionality reduction methods for machine translation quality estimation

González Rubio, Jesús; Navarro Cerdán, José Ramón; Casacuberta Nolla, Francisco

doi:10.1007/s10590-013-9139-3

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Dimensionality reduction methods for machine translation quality estimation

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: González;Navarro; ...

Tamaño: 641.7Kb

Formato: PDF

Descripción: Versión de Autor

Abrir

Nombre: art%3A10.1007%2Fs ...

Tamaño: 548.9Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	González Rubio, Jesús	es_ES
dc.contributor.author	Navarro Cerdán, José Ramón	es_ES
dc.contributor.author	Casacuberta Nolla, Francisco	es_ES
dc.date.accessioned	2014-07-24T15:25:03Z
dc.date.issued	2013-12
dc.identifier.issn	0922-6567
dc.identifier.uri	http://hdl.handle.net/10251/39000
dc.description	The final publication is available at Springer via http://dx.doi.org/10.1007/s10590-013-9139-3	es_ES
dc.description.abstract	[EN] Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different dimensionality reduction methods based on partial least squares regression to overcome this problem, and compare them against several reduction methods previously used in the QE literature. Moreover, we study how the use of such methods influence the performance of different learning models. Experiments carried out on the English-Spanish WMT12 QE task showed that it is possible to improve prediction accuracy while significantly reducing the size of the feature sets.	es_ES
dc.description.sponsorship	This work supported by the European Union Seventh Framework Program (FP7/2007-2013) under the CasMaCat project (grants agreement no. 287576), by Spanish MICINN under TIASA (TIN2009-14205-C04-02) project, and by the Generalitat Valenciana under grant ALMPR (Prometeo/2009/014).
dc.language	Inglés	es_ES
dc.publisher	Springer Verlag (Germany)	es_ES
dc.relation.ispartof	Machine Translation	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Machine translation	es_ES
dc.subject	Quality estimation	es_ES
dc.subject	Dimensionality reduction	es_ES
dc.subject	Partial least squares regression	es_ES
dc.subject.classification	ESTADISTICA E INVESTIGACION OPERATIVA	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Dimensionality reduction methods for machine translation quality estimation	es_ES
dc.type	Artículo	es_ES
dc.embargo.lift	10000-01-01
dc.embargo.terms	forever	es_ES
dc.identifier.doi	10.1007/s10590-013-9139-3
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP7/287576/EU/Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MICINN//TIN2009-14205-C04-02/ES/Tecnicas Interactivas Y Adaptativas Para Sistemas Automaticos De Reconocimiento, Aprendizaje Y Percepcion/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/Generalitat Valenciana//PROMETEO09%2F2009%2F014/ES/Adaptive learning and multimodality in pattern recognition (Almapater)/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	González Rubio, J.; Navarro Cerdán, JR.; Casacuberta Nolla, F. (2013). Dimensionality reduction methods for machine translation quality estimation. Machine Translation. 27(3-4):281-301. https://doi.org/10.1007/s10590-013-9139-3	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://link.springer.com/article/10.1007/s10590-013-9139-3	es_ES
dc.description.upvformatpinicio	281	es_ES
dc.description.upvformatpfin	301	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	27	es_ES
dc.description.issue	3-4	es_ES
dc.relation.senia	254547
dc.contributor.funder	European Commission
dc.contributor.funder	Ministerio de Ciencia e Innovación
dc.contributor.funder	Generalitat Valenciana
dc.description.references	Amaldi E, Kann V (1998) On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor Comput Sci 209(1–2):237–260	es_ES
dc.description.references	Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New York	es_ES
dc.description.references	Avramidis E (2012) Quality estimation for machine translation output using linguistic analysis and decoding features. In: Proceedings of the seventh workshop on statistical machine translation, pp 84–90	es_ES
dc.description.references	Bellman RE (1961) Adaptive control processes: a guided tour. Rand Corporation research studies. Princeton University Press, Princeton	es_ES
dc.description.references	Bisani M, Ney H (2004) Bootstrap estimates for confidence intervals in asr performance evaluation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 1, pp 409–412	es_ES
dc.description.references	Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the international conference on Computational Linguistics, pp 315–321	es_ES
dc.description.references	Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, pp 10–51	es_ES
dc.description.references	Chong I, Jun C (2005) Performance of some variable selection methods when multicollinearity is present. Chemom Intell Lab Syst 78(1–2):103–112	es_ES
dc.description.references	Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297	es_ES
dc.description.references	Gamon M, Aue A, Smets M (2005) Sentence-Level MT evaluation without reference translations: beyond language modeling. In: Proceedings of the conference of the European Association for Machine Translation	es_ES
dc.description.references	Gandrabur S, Foster G (2003) Confidence estimation for text prediction. In: Proceedings of the conference on computational natural language learning, pp 315–321	es_ES
dc.description.references	Geladi P, Kowalski BR (1986) Partial least-squares regression: a tutorial. Anal Chim Acta 185(1):1–17	es_ES
dc.description.references	González-Rubio J, Ortiz-Martínez D, Casacuberta F (2010) Balancing user effort and translation error in interactive machine translation via confidence measures. In: Proceedinss of the meeting of the association for computational linguistics, pp 173–177	es_ES
dc.description.references	González-Rubio J, Sanchís A, Casacuberta F (2012) Prhlt submission to the wmt12 quality estimation task. In: Proceedings of the seventh workshop on statistical machine translation, pp 104–108	es_ES
dc.description.references	Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. Machine Learning Research 3:1157–1182	es_ES
dc.description.references	Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18	es_ES
dc.description.references	Hotelling H (1931) The generalization of Student’s ratio. Ann Math Stat 2(3):360–378	es_ES
dc.description.references	Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the association for computational linguistics, demonstration session	es_ES
dc.description.references	Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324	es_ES
dc.description.references	Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572	es_ES
dc.description.references	Platt JC (1999) Using analytic QP and sparseness to speed training of support vector machines. In: Proceedings of the conference on advances in neural information processing systems II, pp 557–563	es_ES
dc.description.references	Quinlan RJ (1992) Learning with continuous classes. In: Proceedings of the Australian joint conference on artificial intelligence, pp 343–348	es_ES
dc.description.references	Quirk C (2004) Training a sentence-level machine translation confidence measure. In: Proceedings of conference on language resources and evaluation, pp 825–828	es_ES
dc.description.references	Sanchis A, Juan A, Vidal E (2007) Estimation of confidence measures for machine translation. In: Proceedings of the machine translation summit XI, pp 407–412	es_ES
dc.description.references	Scott DW, Thompson JR (1983) Probability density estimation in higher dimensions. In: Proceedings of the fifteenth symposium on the interface, computer science and statistics, pp 173–179	es_ES
dc.description.references	Soricut R, Echihabi A (2010) TrustRank: inducing trust in automatic translations via ranking. In: Proceedings of the meeting of the association for computational linguistics, pp 612–621	es_ES
dc.description.references	Soricut R, Bach N, Wang Z (2012) The SDL language weaver systems in the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation. Montreal, Canada, pp 145–151	es_ES
dc.description.references	Specia L, Saunders C, Wang Z, Shawe-Taylor J, Turchi M (2009a) Improving the confidence of machine translation quality estimates. In: Proceedings of the machine translation summit XII	es_ES
dc.description.references	Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009b) Estimating the sentence-level quality of machine translation systems. In: Proceedings of the meeting of the European Association for Machine Translation, pp 28–35	es_ES
dc.description.references	Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288	es_ES
dc.description.references	Ueffing N, Ney H (2007) Word-level confidence estimation for machine translation. Comput Ling 33:9–40	es_ES
dc.description.references	Ueffing N, Macherey K, Ney H (2003) Confidence measures for statistical machine translation. In: Proceedings of the MT summit IX, pp 394–401	es_ES
dc.description.references	Wold H (1966) Estimation of principal components and related models by iterative least squares. Academic Press, New York	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Dimensionality reduction methods for machine translation quality estimation

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Dimensionality reduction methods for machine translation quality estimation

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)