- -

Dimensionality reduction methods for machine translation quality estimation

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Dimensionality reduction methods for machine translation quality estimation

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author González Rubio, Jesús es_ES
dc.contributor.author Navarro Cerdán, José Ramón es_ES
dc.contributor.author Casacuberta Nolla, Francisco es_ES
dc.date.accessioned 2014-07-24T15:25:03Z
dc.date.issued 2013-12
dc.identifier.issn 0922-6567
dc.identifier.uri http://hdl.handle.net/10251/39000
dc.description The final publication is available at Springer via http://dx.doi.org/10.1007/s10590-013-9139-3 es_ES
dc.description.abstract [EN] Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different dimensionality reduction methods based on partial least squares regression to overcome this problem, and compare them against several reduction methods previously used in the QE literature. Moreover, we study how the use of such methods influence the performance of different learning models. Experiments carried out on the English-Spanish WMT12 QE task showed that it is possible to improve prediction accuracy while significantly reducing the size of the feature sets. es_ES
dc.description.sponsorship This work supported by the European Union Seventh Framework Program (FP7/2007-2013) under the CasMaCat project (grants agreement no. 287576), by Spanish MICINN under TIASA (TIN2009-14205-C04-02) project, and by the Generalitat Valenciana under grant ALMPR (Prometeo/2009/014).
dc.language Inglés es_ES
dc.publisher Springer Verlag (Germany) es_ES
dc.relation.ispartof Machine Translation es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Machine translation es_ES
dc.subject Quality estimation es_ES
dc.subject Dimensionality reduction es_ES
dc.subject Partial least squares regression es_ES
dc.subject.classification ESTADISTICA E INVESTIGACION OPERATIVA es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Dimensionality reduction methods for machine translation quality estimation es_ES
dc.type Artículo es_ES
dc.embargo.lift 10000-01-01
dc.embargo.terms forever es_ES
dc.identifier.doi 10.1007/s10590-013-9139-3
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/287576/EU/Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MICINN//TIN2009-14205-C04-02/ES/Tecnicas Interactivas Y Adaptativas Para Sistemas Automaticos De Reconocimiento, Aprendizaje Y Percepcion/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/Generalitat Valenciana//PROMETEO09%2F2009%2F014/ES/Adaptive learning and multimodality in pattern recognition (Almapater)/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation González Rubio, J.; Navarro Cerdán, JR.; Casacuberta Nolla, F. (2013). Dimensionality reduction methods for machine translation quality estimation. Machine Translation. 27(3-4):281-301. https://doi.org/10.1007/s10590-013-9139-3 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://link.springer.com/article/10.1007/s10590-013-9139-3 es_ES
dc.description.upvformatpinicio 281 es_ES
dc.description.upvformatpfin 301 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 27 es_ES
dc.description.issue 3-4 es_ES
dc.relation.senia 254547
dc.contributor.funder European Commission
dc.contributor.funder Ministerio de Ciencia e Innovación
dc.contributor.funder Generalitat Valenciana
dc.description.references Amaldi E, Kann V (1998) On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor Comput Sci 209(1–2):237–260 es_ES
dc.description.references Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New York es_ES
dc.description.references Avramidis E (2012) Quality estimation for machine translation output using linguistic analysis and decoding features. In: Proceedings of the seventh workshop on statistical machine translation, pp 84–90 es_ES
dc.description.references Bellman RE (1961) Adaptive control processes: a guided tour. Rand Corporation research studies. Princeton University Press, Princeton es_ES
dc.description.references Bisani M, Ney H (2004) Bootstrap estimates for confidence intervals in asr performance evaluation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 1, pp 409–412 es_ES
dc.description.references Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the international conference on Computational Linguistics, pp 315–321 es_ES
dc.description.references Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, pp 10–51 es_ES
dc.description.references Chong I, Jun C (2005) Performance of some variable selection methods when multicollinearity is present. Chemom Intell Lab Syst 78(1–2):103–112 es_ES
dc.description.references Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 es_ES
dc.description.references Gamon M, Aue A, Smets M (2005) Sentence-Level MT evaluation without reference translations: beyond language modeling. In: Proceedings of the conference of the European Association for Machine Translation es_ES
dc.description.references Gandrabur S, Foster G (2003) Confidence estimation for text prediction. In: Proceedings of the conference on computational natural language learning, pp 315–321 es_ES
dc.description.references Geladi P, Kowalski BR (1986) Partial least-squares regression: a tutorial. Anal Chim Acta 185(1):1–17 es_ES
dc.description.references González-Rubio J, Ortiz-Martínez D, Casacuberta F (2010) Balancing user effort and translation error in interactive machine translation via confidence measures. In: Proceedinss of the meeting of the association for computational linguistics, pp 173–177 es_ES
dc.description.references González-Rubio J, Sanchís A, Casacuberta F (2012) Prhlt submission to the wmt12 quality estimation task. In: Proceedings of the seventh workshop on statistical machine translation, pp 104–108 es_ES
dc.description.references Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. Machine Learning Research 3:1157–1182 es_ES
dc.description.references Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18 es_ES
dc.description.references Hotelling H (1931) The generalization of Student’s ratio. Ann Math Stat 2(3):360–378 es_ES
dc.description.references Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the association for computational linguistics, demonstration session es_ES
dc.description.references Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324 es_ES
dc.description.references Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572 es_ES
dc.description.references Platt JC (1999) Using analytic QP and sparseness to speed training of support vector machines. In: Proceedings of the conference on advances in neural information processing systems II, pp 557–563 es_ES
dc.description.references Quinlan RJ (1992) Learning with continuous classes. In: Proceedings of the Australian joint conference on artificial intelligence, pp 343–348 es_ES
dc.description.references Quirk C (2004) Training a sentence-level machine translation confidence measure. In: Proceedings of conference on language resources and evaluation, pp 825–828 es_ES
dc.description.references Sanchis A, Juan A, Vidal E (2007) Estimation of confidence measures for machine translation. In: Proceedings of the machine translation summit XI, pp 407–412 es_ES
dc.description.references Scott DW, Thompson JR (1983) Probability density estimation in higher dimensions. In: Proceedings of the fifteenth symposium on the interface, computer science and statistics, pp 173–179 es_ES
dc.description.references Soricut R, Echihabi A (2010) TrustRank: inducing trust in automatic translations via ranking. In: Proceedings of the meeting of the association for computational linguistics, pp 612–621 es_ES
dc.description.references Soricut R, Bach N, Wang Z (2012) The SDL language weaver systems in the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation. Montreal, Canada, pp 145–151 es_ES
dc.description.references Specia L, Saunders C, Wang Z, Shawe-Taylor J, Turchi M (2009a) Improving the confidence of machine translation quality estimates. In: Proceedings of the machine translation summit XII es_ES
dc.description.references Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009b) Estimating the sentence-level quality of machine translation systems. In: Proceedings of the meeting of the European Association for Machine Translation, pp 28–35 es_ES
dc.description.references Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288 es_ES
dc.description.references Ueffing N, Ney H (2007) Word-level confidence estimation for machine translation. Comput Ling 33:9–40 es_ES
dc.description.references Ueffing N, Macherey K, Ney H (2003) Confidence measures for statistical machine translation. In: Proceedings of the MT summit IX, pp 394–401 es_ES
dc.description.references Wold H (1966) Estimation of principal components and related models by iterative least squares. Academic Press, New York es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem