- -

Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm: theoretical aspects

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm: theoretical aspects

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Camacho Páez, José es_ES
dc.contributor.author Ferrer Riquelme, Alberto José es_ES
dc.date.accessioned 2016-05-26T12:08:42Z
dc.date.available 2016-05-26T12:08:42Z
dc.date.issued 2012-07
dc.identifier.issn 0886-9383
dc.identifier.uri http://hdl.handle.net/10251/64795
dc.description.abstract [EN] Cross-validation has become one of the principal methods to adjust the meta-parameters in predictive models. Extensions of the cross-validation idea have been proposed to select the number of components in principal components analysis (PCA). The element-wise k-fold (ekf) cross-validation is among the most used algorithms for principal components analysis cross-validation. This is the method programmed in the PLS_Toolbox, and it has been stated to outperform other methods under most circumstances in a numerical experiment. The ekf algorithm is based on missing data imputation, and it can be programmed using any method for this purpose. In this paper, the ekf algorithm with the simplest missing data imputation method, trimmed score imputation, is analyzed. A theoretical study is driven to identify in which situations the application of ekf is adequate and, more importantly, in which situations it is not. The results presented show that the ekf method may be unable to assess the extent to which a model represents a test set and may lead to discard principal components with important information. On a second paper of this series, other imputation methods are studied within the ekf algorithm es_ES
dc.description.sponsorship Research in this area is partially supported by the Spanish Ministry of Economy and Competitiveness and FEDER funds from the European Union through grant DPI2011-28112-C04-02. Jose Camacho was funded by the Juan de la Cierva program, Ministry of Science and Innovation, Spain. This study was carried out when Jose Camacho was at the Universidad Politecnica de Valencia and Universitat de Girona, Spain. en_EN
dc.language Inglés es_ES
dc.publisher Wiley es_ES
dc.relation.ispartof Journal of Chemometrics es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Principal component analysis es_ES
dc.subject Number of components es_ES
dc.subject Cross-validation es_ES
dc.subject Missing data es_ES
dc.subject Compression es_ES
dc.subject.classification ESTADISTICA E INVESTIGACION OPERATIVA es_ES
dc.subject.classification INGENIERIA DE SISTEMAS Y AUTOMATICA es_ES
dc.title Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm: theoretical aspects es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1002/cem.2440
dc.relation.projectID info:eu-repo/grantAgreement/MICINN//DPI2011-28112-C04-02/ES/MONITORIZACION, INFERENCIA, OPTIMIZACION Y CONTROL MULTI-ESCALA: DE CELULAS A BIORREACTORES. (MULTISCALES)/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Ingeniería de Sistemas y Automática - Departament d'Enginyeria de Sistemes i Automàtica es_ES
dc.description.bibliographicCitation Camacho Páez, J.; Ferrer Riquelme, AJ. (2012). Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm: theoretical aspects. Journal of Chemometrics. 26(1):361-373. https://doi.org/10.1002/cem.2440 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://dx.doi.org/10.1002/cem.2440 es_ES
dc.description.upvformatpinicio 361 es_ES
dc.description.upvformatpfin 373 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 26 es_ES
dc.description.issue 1 es_ES
dc.relation.senia 242711 es_ES
dc.contributor.funder Ministerio de Ciencia e Innovación es_ES
dc.contributor.funder Universitat de Girona es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
dc.description.references Wold, S. (1978). Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models. Technometrics, 20(4), 397-405. doi:10.1080/00401706.1978.10489693 es_ES
dc.description.references Eastment, H. T., & Krzanowski, W. J. (1982). Cross-Validatory Choice of the Number of Components From a Principal Component Analysis. Technometrics, 24(1), 73-77. doi:10.1080/00401706.1982.10487712 es_ES
dc.description.references Nomikos, P., & MacGregor, J. F. (1995). Multivariate SPC Charts for Monitoring Batch Processes. Technometrics, 37(1), 41-59. doi:10.1080/00401706.1995.10485888 es_ES
dc.description.references Bro, R., Kjeldahl, K., Smilde, A. K., & Kiers, H. A. L. (2008). Cross-validation of component models: A critical look at current methods. Analytical and Bioanalytical Chemistry, 390(5), 1241-1251. doi:10.1007/s00216-007-1790-1 es_ES
dc.description.references Wise BM Gallagher NB Bro R Shaver JM Windig W Koch RS PLSToolbox 3.5 for use with Matlab 2005 es_ES
dc.description.references Nelson, P. R. C., Taylor, P. A., & MacGregor, J. F. (1996). Missing data methods in PCA and PLS: Score calculations with incomplete observations. Chemometrics and Intelligent Laboratory Systems, 35(1), 45-65. doi:10.1016/s0169-7439(96)00007-x es_ES
dc.description.references Arteaga, F., & Ferrer, A. (2002). Dealing with missing data in MSPC: several methods, different interpretations, some examples. Journal of Chemometrics, 16(8-10), 408-418. doi:10.1002/cem.750 es_ES
dc.description.references Arteaga, F., & Ferrer, A. (2005). Framework for regression-based missing data imputation methods in on-line MSPC. Journal of Chemometrics, 19(8), 439-447. doi:10.1002/cem.946 es_ES
dc.description.references Zhang, P. (1993). Model Selection Via Multifold Cross Validation. The Annals of Statistics, 21(1), 299-313. doi:10.1214/aos/1176349027 es_ES
dc.description.references Louwerse, D. J., Smilde, A. K., & Kiers, H. A. L. (1999). Cross-validation of multiway component models. Journal of Chemometrics, 13(5), 491-510. doi:10.1002/(sici)1099-128x(199909/10)13:5<491::aid-cem537>3.0.co;2-2 es_ES
dc.description.references Lei, F., Rotbøll, M., & Jørgensen, S. B. (2001). A biochemically structured model for Saccharomyces cerevisiae. Journal of Biotechnology, 88(3), 205-221. doi:10.1016/s0168-1656(01)00269-3 es_ES
dc.description.references López, F., Miguel Valiente, J., Manuel Prats, J., & Ferrer, A. (2008). Performance evaluation of soft color texture descriptors for surface grading using experimental design and logistic regression. Pattern Recognition, 41(5), 1744-1755. doi:10.1016/j.patcog.2007.09.011 es_ES
dc.description.references Camacho, J., Picó, J., & Ferrer, A. (2010). Data understanding with PCA: Structural and Variance Information plots. Chemometrics and Intelligent Laboratory Systems, 100(1), 48-56. doi:10.1016/j.chemolab.2009.10.005 es_ES
dc.description.references Mercer, A. M., & Mercer, P. R. (2000). Cauchy’s interlace theorem and lower bounds for the spectral radius. International Journal of Mathematics and Mathematical Sciences, 23(8), 563-566. doi:10.1155/s016117120000257x es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem