Mostrar el registro sencillo del ítem
dc.contributor.author | Camacho Páez, José | es_ES |
dc.contributor.author | Ferrer Riquelme, Alberto José | es_ES |
dc.date.accessioned | 2016-05-26T12:08:42Z | |
dc.date.available | 2016-05-26T12:08:42Z | |
dc.date.issued | 2012-07 | |
dc.identifier.issn | 0886-9383 | |
dc.identifier.uri | http://hdl.handle.net/10251/64795 | |
dc.description.abstract | [EN] Cross-validation has become one of the principal methods to adjust the meta-parameters in predictive models. Extensions of the cross-validation idea have been proposed to select the number of components in principal components analysis (PCA). The element-wise k-fold (ekf) cross-validation is among the most used algorithms for principal components analysis cross-validation. This is the method programmed in the PLS_Toolbox, and it has been stated to outperform other methods under most circumstances in a numerical experiment. The ekf algorithm is based on missing data imputation, and it can be programmed using any method for this purpose. In this paper, the ekf algorithm with the simplest missing data imputation method, trimmed score imputation, is analyzed. A theoretical study is driven to identify in which situations the application of ekf is adequate and, more importantly, in which situations it is not. The results presented show that the ekf method may be unable to assess the extent to which a model represents a test set and may lead to discard principal components with important information. On a second paper of this series, other imputation methods are studied within the ekf algorithm | es_ES |
dc.description.sponsorship | Research in this area is partially supported by the Spanish Ministry of Economy and Competitiveness and FEDER funds from the European Union through grant DPI2011-28112-C04-02. Jose Camacho was funded by the Juan de la Cierva program, Ministry of Science and Innovation, Spain. This study was carried out when Jose Camacho was at the Universidad Politecnica de Valencia and Universitat de Girona, Spain. | en_EN |
dc.language | Inglés | es_ES |
dc.publisher | Wiley | es_ES |
dc.relation.ispartof | Journal of Chemometrics | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Principal component analysis | es_ES |
dc.subject | Number of components | es_ES |
dc.subject | Cross-validation | es_ES |
dc.subject | Missing data | es_ES |
dc.subject | Compression | es_ES |
dc.subject.classification | ESTADISTICA E INVESTIGACION OPERATIVA | es_ES |
dc.subject.classification | INGENIERIA DE SISTEMAS Y AUTOMATICA | es_ES |
dc.title | Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm: theoretical aspects | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1002/cem.2440 | |
dc.relation.projectID | info:eu-repo/grantAgreement/MICINN//DPI2011-28112-C04-02/ES/MONITORIZACION, INFERENCIA, OPTIMIZACION Y CONTROL MULTI-ESCALA: DE CELULAS A BIORREACTORES. (MULTISCALES)/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Ingeniería de Sistemas y Automática - Departament d'Enginyeria de Sistemes i Automàtica | es_ES |
dc.description.bibliographicCitation | Camacho Páez, J.; Ferrer Riquelme, AJ. (2012). Cross-validation in PCA models with the element-wise k-fold (ekf) algorithm: theoretical aspects. Journal of Chemometrics. 26(1):361-373. https://doi.org/10.1002/cem.2440 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://dx.doi.org/10.1002/cem.2440 | es_ES |
dc.description.upvformatpinicio | 361 | es_ES |
dc.description.upvformatpfin | 373 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 26 | es_ES |
dc.description.issue | 1 | es_ES |
dc.relation.senia | 242711 | es_ES |
dc.contributor.funder | Ministerio de Ciencia e Innovación | es_ES |
dc.contributor.funder | Universitat de Girona | es_ES |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |
dc.contributor.funder | Universitat Politècnica de València | es_ES |
dc.description.references | Wold, S. (1978). Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models. Technometrics, 20(4), 397-405. doi:10.1080/00401706.1978.10489693 | es_ES |
dc.description.references | Eastment, H. T., & Krzanowski, W. J. (1982). Cross-Validatory Choice of the Number of Components From a Principal Component Analysis. Technometrics, 24(1), 73-77. doi:10.1080/00401706.1982.10487712 | es_ES |
dc.description.references | Nomikos, P., & MacGregor, J. F. (1995). Multivariate SPC Charts for Monitoring Batch Processes. Technometrics, 37(1), 41-59. doi:10.1080/00401706.1995.10485888 | es_ES |
dc.description.references | Bro, R., Kjeldahl, K., Smilde, A. K., & Kiers, H. A. L. (2008). Cross-validation of component models: A critical look at current methods. Analytical and Bioanalytical Chemistry, 390(5), 1241-1251. doi:10.1007/s00216-007-1790-1 | es_ES |
dc.description.references | Wise BM Gallagher NB Bro R Shaver JM Windig W Koch RS PLSToolbox 3.5 for use with Matlab 2005 | es_ES |
dc.description.references | Nelson, P. R. C., Taylor, P. A., & MacGregor, J. F. (1996). Missing data methods in PCA and PLS: Score calculations with incomplete observations. Chemometrics and Intelligent Laboratory Systems, 35(1), 45-65. doi:10.1016/s0169-7439(96)00007-x | es_ES |
dc.description.references | Arteaga, F., & Ferrer, A. (2002). Dealing with missing data in MSPC: several methods, different interpretations, some examples. Journal of Chemometrics, 16(8-10), 408-418. doi:10.1002/cem.750 | es_ES |
dc.description.references | Arteaga, F., & Ferrer, A. (2005). Framework for regression-based missing data imputation methods in on-line MSPC. Journal of Chemometrics, 19(8), 439-447. doi:10.1002/cem.946 | es_ES |
dc.description.references | Zhang, P. (1993). Model Selection Via Multifold Cross Validation. The Annals of Statistics, 21(1), 299-313. doi:10.1214/aos/1176349027 | es_ES |
dc.description.references | Louwerse, D. J., Smilde, A. K., & Kiers, H. A. L. (1999). Cross-validation of multiway component models. Journal of Chemometrics, 13(5), 491-510. doi:10.1002/(sici)1099-128x(199909/10)13:5<491::aid-cem537>3.0.co;2-2 | es_ES |
dc.description.references | Lei, F., Rotbøll, M., & Jørgensen, S. B. (2001). A biochemically structured model for Saccharomyces cerevisiae. Journal of Biotechnology, 88(3), 205-221. doi:10.1016/s0168-1656(01)00269-3 | es_ES |
dc.description.references | López, F., Miguel Valiente, J., Manuel Prats, J., & Ferrer, A. (2008). Performance evaluation of soft color texture descriptors for surface grading using experimental design and logistic regression. Pattern Recognition, 41(5), 1744-1755. doi:10.1016/j.patcog.2007.09.011 | es_ES |
dc.description.references | Camacho, J., Picó, J., & Ferrer, A. (2010). Data understanding with PCA: Structural and Variance Information plots. Chemometrics and Intelligent Laboratory Systems, 100(1), 48-56. doi:10.1016/j.chemolab.2009.10.005 | es_ES |
dc.description.references | Mercer, A. M., & Mercer, P. R. (2000). Cauchy’s interlace theorem and lower bounds for the spectral radius. International Journal of Mathematics and Mathematical Sciences, 23(8), 563-566. doi:10.1155/s016117120000257x | es_ES |