- -

Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Hayet-Otero, Miren es_ES
dc.contributor.author García-García, Fernando es_ES
dc.contributor.author Lee, Dae-Jin es_ES
dc.contributor.author Martínez-Minaya, Joaquín es_ES
dc.contributor.author España Yandiola, Pedro Pablo es_ES
dc.contributor.author Urrutia Landa, Isabel es_ES
dc.contributor.author Nieves Ermecheo, Mónica es_ES
dc.contributor.author Quintana, José María es_ES
dc.contributor.author Menéndez, Rosario es_ES
dc.contributor.author Torres, Antoni es_ES
dc.contributor.author Zalacain Jorge, Rafael es_ES
dc.contributor.author Arostegui, Inmaculada es_ES
dc.date.accessioned 2024-10-23T18:08:56Z
dc.date.available 2024-10-23T18:08:56Z
dc.date.issued 2023-04-13 es_ES
dc.identifier.issn 1932-6203 es_ES
dc.identifier.uri http://hdl.handle.net/10251/210810
dc.description.abstract [EN] With the COVID-19 pandemic having caused unprecedented numbers of infections and deaths, large research efforts have been undertaken to increase our understanding of the disease and the factors which determine diverse clinical evolutions. Here we focused on a fully data-driven exploration regarding which factors (clinical or otherwise) were most informative for SARS-CoV-2 pneumonia severity prediction via machine learning (ML). In particular, feature selection techniques (FS), designed to reduce the dimensionality of data, allowed us to characterize which of our variables were the most useful for ML prognosis. We conducted a multi-centre clinical study, enrolling n = 1548 patients hospitalized due to SARS-CoV-2 pneumonia: where 792, 238, and 598 patients experienced low, medium and high-severity evolutions, respectively. Up to 106 patient-specific clinical variables were collected at admission, although 14 of them had to be discarded for containing > 60% missing values. Alongside 7 socioeconomic attributes and 32 exposures to air pollution (chronic and acute), these became d = 148 features after variable encoding. We addressed this ordinal classification problem both as a ML classification and regression task. Two imputation techniques for missing data were explored, along with a total of 166 unique FS algorithm configurations: 46 filters, 100 wrappers and 20 embeddeds. Of these, 21 setups achieved satisfactory bootstrap stability (> 0.70) with reasonable computation times: 16 filters, 2 wrappers, and 3 embeddeds. The subsets of features selected by each technique showed modest Jaccard similarities across them. However, they consistently pointed out the importance of certain explanatory variables. Namely: patient's C-reactive protein (CRP), pneumonia severity index (PSI), respiratory rate (RR) and oxygen levels -saturation Sp O2, quotients Sp O2/RR and arterial Sat O2/Fi O2-, the neutrophil-to-lymphocyte ratio (NLR) -to certain extent, also neutrophil and lymphocyte counts separately-, lactate dehydrogenase (LDH), and procalcitonin (PCT) levels in blood. A remarkable agreement has been found a posteriori between our strategy and independent clinical research works investigating risk factors for COVID-19 severity. Hence, these findings stress the suitability of this type of fully data-driven approaches for knowledge extraction, as a complementary to clinical perspectives. es_ES
dc.description.sponsorship This research is supported by the Spanish State Research Agency AEI under the project S3M1P4R PID2020-115882RB-I00, as well as by the Basque Government EJ-GV under the grant 'Artificial Intelligence in BCAM' 2019/00432, under the strategy 'Mathematical Modelling Applied to Health', and under the BERC 2018-2021 and 2022-2025 programmes, and also by the Spanish Ministry of Science and Innovation: BCAM Severo Ochoa accreditation CEX2021-001142-S/MICIN/AEI/10.13039/501100011033. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. es_ES
dc.language Inglés es_ES
dc.publisher Public Library of Science es_ES
dc.relation.ispartof PLoS ONE es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject COVID-19 pandemic es_ES
dc.subject SARS-CoV-2 pneumonia es_ES
dc.subject Machine learning (ML) es_ES
dc.subject Feature selection (FS) es_ES
dc.subject Pneumonia severity prediction es_ES
dc.subject Clinical variables es_ES
dc.subject.classification ESTADISTICA E INVESTIGACION OPERATIVA es_ES
dc.title Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1371/journal.pone.0284150 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-115882RB-I00/ES/NUEVAS PROPUESTAS PARA LA ESTIMACION, PREDICCION Y VALIDACION DE MODELOS SEMIPARAMETRICOS PARA EL ANALISIS DE DATOS COMPLEJOS CON APLICACIONES EN SALUD Y CAMBIO CLIMATICO/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/Eusko Jaurlaritza//2019%2F00432/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI//CEX2021-001142-S / es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Facultad de Administración y Dirección de Empresas - Facultat d'Administració i Direcció d'Empreses es_ES
dc.description.bibliographicCitation Hayet-Otero, M.; García-García, F.; Lee, D.; Martínez-Minaya, J.; España Yandiola, PP.; Urrutia Landa, I.; Nieves Ermecheo, M.... (2023). Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques. PLoS ONE. 18(4). https://doi.org/10.1371/journal.pone.0284150 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1371/journal.pone.0284150 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 18 es_ES
dc.description.issue 4 es_ES
dc.identifier.pmid 37053151 es_ES
dc.identifier.pmcid PMC10101453 es_ES
dc.relation.pasarela S\487629 es_ES
dc.contributor.funder Eusko Jaurlaritza es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.contributor.funder Ministerio de Ciencia, Innovación y Universidades es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem