Mostrar el registro sencillo del ítem
dc.contributor.author | Sáez Silvestre, Carlos | es_ES |
dc.contributor.author | Romero, Nekane | es_ES |
dc.contributor.author | Conejero, J. Alberto | es_ES |
dc.contributor.author | Garcia-Gomez, Juan M | es_ES |
dc.date.accessioned | 2022-11-14T19:02:07Z | |
dc.date.available | 2022-11-14T19:02:07Z | |
dc.date.issued | 2021-02 | es_ES |
dc.identifier.issn | 1067-5027 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/189724 | |
dc.description.abstract | [EN] Objective: The lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. Materials and Methods: We used the publicly available nCov2019 dataset, including patient-level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities. Results: Cases from the 2 countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting. Conclusions: Data source variability is a potential contributor to bias in distributed research networks. We call for systematic assessment and reporting of data source variability and data quality in COVID-19 data sharing, as key information for reliable and generalizable machine learning. | es_ES |
dc.description.sponsorship | This work was supported by Universitat Politecnica de Valencia contract no. UPV-SUB.2-1302 and FONDO SUPERA COVID-19 by CRUE-Santander Bank grant "Severity Subgroup Discovery and Classification on COVID-19 Real World Data through Machine Learning and Data Quality assessment (SUBCOVERWD-19)." | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Oxford University Press | es_ES |
dc.relation.ispartof | Journal of the American Medical Informatics Association | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | COVID-19 | es_ES |
dc.subject | Data quality | es_ES |
dc.subject | Machine learning | es_ES |
dc.subject | Biases | es_ES |
dc.subject | Data sharing | es_ES |
dc.subject | Distributed research networks | es_ES |
dc.subject | Multi-site data | es_ES |
dc.subject | Variability | es_ES |
dc.subject | Heterogeneity | es_ES |
dc.subject | Dataset shift | es_ES |
dc.subject.classification | MATEMATICA APLICADA | es_ES |
dc.subject.classification | FISICA APLICADA | es_ES |
dc.title | Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1093/jamia/ocaa258 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/UPV//UPV-SUB.2-1302/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escuela Técnica Superior de Ingenieros Industriales - Escola Tècnica Superior d'Enginyers Industrials | es_ES |
dc.description.bibliographicCitation | Sáez Silvestre, C.; Romero, N.; Conejero, JA.; Garcia-Gomez, JM. (2021). Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset. Journal of the American Medical Informatics Association. 28(2):360-364. https://doi.org/10.1093/jamia/ocaa258 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1093/jamia/ocaa258 | es_ES |
dc.description.upvformatpinicio | 360 | es_ES |
dc.description.upvformatpfin | 364 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 28 | es_ES |
dc.description.issue | 2 | es_ES |
dc.identifier.pmid | 33027509 | es_ES |
dc.identifier.pmcid | PMC7797735 | es_ES |
dc.relation.pasarela | S\435767 | es_ES |
dc.contributor.funder | BANCO SANTANDER, S.A. | es_ES |
dc.contributor.funder | Universitat Politècnica de València | es_ES |
dc.subject.ods | 03.- Garantizar una vida saludable y promover el bienestar para todos y todas en todas las edades | es_ES |
dc.subject.ods | 10.- Reducir las desigualdades entre países y dentro de ellos | es_ES |