- -

EHRtemporalVariability: delineating temporal data-set shifts in electronic health records

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

EHRtemporalVariability: delineating temporal data-set shifts in electronic health records

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Sáez Silvestre, Carlos es_ES
dc.contributor.author Gutiérrez-Sacristán, Alba es_ES
dc.contributor.author Kohane, Isaac es_ES
dc.contributor.author Garcia-Gomez, Juan M es_ES
dc.contributor.author Avillach, Paul es_ES
dc.date.accessioned 2021-05-28T03:33:51Z
dc.date.available 2021-05-28T03:33:51Z
dc.date.issued 2020-07-30 es_ES
dc.identifier.uri http://hdl.handle.net/10251/166908
dc.description.abstract [EN] Background: Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly complicated to address in multimodal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large sets of historical data from EHRs, there is a need for specific software methods to help delineate temporal data-set shifts to ensure reliable data reuse. Results: EHRtemporalVariability is an open-source R package and Shiny app designed to explore and identify temporal data-set shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time; projects their temporal evolution through non-parametric information geometric temporal plots; and enables the exploration of changes in variables through data temporal heat maps. We demonstrate the capability of EHRtemporalVariability to delineate data-set shifts in three impact case studies, one of which is available for reproducibility. Conclusions: EHRtemporalVariability enables the exploration and identification of data-set shifts, contributing to the broad examination and repurposing of large, longitudinal data sets. Our goal is to help ensure reliable data reuse for a wide range of biomedical data users. EHRtemporalVariability is designed for technical users who are programmatically utilizing the R package, as well as users who are not familiar with programming via the Shiny user interface. es_ES
dc.description.sponsorship This work was supported by Universitat Politecnica de Valencia grant PAID-00-17, Generalitat Valenciana grant BEST/2018, and projects H2020-SC1-2016-CNECT No. 727560 and H2020-SC1-BHC-2018-2020 No. 825750 es_ES
dc.language Inglés es_ES
dc.publisher Oxford University Press es_ES
dc.relation.ispartof GigaScience es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Data-set shifts es_ES
dc.subject Data quality es_ES
dc.subject Temporal variability es_ES
dc.subject Scientific data sets es_ES
dc.subject Electronic health records es_ES
dc.subject Claims data es_ES
dc.subject Research repositories es_ES
dc.subject Information geometry es_ES
dc.subject Visual analytics es_ES
dc.subject R package es_ES
dc.subject.classification FISICA APLICADA es_ES
dc.title EHRtemporalVariability: delineating temporal data-set shifts in electronic health records es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1093/gigascience/giaa079 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/727560/EU/Collective wisdom driving public health policies/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/UPV//PAID-00-17/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/825750/EU/Patient-centred pathways of early palliative care, supportive ecosystems and appraisal standard/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Física Aplicada - Departament de Física Aplicada es_ES
dc.description.bibliographicCitation Sáez Silvestre, C.; Gutiérrez-Sacristán, A.; Kohane, I.; Garcia-Gomez, JM.; Avillach, P. (2020). EHRtemporalVariability: delineating temporal data-set shifts in electronic health records. GigaScience. 9(8):1-7. https://doi.org/10.1093/gigascience/giaa079 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1093/gigascience/giaa079 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 7 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 9 es_ES
dc.description.issue 8 es_ES
dc.identifier.eissn 2047-217X es_ES
dc.identifier.pmid 32729900 es_ES
dc.identifier.pmcid PMC7391413 es_ES
dc.relation.pasarela S\418366 es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
dc.description.references Gewin, V. (2016). Data sharing: An open mind on open data. Nature, 529(7584), 117-119. doi:10.1038/nj7584-117a es_ES
dc.description.references Katzan, I. L., & Rudick, R. A. (2012). Time to Integrate Clinical and Research Informatics. Science Translational Medicine, 4(162). doi:10.1126/scitranslmed.3004583 es_ES
dc.description.references Zhu, L., & Zheng, W. J. (2018). Informatics, Data Science, and Artificial Intelligence. JAMA, 320(11), 1103. doi:10.1001/jama.2018.8211 es_ES
dc.description.references Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine Learning in Medicine. New England Journal of Medicine, 380(14), 1347-1358. doi:10.1056/nejmra1814259 es_ES
dc.description.references Andreu-Perez, J., Poon, C. C. Y., Merrifield, R. D., Wong, S. T. C., & Yang, G.-Z. (2015). Big Data for Health. IEEE Journal of Biomedical and Health Informatics, 19(4), 1193-1208. doi:10.1109/jbhi.2015.2450362 es_ES
dc.description.references Sáez, C., Rodrigues, P. P., Gama, J., Robles, M., & García-Gómez, J. M. (2014). Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality. Data Mining and Knowledge Discovery, 29(4), 950-975. doi:10.1007/s10618-014-0378-6 es_ES
dc.description.references Schlegel, D. R., & Ficheur, G. (2017). Secondary Use of Patient Data: Review of the Literature Published in 2016. Yearbook of Medical Informatics, 26(01), 68-71. doi:10.15265/iy-2017-032 es_ES
dc.description.references Agniel, D., Kohane, I. S., & Weber, G. M. (2018). Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ, k1479. doi:10.1136/bmj.k1479 es_ES
dc.description.references Sáez, C., & García-Gómez, J. M. (2018). Kinematics of Big Biomedical Data to characterize temporal variability and seasonality of data repositories: Functional Data Analysis of data temporal evolution over non-parametric statistical manifolds. International Journal of Medical Informatics, 119, 109-124. doi:10.1016/j.ijmedinf.2018.09.015 es_ES
dc.description.references Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., … Irizarry, R. A. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11(10), 733-739. doi:10.1038/nrg2825 es_ES
dc.description.references Goh, W. W. B., Wang, W., & Wong, L. (2017). Why Batch Effects Matter in Omics Data, and How to Avoid Them. Trends in Biotechnology, 35(6), 498-507. doi:10.1016/j.tibtech.2017.02.012 es_ES
dc.description.references Sáez, C., Zurriaga, O., Pérez-Panadés, J., Melchor, I., Robles, M., & García-Gómez, J. M. (2016). Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories. Journal of the American Medical Informatics Association, 23(6), 1085-1095. doi:10.1093/jamia/ocw010 es_ES
dc.description.references Wright, A., Ash, J. S., Aaron, S., Ai, A., Hickman, T.-T. T., Wiesen, J. F., … Sittig, D. F. (2018). Best practices for preventing malfunctions in rule-based clinical decision support alerts and reminders: Results of a Delphi study. International Journal of Medical Informatics, 118, 78-85. doi:10.1016/j.ijmedinf.2018.08.001 es_ES
dc.description.references Moreno-Torres, J. G., Raeder, T., Alaiz-Rodríguez, R., Chawla, N. V., & Herrera, F. (2012). A unifying view on dataset shift in classification. Pattern Recognition, 45(1), 521-530. doi:10.1016/j.patcog.2011.06.019 es_ES
dc.description.references Svolba, G., & Bauer, P. (1999). Statistical Quality Control in Clinical Trials. Controlled Clinical Trials, 20(6), 519-530. doi:10.1016/s0197-2456(99)00029-x es_ES
dc.description.references Bray, F., & Parkin, D. M. (2009). Evaluation of data quality in the cancer registry: Principles and methods. Part I: Comparability, validity and timeliness. European Journal of Cancer, 45(5), 747-755. doi:10.1016/j.ejca.2008.11.032 es_ES
dc.description.references Springate, D. A., Parisi, R., Olier, I., Reeves, D., & Kontopantelis, E. (2017). rEHR: An R package for manipulating and analysing Electronic Health Record data. PLOS ONE, 12(2), e0171784. doi:10.1371/journal.pone.0171784 es_ES
dc.description.references Choi, L., Carroll, R. J., Beck, C., Mosley, J. D., Roden, D. M., Denny, J. C., & Van Driest, S. L. (2018). Evaluating statistical approaches to leverage large clinical datasets for uncovering therapeutic and adverse medication effects. Bioinformatics, 34(17), 2988-2996. doi:10.1093/bioinformatics/bty306 es_ES
dc.description.references Gutiérrez-Sacristán, A., Bravo, À., Giannoula, A., Mayer, M. A., Sanz, F., & Furlong, L. I. (2018). comoRbidity: an R package for the systematic analysis of disease comorbidities. Bioinformatics, 34(18), 3228-3230. doi:10.1093/bioinformatics/bty315 es_ES
dc.description.references Denny, J. C., Bastarache, L., Ritchie, M. D., Carroll, R. J., Zink, R., Mosley, J. D., … Roden, D. M. (2013). Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nature Biotechnology, 31(12), 1102-1111. doi:10.1038/nbt.2749 es_ES
dc.description.references Khera, R., Dorsey, K. B., & Krumholz, H. M. (2018). Transition to the ICD-10 in the United States. JAMA, 320(2), 133. doi:10.1001/jama.2018.6823 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem