Souza, J.; Caballero, I.; Vasco Santos, J.; Lobo, M.; Pinto, A.; Viana, J.; Sáez Silvestre, C.... (2022). Multisource and temporal variability in Portuguese hospital administrative datasets: Data quality implications. Journal of Biomedical Informatics. 136:1-11. https://doi.org/10.1016/j.jbi.2022.104242
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/194472
Título:
|
Multisource and temporal variability in Portuguese hospital administrative datasets: Data quality implications
|
Autor:
|
Souza, Júlio
Caballero, Ismael
Vasco Santos, Joao
Lobo, Mariana
Pinto, Andreia
Viana,Joao
Sáez Silvestre, Carlos
Lopes, Fernando
Freitas, Alberto
|
Entidad UPV:
|
Universitat Politècnica de València. Escuela Técnica Superior de Ingenieros Industriales - Escola Tècnica Superior d'Enginyers Industrials
|
Fecha difusión:
|
|
Resumen:
|
[EN] Background: Unexpected variability across healthcare datasets may indicate data quality issues and thereby affect the credibility of these data for reutilization. No gold-standard reference dataset or methods for ...[+]
[EN] Background: Unexpected variability across healthcare datasets may indicate data quality issues and thereby affect the credibility of these data for reutilization. No gold-standard reference dataset or methods for variability assessment are usually available for these datasets. In this study, we aim to describe the process of discovering data quality implications by applying a set of methods for assessing variability between sources and over time in a large hospital database. Methods: We described and applied a set of multisource and temporal variability assessment methods in a large Portuguese hospitalization database, in which variation in condition-specific hospitalization ratios derived from clinically coded data were assessed between hospitals (sources) and over time. We identified condition-specific admissions using the Clinical Classification Software (CCS), developed by the Agency of Health Care Research and Quality. A Statistical Process Control (SPC) approach based on funnel plots of condition-specific standardized hospitalization ratios (SHR) was used to assess multisource variability, whereas temporal heat maps and Information-Geometric Temporal (IGT) plots were used to assess temporal variability by displaying temporal abrupt changes in data distributions. Results were presented for the 15 most common inpatient conditions (CCS) in Portugal. Main findings: Funnel plot assessment allowed the detection of several outlying hospitals whose SHRs were much lower or higher than expected. Adjusting SHR for hospital characteristics, beyond age and sex, considerably affected the degree of multisource variability for most diseases. Overall, probability distributions changed over time for most diseases, although heterogeneously. Abrupt temporal changes in data distributions for acute myocardial infarction and congestive heart failure coincided with the periods comprising the transition to the International Classification of Diseases, 10th revision, Clinical Modification, whereas changes in the DiagnosisRelated Groups software seem to have driven changes in data distributions for both acute myocardial infarction and liveborn admissions. The analysis of heat maps also allowed the detection of several discontinuities at hospital level over time, in some cases also coinciding with the aforementioned factors. Conclusions: This paper described the successful application of a set of reproducible, generalizable and systematic methods for variability assessment, including visualization tools that can be useful for detecting abnormal patterns in healthcare data, also addressing some limitations of common approaches. The presented method for multisource variability assessment is based on SPC, which is an advantage considering the lack of gold standard for such process. Properly controlling for hospital characteristics and differences in case-mix for estimating SHR is critical for isolating data quality-related variability among data sources. The use of IGT plots provides an advantage over common methods for temporal variability assessment due its suitability for multitype and multimodal data, which are common characteristics of healthcare data. The novelty of this work is the use of a set of methods to discover new data quality insights in healthcare data.
[-]
|
Palabras clave:
|
Data quality
,
Clinical coding
,
Data variability
,
Clinical classification software
,
International classification of diseases
|
Derechos de uso:
|
Reconocimiento - No comercial - Sin obra derivada (by-nc-nd)
|
Fuente:
|
Journal of Biomedical Informatics. (issn:
1532-0464
)
|
DOI:
|
10.1016/j.jbi.2022.104242
|
Editorial:
|
Elsevier
|
Versión del editor:
|
https://doi.org/10.1016/j.jbi.2022.104242
|
Código del Proyecto:
|
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-112540RB-C42/ES/UNA APROXIMACION HOLISTICA DE SMART DATA PARA EL ANALISIS DE DATOS GUIADO POR EL CONTEXTO CENTRADA EN LA CALIDAD Y LA SEGURIDAD /
info:eu-repo/grantAgreement/UPV//UPV-SUB.2-1302/
info:eu-repo/grantAgreement/FEDER//POCI-01-0145-FEDER-030766//1st.IndiQare-Quality indicators in primary health care: validation and implementation of quality indicators as an assessment and comparison tool/
info:eu-repo/grantAgreement/JCCM//SBPLY%2F17%2F180501%2F000293//GEMA-Generation and Evaluation of Models for Data Quality/
info:eu-repo/grantAgreement/JCCM//SBPLY%2F21%2F 180501%2F000061//ADAGIO Alarcos Data Governance framework and systems generation/
|
Agradecimientos:
|
The authors would like to thank the Central Authority for Health Services, I.P. (ACSS) for providing access to the data. The authors disclosed receipt of the following financial support for the research, authorship, and/or ...[+]
The authors would like to thank the Central Authority for Health Services, I.P. (ACSS) for providing access to the data. The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was financed by FEDER-Fundo Europeu de Desenvolvimento Regional funds through the COMPETE 2020-Operacional Programme for Competitiveness and Internationalisation (POCI) and by Portuguese funds through FCT- Fundacao para a Ciencia e a Tecnologia in the framework of the project POCI-01-0145-FEDER-030766 ("1st.IndiQare-Quality indicators in primary health care: validation and implementation of quality indicators as an assessment and comparison tool") . In addition, we would like to thank to projects GEMA (SBPLY/17/180501/000293) -Generation and Evaluation of Models for Data Quality, and ADAGIO (SBPLY/21/180501/000061) - Alarcos Data Governance framework and systems generation, both funded by the Department of Education, Culture and Sports of the JCCM and FEDER; and to AETHER-UCLM: A smart data holistic approach for context -aware data analytics focused on Quality and Security project (Ministerio de Ciencia e Innovacion, PID2020- 112540RB-C42) . CSS thanks the Universitat Politecnica de Valencia contract no. UPV-SUB.2-1302 and FONDO SUPERA COVID-19 by CRUE- Santander Bank grant "Severity Subgroup Discovery and Classification on COVID-19 Real World Data through Machine Learning and Data Quality assessment (SUBCOVERWD-19) ."
[-]
|
Tipo:
|
Artículo
|