- -

Applying probabilistic temporal and multi-site data quality control methods to a public health mortality registry in Spain: A systematic approach to quality control of repositories

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Applying probabilistic temporal and multi-site data quality control methods to a public health mortality registry in Spain: A systematic approach to quality control of repositories

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Sáez Silvestre, Carlos es_ES
dc.contributor.author Zurriaga, Oscar es_ES
dc.contributor.author Pérez-Panadés, Jordi es_ES
dc.contributor.author Melchor, Inma es_ES
dc.contributor.author Robles Viejo, Montserrat es_ES
dc.contributor.author García Gómez, Juan Miguel es_ES
dc.date.accessioned 2017-05-11T08:46:41Z
dc.date.available 2017-05-11T08:46:41Z
dc.date.issued 2016-04-23
dc.identifier.issn 1067-5027
dc.identifier.uri http://hdl.handle.net/10251/80897
dc.description.abstract OBJECTIVE: To assess the variability in data distributions among data sources and over time through a case study of a large multisite repository as a systematic approach to data quality (DQ). MATERIALS AND METHODS: Novel probabilistic DQ control methods based on information theory and geometry are applied to the Public Health Mortality Registry of the Region of Valencia, Spain, with 512 143 entries from 2000 to 2012, disaggregated into 24 health departments. The methods provide DQ metrics and exploratory visualizations for (1) assessing the variability among multiple sources and (2) monitoring and exploring changes with time. The methods are suited to big data and multitype, multivariate, and multimodal data. RESULTS: The repository was partitioned into 2 probabilistically separated temporal subgroups following a change in the Spanish National Death Certificate in 2009. Punctual temporal anomalies were noticed due to a punctual increment in the missing data, along with outlying and clustered health departments due to differences in populations or in practices. DISCUSSION: Changes in protocols, differences in populations, biased practices, or other systematic DQ problems affected data variability. Even if semantic and integration aspects are addressed in data sharing infrastructures, probabilistic variability may still be present. Solutions include fixing or excluding data and analyzing different sites or time periods separately. A systematic approach to assessing temporal and multisite variability is proposed. CONCLUSION: Multisite and temporal variability in data distributions affects DQ, hindering data reuse, and an assessment of such variability should be a part of systematic DQ procedures. es_ES
dc.description.sponsorship This work was supported by the Spanish Ministry of Economy and Competitiveness grant numbers RTC-2014-1530-1 and TIN-2013-43457-R, and by the Universitat Politecnica de Valencia grant number SP20141432. en_EN
dc.language Inglés es_ES
dc.publisher Oxford University Press (OUP) es_ES
dc.relation.ispartof Journal of the American Medical Informatics Association es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject data mining es_ES
dc.subject data monitoring es_ES
dc.subject data quality es_ES
dc.subject data reuse es_ES
dc.subject multisite repositories es_ES
dc.subject statistical data analysis es_ES
dc.subject.classification ESTADISTICA E INVESTIGACION OPERATIVA es_ES
dc.subject.classification FISICA APLICADA es_ES
dc.title Applying probabilistic temporal and multi-site data quality control methods to a public health mortality registry in Spain: A systematic approach to quality control of repositories es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1093/jamia/ocw010
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//RTC-2014-1530-1Q4618002BC.VALENCIANA/ES/Servicio de evaluación y rating de la calidad de repositorios de datos biomédicos/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/692023/EU/Linking excellence in biomedical knowledge and computational intelligence research for personalized management of CVD within PHC/
dc.relation.projectID info:eu-repo/grantAgreement/UPV//SP20141432/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2013-43457-R/ES/CARACTERIZACION DE FIRMAS BIOLOGICAS DE GLIOBLASTOMAS MEDIANTE MODELOS NO-SUPERVISADOS DE PREDICCION ESTRUCTURADA BASADOS EN BIOMARCADORES DE IMAGEN/
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escuela Técnica Superior de Ingenieros Industriales - Escola Tècnica Superior d'Enginyers Industrials es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Sáez Silvestre, C.; Zurriaga, O.; Pérez-Panadés, J.; Melchor, I.; Robles Viejo, M.; García Gómez, JM. (2016). Applying probabilistic temporal and multi-site data quality control methods to a public health mortality registry in Spain: A systematic approach to quality control of repositories. Journal of the American Medical Informatics Association. 23(6):1085-1095. https://doi.org/10.1093/jamia/ocw010 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1093/jamia/ocw010 es_ES
dc.description.upvformatpinicio 1085 es_ES
dc.description.upvformatpfin 1095 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 23 es_ES
dc.description.issue 6 es_ES
dc.relation.senia 307797 es_ES
dc.identifier.eissn 1527-974X
dc.identifier.pmid 27107447
dc.contributor.funder Universitat Politècnica de València es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem