- -

Exploring Hybrid Parallel Systems for Probabilistic Record Linkage

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Exploring Hybrid Parallel Systems for Probabilistic Record Linkage

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Boratto, Murilo es_ES
dc.contributor.author Alonso-Jordá, Pedro es_ES
dc.contributor.author Pinto, Clicia es_ES
dc.contributor.author Melo, Pedro es_ES
dc.contributor.author Barreto, Marcos es_ES
dc.contributor.author Denaxas, Spiros es_ES
dc.date.accessioned 2020-07-15T03:32:17Z
dc.date.available 2020-07-15T03:32:17Z
dc.date.issued 2019-03 es_ES
dc.identifier.issn 0920-8542 es_ES
dc.identifier.uri http://hdl.handle.net/10251/148002
dc.description.abstract [EN] Record linkage is a technique widely used to gather data stored in disparate data sources that presumably pertain to the same real world entity. This integration can be done deterministically or probabilistically, depending on the existence of common key attributes among all data sources involved. The probabilistic approach is very time-consuming due to the amount of records that must be compared, specifically in big data scenarios. In this paper, we propose and evaluate a methodology that simultaneously exploits multicore and multi-GPU architectures in order to perform the probabilistic linkage of large-scale Brazilian governmental databases. We present some algorithmic optimizations that provide high accuracy and improve performance by defining the best algorithm-architecture combination for a problem given its input size. We also discuss performance results obtained with different data samples, showing that a hybrid approach outperforms other configurations, providing an average speedup of 7.9 when linking up to 20.000 million records. es_ES
dc.description.sponsorship This work has been partially supported by CNPq, FAPESB, Bill & Melinda Gates Foundation, The Royal Society (UK), Medical Research Council (UK), NVIDIA Hardware Grant Program, Generalitat Valenciana (Grant PROMETEOII/2014/003), Spanish Government and European Commission through TEC2015-67387-C4-1-R (MINECO/FEDER), and network CAPAP-H. We have also worked in cooperation with the EU-COST Programme Action IC1305, "Network for Sustainable Ultrascale Computing (NESUS) es_ES
dc.language Inglés es_ES
dc.publisher Springer-Verlag es_ES
dc.relation.ispartof The Journal of Supercomputing es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Probabilistic linkage es_ES
dc.subject Public health es_ES
dc.subject Performance evaluation es_ES
dc.subject Multicore es_ES
dc.subject Multi-GPU es_ES
dc.subject.classification CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL es_ES
dc.title Exploring Hybrid Parallel Systems for Probabilistic Record Linkage es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s11227-018-2328-3 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/COST//IC1305/EU/Network for Sustainable Ultrascale Computing (NESUS)/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2014%2F003/ES/Computación y comunicaciones de altas prestaciones y aplicaciones en ingeniería/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TEC2015-67387-C4-1-R/ES/SMART SOUND PROCESSING FOR THE DIGITAL LIVING/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Boratto, M.; Alonso-Jordá, P.; Pinto, C.; Melo, P.; Barreto, M.; Denaxas, S. (2019). Exploring Hybrid Parallel Systems for Probabilistic Record Linkage. The Journal of Supercomputing. 75:1137-1149. https://doi.org/10.1007/s11227-018-2328-3 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1007/s11227-018-2328-3 es_ES
dc.description.upvformatpinicio 1137 es_ES
dc.description.upvformatpfin 1149 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 75 es_ES
dc.relation.pasarela S\382115 es_ES
dc.contributor.funder Royal Society, Reino Unido es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Bill and Melinda Gates Foundation es_ES
dc.contributor.funder Medical Research Council, Reino Unido es_ES
dc.contributor.funder European Cooperation in Science and Technology es_ES
dc.contributor.funder Fundação de Amparo à Pesquisa do Estado da Bahia es_ES
dc.contributor.funder Conselho Nacional de Desenvolvimento Científico e Tecnológico, Brasil es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.description.references Andrade G, Viegas F, Ramos GS, Almeida J, Rocha L, Gonçalves M, Ferreira R (2013) GPU-NB: a fast CUDA-based implementation of Naïve Bayes. In: 2013 25th International Symposium on Computer Architecture and High Performance Computing, pp 168–175 es_ES
dc.description.references Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426 es_ES
dc.description.references Cook S (2013) CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, 1st edn. Morgan Kaufmann, San Francisco es_ES
dc.description.references Doan A, Halevy A, Ives Z (2012) Principles of Data Integration. Elsevier, Amsterdam es_ES
dc.description.references Étienne EY (2012) Hyper-threading. TurbsPublishing, Saarbrücken es_ES
dc.description.references Fellegi IP, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64:1183–1210 es_ES
dc.description.references Feng X, Jin H, Zheng R, Zhu L (2014) Near-duplicate detection using GPU-based simhash scheme. In: 2014 International Conference on Smart Computing, pp 223–228 es_ES
dc.description.references Forchhammer B, Papenbrock T, Stening T, Viehmeier S, Naumann U.D.F (2013) Duplicate detection on GPUs. In: BTW. Köllen-Verlag, pp 165–184 es_ES
dc.description.references Kim H.s, Lee D (2007) Parallel linkage. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007. ACM, New York, NY, USA, pp 283–292 es_ES
dc.description.references Mamun AA, Aseltine R, Rajasekaran S (2015) RLT-S: a web system for record linkage. PLoS ONE 10(5):1–9 es_ES
dc.description.references Mamun AA, Aseltine R, Rajasekaran S (2016) Efficient record linkage algorithms using complete linkage clustering. PLoS ONE 11(4):1–21 es_ES
dc.description.references Mamun AA, Mi T, Aseltine R, Rajasekaran S (2014) Efficient sequential and parallel algorithms for record linkage. J Am Med Inform Assoc 21(2):252–262 es_ES
dc.description.references Mizell E, Biery R (2017) How GPUs are defining the future of data analytics es_ES
dc.description.references Munshi A, Gaster B, Mattson TG, Fung J, Ginsburg D (2011) OpenCL Programming Guide, 1st edn. Addison-Wesley, Reading es_ES
dc.description.references NVIDIA Corporation: NVIDIA CUDA C programming guide (2010). Version 3.2 es_ES
dc.description.references OpenMP Architecture Review Board: OpenMP application program interface version 4.0 (2013) es_ES
dc.description.references Pokorny J (2011) NoSQL databases: a step to database scalability in web environment. In: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, iiWAS ’11. ACM, New York, NY, USA, pp 278–283 es_ES
dc.description.references Rendle S, Schmidt-Thieme L (2008) Scaling Record Linkage to Non-uniform Distributed Class Sizes. Springer, Berlin, pp 308–319 es_ES
dc.description.references Sehili Z, Kolb L, Borgs C, Schnell R, Rahm E (2015) Privacy preserving record linkage with ppjoin. In: Datenbanksysteme für Business, Technologie und Web (BTW), pp 85–104 es_ES
dc.description.references Winkler WE (1999) The state of record linkage and current research problems es_ES
dc.description.references Zhong Z, Rychkov V, Lastovetsky A (2015) Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans Comput 64(9):2506–2518 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem