Mostrar el registro sencillo del ítem
dc.contributor.author | Boratto, Murilo | es_ES |
dc.contributor.author | Alonso-Jordá, Pedro | es_ES |
dc.contributor.author | Pinto, Clicia | es_ES |
dc.contributor.author | Melo, Pedro | es_ES |
dc.contributor.author | Barreto, Marcos | es_ES |
dc.contributor.author | Denaxas, Spiros | es_ES |
dc.date.accessioned | 2020-07-15T03:32:17Z | |
dc.date.available | 2020-07-15T03:32:17Z | |
dc.date.issued | 2019-03 | es_ES |
dc.identifier.issn | 0920-8542 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/148002 | |
dc.description.abstract | [EN] Record linkage is a technique widely used to gather data stored in disparate data sources that presumably pertain to the same real world entity. This integration can be done deterministically or probabilistically, depending on the existence of common key attributes among all data sources involved. The probabilistic approach is very time-consuming due to the amount of records that must be compared, specifically in big data scenarios. In this paper, we propose and evaluate a methodology that simultaneously exploits multicore and multi-GPU architectures in order to perform the probabilistic linkage of large-scale Brazilian governmental databases. We present some algorithmic optimizations that provide high accuracy and improve performance by defining the best algorithm-architecture combination for a problem given its input size. We also discuss performance results obtained with different data samples, showing that a hybrid approach outperforms other configurations, providing an average speedup of 7.9 when linking up to 20.000 million records. | es_ES |
dc.description.sponsorship | This work has been partially supported by CNPq, FAPESB, Bill & Melinda Gates Foundation, The Royal Society (UK), Medical Research Council (UK), NVIDIA Hardware Grant Program, Generalitat Valenciana (Grant PROMETEOII/2014/003), Spanish Government and European Commission through TEC2015-67387-C4-1-R (MINECO/FEDER), and network CAPAP-H. We have also worked in cooperation with the EU-COST Programme Action IC1305, "Network for Sustainable Ultrascale Computing (NESUS) | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Springer-Verlag | es_ES |
dc.relation.ispartof | The Journal of Supercomputing | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Probabilistic linkage | es_ES |
dc.subject | Public health | es_ES |
dc.subject | Performance evaluation | es_ES |
dc.subject | Multicore | es_ES |
dc.subject | Multi-GPU | es_ES |
dc.subject.classification | CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL | es_ES |
dc.title | Exploring Hybrid Parallel Systems for Probabilistic Record Linkage | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1007/s11227-018-2328-3 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/COST//IC1305/EU/Network for Sustainable Ultrascale Computing (NESUS)/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2014%2F003/ES/Computación y comunicaciones de altas prestaciones y aplicaciones en ingeniería/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TEC2015-67387-C4-1-R/ES/SMART SOUND PROCESSING FOR THE DIGITAL LIVING/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.description.bibliographicCitation | Boratto, M.; Alonso-Jordá, P.; Pinto, C.; Melo, P.; Barreto, M.; Denaxas, S. (2019). Exploring Hybrid Parallel Systems for Probabilistic Record Linkage. The Journal of Supercomputing. 75:1137-1149. https://doi.org/10.1007/s11227-018-2328-3 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1007/s11227-018-2328-3 | es_ES |
dc.description.upvformatpinicio | 1137 | es_ES |
dc.description.upvformatpfin | 1149 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 75 | es_ES |
dc.relation.pasarela | S\382115 | es_ES |
dc.contributor.funder | Royal Society, Reino Unido | es_ES |
dc.contributor.funder | Generalitat Valenciana | es_ES |
dc.contributor.funder | Bill and Melinda Gates Foundation | es_ES |
dc.contributor.funder | Medical Research Council, Reino Unido | es_ES |
dc.contributor.funder | European Cooperation in Science and Technology | es_ES |
dc.contributor.funder | Fundação de Amparo à Pesquisa do Estado da Bahia | es_ES |
dc.contributor.funder | Conselho Nacional de Desenvolvimento Científico e Tecnológico, Brasil | es_ES |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |
dc.description.references | Andrade G, Viegas F, Ramos GS, Almeida J, Rocha L, Gonçalves M, Ferreira R (2013) GPU-NB: a fast CUDA-based implementation of Naïve Bayes. In: 2013 25th International Symposium on Computer Architecture and High Performance Computing, pp 168–175 | es_ES |
dc.description.references | Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426 | es_ES |
dc.description.references | Cook S (2013) CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, 1st edn. Morgan Kaufmann, San Francisco | es_ES |
dc.description.references | Doan A, Halevy A, Ives Z (2012) Principles of Data Integration. Elsevier, Amsterdam | es_ES |
dc.description.references | Étienne EY (2012) Hyper-threading. TurbsPublishing, Saarbrücken | es_ES |
dc.description.references | Fellegi IP, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64:1183–1210 | es_ES |
dc.description.references | Feng X, Jin H, Zheng R, Zhu L (2014) Near-duplicate detection using GPU-based simhash scheme. In: 2014 International Conference on Smart Computing, pp 223–228 | es_ES |
dc.description.references | Forchhammer B, Papenbrock T, Stening T, Viehmeier S, Naumann U.D.F (2013) Duplicate detection on GPUs. In: BTW. Köllen-Verlag, pp 165–184 | es_ES |
dc.description.references | Kim H.s, Lee D (2007) Parallel linkage. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007. ACM, New York, NY, USA, pp 283–292 | es_ES |
dc.description.references | Mamun AA, Aseltine R, Rajasekaran S (2015) RLT-S: a web system for record linkage. PLoS ONE 10(5):1–9 | es_ES |
dc.description.references | Mamun AA, Aseltine R, Rajasekaran S (2016) Efficient record linkage algorithms using complete linkage clustering. PLoS ONE 11(4):1–21 | es_ES |
dc.description.references | Mamun AA, Mi T, Aseltine R, Rajasekaran S (2014) Efficient sequential and parallel algorithms for record linkage. J Am Med Inform Assoc 21(2):252–262 | es_ES |
dc.description.references | Mizell E, Biery R (2017) How GPUs are defining the future of data analytics | es_ES |
dc.description.references | Munshi A, Gaster B, Mattson TG, Fung J, Ginsburg D (2011) OpenCL Programming Guide, 1st edn. Addison-Wesley, Reading | es_ES |
dc.description.references | NVIDIA Corporation: NVIDIA CUDA C programming guide (2010). Version 3.2 | es_ES |
dc.description.references | OpenMP Architecture Review Board: OpenMP application program interface version 4.0 (2013) | es_ES |
dc.description.references | Pokorny J (2011) NoSQL databases: a step to database scalability in web environment. In: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, iiWAS ’11. ACM, New York, NY, USA, pp 278–283 | es_ES |
dc.description.references | Rendle S, Schmidt-Thieme L (2008) Scaling Record Linkage to Non-uniform Distributed Class Sizes. Springer, Berlin, pp 308–319 | es_ES |
dc.description.references | Sehili Z, Kolb L, Borgs C, Schnell R, Rahm E (2015) Privacy preserving record linkage with ppjoin. In: Datenbanksysteme für Business, Technologie und Web (BTW), pp 85–104 | es_ES |
dc.description.references | Winkler WE (1999) The state of record linkage and current research problems | es_ES |
dc.description.references | Zhong Z, Rychkov V, Lastovetsky A (2015) Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans Comput 64(9):2506–2518 | es_ES |