Mostrar el registro sencillo del ítem
dc.contributor.author | Anzt, Hartwig | es_ES |
dc.contributor.author | Dongarra, Jack | es_ES |
dc.contributor.author | Flegar, Goran | es_ES |
dc.contributor.author | Quintana Ortí, Enrique S. | es_ES |
dc.date.accessioned | 2020-12-31T04:31:23Z | |
dc.date.available | 2020-12-31T04:31:23Z | |
dc.date.issued | 2019-01 | es_ES |
dc.identifier.issn | 0167-8191 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/158177 | |
dc.description.abstract | [EN] In this work, we address the efficient realization of block-Jacobi preconditioning on graphics processing units (GPUs). This task requires the solution of a collection of small and independent linear systems. To fully realize this implementation, we develop a variablesize batched matrix inversion kernel that uses Gauss-Jordan elimination (GJE) along with a variable-size batched matrix-vector multiplication kernel that transforms the linear systems' right-hand sides into the solution vectors. Our kernels make heavy use of the increased register count and the warp-local communication associated with newer GPU architectures. Moreover, in the matrix inversion, we employ an implicit pivoting strategy that migrates the workload (i.e., operations) to the place where the data resides instead of moving the data to the executing cores. We complement the matrix inversion with extraction and insertion strategies that allow the block-Jacobi preconditioner to be set up rapidly. The experiments on NVlDlA's K40 and P100 architectures reveal that our variable-size batched matrix inversion routine outperforms the CUDA basic linear algebra subroutine (cuBLAS) library functions that provide the same (or even less) functionality. We also show that the preconditioner setup and preconditioner application cost can be somewhat offset by the faster convergence of the iterative solver. (C) 2018 Elsevier B.V. All rights reserved. | es_ES |
dc.description.sponsorship | This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC-0010042. H. Anzt was supported by the "Impuls and Vernetzungsfond of the Helmholtz Association" under grant VH-NG-1241. G. Flegar and E. S. Quintana-Orti were supported by project TIN2014-53495-R of the MINECO-FEDER; and project OPRECOMP (http://oprecomp.eu) with the financial support of the Future and Emerging Technologies (FET) programme within the European Union's Horizon 2020 research and innovation programme, under grant agreement No 732631. The authors would also like to acknowledge the Swiss National Computing Centre (CSCS) for granting computing resources in the Small Development Project entitled "Energy-Efficient preconditioning for iterative linear solvers" (#d65). | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Elsevier | es_ES |
dc.relation.ispartof | Parallel Computing | es_ES |
dc.rights | Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) | es_ES |
dc.subject | Batched algorithms | es_ES |
dc.subject | Matrix inversion | es_ES |
dc.subject | Gauss-Jordan elimination | es_ES |
dc.subject | Block-Jacobi | es_ES |
dc.subject | Sparse linear systems | es_ES |
dc.subject | Graphics processor | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1016/j.parco.2017.12.006 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/H2020/732631/EU/Open transPREcision COMPuting/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2014-53495-R/ES/COMPUTACION HETEROGENEA DE BAJO CONSUMO/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/Helmholtz Association of German Research Centers//VH-NG-1241/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/DOE//DE-SC-0010042/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/CSCS//#d65/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors | es_ES |
dc.description.bibliographicCitation | Anzt, H.; Dongarra, J.; Flegar, G.; Quintana Ortí, ES. (2019). Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors. Parallel Computing. 81:131-146. https://doi.org/10.1016/j.parco.2017.12.006 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1016/j.parco.2017.12.006 | es_ES |
dc.description.upvformatpinicio | 131 | es_ES |
dc.description.upvformatpfin | 146 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 81 | es_ES |
dc.relation.pasarela | S\379961 | es_ES |
dc.contributor.funder | European Commission | es_ES |
dc.contributor.funder | U.S. Department of Energy | es_ES |
dc.contributor.funder | European Regional Development Fund | es_ES |
dc.contributor.funder | Swiss National Supercomputing Centre | es_ES |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |
dc.contributor.funder | Helmholtz Association of German Research Centers | es_ES |