- -

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Anzt, Hartwig es_ES
dc.contributor.author Dongarra, Jack es_ES
dc.contributor.author Flegar, Goran es_ES
dc.contributor.author Quintana Ortí, Enrique S. es_ES
dc.date.accessioned 2020-12-31T04:31:23Z
dc.date.available 2020-12-31T04:31:23Z
dc.date.issued 2019-01 es_ES
dc.identifier.issn 0167-8191 es_ES
dc.identifier.uri http://hdl.handle.net/10251/158177
dc.description.abstract [EN] In this work, we address the efficient realization of block-Jacobi preconditioning on graphics processing units (GPUs). This task requires the solution of a collection of small and independent linear systems. To fully realize this implementation, we develop a variablesize batched matrix inversion kernel that uses Gauss-Jordan elimination (GJE) along with a variable-size batched matrix-vector multiplication kernel that transforms the linear systems' right-hand sides into the solution vectors. Our kernels make heavy use of the increased register count and the warp-local communication associated with newer GPU architectures. Moreover, in the matrix inversion, we employ an implicit pivoting strategy that migrates the workload (i.e., operations) to the place where the data resides instead of moving the data to the executing cores. We complement the matrix inversion with extraction and insertion strategies that allow the block-Jacobi preconditioner to be set up rapidly. The experiments on NVlDlA's K40 and P100 architectures reveal that our variable-size batched matrix inversion routine outperforms the CUDA basic linear algebra subroutine (cuBLAS) library functions that provide the same (or even less) functionality. We also show that the preconditioner setup and preconditioner application cost can be somewhat offset by the faster convergence of the iterative solver. (C) 2018 Elsevier B.V. All rights reserved. es_ES
dc.description.sponsorship This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC-0010042. H. Anzt was supported by the "Impuls and Vernetzungsfond of the Helmholtz Association" under grant VH-NG-1241. G. Flegar and E. S. Quintana-Orti were supported by project TIN2014-53495-R of the MINECO-FEDER; and project OPRECOMP (http://oprecomp.eu) with the financial support of the Future and Emerging Technologies (FET) programme within the European Union's Horizon 2020 research and innovation programme, under grant agreement No 732631. The authors would also like to acknowledge the Swiss National Computing Centre (CSCS) for granting computing resources in the Small Development Project entitled "Energy-Efficient preconditioning for iterative linear solvers" (#d65). es_ES
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation.ispartof Parallel Computing es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Batched algorithms es_ES
dc.subject Matrix inversion es_ES
dc.subject Gauss-Jordan elimination es_ES
dc.subject Block-Jacobi es_ES
dc.subject Sparse linear systems es_ES
dc.subject Graphics processor es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.parco.2017.12.006 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/732631/EU/Open transPREcision COMPuting/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2014-53495-R/ES/COMPUTACION HETEROGENEA DE BAJO CONSUMO/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/Helmholtz Association of German Research Centers//VH-NG-1241/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/DOE//DE-SC-0010042/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/CSCS//#d65/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Anzt, H.; Dongarra, J.; Flegar, G.; Quintana Ortí, ES. (2019). Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors. Parallel Computing. 81:131-146. https://doi.org/10.1016/j.parco.2017.12.006 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1016/j.parco.2017.12.006 es_ES
dc.description.upvformatpinicio 131 es_ES
dc.description.upvformatpfin 146 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 81 es_ES
dc.relation.pasarela S\379961 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder U.S. Department of Energy es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Swiss National Supercomputing Centre es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.contributor.funder Helmholtz Association of German Research Centers es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem