Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

Anzt, Hartwig; Dongarra, Jack; Flegar, Goran; Quintana Ortí, Enrique S.

doi:10.1016/j.parco.2017.12.006

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Anzt;Dongarra;Flegar ...

Tamaño: 908.1Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: 1-s2.0-S016781911 ...

Tamaño: 1.961Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Anzt, Hartwig	es_ES
dc.contributor.author	Dongarra, Jack	es_ES
dc.contributor.author	Flegar, Goran	es_ES
dc.contributor.author	Quintana Ortí, Enrique S.	es_ES
dc.date.accessioned	2020-12-31T04:31:23Z
dc.date.available	2020-12-31T04:31:23Z
dc.date.issued	2019-01	es_ES
dc.identifier.issn	0167-8191	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/158177
dc.description.abstract	[EN] In this work, we address the efficient realization of block-Jacobi preconditioning on graphics processing units (GPUs). This task requires the solution of a collection of small and independent linear systems. To fully realize this implementation, we develop a variablesize batched matrix inversion kernel that uses Gauss-Jordan elimination (GJE) along with a variable-size batched matrix-vector multiplication kernel that transforms the linear systems' right-hand sides into the solution vectors. Our kernels make heavy use of the increased register count and the warp-local communication associated with newer GPU architectures. Moreover, in the matrix inversion, we employ an implicit pivoting strategy that migrates the workload (i.e., operations) to the place where the data resides instead of moving the data to the executing cores. We complement the matrix inversion with extraction and insertion strategies that allow the block-Jacobi preconditioner to be set up rapidly. The experiments on NVlDlA's K40 and P100 architectures reveal that our variable-size batched matrix inversion routine outperforms the CUDA basic linear algebra subroutine (cuBLAS) library functions that provide the same (or even less) functionality. We also show that the preconditioner setup and preconditioner application cost can be somewhat offset by the faster convergence of the iterative solver. (C) 2018 Elsevier B.V. All rights reserved.	es_ES
dc.description.sponsorship	This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC-0010042. H. Anzt was supported by the "Impuls and Vernetzungsfond of the Helmholtz Association" under grant VH-NG-1241. G. Flegar and E. S. Quintana-Orti were supported by project TIN2014-53495-R of the MINECO-FEDER; and project OPRECOMP (http://oprecomp.eu) with the financial support of the Future and Emerging Technologies (FET) programme within the European Union's Horizon 2020 research and innovation programme, under grant agreement No 732631. The authors would also like to acknowledge the Swiss National Computing Centre (CSCS) for granting computing resources in the Small Development Project entitled "Energy-Efficient preconditioning for iterative linear solvers" (#d65).	es_ES
dc.language	Inglés	es_ES
dc.publisher	Elsevier	es_ES
dc.relation.ispartof	Parallel Computing	es_ES
dc.rights	Reconocimiento - No comercial - Sin obra derivada (by-nc-nd)	es_ES
dc.subject	Batched algorithms	es_ES
dc.subject	Matrix inversion	es_ES
dc.subject	Gauss-Jordan elimination	es_ES
dc.subject	Block-Jacobi	es_ES
dc.subject	Sparse linear systems	es_ES
dc.subject	Graphics processor	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1016/j.parco.2017.12.006	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/732631/EU/Open transPREcision COMPuting/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2014-53495-R/ES/COMPUTACION HETEROGENEA DE BAJO CONSUMO/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/Helmholtz Association of German Research Centers//VH-NG-1241/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/DOE//DE-SC-0010042/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/CSCS//#d65/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.description.bibliographicCitation	Anzt, H.; Dongarra, J.; Flegar, G.; Quintana Ortí, ES. (2019). Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors. Parallel Computing. 81:131-146. https://doi.org/10.1016/j.parco.2017.12.006	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.parco.2017.12.006	es_ES
dc.description.upvformatpinicio	131	es_ES
dc.description.upvformatpfin	146	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	81	es_ES
dc.relation.pasarela	S\379961	es_ES
dc.contributor.funder	European Commission	es_ES
dc.contributor.funder	U.S. Department of Energy	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES
dc.contributor.funder	Swiss National Supercomputing Centre	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.contributor.funder	Helmholtz Association of German Research Centers	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)