Utilising multiple GPU cards and multiple hosts

Palacios Piqueres, David

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Utilising multiple GPU cards and multiple hosts

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: memoria.pdf

Tamaño: 1.371Mb

Formato: PDF

Solicitar una copia al autor

dc.contributor.advisor	Sahuquillo Borrás, Julio	es_ES
dc.contributor.advisor	Barnes, Stuart
dc.contributor.author	Palacios Piqueres, David	es_ES
dc.date.accessioned	2011-12-21T08:23:47Z
dc.date.available	2011-12-21T08:23:47Z
dc.date.created	2011-09-30
dc.date.issued	2011-12-21
dc.identifier.uri	http://hdl.handle.net/10251/14101
dc.description.abstract	The sparse Matrix-Vector multiplication is a key operation in science and engineering along with the Conjugate Gradient method. Hence both of them are currently being studied nowadays with the purpose of increasing its performance mainly on GPU devices following the GPGPU trend (General Purpose GPU computing). This thesis presents a study of the speedup gained when performing the sparse Matrix-Vector multiplication based on the ELLR-T storage format, and a Conjugate Gradient solver that makes use of this algorithm, on different computing environments including multiple GPU cards and multiple hosts. The code implemented has been specifically designed to harness the computational architecture of the GPU by using the Nvidia CUDA (Compute Unified Device Architecture) API and the bottlenecks to its performance have been carefully analysed. The analysis shows that the bottleneck of the sparse Matrix-Vector algorithm performance, and therefore the Conjugate Gradient method, is the memory bandwidth of the computing architecture where it is executed. However, when executed on multiple GPUs and/or multiple nodes, the performance is bounded by the vector transfers between cards and nodes and the synchronization time. In fact, the multi-GPU version of the Conjugate Gradient solver presents approximately the same performance as the sequential one.The sequential Conjugate Gradient solver implemented in this thesis achieves a speedup up to 26 on a Tesla C1060 over an Intel Xeon E5462 and up to 14 on a Tesla C2050 over an Intel Core i7 X980 for matrices that represent real problems obtained from the University of Florida Sparse Matrix Collection.	es_ES
dc.format.extent	102	es_ES
dc.language	Inglés	es_ES
dc.publisher	Universitat Politècnica de València	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject.other	Ingeniería Informática-Enginyeria Informàtica	es_ES
dc.title	Utilising multiple GPU cards and multiple hosts	es_ES
dc.type	Proyecto/Trabajo fin de carrera/grado	es_ES
dc.rights.accessRights	Cerrado	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.description.bibliographicCitation	Palacios Piqueres, D. (2011). Utilising multiple GPU cards and multiple hosts. http://hdl.handle.net/10251/14101.	es_ES
dc.description.accrualMethod	Archivo delegado	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

ETSINF - Trabajos académicos [4769]
Escola Tècnica Superior d'Enginyeria Informàtica

Mostrar el registro sencillo del ítem

Utilising multiple GPU cards and multiple hosts

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Utilising multiple GPU cards and multiple hosts

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)