- -

Utilising multiple GPU cards and multiple hosts

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Utilising multiple GPU cards and multiple hosts

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.advisor Sahuquillo Borrás, Julio es_ES
dc.contributor.advisor Barnes, Stuart
dc.contributor.author Palacios Piqueres, David es_ES
dc.date.accessioned 2011-12-21T08:23:47Z
dc.date.available 2011-12-21T08:23:47Z
dc.date.created 2011-09-30
dc.date.issued 2011-12-21
dc.identifier.uri http://hdl.handle.net/10251/14101
dc.description.abstract The sparse Matrix-Vector multiplication is a key operation in science and engineering along with the Conjugate Gradient method. Hence both of them are currently being studied nowadays with the purpose of increasing its performance mainly on GPU devices following the GPGPU trend (General Purpose GPU computing). This thesis presents a study of the speedup gained when performing the sparse Matrix-Vector multiplication based on the ELLR-T storage format, and a Conjugate Gradient solver that makes use of this algorithm, on different computing environments including multiple GPU cards and multiple hosts. The code implemented has been specifically designed to harness the computational architecture of the GPU by using the Nvidia CUDA (Compute Unified Device Architecture) API and the bottlenecks to its performance have been carefully analysed. The analysis shows that the bottleneck of the sparse Matrix-Vector algorithm performance, and therefore the Conjugate Gradient method, is the memory bandwidth of the computing architecture where it is executed. However, when executed on multiple GPUs and/or multiple nodes, the performance is bounded by the vector transfers between cards and nodes and the synchronization time. In fact, the multi-GPU version of the Conjugate Gradient solver presents approximately the same performance as the sequential one.The sequential Conjugate Gradient solver implemented in this thesis achieves a speedup up to 26 on a Tesla C1060 over an Intel Xeon E5462 and up to 14 on a Tesla C2050 over an Intel Core i7 X980 for matrices that represent real problems obtained from the University of Florida Sparse Matrix Collection. es_ES
dc.format.extent 102 es_ES
dc.language Inglés es_ES
dc.publisher Universitat Politècnica de València es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject.other Ingeniería Informática-Enginyeria Informàtica es_ES
dc.title Utilising multiple GPU cards and multiple hosts es_ES
dc.type Proyecto/Trabajo fin de carrera/grado es_ES
dc.rights.accessRights Cerrado es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Palacios Piqueres, D. (2011). Utilising multiple GPU cards and multiple hosts. http://hdl.handle.net/10251/14101. es_ES
dc.description.accrualMethod Archivo delegado es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem