- -

A complete and efficient CUDA-sharing solution for HPC clusters

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

A complete and efficient CUDA-sharing solution for HPC clusters

Show simple item record

Files in this item

dc.contributor.author Peña Monferrer, Antonio José es_ES
dc.contributor.author Reaño González, Carlos es_ES
dc.contributor.author Silla Jiménez, Federico es_ES
dc.contributor.author Mayo Gual, Rafael es_ES
dc.contributor.author Quintana-Orti, Enrique S. es_ES
dc.contributor.author Duato Marín, José Francisco es_ES
dc.date.accessioned 2015-05-12T09:15:54Z
dc.date.available 2015-05-12T09:15:54Z
dc.date.issued 2014-12
dc.identifier.issn 0167-8191
dc.identifier.uri http://hdl.handle.net/10251/50089
dc.description.abstract In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than nodes, as well as permits a single node to exploit the whole set of GPUs installed in the cluster. In our proposal, CUDA applications can seamlessly interact with any GPU in the cluster, independently of its physical location. Thus, GPUs can be either distributed among compute nodes or concentrated in dedicated GPGPU servers, depending on the cluster administrator’s policy. This proposal leads to savings not only in space but also in energy, acquisition, and maintenance costs. The performance evaluation in this paper with a series of benchmarks and a production application clearly demonstrates the viability of this proposal. Concretely, experiments with the matrix–matrix product reveal excellent performance compared with regular executions on the local GPU; on a much more complex application, the GPU-accelerated LAMMPS, we attain up to 11x speedup employing 8 remote accelerators from a single node with respect to a 12-core CPU-only execution. GPGPU service interaction in compute nodes, remote acceleration in dedicated GPGPU servers, and data transfer performance of similar GPU virtualization frameworks are also evaluated. 2014 Elsevier B.V. All rights reserved. es_ES
dc.description.sponsorship This work was supported by the Spanish Ministerio de Economia y Competitividad (MINECO) and by FEDER funds under Grant TIN2012-38341-004-01. It was also supported by MINECO, FEDER funds, under Grant TIN2011-23283, and by the Fundacion Caixa-Castello Bancaixa, Grant P11B2013-21. This work was also supported in part by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357. Authors are grateful for the generous support provided by Mellanox Technologies to the rCUDA Project. The authors would also like to thank Adrian Castello, member of The rCUDA Development Team, for his hard work on rCUDA. en_EN
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation Spanish Ministerio de Economia y Competitividad (MINECO) es_ES
dc.relation FEDER [TIN2012-38341-004-01] es_ES
dc.relation MINECO, FEDER [TIN2011-23283] es_ES
dc.relation Fundacion Caixa-Castello Bancaixa [P11B2013-21] es_ES
dc.relation U.S. Department of Energy, Office of Science [DE-AC02-06CH11357] es_ES
dc.relation Mellanox Technologies es_ES
dc.relation.ispartof Parallel Computing es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Graphics processors es_ES
dc.subject Virtualization es_ES
dc.subject High performance computing es_ES
dc.subject Clusters es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title A complete and efficient CUDA-sharing solution for HPC clusters es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.parco.2014.09.011
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Peña Monferrer, AJ.; Reaño González, C.; Silla Jiménez, F.; Mayo Gual, R.; Quintana-Orti, ES.; Duato Marín, JF. (2014). A complete and efficient CUDA-sharing solution for HPC clusters. Parallel Computing. 40(10):574-588. doi:10.1016/j.parco.2014.09.011 es_ES
dc.description.accrualMethod Senia es_ES
dc.relation.publisherversion http://dx.doi.org/10.1016/j.parco.2014.09.011 es_ES
dc.description.upvformatpinicio 574 es_ES
dc.description.upvformatpfin 588 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 40 es_ES
dc.description.issue 10 es_ES
dc.relation.senia 277383


This item appears in the following Collection(s)

Show simple item record