Iserte Agut, S.; Castello Gimeno, A.; Mayo Gual, R.; Quintana Ortí, ES.; Silla Jiménez, F.; Duato Marín, JF.; Reaño González, C.... (2014). SLURM Support for Remote GPU Virtualization: Implementation and Performance Study. En Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on. IEEE. 318-325. https://doi.org/10.1109/SBAC-PAD.2014.49
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/66693
Título:
|
SLURM Support for Remote GPU Virtualization: Implementation and Performance Study
|
Autor:
|
Iserte Agut, Sergio
Castello Gimeno, Adrián
Mayo Gual, Rafael
Quintana Ortí, Enrique Salvador
Silla Jiménez, Federico
Duato Marín, José Francisco
Reaño González, Carlos
Prades Gasulla, Javier
|
Entidad UPV:
|
Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors
|
Fecha difusión:
|
|
Resumen:
|
SLURM is a resource manager that can be leveraged to share a collection of heterogeneous resources among the jobs in execution in a cluster. However, SLURM is not designed to handle resources such as graphics processing ...[+]
SLURM is a resource manager that can be leveraged to share a collection of heterogeneous resources among the jobs in execution in a cluster. However, SLURM is not designed to handle resources such as graphics processing units (GPUs). Concretely, although SLURM can use a generic resource plugin (GRes) to manage GPUs, with this solution the hardware accelerators can only be accessed by the job that is in execution on the node to which the GPU is attached. This is a serious constraint for remote GPU virtualization technologies, which aim at providing a user-transparent access to all GPUs in cluster, independently of the specific location of the node where the application is running with respect to the GPU node. In this work we introduce a new type of device in SLURM, "rgpu", in order to gain access from any application node to any GPU node in the cluster using rCUDA as the remote GPU virtualization solution. With this new scheduling mechanism, a user can access any number of GPUs, as SLURM schedules the tasks taking into account all the graphics accelerators available in the complete cluster. We present experimental results that show the benefits of this new approach in terms of increased flexibility for the job scheduler.
[-]
|
Palabras clave:
|
HPC cluster
,
Job scheduler
,
Remote GPU virtualization
,
Resource management
|
Derechos de uso:
|
Reserva de todos los derechos
|
ISBN:
|
9781467347907
|
Fuente:
|
Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on. (issn:
1550-6533
)
|
DOI:
|
10.1109/SBAC-PAD.2014.49
|
Editorial:
|
IEEE
|
Versión del editor:
|
http://dx.doi.org/10.1109/SBAC-PAD.2014.49
|
Título del congreso:
|
26th IEEE Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2014)
|
Lugar del congreso:
|
Paris, France
|
Fecha congreso:
|
October, 22-24, 2014
|
Código del Proyecto:
|
info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2013%2F009/ES/DESARROLLO DE LIBRERIAS PARA GESTIONAR EL ACCESO A DISPOSITIVOS REMOTOS COMPARTIDOS EN SERVIDORES DE ALTAS PRESTACIONES/
info:eu-repo/grantAgreement/MICINN//TIN2011-23283/ES/POWER-AWARE HIGH PERFORMANCE COMPUTING/
info:eu-repo/grantAgreement/UJI//P1·1B2013-21/
|
Descripción:
|
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
|
Agradecimientos:
|
The researchers at UPV were supported by the the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. Researchers at UJI were supported by MINECO, by FEDER funds under Grant TIN2011-23283, ...[+]
The researchers at UPV were supported by the the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. Researchers at UJI were supported by MINECO, by FEDER funds under Grant TIN2011-23283, and by the Fundacion Caixa-Castelló Bancaixa (Grant P11B2013-21).
[-]
|
Tipo:
|
Capítulo de libro
Comunicación en congreso
|