- -

On the Benefits of the Remote GPU Virtualization Mechanism: the rCUDA Case

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

El jueves 27 desde las 00 hasta 10:00 horas el sistema se apagará debido a tareas habituales de mantenimiento

On the Benefits of the Remote GPU Virtualization Mechanism: the rCUDA Case

Show full item record

Silla Jiménez, F.; Iserte Agut, S.; Reaño González, C.; Prades, J. (2017). On the Benefits of the Remote GPU Virtualization Mechanism: the rCUDA Case. Concurrency and Computation Practice and Experience. 29(13):1-17. https://doi.org/10.1002/cpe.4072

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/152813

Files in this item

Item Metadata

Title: On the Benefits of the Remote GPU Virtualization Mechanism: the rCUDA Case
Author: Silla Jiménez, Federico Iserte Agut, Sergio Reaño González, Carlos Prades, Javier
UPV Unit: Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors
Issued date:
Abstract:
[EN] Graphics processing units (GPUs) are being adopted in many computing facilities given their extraordinary computing power, which makes it possible to accelerate many general purpose applications from different domains. ...[+]
Subjects: CUDA , GPU migration , GPU virtualization , InfiniBand , Slurm , Xen
Copyrigths: Reserva de todos los derechos
Source:
Concurrency and Computation Practice and Experience. (issn: 1532-0626 )
DOI: 10.1002/cpe.4072
Publisher:
John Wiley & Sons
Publisher version: https://doi.org/10.1002/cpe.4072
Project ID:
info:eu-repo/grantAgreement/MINECO//TIN2014-53495-R/ES/COMPUTACION HETEROGENEA DE BAJO CONSUMO/
GENERALITAT VALENCIANA/PROMETEOII/2013/009
Thanks:
Generalitat Valenciana, Grant/Award Number: PROMETEOII/2013/009; MINECO and FEDER, Grant/Award Number: TIN2014-53495-R
Type: Artículo

References

Wu H Diamos G Sheard T Red Fox: An execution environment for relational query processing on GPUs Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization CGO '14 Orlando, FL, USA ACM 2014 44:44 44:54

Playne DP Hawick KA Data parallel three-dimensional cahn-hilliard field equation simulation on GPUs with CUDA Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA Las Vegas, Nevada, USA 2009

Yamazaki, I., Dong, T., Solcà, R., Tomov, S., Dongarra, J., & Schulthess, T. (2013). Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems. Concurrency and Computation: Practice and Experience, 26(16), 2652-2666. doi:10.1002/cpe.3152 [+]
Wu H Diamos G Sheard T Red Fox: An execution environment for relational query processing on GPUs Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization CGO '14 Orlando, FL, USA ACM 2014 44:44 44:54

Playne DP Hawick KA Data parallel three-dimensional cahn-hilliard field equation simulation on GPUs with CUDA Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA Las Vegas, Nevada, USA 2009

Yamazaki, I., Dong, T., Solcà, R., Tomov, S., Dongarra, J., & Schulthess, T. (2013). Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems. Concurrency and Computation: Practice and Experience, 26(16), 2652-2666. doi:10.1002/cpe.3152

Yuancheng Luo D Canny edge detection on NVIDIA CUDA IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08 Anchorage, AK, USA IEEE 2008 1 8

Surkov, V. (2010). Parallel option pricing with Fourier space time-stepping method on graphics processing units. Parallel Computing, 36(7), 372-380. doi:10.1016/j.parco.2010.02.006

Agarwal, P. K., Hampton, S., Poznanovic, J., Ramanthan, A., Alam, S. R., & Crozier, P. S. (2012). Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures. Concurrency and Computation: Practice and Experience, 25(10), 1356-1375. doi:10.1002/cpe.2943

Yoo, A. B., Jette, M. A., & Grondona, M. (2003). SLURM: Simple Linux Utility for Resource Management. Lecture Notes in Computer Science, 44-60. doi:10.1007/10968987_3

Silla F Prades J Iserte S Reaño C Remote GPU virtualization: Is it useful The 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era Barcelona, Spain IEEE Computer Society 2016 41 48

Liang TY Chang YW GridCuda: A grid-enabled CUDA programming toolkit 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA) Biopolis, Singapore IEEE 2011 141 146

Oikawa M Kawai A Nomura K Yasuoka K Yoshikawa K Narumi T DS-CUDA: A middleware to use many GPUs in the cloud environment Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis SCC '12 IEEE Computer Society Washington, DC, USA 2012 1207 1214

Giunta G Montella R Agrillo G Coviello G A GPGPU transparent virtualization component for high performance computing clouds Euro-Par 2010 - Parallel Processing Ischia, Italy Springer 2010

Shi L Chen H Sun J vCUDA: GPU accelerated high performance computing in virtual machines IEEE International Symposium on Parallel & Distributed Processing, 2009. IPDPS 2009 Rome, Italy IEEE 2009 1 11

Gupta V Gavrilovska A Schwan K GViM: GPU-accelerated virtual machines Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing Nuremberg, Germany 2009 17 24

Peña, A. J., Reaño, C., Silla, F., Mayo, R., Quintana-Ortí, E. S., & Duato, J. (2014). A complete and efficient CUDA-sharing solution for HPC clusters. Parallel Computing, 40(10), 574-588. doi:10.1016/j.parco.2014.09.011

CUDA API Reference Manual 7.5 https://developer.nvidia.com/cuda-toolkit 2016

Merritt AM Gupta V Verma A Gavrilovska A Schwan K Shadowfax: Scaling in heterogeneous cluster systems via GPGPU assemblies Proceedings of the 5th International Workshop on Virtualization Technologies in Distributed Computing VTDC '11 ACM New York, NY, USA 2011 3 10

Shadowfax II - scalable implementation of GPGPU assemblies http://keeneland.gatech.edu/software/keeneland/kidron

NVIDIA The NVIDIA GPU Computing SDK Version 5.5 2013

iperf3: A TCP, UDP, and SCTP network bandwidth measurement tool https://github.com/esnet/iperf 2016

Reaño C Silla F Shainer G Schultz S Local and remote GPUs perform similar with EDR 100G InfiniBand Proceedings of the Industrial Track of the 16th International Middleware Conference Middleware Industry '15 Vancouver, Canada 2015

Reaño, C., Silla, F., Castelló, A., Peña, A. J., Mayo, R., Quintana-Ortí, E. S., & Duato, J. (2014). Improving the user experience of the rCUDA remote GPU virtualization framework. Concurrency and Computation: Practice and Experience, 27(14), 3746-3770. doi:10.1002/cpe.3409

Iserte S Castelló A Mayo R Slurm support for remote GPU virtualization: Implementation and performance study 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) 2014 318 325

Vouzis, P. D., & Sahinidis, N. V. (2010). GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics, 27(2), 182-188. doi:10.1093/bioinformatics/btq644

Brown, W. M., Kohlmeyer, A., Plimpton, S. J., & Tharrington, A. N. (2012). Implementing molecular dynamics on hybrid high performance computers – Particle–particle particle-mesh. Computer Physics Communications, 183(3), 449-459. doi:10.1016/j.cpc.2011.10.012

Liu, Y., Schmidt, B., Liu, W., & Maskell, D. L. (2010). CUDA–MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognition Letters, 31(14), 2170-2177. doi:10.1016/j.patrec.2009.10.009

Pronk, S., Páll, S., Schulz, R., Larsson, P., Bjelkmar, P., Apostolov, R., … Lindahl, E. (2013). GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics, 29(7), 845-854. doi:10.1093/bioinformatics/btt055

Klus, P., Lam, S., Lyberg, D., Cheung, M., Pullan, G., McFarlane, I., … Lam, B. Y. (2012). BarraCUDA - a fast short read sequence aligner using graphics processing units. BMC Research Notes, 5(1), 27. doi:10.1186/1756-0500-5-27

Kurtz, S., Phillippy, A., Delcher, A. L., Smoot, M., Shumway, M., Antonescu, C., & Salzberg, S. L. (2004). Genome Biology, 5(2), R12. doi:10.1186/gb-2004-5-2-r12

Chang, C.-C., & Lin, C.-J. (2011). LIBSVM. ACM Transactions on Intelligent Systems and Technology, 2(3), 1-27. doi:10.1145/1961189.1961199

Phillips, J. C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., … Schulten, K. (2005). Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 26(16), 1781-1802. doi:10.1002/jcc.20289

NVIDIA Popular GPU-Accelerated Applications Catalog http://www.nvidia.es/content/tesla/pdf/gpu-accelerated-applications-for-hpc.pdf 2016

Walters JP Younge AJ Kang D-I GPU-passthrough performance: A comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL applications 7th IEEE International Conference on Cloud Computing (CLOUD 2014) Anchorage, AK, USA 2014

Yang C-T Wang H-Y Ou W-S Liu Y-T Hsu C-H On implementation of GPU virtualization using PCI pass-through 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CLOUDCOM) Taipei, Taiwan 2012 711 716

Pérez F Reaño C Silla F Providing CUDA acceleration to KVM virtual machines in InfiniBand clusters with rCUDA Proceedings of the International Conference on Distributed Applications and Interoperable Systems Crete, Greece 2016

Jo, H., Jeong, J., Lee, M., & Choi, D. H. (2013). Exploiting GPUs in Virtual Machine for BioCloud. BioMed Research International, 2013, 1-11. doi:10.1155/2013/939460

Prades J Reaño C Silla F CUDA acceleration for Xen virtual machines in Infiniband clusters with rCUDA Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming PPoPP '16 Barcelona, Spain 2016

Mellanox Mellanox OFED for Linux User Manual 2015

Liu, Y., Wirawan, A., & Schmidt, B. (2013). CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, 14(1). doi:10.1186/1471-2105-14-117

Takizawa H Sato K Komatsu K Kobayashi H CheCUDA: A checkpoint/restart tool for CUDA applications Proceedings of the 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies Hiroshima, Japan 2009

[-]

recommendations

 

This item appears in the following Collection(s)

Show full item record