InfiniBand Trade Association (IBTA) (2015) [Online].
DAmbrosia J (2014) Ethernet in the TOP500 [Online].
TOP500 Supercomputer Sites (2014) [Online].
InfiniBand Trade Association (IBTA) (2015) [Online].
DAmbrosia J (2014) Ethernet in the TOP500 [Online].
TOP500 Supercomputer Sites (2014) [Online].
InfiniBand Trade Association (IBTA) (2007) The InfiniBand Trade Association Specification
Kerr G (2011) Dissecting a small infiniband application using the verbs API. CoRR abs/1105.1827 [Online]. arxiv:1105.1827
Woodruff B, Hefty S, Dreier R, Rosenstock H (2005) Introduction to the infiniband core software. In: Linux symposium, vol 2
Bedeir T (2010) Building an RDMA-capable application with ib verbs, Technical report, HPC Advisory Council, Tech. Rep., 2010.
Liu Q, Russell RD (2014) A performance study of infiniband fourteen data rate (fdr). In: Proceedings of the High performance computing symposium, ser. HPC ’14. San Diego, CA, USA: Society for Computer Simulation International, 2014, pp 16:1–16:10 [Online].
Hjelm N (2014) Optimizing one-sided operations in open mpi. In: Proceedings of the 21st European MPI Users’ Group Meeting, ser. EuroMPI/ASIA ’14. New York, NY, USA: ACM, 2014, pp 123:123–123:124 [Online].
Subramoni H, Hamidouche K, Venkatesh A, Chakraborty S, Panda D (2014) Designing mpi library with dynamic connected transport (dct) of infiniband: Early experiences. In: Kunkel J , Ludwig T, Meuer H (eds) Supercomputing, ser. lecture notes in computer science. Springer International Publishing, 2014, vol 8488, pp 278–295 [Online]. doi: 10.1007/978-3-319-07518-1_18
Unified Communication X (UCX), 2015 [Online].
NVIDIA (2014) CUDA C Programming Guide 6.5
Peña AJ, Reaño C, Silla F, Mayo R, Quintana-Ortí ES, Duato J (2014) A complete and efficient cuda-sharing solution for hpc clusters. Parallel Comput 40(10):574– 588 [Online].
Reaño C, Silla F, Gimeno AC, Peña AJ, Mayo R, Quintana-Ortí ES, Duato J (2015) Improving the user experience of the rcuda remote GPU virtualization framework. Concurr Comput Pract Exp 27(14)3746–3770 [Online]. doi: 10.1002/cpe.3409
Prades J, Reaño C, Silla F (2016) Flexible access to CUDA accelerators from Xen virtual machines in InfiniBand clusters using rCUDA. In: 21st ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2016
Iserte S, Gimeno AC, Mayo R, Quintana-Ortí ES, Silla F, Duato J, Reaño C, Prades J (2014) SLURM support for remote GPU virtualization: implementation and performance study. In: 26th IEEE international symposium on computer architecture and high performance computing, SBAC-PAD, 2014, pp 318–325 [Online]. doi: 10.1109/SBAC-PAD.2014.49
NVIDIA (2014) NVIDIA CUDA Samples 6.5
Che S, Boyer M, Meng J, Tarjan D, Sheaffer J, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: Workload Characterization, 2009. IISWC 2009. IEEE international symposium on, 2009, pp 44–54
University of Tennessee, MAGMA: matrix algebra on GPU and multicore architectures [Online].
Bosma W, Cannon J, Playoust C (1997) The Magma algebra system. I. The user language. Computational algebra and number theory (London, 1993). J Symbol Comput 24(3–4) 235–265 [Online]. doi: 10.1006/jsco.1996.0125
GROMACS web page (2014 ) [Online].
Pronk S, Pll S, Schulz R, Larsson P, Bjelkmar P, Apostolov R, Shirts MR, Smith JC, Kasson PM, van der Spoel D, Hess B, Lindahl E (2013) Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7)845–854 [Online].
Brown WM, Kohlmeyer A, Plimpton SJ, Tharrington AN (2012) Implementing molecular dynamics on hybrid high performance computers: particle–particle particle–mesh. Comp Phys Commun 183(3):449–459
Athanasopoulos A, Dimou A, Mezaris V, Kompatsiaris I (2011) GPU acceleration for support vector machines. In: 12th international workshop on image analysis for multimedia interactive services (WIAMIS)