- -

Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems

Show full item record

Boratto, M.; Alonso-Jordá, P.; Gimenez, D.; Lastovetsky, A. (2017). Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems. The Journal of Supercomputing. 73(1):227-239. https://doi.org/10.1007/s11227-016-1694-y

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/152271

Files in this item

Item Metadata

Title: Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems
Author: Boratto, Murilo Alonso-Jordá, Pedro Gimenez, Domingo Lastovetsky, Alexey
UPV Unit: Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
Issued date:
Abstract:
[EN] Automatic tuning methodologies have been used in the design of routines in recent years. The goal of these methodologies is to develop routines which automatically adapt to the conditions of the underlying computational ...[+]
Subjects: Automatic Tuning , Matrix Polynomials , Performance , Multicore , Multi-GPU
Copyrigths: Reserva de todos los derechos
Source:
The Journal of Supercomputing. (issn: 0920-8542 )
DOI: 10.1007/s11227-016-1694-y
Publisher:
Springer-Verlag
Publisher version: https://doi.org/10.1007/s11227-016-1694-y
Project ID:
info:eu-repo/grantAgreement/COST//IC1305/EU/Network for Sustainable Ultrascale Computing (NESUS)/
info:eu-repo/grantAgreement/MINECO//TIN2015-66972-C5-3-R/ES/TECNICAS PARA LA MEJORA DE LAS PRESTACIONES, FIABILIDAD Y CONSUMO DE ENERGIA DE LOS SERVIDORES. OPTIMIZACION DE APLICACIONES CIENTIFICAS, MEDICAS Y DE VISION ARTIFICIAL/
info:eu-repo/grantAgreement/MINECO//TEC2015-67387-C4-1-R/ES/SMART SOUND PROCESSING FOR THE DIGITAL LIVING/
info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2014%2F003/ES/Computación y comunicaciones de altas prestaciones y aplicaciones en ingeniería/
Thanks:
This work has been partially supported by Generalitat Valenciana under Grant PROM-ETEOII/2014/003, and by the Spanish MINECO, as well as European Commission FEDER funds, under Grant TEC2015-67387-C4-1-R and TIN2015-66972-C5-3-R, ...[+]
Type: Artículo

References

Alberti PV, Alonso P, Vidal AM, Cuenca J, Giménez D (2004) Designing polylibraries to speed up linear algebra computations. IJHPCN 1(1/2/3):75–84

Alonso P, Boratto M, Pinilla J, Ibañez J, Martinez J (2014) On the evaluation of matrix polynomials using several GPGPUs. Tech Rep Riunet/E10251/39615

Anderson E, Bai Z, Bischof C, Demmel J, Dongarra J, Croz JD, Greenbaum A, Hammarling S, McKenney A, Ostrouchov S, Sorensen D (2013) LAPACK users guide, 2nd edn. SIAM, Philadelphia [+]
Alberti PV, Alonso P, Vidal AM, Cuenca J, Giménez D (2004) Designing polylibraries to speed up linear algebra computations. IJHPCN 1(1/2/3):75–84

Alonso P, Boratto M, Pinilla J, Ibañez J, Martinez J (2014) On the evaluation of matrix polynomials using several GPGPUs. Tech Rep Riunet/E10251/39615

Anderson E, Bai Z, Bischof C, Demmel J, Dongarra J, Croz JD, Greenbaum A, Hammarling S, McKenney A, Ostrouchov S, Sorensen D (2013) LAPACK users guide, 2nd edn. SIAM, Philadelphia

Blackford LS, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G, Heroux M, Kaufman L, Lumsdaine A, Petitet A, Pozo R, Remington K, Whaley RC (2001) An updated set of basic linear algebra subprograms (blas). ACM Trans Math Softw 28:135–151

Caron E, Uter F (2002) Parallel extension of a dynamic performance forecasting tool. Sci Ann Cuza Univ 11:80–93

Chandra R (2001) Parallel programming in OpenMP. Morgan Kaufmann, Burlington

Demmel J, Marques O, Parlett BN, Vömel C (2008) Performance and accuracy of LAPACK’s symmetric tridiagonal eigensolvers. SIAM J.Sci Comput 30(3):1508–1526

Frigo M, Johnson S (1998) FFTW: an adaptive software architecture for the FFT. In: Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing vol. 3, pp 1381–1384

García L, Cuenca J, Giménez D (2007) Including improvement of the execution time in a software architecture of libraries with self-optimisation. In: ICSOFT 2007, Proceedings of the Second International Conference on Software and Data Technologies, Volume SE, Barcelona, Spain, pp 156–161, 22–25 July

García LP, Cuenca J, Giménez D (2014) On optimization techniques for the matrix multiplication on hybrid cpu+gpu platforms. Ann Multicore GPU Program 1(1):10–18

Hasanov K, Quintin JN, Lastovetsky A (2014) Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms. J Supercomput 71(11):24–34

Katagiri T, Kise K, Honda H (2005) RAO-SS: a prototype of run-time auto-tuning facility for sparse direct solvers. Tech Rep 22(1):1–10

Katagiri T, Kise K, Honda H, Yuba T (2004) Effect of auto-tuning with user’s knowledge for numerical software. Proceedings of the 1st conference on computing frontiers, Ischia, Italy. ACM, New York, NY, USA, pp 12–25

Nath R, Tomov S, Dongarra J (2010) An improved magma gemm for fermi graphics processing units. Int J High Perform Comput Appl 24(4):511–515

Paterson MS, Stockmeyer LJ (1973) On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J Comput 2(1):60–66

PLASMA (2015) Parallel linear algebra software for multicore architectures. Available in: http://www.netlib.org/plasma/ . Accessed 1 June 2015

Tanaka T, Katagiri T, Yuba T (2007) D-spline based incremental parameter estimation in automatic performance tuning. In: International Conference on Applied Parallel Computing: State of the Art in Scientific Computing, PARA’06. Springer-Verlag, Berlin, Heidelberg, pp 986–995

Vuduc R, Demmel J, Bilmes J (2004) Statistical models for empirical search-based performance tuning. Int J High Perform Comput Appl 18:65–94

Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimizations of software and the ATLAS project. Parallel Comput 27:21–37

[-]

recommendations

 

This item appears in the following Collection(s)

Show full item record