Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems

Boratto, Murilo; Alonso-Jordá, Pedro; Gimenez, Domingo; Lastovetsky, Alexey

doi:10.1007/s11227-016-1694-y

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems

Mostrar el registro completo del ítem

Boratto, M.; Alonso-Jordá, P.; Gimenez, D.; Lastovetsky, A. (2017). Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems. The Journal of Supercomputing. 73(1):227-239. https://doi.org/10.1007/s11227-016-1694-y

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/152271

Ficheros en el ítem

Nombre: Boratto;Alonso-Jo ...

Tamaño: 416.7Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir/Preview

Nombre: art:10.1007/s1122 ...

Tamaño: 508.7Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

Metadatos del ítem

Título:

Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems

Autor:

Boratto, Murilo

Alonso-Jordá, Pedro Gimenez, Domingo Lastovetsky, Alexey

Entidad UPV:

Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació

Fecha difusión:

2017-01

Resumen:

[EN] Automatic tuning methodologies have been used in the design of routines in recent years. The goal of these methodologies is to develop routines which automatically adapt to the conditions of the underlying computational ...[+]

Palabras clave:

Automatic Tuning , Matrix Polynomials , Performance , Multicore , Multi-GPU

Derechos de uso:

Reserva de todos los derechos

Fuente:

The Journal of Supercomputing. (issn: 0920-8542 )

DOI:

10.1007/s11227-016-1694-y

Editorial:

Springer-Verlag

Versión del editor:

https://doi.org/10.1007/s11227-016-1694-y

Código del Proyecto:

info:eu-repo/grantAgreement/COST//IC1305/EU/Network for Sustainable Ultrascale Computing (NESUS)/
info:eu-repo/grantAgreement/MINECO//TIN2015-66972-C5-3-R/ES/TECNICAS PARA LA MEJORA DE LAS PRESTACIONES, FIABILIDAD Y CONSUMO DE ENERGIA DE LOS SERVIDORES. OPTIMIZACION DE APLICACIONES CIENTIFICAS, MEDICAS Y DE VISION ARTIFICIAL/
info:eu-repo/grantAgreement/MINECO//TEC2015-67387-C4-1-R/ES/SMART SOUND PROCESSING FOR THE DIGITAL LIVING/
info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2014%2F003/ES/Computación y comunicaciones de altas prestaciones y aplicaciones en ingeniería/

Agradecimientos:

This work has been partially supported by Generalitat Valenciana under Grant PROM-ETEOII/2014/003, and by the Spanish MINECO, as well as European Commission FEDER funds, under Grant TEC2015-67387-C4-1-R and TIN2015-66972-C5-3-R, ...[+]

Tipo:

Artículo

References

Alberti PV, Alonso P, Vidal AM, Cuenca J, Giménez D (2004) Designing polylibraries to speed up linear algebra computations. IJHPCN 1(1/2/3):75–84

Alonso P, Boratto M, Pinilla J, Ibañez J, Martinez J (2014) On the evaluation of matrix polynomials using several GPGPUs. Tech Rep Riunet/E10251/39615

Anderson E, Bai Z, Bischof C, Demmel J, Dongarra J, Croz JD, Greenbaum A, Hammarling S, McKenney A, Ostrouchov S, Sorensen D (2013) LAPACK users guide, 2nd edn. SIAM, Philadelphia

Blackford LS, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G, Heroux M, Kaufman L, Lumsdaine A, Petitet A, Pozo R, Remington K, Whaley RC (2001) An updated set of basic linear algebra subprograms (blas). ACM Trans Math Softw 28:135–151

Caron E, Uter F (2002) Parallel extension of a dynamic performance forecasting tool. Sci Ann Cuza Univ 11:80–93

Chandra R (2001) Parallel programming in OpenMP. Morgan Kaufmann, Burlington

Demmel J, Marques O, Parlett BN, Vömel C (2008) Performance and accuracy of LAPACK’s symmetric tridiagonal eigensolvers. SIAM J.Sci Comput 30(3):1508–1526

Frigo M, Johnson S (1998) FFTW: an adaptive software architecture for the FFT. In: Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing vol. 3, pp 1381–1384

García L, Cuenca J, Giménez D (2007) Including improvement of the execution time in a software architecture of libraries with self-optimisation. In: ICSOFT 2007, Proceedings of the Second International Conference on Software and Data Technologies, Volume SE, Barcelona, Spain, pp 156–161, 22–25 July

García LP, Cuenca J, Giménez D (2014) On optimization techniques for the matrix multiplication on hybrid cpu+gpu platforms. Ann Multicore GPU Program 1(1):10–18

Hasanov K, Quintin JN, Lastovetsky A (2014) Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms. J Supercomput 71(11):24–34

Katagiri T, Kise K, Honda H (2005) RAO-SS: a prototype of run-time auto-tuning facility for sparse direct solvers. Tech Rep 22(1):1–10

Katagiri T, Kise K, Honda H, Yuba T (2004) Effect of auto-tuning with user’s knowledge for numerical software. Proceedings of the 1st conference on computing frontiers, Ischia, Italy. ACM, New York, NY, USA, pp 12–25

Nath R, Tomov S, Dongarra J (2010) An improved magma gemm for fermi graphics processing units. Int J High Perform Comput Appl 24(4):511–515

Paterson MS, Stockmeyer LJ (1973) On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J Comput 2(1):60–66

PLASMA (2015) Parallel linear algebra software for multicore architectures. Available in: http://www.netlib.org/plasma/ . Accessed 1 June 2015

Tanaka T, Katagiri T, Yuba T (2007) D-spline based incremental parameter estimation in automatic performance tuning. In: International Conference on Applied Parallel Computing: State of the Art in Scientific Computing, PARA’06. Springer-Verlag, Berlin, Heidelberg, pp 986–995

Vuduc R, Demmel J, Bilmes J (2004) Statistical models for empirical search-based performance tuning. Int J High Perform Comput Appl 18:65–94

Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimizations of software and the ATLAS project. Parallel Comput 27:21–37

[-]

recommendations

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [48344]

Mostrar el registro completo del ítem

Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Automatic Tuning to Performance Modelling of Matrix Polynomials on Multicore and Multi-GPU Systems

Ficheros en el ítem

Metadatos del ítem

References

recommendations

Este ítem aparece en la(s) siguiente(s) colección(ones)