Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

Cámara, Jesús; Cuenca, Javier; Giménez, Domingo; García, Luis Pedro; Vidal Maciá, Antonio Manuel

doi:10.1007/s10766-013-0249-6

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: IJPP-D-12-00395ma ...

Tamaño: 1.347Mb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: Journal of parallel ...

Tamaño: 516.9Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Cámara, Jesús	es_ES
dc.contributor.author	Cuenca, Javier	es_ES
dc.contributor.author	Giménez, Domingo	es_ES
dc.contributor.author	García, Luis Pedro	es_ES
dc.contributor.author	Vidal Maciá, Antonio Manuel	es_ES
dc.date.accessioned	2015-04-27T08:10:03Z
dc.date.available	2015-04-27T08:10:03Z
dc.date.issued	2014-06
dc.identifier.issn	0885-7458
dc.identifier.uri	http://hdl.handle.net/10251/49284
dc.description	The final publication is available at Springer via http://dx.doi.org/10.1007/s10766-013-0249-6	es_ES
dc.description.abstract	The introduction of auto-tuning techniques in linear algebra shared-memory routines is analyzed. Information obtained in the installation of the routines is used at running time to take some decisions to reduce the total execution time. The study is carried out with routines at different levels (matrix multiplication, LU and Cholesky factorizations and linear systems symmetric or general routines) and with calls to routines in the LAPACK and PLASMA libraries with multithread implementations. Medium NUMA and large cc-NUMA systems are used in the experiments. This variety of routines, libraries and systems allows us to obtain general conclusions about the methodology to use for linear algebra shared-memory routines auto-tuning. Satisfactory execution times are obtained with the proposed methodology.	es_ES
dc.description.sponsorship	Partially supported by Fundacion Seneca, Consejeria de Educacion de la Region de Murcia, 08763/PI/08, PROMETEO/2009/013 from Generalitat Valenciana, the Spanish Ministry of Education and Science through TIN2012-38341-C04-03, and the High-Performance Computing Network on Parallel Heterogeneus Architectures (CAPAP-H). The authors gratefully acknowledge the computer resources and assistance provided by the Supercomputing Centre of the Scientific Park Foundation of Murcia and by the Centre de Supercomputacio de Catalunya.	en_EN
dc.language	Inglés	es_ES
dc.publisher	Springer Verlag (Germany)	es_ES
dc.relation.ispartof	International Journal of Parallel Programming	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Linear algebra libraries	es_ES
dc.subject	Linear algebra routines	es_ES
dc.subject	Empirical installation	es_ES
dc.subject	Shared-memory	es_ES
dc.subject	Auto-tuning	es_ES
dc.subject.classification	CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL	es_ES
dc.title	Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1007/s10766-013-0249-6
dc.relation.projectID	info:eu-repo/grantAgreement/Generalitat Valenciana//PROMETEO09%2F2009%2F013/ES/Computacion de altas prestaciones sobre arquitecturas actuales en porblemas de procesado múltiple de señal/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2012-38341-C04-03/ES/MEJORA DE ARQUITECTURA DE SERVIDORES, SERVICIOS Y APLICACIONES/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/f SéNeCa//08763%2FPI%2F08/ES/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Cámara, J.; Cuenca, J.; Giménez, D.; García, LP.; Vidal Maciá, AM. (2014). Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning. International Journal of Parallel Programming. 42(3):408-434. https://doi.org/10.1007/s10766-013-0249-6	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://dx.doi.org/10.1007/s10766-013-0249-6	es_ES
dc.description.upvformatpinicio	408	es_ES
dc.description.upvformatpfin	434	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	42	es_ES
dc.description.issue	3	es_ES
dc.relation.senia	264296
dc.contributor.funder	Fundación Séneca-Agencia de Ciencia y Tecnología de la Región de Murcia	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Comunidad Autónoma de la Región de Murcia	es_ES
dc.description.references	Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J. Phys. Conf. Ser. 180(1), 1–5 (2009)	es_ES
dc.description.references	Alberti, P., Alonso, P., Vidal, A.M., Cuenca, J., Giménez, D.: Designing polylibraries to speed up linear algebra computations. Int. J. High Perform. Comput. Netw. 1/2/3(1), 75–84 (2004)	es_ES
dc.description.references	Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J.J., Du Croz, J., Grenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., D. Sorensen, S.: LAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia (1995)	es_ES
dc.description.references	Bernabé, G., Cuenca, J., Giménez, D.: Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs. In: ICCS (2013)	es_ES
dc.description.references	Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)	es_ES
dc.description.references	Cámara, J., Cuenca, J., Giménez, D., Vidal. A.M.: Empirical autotuning of two-level parallel linear algebra routines on large cc-NUMA systems. In: ISPA (2012)	es_ES
dc.description.references	Caron, E., Desprez, F., Suter, F.: Parallel extension of a dynamic performance forecasting tool. Scalable Comput. Pract. Exp. 6(1), 57–69 (2005)	es_ES
dc.description.references	Chen, Z., Dongarra, J., Luszczek, P., Roche, K.: Self adapting software for numerical linear algebra and LAPACK for clusters. Parallel Comput. 29, 1723–1743 (2003)	es_ES
dc.description.references	Cuenca, J., Giménez, D., González, J.: Achitecture of an automatic tuned linear algebra library. Parallel Comput. 30(2), 187–220 (2004)	es_ES
dc.description.references	Cuenca, J., García, L.P., Giménez, D.: Improving linear algebra computation on NUMA platforms through auto-tuned nested parallelism. In: Proceedings of the 2012 EUROMICRO Conference on Parallel, Distributed and Network Processing (2012)	es_ES
dc.description.references	Frigo, M.: FFTW: An adaptive software architecture for the FFT. In: Proceedings of the ICASSP Conference, vol. 3, p. 1381 (1998)	es_ES
dc.description.references	Golub, G., Van Loan, C.F.: Matrix Computations, 3rd edn. The John Hopkins University Press, Baltimore (1996)	es_ES
dc.description.references	Im, E.-J., Yelick, K., Vuduc, R.: Sparsity: optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl. (IJHPCA) 18(1), 135–158 (2004)	es_ES
dc.description.references	Intel MKL web page.: http://software.intel.com/en-us/intel-mkl/	es_ES
dc.description.references	Jerez, S., Montávez, J.-P., Giménez, D.: Optimizing the execution of a parallel meteorology simulation code. In: Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium. IEEE (2009)	es_ES
dc.description.references	Katagiri, T., Kise, K., Honda, H., Yuba, T.: Fiber: a generalized framework for auto-tuning software. Springer LNCS 2858, 146–159 (2003)	es_ES
dc.description.references	Katagiri, T., Kise, K., Honda, H., Yuba, T.: ABCLib-DRSSED: a parallel eigensolver with an auto-tuning facility. Parallel Comput. 32(3), 231–250 (2006)	es_ES
dc.description.references	Kurzak, J., Tomov, S., Dongarra, J.: Autotuning gemm kernels for the FERMI GPU. IEEE Trans. Parallel Distrib. Syst. 23(11), 2045–2057 (2012)	es_ES
dc.description.references	Lastovetsky, A.L., Reddy, R., Higgins, R.: Building the functional performance model of a processor. In: SAC, pp. 746–753 (2006)	es_ES
dc.description.references	Li, J., Skjellum, A., Falgout, R.D.: A poly-algorithm for parallel dense matrix multiplication on two-dimensional process grid topologies. Concurrency Pract. Exp. 9(5), 345–389 (1997)	es_ES
dc.description.references	Naono, K., Teranishi, K., Cavazos, J., Suda, R., (eds.): Software Automatic Tuning. From Concepts to State-of-the-Art Results. Springer, Berlin (2010)	es_ES
dc.description.references	Nath, R., Tomov, S., Dongarra, J.: An improved MAGMA gemm for FERMI graphics processing units. IJHPCA 24(4), 511–515 (2010)	es_ES
dc.description.references	Petitet, A., Blackford, L.S., Dongarra, J., Ellis, B., Fagg, G.E., Roche, K., Vadhiyar, S.S.: Numerical libraries and the grid. IJHPCA 15(4), 359–374 (2001)	es_ES
dc.description.references	PLASMA.: http://icl.cs.utk.edu/plasma/	es_ES
dc.description.references	Püschel, M., Moura, J.M.F., Singer, B., Xiong, J., Johnson, J.R., Padua, D.A., Veloso, M.M., Johnson, R.W.: Spiral: a generator for platform-adapted libraries of signal processing algorithms. IJHPCA 18(1), 21–45 (2004)	es_ES
dc.description.references	Seshagiri, L., Wu, M.-S., Sosonkina, M., Zhang, Z., Gordon, M.S., Schmidt, M.W.: Enhancing adaptive middleware for quantum chemistry applications with a database framework. In: IPDPS Workshops, pp. 1–8 (2010)	es_ES
dc.description.references	Tanaka, T., Katagiri, T., Yuba, T.: d-Spline based incremental parameter estimation in automatic performance tuning. In: PARA, pp. 986–995 (2006)	es_ES
dc.description.references	Vuduc, R., Demmel, J., Bilmes, J.: Statistical models for automatic performance tuning. In: International Conference on Computational Science (1), pp. 117–126 (2001)	es_ES
dc.description.references	Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 27(1–2), 3–35 (2001)	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [47175]

Mostrar el registro sencillo del ítem

Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)