- -

Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

Show simple item record

Files in this item

dc.contributor.author Cámara, Jesús es_ES
dc.contributor.author Cuenca, Javier es_ES
dc.contributor.author Giménez, Domingo es_ES
dc.contributor.author García, Luis Pedro es_ES
dc.contributor.author Vidal Maciá, Antonio Manuel es_ES
dc.date.accessioned 2015-04-27T08:10:03Z
dc.date.available 2015-04-27T08:10:03Z
dc.date.issued 2014-06
dc.identifier.issn 0885-7458
dc.identifier.uri http://hdl.handle.net/10251/49284
dc.description The final publication is available at Springer via http://dx.doi.org/10.1007/s10766-013-0249-6 es_ES
dc.description.abstract The introduction of auto-tuning techniques in linear algebra shared-memory routines is analyzed. Information obtained in the installation of the routines is used at running time to take some decisions to reduce the total execution time. The study is carried out with routines at different levels (matrix multiplication, LU and Cholesky factorizations and linear systems symmetric or general routines) and with calls to routines in the LAPACK and PLASMA libraries with multithread implementations. Medium NUMA and large cc-NUMA systems are used in the experiments. This variety of routines, libraries and systems allows us to obtain general conclusions about the methodology to use for linear algebra shared-memory routines auto-tuning. Satisfactory execution times are obtained with the proposed methodology. es_ES
dc.description.sponsorship Partially supported by Fundacion Seneca, Consejeria de Educacion de la Region de Murcia, 08763/PI/08, PROMETEO/2009/013 from Generalitat Valenciana, the Spanish Ministry of Education and Science through TIN2012-38341-C04-03, and the High-Performance Computing Network on Parallel Heterogeneus Architectures (CAPAP-H). The authors gratefully acknowledge the computer resources and assistance provided by the Supercomputing Centre of the Scientific Park Foundation of Murcia and by the Centre de Supercomputacio de Catalunya. en_EN
dc.language Inglés es_ES
dc.publisher Springer Verlag (Germany) es_ES
dc.relation Fundación Séneca es_ES
dc.relation Consejería de Educación de la Región de Murcia es_ES
dc.relation Generalitat Valenciana [08763/PI/08 PROMETEO/2009/013] es_ES
dc.relation Spanish Ministry of Education and Science [TIN2012-38341-C04-03] es_ES
dc.relation High-Performance Computing Network on Parallel Heterogeneus Architectures (CAPAP-H) es_ES
dc.relation.ispartof International Journal of Parallel Programming es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Linear algebra libraries es_ES
dc.subject Linear algebra routines es_ES
dc.subject Empirical installation es_ES
dc.subject Shared-memory es_ES
dc.subject Auto-tuning es_ES
dc.subject.classification CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL es_ES
dc.title Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s10766-013-0249-6
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Cámara, J.; Cuenca, J.; Giménez, D.; García, LP.; Vidal Maciá, AM. (2014). Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning. International Journal of Parallel Programming. 42(3):408-434. doi:10.1007/s10766-013-0249-6 es_ES
dc.description.accrualMethod Senia es_ES
dc.relation.publisherversion http://dx.doi.org/10.1007/s10766-013-0249-6 es_ES
dc.description.upvformatpinicio 408 es_ES
dc.description.upvformatpfin 434 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 42 es_ES
dc.description.issue 3 es_ES
dc.relation.senia 264296
dc.relation.references Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J. Phys. Conf. Ser. 180(1), 1–5 (2009) es_ES
dc.relation.references Alberti, P., Alonso, P., Vidal, A.M., Cuenca, J., Giménez, D.: Designing polylibraries to speed up linear algebra computations. Int. J. High Perform. Comput. Netw. 1/2/3(1), 75–84 (2004) es_ES
dc.relation.references Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J.J., Du Croz, J., Grenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., D. Sorensen, S.: LAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia (1995) es_ES
dc.relation.references Bernabé, G., Cuenca, J., Giménez, D.: Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs. In: ICCS (2013) es_ES
dc.relation.references Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009) es_ES
dc.relation.references Cámara, J., Cuenca, J., Giménez, D., Vidal. A.M.: Empirical autotuning of two-level parallel linear algebra routines on large cc-NUMA systems. In: ISPA (2012) es_ES
dc.relation.references Caron, E., Desprez, F., Suter, F.: Parallel extension of a dynamic performance forecasting tool. Scalable Comput. Pract. Exp. 6(1), 57–69 (2005) es_ES
dc.relation.references Chen, Z., Dongarra, J., Luszczek, P., Roche, K.: Self adapting software for numerical linear algebra and LAPACK for clusters. Parallel Comput. 29, 1723–1743 (2003) es_ES
dc.relation.references Cuenca, J., Giménez, D., González, J.: Achitecture of an automatic tuned linear algebra library. Parallel Comput. 30(2), 187–220 (2004) es_ES
dc.relation.references Cuenca, J., García, L.P., Giménez, D.: Improving linear algebra computation on NUMA platforms through auto-tuned nested parallelism. In: Proceedings of the 2012 EUROMICRO Conference on Parallel, Distributed and Network Processing (2012) es_ES
dc.relation.references Frigo, M.: FFTW: An adaptive software architecture for the FFT. In: Proceedings of the ICASSP Conference, vol. 3, p. 1381 (1998) es_ES
dc.relation.references Golub, G., Van Loan, C.F.: Matrix Computations, 3rd edn. The John Hopkins University Press, Baltimore (1996) es_ES
dc.relation.references Im, E.-J., Yelick, K., Vuduc, R.: Sparsity: optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl. (IJHPCA) 18(1), 135–158 (2004) es_ES
dc.relation.references Intel MKL web page.: http://software.intel.com/en-us/intel-mkl/ es_ES
dc.relation.references Jerez, S., Montávez, J.-P., Giménez, D.: Optimizing the execution of a parallel meteorology simulation code. In: Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium. IEEE (2009) es_ES
dc.relation.references Katagiri, T., Kise, K., Honda, H., Yuba, T.: Fiber: a generalized framework for auto-tuning software. Springer LNCS 2858, 146–159 (2003) es_ES
dc.relation.references Katagiri, T., Kise, K., Honda, H., Yuba, T.: ABCLib-DRSSED: a parallel eigensolver with an auto-tuning facility. Parallel Comput. 32(3), 231–250 (2006) es_ES
dc.relation.references Kurzak, J., Tomov, S., Dongarra, J.: Autotuning gemm kernels for the FERMI GPU. IEEE Trans. Parallel Distrib. Syst. 23(11), 2045–2057 (2012) es_ES
dc.relation.references Lastovetsky, A.L., Reddy, R., Higgins, R.: Building the functional performance model of a processor. In: SAC, pp. 746–753 (2006) es_ES
dc.relation.references Li, J., Skjellum, A., Falgout, R.D.: A poly-algorithm for parallel dense matrix multiplication on two-dimensional process grid topologies. Concurrency Pract. Exp. 9(5), 345–389 (1997) es_ES
dc.relation.references Naono, K., Teranishi, K., Cavazos, J., Suda, R., (eds.): Software Automatic Tuning. From Concepts to State-of-the-Art Results. Springer, Berlin (2010) es_ES
dc.relation.references Nath, R., Tomov, S., Dongarra, J.: An improved MAGMA gemm for FERMI graphics processing units. IJHPCA 24(4), 511–515 (2010) es_ES
dc.relation.references Petitet, A., Blackford, L.S., Dongarra, J., Ellis, B., Fagg, G.E., Roche, K., Vadhiyar, S.S.: Numerical libraries and the grid. IJHPCA 15(4), 359–374 (2001) es_ES
dc.relation.references PLASMA.: http://icl.cs.utk.edu/plasma/ es_ES
dc.relation.references Püschel, M., Moura, J.M.F., Singer, B., Xiong, J., Johnson, J.R., Padua, D.A., Veloso, M.M., Johnson, R.W.: Spiral: a generator for platform-adapted libraries of signal processing algorithms. IJHPCA 18(1), 21–45 (2004) es_ES
dc.relation.references Seshagiri, L., Wu, M.-S., Sosonkina, M., Zhang, Z., Gordon, M.S., Schmidt, M.W.: Enhancing adaptive middleware for quantum chemistry applications with a database framework. In: IPDPS Workshops, pp. 1–8 (2010) es_ES
dc.relation.references Tanaka, T., Katagiri, T., Yuba, T.: d-Spline based incremental parameter estimation in automatic performance tuning. In: PARA, pp. 986–995 (2006) es_ES
dc.relation.references Vuduc, R., Demmel, J., Bilmes, J.: Statistical models for automatic performance tuning. In: International Conference on Computational Science (1), pp. 117–126 (2001) es_ES
dc.relation.references Whaley, R.C., Petitet, A., Dongarra, J.: Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 27(1–2), 3–35 (2001) es_ES


This item appears in the following Collection(s)

Show simple item record