Programming parallel dense matrix factorizations with look-ahead and OpenMP

Catalán, Sandra; Castelló, Adrián; Igual, Francisco D.; Rodríguez-Sánchez, Rafael; Quintana Ortí, Enrique Salvador

doi:10.1007/s10586-019-02927-z

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Programming parallel dense matrix factorizations with look-ahead and OpenMP

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Catalán;Castelló;Igual ...

Tamaño: 568.0Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: Catalán2020_Artic ...

Tamaño: 1.930Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Catalán, Sandra	es_ES
dc.contributor.author	Castelló, Adrián	es_ES
dc.contributor.author	Igual, Francisco D.	es_ES
dc.contributor.author	Rodríguez-Sánchez, Rafael	es_ES
dc.contributor.author	Quintana Ortí, Enrique Salvador	es_ES
dc.date.accessioned	2021-05-14T03:31:47Z
dc.date.available	2021-05-14T03:31:47Z
dc.date.issued	2020-03	es_ES
dc.identifier.issn	1386-7857	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/166343
dc.description.abstract	[EN] We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded version of basic linear algebra subroutines (BLAS). The proposed approach is also different from the more sophisticated runtime-based implementations, which decompose the operation into tasks and identify dependencies via directives and runtime support. Instead, our strategy attains high performance by explicitly embedding a static look-ahead technique into the DMF code, in order to overcome the performance bottleneck of the panel factorization, and realizing the trailing update via a cache-aware multi-threaded implementation of the BLAS. Although the parallel algorithms are specified with a high level of abstraction, the actual implementation can be easily derived from them, paving the road to deriving a high performance implementation of a considerable fraction of linear algebra package (LAPACK) functionality on any multicore platform with an OpenMP-like runtime.	es_ES
dc.description.sponsorship	The researchers from Universidad Jaume I were supported by the CICYT Projects TIN2014-53495-R and TIN2017-82972-R of the MINECO and FEDER, and the H2020 EU FETHPC Project 671602 "INTERTWinE". The researchers from Universidad Complutense de Madrid were supported by the CICYT Project TIN2015-65277-R of the MINECO and FEDER. Sandra Catalan was supported during part of this time by the FPU program of the Ministerio de Educacion, Cultura y Deporte. Adrian Castello was supported by the ValI+D 2015 FPI program of the Generalitat Valenciana.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer-Verlag	es_ES
dc.relation.ispartof	Cluster Computing	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Matrix factorizations	es_ES
dc.subject	Look-ahead	es_ES
dc.subject	Multi-threading	es_ES
dc.subject	OpenMP	es_ES
dc.subject	Lightweight threads	es_ES
dc.subject	High performance computing	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	Programming parallel dense matrix factorizations with look-ahead and OpenMP	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1007/s10586-019-02927-z	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/671602/EU/Programming Model INTERoperability ToWards Exascale (INTERTWinE)/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2014-53495-R/ES/COMPUTACION HETEROGENEA DE BAJO CONSUMO/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2015-65277-R/ES/COMPPUTACION HETEROGENEA EFICIENTE: DEL PROCESADOR AL DATACENTER/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.description.bibliographicCitation	Catalán, S.; Castelló, A.; Igual, FD.; Rodríguez-Sánchez, R.; Quintana Ortí, ES. (2020). Programming parallel dense matrix factorizations with look-ahead and OpenMP. Cluster Computing. 23(1):359-375. https://doi.org/10.1007/s10586-019-02927-z	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1007/s10586-019-02927-z	es_ES
dc.description.upvformatpinicio	359	es_ES
dc.description.upvformatpfin	375	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	23	es_ES
dc.description.issue	1	es_ES
dc.relation.pasarela	S\402140	es_ES
dc.contributor.funder	European Commission	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.contributor.funder	Ministerio de Educación, Cultura y Deporte	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES
dc.description.references	Anderson, E., Bai, Z., Susan Blackford, L., Demmel, J., Dongarra, J.J., Croz, J.D., Hammarling, S., Greenbaum, A., McKenney, A., Sorensen, D.C.: LAPACK Users’ guide. SIAM, 3rd edition (1999)	es_ES
dc.description.references	Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using SMPSs. Conc. Comp. 21, 2438–2456 (2009)	es_ES
dc.description.references	Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana-Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1–26 (2005)	es_ES
dc.description.references	Bischof, C.H., Lang, B., Sun, X.: Algorithm 807: the SBR toolbox–software for successive band reduction. ACM Trans. Math. Softw. 26(4), 602–616 (2000)	es_ES
dc.description.references	Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)	es_ES
dc.description.references	Castelló, A., Mayo, R., Sala, K., Beltran, V., Balaji, P., Peña, A.J.: On the adequacy of lightweight thread approaches for high-level parallel programming models. Future Gener. Comput. Syst. 84, 22–31 (2018)	es_ES
dc.description.references	Castelló, A., Peña, A.J., Seo, S., Mayo, R., Balaji, P., Quintana-Ortí, E.S.: A review of lightweight thread approaches for high performance computing. In: Proceedings of the IEEE International Conference on Cluster Computing, Taipei, Taiwan (September 2016)	es_ES
dc.description.references	Castelló, A., Seo, S., Mayo, R., Balaji, P., Quintana-Ortí, E.S., Peña, A.J.: GLT: a unified API for lightweight thread libraries. In: Proceedings of the IEEE International European Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain (August 2017)	es_ES
dc.description.references	Castelló, A., Seo, S., Mayo, R., Balaji, P., Quintana-Ortí, E.S., Peña, A.J.: GLTO: on the adequacy of lightweight thread approaches for OpenMP implementations. In: Proceedings of the International Conference on Parallel Processing, Bristol, UK (August 2017)	es_ES
dc.description.references	Catalán, S, Herrero, JR., Quintana-Ortí, E.S., Rodríguez-Sánchez, R., van de Geijn, R.A.: A case for malleable thread-level linear algebra libraries: The LU factorization with partial pivoting. CoRR (2016) arXiv:1611.06365	es_ES
dc.description.references	Catalán, S., Igual, F.D., Mayo, R., Rguez-Sánchez, R.: Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors. Clust. Comput. 19(3), 1037–1051 (2016)	es_ES
dc.description.references	Chameleon project. http://project.inria.fr/chameleon/	es_ES
dc.description.references	Demmel, J.: Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, Paris (1997)	es_ES
dc.description.references	Dongarra, J.J., Croz, J.D., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)	es_ES
dc.description.references	FLAME project home page. http://www.cs.utexas.edu/users/flame/	es_ES
dc.description.references	Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)	es_ES
dc.description.references	Goto, K., van de Geijn, R.A.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008)	es_ES
dc.description.references	Goto, K., van de Geijn, R.: High performance implementation of the level-3 BLAS. ACM Trans. Math. Softw. 35(1), 4:1–4:14 (2008)	es_ES
dc.description.references	Grosser, B., Lang, B.: Efficient parallel reduction to bidiagonal form. Parallel Comput. 25(8), 969–986 (1999)	es_ES
dc.description.references	Gunter, B.C., van de Geijn, R.A.: Parallel out-of-core computation and updating the QR factorization. ACM Trans. Math. Soft. 31(1), 60–78 (2005)	es_ES
dc.description.references	IBM. Engineering and Scientific Subroutine Library. http://www-03.ibm.com/systems/power/software/essl/ (2015)	es_ES
dc.description.references	Intel. Math Kernel Library. https://software.intel.com/en-us/intel-mkl (2015)	es_ES
dc.description.references	OmpSs project home page. http://pm.bsc.es/ompss	es_ES
dc.description.references	http://www.openblas.net (2015)	es_ES
dc.description.references	OpenMP API specification for parallel programming. http://www.openmp.org (2017)	es_ES
dc.description.references	PLASMA project home page. http://icl.cs.utk.edu/plasma	es_ES
dc.description.references	Quintana-Ortí, E.S., van de Geijn, R.A.: Updating an LU factorization with pivoting. ACM Trans. Math. Softw. 35(2), 11:1–11:16 (2008)	es_ES
dc.description.references	Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36(3), 14:1–14:26 (2009)	es_ES
dc.description.references	Rodríguez-Sánchez, R., Catalán, Sandra, H., José, R., Quintana-Ortí, E.S., Tomás, A.E.: Two-sided reduction to compact band forms with look-ahead (2017) CoRR, arXiv:1709.00302	es_ES
dc.description.references	Seo, S., Amer, A., Balaji, P., Bordage, C., Bosilca, G., Brooks, A., Carns, P., Castelló, A., Genet, D., Herault, T., Iwasaki, S., Jindal, P., Kale, S., Krishnamoorthy, S., Lifflander, J., Lu, H., Meneses, E., Snir, M., Sun, Y., Taura, K., Beckman, P.: Argobots: a lightweight low-level threading and tasking framework. IEEE Trans. Parallel Distrib. Syst. PP(99), 1–1 (2017)	es_ES
dc.description.references	Smith, T.M., van de Geijn, R., Smelyanskiy, M., Hammond, J.R., Van Zee, F.G.: Anatomy of high-performance many-threaded matrix multiplication. In: Proceedings of IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS’14, pp. 1049–1059 (2014)	es_ES
dc.description.references	StarPU project. http://runtime.bordeaux.inria.fr/StarPU/	es_ES
dc.description.references	Stein, D., Shah, D.: Implementing lightweight threads. In: USENIX Summer (1992)	es_ES
dc.description.references	Strazdins, P.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia (1998)	es_ES
dc.description.references	Van Zee, F.G., van de Geijn, R.A.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. 41(3), 14:1–14:33 (2015)	es_ES
dc.description.references	Whaley, C.R., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of SC’98 (1998)	es_ES
dc.description.references	Van Zee, F.G., Smith, T.M., Marker, B., Low, T., Van De Geijn, R.A., Igual, F.D., Smelyanskiy, M., Zhang, X., Kistler, M., Austel, V., Gunnels, J.A., Killough, L.: The BLIS framework: experiments in portability. ACM Trans. Math. Softw. 42(2), 12:1–12:19 (2016)	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Programming parallel dense matrix factorizations with look-ahead and OpenMP

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Programming parallel dense matrix factorizations with look-ahead and OpenMP

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)