Integration and exploitation of intra-routine malleability in BLIS

Rodríguez-Sánchez, Rafael; Igual, Francisco D.; Quintana-Ortí, Enrique S.

doi:10.1007/s11227-019-03078-z

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Integration and exploitation of intra-routine malleability in BLIS

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Rodriguez-Sanchez ...

Tamaño: 477.8Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: RodrÃguez-SÃ¡nch ...

Tamaño: 1.256Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Rodríguez-Sánchez, Rafael	es_ES
dc.contributor.author	Igual, Francisco D.	es_ES
dc.contributor.author	Quintana-Ortí, Enrique S.	es_ES
dc.date.accessioned	2021-11-10T19:05:36Z
dc.date.available	2021-11-10T19:05:36Z
dc.date.issued	2020-04	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/176790
dc.description.abstract	[EN] Malleability is a property of certain applications (or tasks) that, given an external request or autonomously, can accommodate a dynamic modification of the degree of parallelism being exploited at runtime. Malleability improves resource usage (core occupation) on modern multicore architectures for applications that exhibit irregular and divergent execution paths and heavily depend on the underlying library performance to attain high performance. The integration of malleability within high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is nonexistent, and, in addition, it is difficult to attain given the rigidity of current application programming interfaces (APIs). In this paper, we overcome these issues presenting the integration of a malleability mechanism within BLIS, a high-performance and portable framework to implement BLAS-like operations. For this purpose, we leverage low-level (yet simple) APIs to integrate on-demand malleability across all Level-3 BLAS routines, and we demonstrate the performance benefits of this approach by means of a higher-level dense matrix operation: the LU factorization with partial pivoting and look-ahead	es_ES
dc.description.sponsorship	The researchers from Universidad Complutense de Madrid were supported by the EU (FEDER) and Spanish MINECO (TIN2015-65277-R, RTI2018-093684-B-I00), and by Spanish CM (S2018/TCS-4423). The researcher from Universitat Poliecnica de Valencia was supported by the Spanish MINECO (TIN2017-82972-R)	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer-Verlag	es_ES
dc.relation.ispartof	The Journal of Supercomputing (Online)	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Malleability	es_ES
dc.subject	Linear algebra	es_ES
dc.subject	BLAS	es_ES
dc.subject	Multicore architectures	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	Integration and exploitation of intra-routine malleability in BLIS	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1007/s11227-019-03078-z	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/CAM//S2018%2FTCS-4423 /	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-093684-B-I00/ES/HETEROGENEIDAD Y ESPECIALIZACION EN LA ERA POST-MOORE/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2015-65277-R/ES/COMPPUTACION HETEROGENEA EFICIENTE: DEL PROCESADOR AL DATACENTER/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.description.bibliographicCitation	Rodríguez-Sánchez, R.; Igual, FD.; Quintana-Ortí, ES. (2020). Integration and exploitation of intra-routine malleability in BLIS. The Journal of Supercomputing (Online). 76(4):2860-2875. https://doi.org/10.1007/s11227-019-03078-z	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1007/s11227-019-03078-z	es_ES
dc.description.upvformatpinicio	2860	es_ES
dc.description.upvformatpfin	2875	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	76	es_ES
dc.description.issue	4	es_ES
dc.identifier.eissn	1573-0484	es_ES
dc.relation.pasarela	S\417896	es_ES
dc.contributor.funder	Comunidad de Madrid	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.description.references	Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp Spec Issue Euro Par 2009(23):187–198	es_ES
dc.description.references	Catalán S, Castelló A, Igual FD, Rodríguez-Sánchez R, Quintana-Ortí ES (2019) Programming parallel dense matrix factorizations with look-ahead and OpenMP. Cluster Comput. https://doi.org/10.1007/s10586-019-02927-z	es_ES
dc.description.references	Catalán S, Herrero JR, Quintana-Ortí ES, Rodríguez-Sánchez R, Van De Geijn R (2019) A case for malleable thread-level linear algebra libraries: the LU factorization with partial pivoting. IEEE Access 7:17617–17633	es_ES
dc.description.references	Catalán S, Igual FD, Mayo R, Rodríguez-Sánchez R, Quintana-Ortí ES (2016) Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors. Cluster Comput 19(3):1037–1051	es_ES
dc.description.references	Chan E, Van Zee FG, Bientinesi P, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn R (2008)Supermatrix: A multithreaded runtime scheduling system for algorithms-by-blocks. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York, pp 123–132	es_ES
dc.description.references	Corporation I (2019) Intel ® math kernel library developer reference. Tech rep, Intel Corporation. https://software.intel.com/sites/default/files/mkl-2019-developer-reference-c_2.pdf. Accessed 13 Nov 2019	es_ES
dc.description.references	Dolz MF, Igual FD, Ludwig T, Piñuel L, Quintana-Ortí ES (2015) Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the intel xeon phi. Comput Electr Eng 46:95–111	es_ES
dc.description.references	Dongarra JJ, Du Croz J, Hammarling S, Duff IS (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17	es_ES
dc.description.references	Duran A, Ayguadé E, Badia RM, Labarta J, Martinell L, Martorell X, Planas J (2011) OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process Lett 21(2):173–193	es_ES
dc.description.references	Gates M, Luszczek P, Abdelfattah A, Kurzak J, Dongarra J, Arturov K, Cecka C, Freitag C (2018) C++ API for BLAS and LAPACK. Tech Rep 2, ICL-UT-17-03 (2017). Revision 21 Feb 2018	es_ES
dc.description.references	Guennebaud G, Jacob B et al (2019) Eigen v3. http://eigen.tuxfamily.org. Accessed 13 Nov 2019	es_ES
dc.description.references	LAPACK project home page. http://www.netlib.org/lapack. Accessed 13 Nov 2019	es_ES
dc.description.references	Leung J, Kelly L, Anderson JH (2004) Handbook of scheduling: algorithms, models, and performance analysis. CRC Press Inc, Boca Raton, FL	es_ES
dc.description.references	Smith TM, van de Geijn RA, Smelyanskiy M, Hammond JR, Van Zee FG (2014) Anatomy of high-performance many-threaded matrix multiplication. In: 28th IEEE International Parallel & Distributed Processing Symposium	es_ES
dc.description.references	Strazdins P (1998) A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Tech Rep TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia	es_ES
dc.description.references	Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimization of software and the ATLAS project. Parallel Comput 27(1–2):3–35	es_ES
dc.description.references	Van Zee FG, Implementing high-performance complex matrix multiplication via the 1m method. ACM Trans Math Softw (submitted)	es_ES
dc.description.references	Van Zee FG, van de Geijn RA (2015) BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans Math Softw 41(3):14:1–14:33	es_ES
dc.description.references	Van Zee FG, Parikh DN, van de Geijn RA, Supporting mixed-domain mixed-precision matrix multiplication within the BLIS framework. ACM Trans Math Softw (submitted)	es_ES
dc.description.references	Van Zee FG, Smith T (2017) Implementing high-performance complex matrix multiplication via the 3m and 4m methods. ACM Trans Math Softw 44(1):7:1–7:36	es_ES
dc.description.references	Van Zee FG, Smith T, Igual FD, Smelyanskiy M, Zhang X, Kistler M, Austel V, Gunnels J, Low TM, Marker B, Killough L, van de Geijn RA (2016) The BLIS framework: experiments in portability. ACM Trans Math Softw 42(2):12:1–12:19	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Integration and exploitation of intra-routine malleability in BLIS

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Integration and exploitation of intra-routine malleability in BLIS

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)