- -

Integration and exploitation of intra-routine malleability in BLIS

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Integration and exploitation of intra-routine malleability in BLIS

Mostrar el registro completo del ítem

Rodríguez-Sánchez, R.; Igual, FD.; Quintana-Ortí, ES. (2020). Integration and exploitation of intra-routine malleability in BLIS. The Journal of Supercomputing (Online). 76(4):2860-2875. https://doi.org/10.1007/s11227-019-03078-z

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/176790

Ficheros en el ítem

Metadatos del ítem

Título: Integration and exploitation of intra-routine malleability in BLIS
Autor: Rodríguez-Sánchez, Rafael Igual, Francisco D. Quintana-Ortí, Enrique S.
Entidad UPV: Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors
Fecha difusión:
Resumen:
[EN] Malleability is a property of certain applications (or tasks) that, given an external request or autonomously, can accommodate a dynamic modification of the degree of parallelism being exploited at runtime. Malleability ...[+]
Palabras clave: Malleability , Linear algebra , BLAS , Multicore architectures
Derechos de uso: Reserva de todos los derechos
Fuente:
The Journal of Supercomputing (Online). (eissn: 1573-0484 )
DOI: 10.1007/s11227-019-03078-z
Editorial:
Springer-Verlag
Versión del editor: https://doi.org/10.1007/s11227-019-03078-z
Código del Proyecto:
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/
info:eu-repo/grantAgreement/CAM//S2018%2FTCS-4423 /
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-093684-B-I00/ES/HETEROGENEIDAD Y ESPECIALIZACION EN LA ERA POST-MOORE/
info:eu-repo/grantAgreement/MINECO//TIN2015-65277-R/ES/COMPPUTACION HETEROGENEA EFICIENTE: DEL PROCESADOR AL DATACENTER/
Agradecimientos:
The researchers from Universidad Complutense de Madrid were supported by the EU (FEDER) and Spanish MINECO (TIN2015-65277-R, RTI2018-093684-B-I00), and by Spanish CM (S2018/TCS-4423). The researcher from Universitat ...[+]
Tipo: Artículo

References

Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp Spec Issue Euro Par 2009(23):187–198

Catalán S, Castelló A, Igual FD, Rodríguez-Sánchez R, Quintana-Ortí ES (2019) Programming parallel dense matrix factorizations with look-ahead and OpenMP. Cluster Comput. https://doi.org/10.1007/s10586-019-02927-z

Catalán S, Herrero JR, Quintana-Ortí ES, Rodríguez-Sánchez R, Van De Geijn R (2019) A case for malleable thread-level linear algebra libraries: the LU factorization with partial pivoting. IEEE Access 7:17617–17633 [+]
Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp Spec Issue Euro Par 2009(23):187–198

Catalán S, Castelló A, Igual FD, Rodríguez-Sánchez R, Quintana-Ortí ES (2019) Programming parallel dense matrix factorizations with look-ahead and OpenMP. Cluster Comput. https://doi.org/10.1007/s10586-019-02927-z

Catalán S, Herrero JR, Quintana-Ortí ES, Rodríguez-Sánchez R, Van De Geijn R (2019) A case for malleable thread-level linear algebra libraries: the LU factorization with partial pivoting. IEEE Access 7:17617–17633

Catalán S, Igual FD, Mayo R, Rodríguez-Sánchez R, Quintana-Ortí ES (2016) Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors. Cluster Comput 19(3):1037–1051

Chan E, Van Zee FG, Bientinesi P, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn R (2008)Supermatrix: A multithreaded runtime scheduling system for algorithms-by-blocks. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York, pp 123–132

Corporation I (2019) Intel ® math kernel library developer reference. Tech rep, Intel Corporation. https://software.intel.com/sites/default/files/mkl-2019-developer-reference-c_2.pdf. Accessed 13 Nov 2019

Dolz MF, Igual FD, Ludwig T, Piñuel L, Quintana-Ortí ES (2015) Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the intel xeon phi. Comput Electr Eng 46:95–111

Dongarra JJ, Du Croz J, Hammarling S, Duff IS (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17

Duran A, Ayguadé E, Badia RM, Labarta J, Martinell L, Martorell X, Planas J (2011) OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process Lett 21(2):173–193

Gates M, Luszczek P, Abdelfattah A, Kurzak J, Dongarra J, Arturov K, Cecka C, Freitag C (2018) C++ API for BLAS and LAPACK. Tech Rep 2, ICL-UT-17-03 (2017). Revision 21 Feb 2018

Guennebaud G, Jacob B et al (2019) Eigen v3. http://eigen.tuxfamily.org. Accessed 13 Nov 2019

LAPACK project home page. http://www.netlib.org/lapack. Accessed 13 Nov 2019

Leung J, Kelly L, Anderson JH (2004) Handbook of scheduling: algorithms, models, and performance analysis. CRC Press Inc, Boca Raton, FL

Smith TM, van de Geijn RA, Smelyanskiy M, Hammond JR, Van Zee FG (2014) Anatomy of high-performance many-threaded matrix multiplication. In: 28th IEEE International Parallel & Distributed Processing Symposium

Strazdins P (1998) A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Tech Rep TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia

Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimization of software and the ATLAS project. Parallel Comput 27(1–2):3–35

Van Zee FG, Implementing high-performance complex matrix multiplication via the 1m method. ACM Trans Math Softw (submitted)

Van Zee FG, van de Geijn RA (2015) BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans Math Softw 41(3):14:1–14:33

Van Zee FG, Parikh DN, van de Geijn RA, Supporting mixed-domain mixed-precision matrix multiplication within the BLIS framework. ACM Trans Math Softw (submitted)

Van Zee FG, Smith T (2017) Implementing high-performance complex matrix multiplication via the 3m and 4m methods. ACM Trans Math Softw 44(1):7:1–7:36

Van Zee FG, Smith T, Igual FD, Smelyanskiy M, Zhang X, Kistler M, Austel V, Gunnels J, Low TM, Marker B, Killough L, van de Geijn RA (2016) The BLIS framework: experiments in portability. ACM Trans Math Softw 42(2):12:1–12:19

[-]

recommendations

 

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro completo del ítem