- -

Programming parallel dense matrix factorizations with look-ahead and OpenMP

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Programming parallel dense matrix factorizations with look-ahead and OpenMP

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Catalán, Sandra es_ES
dc.contributor.author Castelló, Adrián es_ES
dc.contributor.author Igual, Francisco D. es_ES
dc.contributor.author Rodríguez-Sánchez, Rafael es_ES
dc.contributor.author Quintana Ortí, Enrique Salvador es_ES
dc.date.accessioned 2021-05-14T03:31:47Z
dc.date.available 2021-05-14T03:31:47Z
dc.date.issued 2020-03 es_ES
dc.identifier.issn 1386-7857 es_ES
dc.identifier.uri http://hdl.handle.net/10251/166343
dc.description.abstract [EN] We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded version of basic linear algebra subroutines (BLAS). The proposed approach is also different from the more sophisticated runtime-based implementations, which decompose the operation into tasks and identify dependencies via directives and runtime support. Instead, our strategy attains high performance by explicitly embedding a static look-ahead technique into the DMF code, in order to overcome the performance bottleneck of the panel factorization, and realizing the trailing update via a cache-aware multi-threaded implementation of the BLAS. Although the parallel algorithms are specified with a high level of abstraction, the actual implementation can be easily derived from them, paving the road to deriving a high performance implementation of a considerable fraction of linear algebra package (LAPACK) functionality on any multicore platform with an OpenMP-like runtime. es_ES
dc.description.sponsorship The researchers from Universidad Jaume I were supported by the CICYT Projects TIN2014-53495-R and TIN2017-82972-R of the MINECO and FEDER, and the H2020 EU FETHPC Project 671602 "INTERTWinE". The researchers from Universidad Complutense de Madrid were supported by the CICYT Project TIN2015-65277-R of the MINECO and FEDER. Sandra Catalan was supported during part of this time by the FPU program of the Ministerio de Educacion, Cultura y Deporte. Adrian Castello was supported by the ValI+D 2015 FPI program of the Generalitat Valenciana. es_ES
dc.language Inglés es_ES
dc.publisher Springer-Verlag es_ES
dc.relation.ispartof Cluster Computing es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Matrix factorizations es_ES
dc.subject Look-ahead es_ES
dc.subject Multi-threading es_ES
dc.subject OpenMP es_ES
dc.subject Lightweight threads es_ES
dc.subject High performance computing es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Programming parallel dense matrix factorizations with look-ahead and OpenMP es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s10586-019-02927-z es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/671602/EU/Programming Model INTERoperability ToWards Exascale (INTERTWinE)/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2014-53495-R/ES/COMPUTACION HETEROGENEA DE BAJO CONSUMO/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2015-65277-R/ES/COMPPUTACION HETEROGENEA EFICIENTE: DEL PROCESADOR AL DATACENTER/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Catalán, S.; Castelló, A.; Igual, FD.; Rodríguez-Sánchez, R.; Quintana Ortí, ES. (2020). Programming parallel dense matrix factorizations with look-ahead and OpenMP. Cluster Computing. 23(1):359-375. https://doi.org/10.1007/s10586-019-02927-z es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1007/s10586-019-02927-z es_ES
dc.description.upvformatpinicio 359 es_ES
dc.description.upvformatpfin 375 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 23 es_ES
dc.description.issue 1 es_ES
dc.relation.pasarela S\402140 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.contributor.funder Ministerio de Educación, Cultura y Deporte es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.description.references Anderson, E., Bai, Z., Susan Blackford, L., Demmel, J., Dongarra, J.J., Croz, J.D., Hammarling, S., Greenbaum, A., McKenney, A., Sorensen, D.C.: LAPACK Users’ guide. SIAM, 3rd edition (1999) es_ES
dc.description.references Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using SMPSs. Conc. Comp. 21, 2438–2456 (2009) es_ES
dc.description.references Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana-Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1–26 (2005) es_ES
dc.description.references Bischof, C.H., Lang, B., Sun, X.: Algorithm 807: the SBR toolbox–software for successive band reduction. ACM Trans. Math. Softw. 26(4), 602–616 (2000) es_ES
dc.description.references Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009) es_ES
dc.description.references Castelló, A., Mayo, R., Sala, K., Beltran, V., Balaji, P., Peña, A.J.: On the adequacy of lightweight thread approaches for high-level parallel programming models. Future Gener. Comput. Syst. 84, 22–31 (2018) es_ES
dc.description.references Castelló, A., Peña, A.J., Seo, S., Mayo, R., Balaji, P., Quintana-Ortí, E.S.: A review of lightweight thread approaches for high performance computing. In: Proceedings of the IEEE International Conference on Cluster Computing, Taipei, Taiwan (September 2016) es_ES
dc.description.references Castelló, A., Seo, S., Mayo, R., Balaji, P., Quintana-Ortí, E.S., Peña, A.J.: GLT: a unified API for lightweight thread libraries. In: Proceedings of the IEEE International European Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain (August 2017) es_ES
dc.description.references Castelló, A., Seo, S., Mayo, R., Balaji, P., Quintana-Ortí, E.S., Peña, A.J.: GLTO: on the adequacy of lightweight thread approaches for OpenMP implementations. In: Proceedings of the International Conference on Parallel Processing, Bristol, UK (August 2017) es_ES
dc.description.references Catalán, S, Herrero, JR., Quintana-Ortí, E.S., Rodríguez-Sánchez, R., van de Geijn, R.A.: A case for malleable thread-level linear algebra libraries: The LU factorization with partial pivoting. CoRR (2016) arXiv:1611.06365 es_ES
dc.description.references Catalán, S., Igual, F.D., Mayo, R., Rguez-Sánchez, R.: Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors. Clust. Comput. 19(3), 1037–1051 (2016) es_ES
dc.description.references Chameleon project. http://project.inria.fr/chameleon/ es_ES
dc.description.references Demmel, J.: Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, Paris (1997) es_ES
dc.description.references Dongarra, J.J., Croz, J.D., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990) es_ES
dc.description.references FLAME project home page. http://www.cs.utexas.edu/users/flame/ es_ES
dc.description.references Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996) es_ES
dc.description.references Goto, K., van de Geijn, R.A.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008) es_ES
dc.description.references Goto, K., van de Geijn, R.: High performance implementation of the level-3 BLAS. ACM Trans. Math. Softw. 35(1), 4:1–4:14 (2008) es_ES
dc.description.references Grosser, B., Lang, B.: Efficient parallel reduction to bidiagonal form. Parallel Comput. 25(8), 969–986 (1999) es_ES
dc.description.references Gunter, B.C., van de Geijn, R.A.: Parallel out-of-core computation and updating the QR factorization. ACM Trans. Math. Soft. 31(1), 60–78 (2005) es_ES
dc.description.references IBM. Engineering and Scientific Subroutine Library. http://www-03.ibm.com/systems/power/software/essl/ (2015) es_ES
dc.description.references Intel. Math Kernel Library. https://software.intel.com/en-us/intel-mkl (2015) es_ES
dc.description.references OmpSs project home page. http://pm.bsc.es/ompss es_ES
dc.description.references http://www.openblas.net (2015) es_ES
dc.description.references OpenMP API specification for parallel programming. http://www.openmp.org (2017) es_ES
dc.description.references PLASMA project home page. http://icl.cs.utk.edu/plasma es_ES
dc.description.references Quintana-Ortí, E.S., van de Geijn, R.A.: Updating an LU factorization with pivoting. ACM Trans. Math. Softw. 35(2), 11:1–11:16 (2008) es_ES
dc.description.references Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36(3), 14:1–14:26 (2009) es_ES
dc.description.references Rodríguez-Sánchez, R., Catalán, Sandra, H., José, R., Quintana-Ortí, E.S., Tomás, A.E.: Two-sided reduction to compact band forms with look-ahead (2017) CoRR, arXiv:1709.00302 es_ES
dc.description.references Seo, S., Amer, A., Balaji, P., Bordage, C., Bosilca, G., Brooks, A., Carns, P., Castelló, A., Genet, D., Herault, T., Iwasaki, S., Jindal, P., Kale, S., Krishnamoorthy, S., Lifflander, J., Lu, H., Meneses, E., Snir, M., Sun, Y., Taura, K., Beckman, P.: Argobots: a lightweight low-level threading and tasking framework. IEEE Trans. Parallel Distrib. Syst. PP(99), 1–1 (2017) es_ES
dc.description.references Smith, T.M., van de Geijn, R., Smelyanskiy, M., Hammond, J.R., Van Zee, F.G.: Anatomy of high-performance many-threaded matrix multiplication. In: Proceedings of IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS’14, pp. 1049–1059 (2014) es_ES
dc.description.references StarPU project. http://runtime.bordeaux.inria.fr/StarPU/ es_ES
dc.description.references Stein, D., Shah, D.: Implementing lightweight threads. In: USENIX Summer (1992) es_ES
dc.description.references Strazdins, P.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia (1998) es_ES
dc.description.references Van Zee, F.G., van de Geijn, R.A.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. 41(3), 14:1–14:33 (2015) es_ES
dc.description.references Whaley, C.R., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of SC’98 (1998) es_ES
dc.description.references Van Zee, F.G., Smith, T.M., Marker, B., Low, T., Van De Geijn, R.A., Igual, F.D., Smelyanskiy, M., Zhang, X., Kistler, M., Austel, V., Gunnels, J.A., Killough, L.: The BLIS framework: experiments in portability. ACM Trans. Math. Softw. 42(2), 12:1–12:19 (2016) es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem