Mostrar el registro sencillo del ítem
dc.contributor.author | Catalán, Sandra | es_ES |
dc.contributor.author | Castelló, Adrián | es_ES |
dc.contributor.author | Igual, Francisco D. | es_ES |
dc.contributor.author | Rodríguez-Sánchez, Rafael | es_ES |
dc.contributor.author | Quintana Ortí, Enrique Salvador | es_ES |
dc.date.accessioned | 2021-05-14T03:31:47Z | |
dc.date.available | 2021-05-14T03:31:47Z | |
dc.date.issued | 2020-03 | es_ES |
dc.identifier.issn | 1386-7857 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/166343 | |
dc.description.abstract | [EN] We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded version of basic linear algebra subroutines (BLAS). The proposed approach is also different from the more sophisticated runtime-based implementations, which decompose the operation into tasks and identify dependencies via directives and runtime support. Instead, our strategy attains high performance by explicitly embedding a static look-ahead technique into the DMF code, in order to overcome the performance bottleneck of the panel factorization, and realizing the trailing update via a cache-aware multi-threaded implementation of the BLAS. Although the parallel algorithms are specified with a high level of abstraction, the actual implementation can be easily derived from them, paving the road to deriving a high performance implementation of a considerable fraction of linear algebra package (LAPACK) functionality on any multicore platform with an OpenMP-like runtime. | es_ES |
dc.description.sponsorship | The researchers from Universidad Jaume I were supported by the CICYT Projects TIN2014-53495-R and TIN2017-82972-R of the MINECO and FEDER, and the H2020 EU FETHPC Project 671602 "INTERTWinE". The researchers from Universidad Complutense de Madrid were supported by the CICYT Project TIN2015-65277-R of the MINECO and FEDER. Sandra Catalan was supported during part of this time by the FPU program of the Ministerio de Educacion, Cultura y Deporte. Adrian Castello was supported by the ValI+D 2015 FPI program of the Generalitat Valenciana. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Springer-Verlag | es_ES |
dc.relation.ispartof | Cluster Computing | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Matrix factorizations | es_ES |
dc.subject | Look-ahead | es_ES |
dc.subject | Multi-threading | es_ES |
dc.subject | OpenMP | es_ES |
dc.subject | Lightweight threads | es_ES |
dc.subject | High performance computing | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | Programming parallel dense matrix factorizations with look-ahead and OpenMP | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1007/s10586-019-02927-z | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/H2020/671602/EU/Programming Model INTERoperability ToWards Exascale (INTERTWinE)/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2014-53495-R/ES/COMPUTACION HETEROGENEA DE BAJO CONSUMO/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2015-65277-R/ES/COMPPUTACION HETEROGENEA EFICIENTE: DEL PROCESADOR AL DATACENTER/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors | es_ES |
dc.description.bibliographicCitation | Catalán, S.; Castelló, A.; Igual, FD.; Rodríguez-Sánchez, R.; Quintana Ortí, ES. (2020). Programming parallel dense matrix factorizations with look-ahead and OpenMP. Cluster Computing. 23(1):359-375. https://doi.org/10.1007/s10586-019-02927-z | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1007/s10586-019-02927-z | es_ES |
dc.description.upvformatpinicio | 359 | es_ES |
dc.description.upvformatpfin | 375 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 23 | es_ES |
dc.description.issue | 1 | es_ES |
dc.relation.pasarela | S\402140 | es_ES |
dc.contributor.funder | European Commission | es_ES |
dc.contributor.funder | Generalitat Valenciana | es_ES |
dc.contributor.funder | European Regional Development Fund | es_ES |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |
dc.contributor.funder | Ministerio de Educación, Cultura y Deporte | es_ES |
dc.contributor.funder | Agencia Estatal de Investigación | es_ES |
dc.description.references | Anderson, E., Bai, Z., Susan Blackford, L., Demmel, J., Dongarra, J.J., Croz, J.D., Hammarling, S., Greenbaum, A., McKenney, A., Sorensen, D.C.: LAPACK Users’ guide. SIAM, 3rd edition (1999) | es_ES |
dc.description.references | Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using SMPSs. Conc. Comp. 21, 2438–2456 (2009) | es_ES |
dc.description.references | Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana-Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1–26 (2005) | es_ES |
dc.description.references | Bischof, C.H., Lang, B., Sun, X.: Algorithm 807: the SBR toolbox–software for successive band reduction. ACM Trans. Math. Softw. 26(4), 602–616 (2000) | es_ES |
dc.description.references | Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009) | es_ES |
dc.description.references | Castelló, A., Mayo, R., Sala, K., Beltran, V., Balaji, P., Peña, A.J.: On the adequacy of lightweight thread approaches for high-level parallel programming models. Future Gener. Comput. Syst. 84, 22–31 (2018) | es_ES |
dc.description.references | Castelló, A., Peña, A.J., Seo, S., Mayo, R., Balaji, P., Quintana-Ortí, E.S.: A review of lightweight thread approaches for high performance computing. In: Proceedings of the IEEE International Conference on Cluster Computing, Taipei, Taiwan (September 2016) | es_ES |
dc.description.references | Castelló, A., Seo, S., Mayo, R., Balaji, P., Quintana-Ortí, E.S., Peña, A.J.: GLT: a unified API for lightweight thread libraries. In: Proceedings of the IEEE International European Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain (August 2017) | es_ES |
dc.description.references | Castelló, A., Seo, S., Mayo, R., Balaji, P., Quintana-Ortí, E.S., Peña, A.J.: GLTO: on the adequacy of lightweight thread approaches for OpenMP implementations. In: Proceedings of the International Conference on Parallel Processing, Bristol, UK (August 2017) | es_ES |
dc.description.references | Catalán, S, Herrero, JR., Quintana-Ortí, E.S., Rodríguez-Sánchez, R., van de Geijn, R.A.: A case for malleable thread-level linear algebra libraries: The LU factorization with partial pivoting. CoRR (2016) arXiv:1611.06365 | es_ES |
dc.description.references | Catalán, S., Igual, F.D., Mayo, R., Rguez-Sánchez, R.: Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors. Clust. Comput. 19(3), 1037–1051 (2016) | es_ES |
dc.description.references | Chameleon project. http://project.inria.fr/chameleon/ | es_ES |
dc.description.references | Demmel, J.: Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, Paris (1997) | es_ES |
dc.description.references | Dongarra, J.J., Croz, J.D., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990) | es_ES |
dc.description.references | FLAME project home page. http://www.cs.utexas.edu/users/flame/ | es_ES |
dc.description.references | Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996) | es_ES |
dc.description.references | Goto, K., van de Geijn, R.A.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008) | es_ES |
dc.description.references | Goto, K., van de Geijn, R.: High performance implementation of the level-3 BLAS. ACM Trans. Math. Softw. 35(1), 4:1–4:14 (2008) | es_ES |
dc.description.references | Grosser, B., Lang, B.: Efficient parallel reduction to bidiagonal form. Parallel Comput. 25(8), 969–986 (1999) | es_ES |
dc.description.references | Gunter, B.C., van de Geijn, R.A.: Parallel out-of-core computation and updating the QR factorization. ACM Trans. Math. Soft. 31(1), 60–78 (2005) | es_ES |
dc.description.references | IBM. Engineering and Scientific Subroutine Library. http://www-03.ibm.com/systems/power/software/essl/ (2015) | es_ES |
dc.description.references | Intel. Math Kernel Library. https://software.intel.com/en-us/intel-mkl (2015) | es_ES |
dc.description.references | OmpSs project home page. http://pm.bsc.es/ompss | es_ES |
dc.description.references | http://www.openblas.net (2015) | es_ES |
dc.description.references | OpenMP API specification for parallel programming. http://www.openmp.org (2017) | es_ES |
dc.description.references | PLASMA project home page. http://icl.cs.utk.edu/plasma | es_ES |
dc.description.references | Quintana-Ortí, E.S., van de Geijn, R.A.: Updating an LU factorization with pivoting. ACM Trans. Math. Softw. 35(2), 11:1–11:16 (2008) | es_ES |
dc.description.references | Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36(3), 14:1–14:26 (2009) | es_ES |
dc.description.references | Rodríguez-Sánchez, R., Catalán, Sandra, H., José, R., Quintana-Ortí, E.S., Tomás, A.E.: Two-sided reduction to compact band forms with look-ahead (2017) CoRR, arXiv:1709.00302 | es_ES |
dc.description.references | Seo, S., Amer, A., Balaji, P., Bordage, C., Bosilca, G., Brooks, A., Carns, P., Castelló, A., Genet, D., Herault, T., Iwasaki, S., Jindal, P., Kale, S., Krishnamoorthy, S., Lifflander, J., Lu, H., Meneses, E., Snir, M., Sun, Y., Taura, K., Beckman, P.: Argobots: a lightweight low-level threading and tasking framework. IEEE Trans. Parallel Distrib. Syst. PP(99), 1–1 (2017) | es_ES |
dc.description.references | Smith, T.M., van de Geijn, R., Smelyanskiy, M., Hammond, J.R., Van Zee, F.G.: Anatomy of high-performance many-threaded matrix multiplication. In: Proceedings of IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS’14, pp. 1049–1059 (2014) | es_ES |
dc.description.references | StarPU project. http://runtime.bordeaux.inria.fr/StarPU/ | es_ES |
dc.description.references | Stein, D., Shah, D.: Implementing lightweight threads. In: USENIX Summer (1992) | es_ES |
dc.description.references | Strazdins, P.: A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Technical Report TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia (1998) | es_ES |
dc.description.references | Van Zee, F.G., van de Geijn, R.A.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. 41(3), 14:1–14:33 (2015) | es_ES |
dc.description.references | Whaley, C.R., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of SC’98 (1998) | es_ES |
dc.description.references | Van Zee, F.G., Smith, T.M., Marker, B., Low, T., Van De Geijn, R.A., Igual, F.D., Smelyanskiy, M., Zhang, X., Kistler, M., Austel, V., Gunnels, J.A., Killough, L.: The BLIS framework: experiments in portability. ACM Trans. Math. Softw. 42(2), 12:1–12:19 (2016) | es_ES |