Mostrar el registro sencillo del ítem
dc.contributor.author | Tomás Domínguez, Andrés Enrique | es_ES |
dc.contributor.author | Quintana-Ortí, Enrique S. | es_ES |
dc.date.accessioned | 2021-11-10T19:05:53Z | |
dc.date.available | 2021-11-10T19:05:53Z | |
dc.date.issued | 2020-11 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/176801 | |
dc.description.abstract | [EN] We present a novel method for the QR factorization of large tall-and-skinny matrices that introduces an approximation technique for computing the Householder vectors. This approach is very competitive on a hybrid platform equipped with a graphics processor, with a performance advantage over the conventional factorization due to the reduced amount of data transfers between the graphics accelerator and the main memory of the host. Our experiments show that, for tall¿skinny matrices, the new approach outperforms the code in MAGMA by a large margin, while it is very competitive for square matrices when the memory transfers and CPU computations are the bottleneck of the Householder QR factorization | es_ES |
dc.description.sponsorship | This research was supported by the Project TIN2017-82972-R from the MINECO (Spain) and the EU H2020 Project 732631 "OPRECOMP. Open Transprecision Computing". | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Springer-Verlag | es_ES |
dc.relation.ispartof | The Journal of Supercomputing (Online) | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | QR factorization | es_ES |
dc.subject | Tall-and-skinny matrices | es_ES |
dc.subject | GPU | es_ES |
dc.subject | Householder vector | es_ES |
dc.subject | Look-ahead | es_ES |
dc.subject | High performance | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1007/s11227-020-03176-3 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/H2020/732631/EU/Open transPREcision COMPuting/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Instituto Universitario de Telecomunicación y Aplicaciones Multimedia - Institut Universitari de Telecomunicacions i Aplicacions Multimèdia | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors | es_ES |
dc.description.bibliographicCitation | Tomás Domínguez, AE.; Quintana-Ortí, ES. (2020). Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors. The Journal of Supercomputing (Online). 76(11):8771-8786. https://doi.org/10.1007/s11227-020-03176-3 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1007/s11227-020-03176-3 | es_ES |
dc.description.upvformatpinicio | 8771 | es_ES |
dc.description.upvformatpfin | 8786 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 76 | es_ES |
dc.description.issue | 11 | es_ES |
dc.identifier.eissn | 1573-0484 | es_ES |
dc.relation.pasarela | S\417897 | es_ES |
dc.contributor.funder | European Commission | es_ES |
dc.contributor.funder | Agencia Estatal de Investigación | es_ES |
dc.description.references | Abdelfattah A, Haidar A, Tomov S, Dongarra J (2018) Analysis and design techniques towards high-performance and energy-efficient dense linear solvers on GPUs. IEEE Trans Parallel Distrib Syst 29(12):2700–2712. https://doi.org/10.1109/TPDS.2018.2842785 | es_ES |
dc.description.references | Ballard G, Demmel J, Grigori L, Jacquelin M, Knight N, Nguyen H (2015) Reconstructing Householder vectors from tall-skinny QR. J Parallel Distrib Comput 85:3–31. https://doi.org/10.1016/j.jpdc.2015.06.003 | es_ES |
dc.description.references | Barrachina S, Castillo M, Igual FD, Mayo R, Quintana-Ortí ES (2008) Solving dense linear systems on graphics processors. In: Luque E, Margalef T, Benítez D (eds) Euro-Par 2008—parallel processing. Springer, Heidelberg, pp 739–748 | es_ES |
dc.description.references | Benson AR, Gleich DF, Demmel J (2013) Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures. In: 2013 IEEE International Conference on Big Data, pp 264–272. https://doi.org/10.1109/BigData.2013.6691583 | es_ES |
dc.description.references | Businger P, Golub GH (1965) Linear least squares solutions by householder transformations. Numer Math 7(3):269–276. https://doi.org/10.1007/BF01436084 | es_ES |
dc.description.references | Demmel J, Grigori L, Hoemmen M, Langou J (2012) Communication-optimal parallel and sequential QR and LU factorizations. SIAM J Sci Comput 34(1):206–239. https://doi.org/10.1137/080731992 | es_ES |
dc.description.references | Dongarra J, Du Croz J, Hammarling S, Duff IS (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17. https://doi.org/10.1145/77626.79170 | es_ES |
dc.description.references | Drmač Z, Bujanović Z (2008) On the failure of rank-revealing qr factorization software—a case study. ACM Trans Math Softw 35(2):12:1–12:28. https://doi.org/10.1145/1377612.1377616 | es_ES |
dc.description.references | Fukaya T, Nakatsukasa Y, Yanagisawa Y, Yamamoto Y (2014) CholeskyQR2: A simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system. In: 2014 5th workshop on latest advances in scalable algorithms for large-scale systems, pp 31–38. https://doi.org/10.1109/ScalA.2014.11 | es_ES |
dc.description.references | Fukaya T, Kannan R, Nakatsukasa Y, Yamamoto Y, Yanagisawa Y (2018) Shifted CholeskyQR for computing the QR factorization of ill-conditioned matrices, arXiv:1809.11085 | es_ES |
dc.description.references | Golub G, Van Loan C (2013) Matrix computations. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, Baltimore | es_ES |
dc.description.references | Gunter BC, van de Geijn RA (2005) Parallel out-of-core computation and updating the QR factorization. ACM Trans Math Softw 31(1):60–78. https://doi.org/10.1145/1055531.1055534 | es_ES |
dc.description.references | Joffrain T, Low TM, Quintana-Ortí ES, Rvd Geijn, Zee FGV (2006) Accumulating householder transformations, revisited. ACM Trans Math Softw 32(2):169–179. https://doi.org/10.1145/1141885.1141886 | es_ES |
dc.description.references | Puglisi C (1992) Modification of the householder method based on the compact WY representation. SIAM J Sci Stat Comput 13(3):723–726. https://doi.org/10.1137/0913042 | es_ES |
dc.description.references | Saad Y (2003) Iterative methods for sparse linear systems, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia | es_ES |
dc.description.references | Schreiber R, Van Loan C (1989) A storage-efficient WY representation for products of householder transformations. SIAM J Sci Comput 10(1):53–57. https://doi.org/10.1137/0910005 | es_ES |
dc.description.references | Stathopoulos A, Wu K (2001) A block orthogonalization procedure with constant synchronization requirements. SIAM J Sci Comput 23(6):2165–2182. https://doi.org/10.1137/S1064827500370883 | es_ES |
dc.description.references | Strazdins P (1998) A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Tech. Rep. TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia | es_ES |
dc.description.references | Tomás Dominguez AE, Quintana Orti ES (2018) Fast blocking of householder reflectors on graphics processors. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 385–393. https://doi.org/10.1109/PDP2018.2018.00068 | es_ES |
dc.description.references | Volkov V, Demmel JW (2008) LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech. Rep. 202, LAPACK Working Note. http://www.netlib.org/lapack/lawnspdf/lawn202.pdf | es_ES |
dc.description.references | Yamamoto Y, Nakatsukasa Y, Yanagisawa Y, Fukaya T (2015) Roundoff error analysis of the Cholesky QR2 algorithm. Electron Trans Numer Anal 44:306–326 | es_ES |
dc.description.references | Yamazaki I, Tomov S, Dongarra J (2015) Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J Sci Comput 37(3):C307–C330. https://doi.org/10.1137/14M0973773 | es_ES |