- -

Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Tomás Domínguez, Andrés Enrique es_ES
dc.contributor.author Quintana-Ortí, Enrique S. es_ES
dc.date.accessioned 2021-11-10T19:05:53Z
dc.date.available 2021-11-10T19:05:53Z
dc.date.issued 2020-11 es_ES
dc.identifier.uri http://hdl.handle.net/10251/176801
dc.description.abstract [EN] We present a novel method for the QR factorization of large tall-and-skinny matrices that introduces an approximation technique for computing the Householder vectors. This approach is very competitive on a hybrid platform equipped with a graphics processor, with a performance advantage over the conventional factorization due to the reduced amount of data transfers between the graphics accelerator and the main memory of the host. Our experiments show that, for tall¿skinny matrices, the new approach outperforms the code in MAGMA by a large margin, while it is very competitive for square matrices when the memory transfers and CPU computations are the bottleneck of the Householder QR factorization es_ES
dc.description.sponsorship This research was supported by the Project TIN2017-82972-R from the MINECO (Spain) and the EU H2020 Project 732631 "OPRECOMP. Open Transprecision Computing". es_ES
dc.language Inglés es_ES
dc.publisher Springer-Verlag es_ES
dc.relation.ispartof The Journal of Supercomputing (Online) es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject QR factorization es_ES
dc.subject Tall-and-skinny matrices es_ES
dc.subject GPU es_ES
dc.subject Householder vector es_ES
dc.subject Look-ahead es_ES
dc.subject High performance es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s11227-020-03176-3 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/732631/EU/Open transPREcision COMPuting/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Instituto Universitario de Telecomunicación y Aplicaciones Multimedia - Institut Universitari de Telecomunicacions i Aplicacions Multimèdia es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Tomás Domínguez, AE.; Quintana-Ortí, ES. (2020). Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors. The Journal of Supercomputing (Online). 76(11):8771-8786. https://doi.org/10.1007/s11227-020-03176-3 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1007/s11227-020-03176-3 es_ES
dc.description.upvformatpinicio 8771 es_ES
dc.description.upvformatpfin 8786 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 76 es_ES
dc.description.issue 11 es_ES
dc.identifier.eissn 1573-0484 es_ES
dc.relation.pasarela S\417897 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.description.references Abdelfattah A, Haidar A, Tomov S, Dongarra J (2018) Analysis and design techniques towards high-performance and energy-efficient dense linear solvers on GPUs. IEEE Trans Parallel Distrib Syst 29(12):2700–2712. https://doi.org/10.1109/TPDS.2018.2842785 es_ES
dc.description.references Ballard G, Demmel J, Grigori L, Jacquelin M, Knight N, Nguyen H (2015) Reconstructing Householder vectors from tall-skinny QR. J Parallel Distrib Comput 85:3–31. https://doi.org/10.1016/j.jpdc.2015.06.003 es_ES
dc.description.references Barrachina S, Castillo M, Igual FD, Mayo R, Quintana-Ortí ES (2008) Solving dense linear systems on graphics processors. In: Luque E, Margalef T, Benítez D (eds) Euro-Par 2008—parallel processing. Springer, Heidelberg, pp 739–748 es_ES
dc.description.references Benson AR, Gleich DF, Demmel J (2013) Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures. In: 2013 IEEE International Conference on Big Data, pp 264–272. https://doi.org/10.1109/BigData.2013.6691583 es_ES
dc.description.references Businger P, Golub GH (1965) Linear least squares solutions by householder transformations. Numer Math 7(3):269–276. https://doi.org/10.1007/BF01436084 es_ES
dc.description.references Demmel J, Grigori L, Hoemmen M, Langou J (2012) Communication-optimal parallel and sequential QR and LU factorizations. SIAM J Sci Comput 34(1):206–239. https://doi.org/10.1137/080731992 es_ES
dc.description.references Dongarra J, Du Croz J, Hammarling S, Duff IS (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17. https://doi.org/10.1145/77626.79170 es_ES
dc.description.references Drmač Z, Bujanović Z (2008) On the failure of rank-revealing qr factorization software—a case study. ACM Trans Math Softw 35(2):12:1–12:28. https://doi.org/10.1145/1377612.1377616 es_ES
dc.description.references Fukaya T, Nakatsukasa Y, Yanagisawa Y, Yamamoto Y (2014) CholeskyQR2: A simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system. In: 2014 5th workshop on latest advances in scalable algorithms for large-scale systems, pp 31–38. https://doi.org/10.1109/ScalA.2014.11 es_ES
dc.description.references Fukaya T, Kannan R, Nakatsukasa Y, Yamamoto Y, Yanagisawa Y (2018) Shifted CholeskyQR for computing the QR factorization of ill-conditioned matrices, arXiv:1809.11085 es_ES
dc.description.references Golub G, Van Loan C (2013) Matrix computations. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, Baltimore es_ES
dc.description.references Gunter BC, van de Geijn RA (2005) Parallel out-of-core computation and updating the QR factorization. ACM Trans Math Softw 31(1):60–78. https://doi.org/10.1145/1055531.1055534 es_ES
dc.description.references Joffrain T, Low TM, Quintana-Ortí ES, Rvd Geijn, Zee FGV (2006) Accumulating householder transformations, revisited. ACM Trans Math Softw 32(2):169–179. https://doi.org/10.1145/1141885.1141886 es_ES
dc.description.references Puglisi C (1992) Modification of the householder method based on the compact WY representation. SIAM J Sci Stat Comput 13(3):723–726. https://doi.org/10.1137/0913042 es_ES
dc.description.references Saad Y (2003) Iterative methods for sparse linear systems, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia es_ES
dc.description.references Schreiber R, Van Loan C (1989) A storage-efficient WY representation for products of householder transformations. SIAM J Sci Comput 10(1):53–57. https://doi.org/10.1137/0910005 es_ES
dc.description.references Stathopoulos A, Wu K (2001) A block orthogonalization procedure with constant synchronization requirements. SIAM J Sci Comput 23(6):2165–2182. https://doi.org/10.1137/S1064827500370883 es_ES
dc.description.references Strazdins P (1998) A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Tech. Rep. TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia es_ES
dc.description.references Tomás Dominguez AE, Quintana Orti ES (2018) Fast blocking of householder reflectors on graphics processors. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 385–393. https://doi.org/10.1109/PDP2018.2018.00068 es_ES
dc.description.references Volkov V, Demmel JW (2008) LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech. Rep. 202, LAPACK Working Note. http://www.netlib.org/lapack/lawnspdf/lawn202.pdf es_ES
dc.description.references Yamamoto Y, Nakatsukasa Y, Yanagisawa Y, Fukaya T (2015) Roundoff error analysis of the Cholesky QR2 algorithm. Electron Trans Numer Anal 44:306–326 es_ES
dc.description.references Yamazaki I, Tomov S, Dongarra J (2015) Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J Sci Comput 37(3):C307–C330. https://doi.org/10.1137/14M0973773 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem