- -

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Tomás Domínguez, Andrés Enrique es_ES
dc.contributor.author Bai, Zhaojun es_ES
dc.contributor.author Hernández García, Vicente es_ES
dc.contributor.editor Dayde, M. es_ES
dc.contributor.editor Marques, O. es_ES
dc.contributor.editor Nakajima, K. es_ES
dc.date.accessioned 2016-11-18T14:53:50Z
dc.date.available 2016-11-18T14:53:50Z
dc.date.issued 2013
dc.identifier.isbn 978-3-642-38717-3
dc.identifier.issn 0302-9743
dc.identifier.uri http://hdl.handle.net/10251/74384
dc.description.abstract The QR decomposition with column pivoting (QRP) of a matrix is widely used for rank revealing. The performance of LAPACK implementation (DGEQP3) of the Householder QRP algorithm is limited by Level 2 BLAS operations required for updating the column norms. In this paper, we propose an implementation of the QRP algorithm using a distribution of the matrix columns in a round-robin fashion for better data locality and parallel memory bus utilization on multicore architectures. Our performance results show a 60% improvement over the routine DGEQP3 of Intel MKL (version 10.3) on a 12 core Intel Xeon X5670 machine. In addition, we show that the same data distribution is also suitable for general purpose GPU processors, where our implementation obtains up to 90 GFlops on a NVIDIA GeForce GTX480. This is about 2 times faster than the QRP implementation of MAGMA (version 1.2.1). es_ES
dc.description.sponsorship Tom ́as and Bai were supported in part by the U.S. DOES ciDAC grant DOE-DE-FC0206ER25793 and NSF grant PHY1005502. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-05CH11231. es_ES
dc.format.extent 8 es_ES
dc.language Inglés es_ES
dc.publisher Springer Verlag (Germany): Series es_ES
dc.relation.ispartof High Performance Computing for Computational Science - VECPAR 2012 es_ES
dc.relation.ispartofseries Lecture Notes in Computer Science;7851
dc.rights Reserva de todos los derechos es_ES
dc.subject Data distribution es_ES
dc.subject General purpose GPU es_ES
dc.subject Multicore architectures es_ES
dc.subject Parallel memory es_ES
dc.subject Parallelizations es_ES
dc.subject Q R decomposition es_ES
dc.subject Rank-revealing es_ES
dc.subject Round-robin fashions es_ES
dc.subject.classification CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors es_ES
dc.type Capítulo de libro es_ES
dc.type Comunicación en congreso es_ES
dc.identifier.doi 10.1007/978-3-642-38718-0_8
dc.relation.projectID info:eu-repo/grantAgreement/DOE//DE-FC0206ER25793/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/NSF//1005502/US/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/DOE//DE-AC02-05CH11231/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escuela Politécnica Superior de Gandia - Escola Politècnica Superior de Gandia es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Tomás Domínguez, AE.; Bai, Z.; Hernández García, V. (2013). Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors. En High Performance Computing for Computational Science - VECPAR 2012. Springer Verlag (Germany): Series. 50-58. https://doi.org/10.1007/978-3-642-38718-0_8 es_ES
dc.description.accrualMethod S es_ES
dc.relation.conferencename 10th International Conference on High Performance Computing for Computational Science, VECPAR 2012 es_ES
dc.relation.conferencedate July 17-20, 2012 es_ES
dc.relation.conferenceplace Kobe, Japan es_ES
dc.relation.publisherversion http://dx.doi.org/10.1007/978-3-642-38718-0_8 es_ES
dc.description.upvformatpinicio 50 es_ES
dc.description.upvformatpfin 58 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.senia 240485 es_ES
dc.contributor.funder U.S. Department of Energy es_ES
dc.contributor.funder National Science Foundation, EEUU es_ES
dc.description.references Bischof, C.H.: A parallel QR factorization algorithm with controlled local pivoting. SIAM J. Sci. Stat. Comput. 12, 36–57 (1991) es_ES
dc.description.references Chandrasekaran, S., Ipsen, I.C.F.: On rank-revealing factorisations. SIAM J. Matrix Anal. Appl. 15, 592–622 (1994) es_ES
dc.description.references Castaldo, A.M., Whaley, R.C.: Scaling LAPACK panel operations using parallel cache assignment. In: 15th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, pp. 223–231 (2010) es_ES
dc.description.references Drmač, Z., Bujanović, Z.: On the failure of rank-revealing QR factorization software – a case study. ACM Trans. Math. Softw. 35, 12:1–12:28 (2008) es_ES
dc.description.references Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm I. SIAM J. Matrix Anal. Appl. 29, 1322–1342 (2008) es_ES
dc.description.references Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm II. SIAM J. Matrix Anal. Appl. 29, 1343–1362 (2008) es_ES
dc.description.references Golub, G.H.: Numerical methods for solving linear least squares problems. Numer. Math. 7, 206–216 (1965) es_ES
dc.description.references Gu, M., Eisenstat, S.: Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM J. Sci. Comput. 17, 848–869 (1996) es_ES
dc.description.references Quintana-Orti, G., Sun, X., Bischof, C.H.: A BLAS-3 version of the QR factorization with column pivoting. SIAM J. Sci. Comput. 19, 1486–1494 (1998) es_ES
dc.description.references Schreiber, R., van Loan, C.: A storage-efficient WY representation for products of Householder transformations. SIAM J. Sci. Stat. Comput. 10, 53–57 (1989) es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem