Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

Tomás Domínguez, Andrés Enrique; Bai, Zhaojun; Hernández García, Vicente

doi:10.1007/978-3-642-38718-0_8

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: v1a-03-Tomas.pdf

Tamaño: 419.2Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: vecpar2012-final.pdf

Tamaño: 425.1Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Tomás Domínguez, Andrés Enrique	es_ES
dc.contributor.author	Bai, Zhaojun	es_ES
dc.contributor.author	Hernández García, Vicente	es_ES
dc.contributor.editor	Dayde, M.	es_ES
dc.contributor.editor	Marques, O.	es_ES
dc.contributor.editor	Nakajima, K.	es_ES
dc.date.accessioned	2016-11-18T14:53:50Z
dc.date.available	2016-11-18T14:53:50Z
dc.date.issued	2013
dc.identifier.isbn	978-3-642-38717-3
dc.identifier.issn	0302-9743
dc.identifier.uri	http://hdl.handle.net/10251/74384
dc.description.abstract	The QR decomposition with column pivoting (QRP) of a matrix is widely used for rank revealing. The performance of LAPACK implementation (DGEQP3) of the Householder QRP algorithm is limited by Level 2 BLAS operations required for updating the column norms. In this paper, we propose an implementation of the QRP algorithm using a distribution of the matrix columns in a round-robin fashion for better data locality and parallel memory bus utilization on multicore architectures. Our performance results show a 60% improvement over the routine DGEQP3 of Intel MKL (version 10.3) on a 12 core Intel Xeon X5670 machine. In addition, we show that the same data distribution is also suitable for general purpose GPU processors, where our implementation obtains up to 90 GFlops on a NVIDIA GeForce GTX480. This is about 2 times faster than the QRP implementation of MAGMA (version 1.2.1).	es_ES
dc.description.sponsorship	Tom ́as and Bai were supported in part by the U.S. DOES ciDAC grant DOE-DE-FC0206ER25793 and NSF grant PHY1005502. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-05CH11231.	es_ES
dc.format.extent	8	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer Verlag (Germany): Series	es_ES
dc.relation.ispartof	High Performance Computing for Computational Science - VECPAR 2012	es_ES
dc.relation.ispartofseries	Lecture Notes in Computer Science;7851
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Data distribution	es_ES
dc.subject	General purpose GPU	es_ES
dc.subject	Multicore architectures	es_ES
dc.subject	Parallel memory	es_ES
dc.subject	Parallelizations	es_ES
dc.subject	Q R decomposition	es_ES
dc.subject	Rank-revealing	es_ES
dc.subject	Round-robin fashions	es_ES
dc.subject.classification	CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors	es_ES
dc.type	Capítulo de libro	es_ES
dc.type	Comunicación en congreso	es_ES
dc.identifier.doi	10.1007/978-3-642-38718-0_8
dc.relation.projectID	info:eu-repo/grantAgreement/DOE//DE-FC0206ER25793/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/NSF//1005502/US/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/DOE//DE-AC02-05CH11231/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escuela Politécnica Superior de Gandia - Escola Politècnica Superior de Gandia	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Tomás Domínguez, AE.; Bai, Z.; Hernández García, V. (2013). Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors. En High Performance Computing for Computational Science - VECPAR 2012. Springer Verlag (Germany): Series. 50-58. https://doi.org/10.1007/978-3-642-38718-0_8	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.conferencename	10th International Conference on High Performance Computing for Computational Science, VECPAR 2012	es_ES
dc.relation.conferencedate	July 17-20, 2012	es_ES
dc.relation.conferenceplace	Kobe, Japan	es_ES
dc.relation.publisherversion	http://dx.doi.org/10.1007/978-3-642-38718-0_8	es_ES
dc.description.upvformatpinicio	50	es_ES
dc.description.upvformatpfin	58	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.relation.senia	240485	es_ES
dc.contributor.funder	U.S. Department of Energy	es_ES
dc.contributor.funder	National Science Foundation, EEUU	es_ES
dc.description.references	Bischof, C.H.: A parallel QR factorization algorithm with controlled local pivoting. SIAM J. Sci. Stat. Comput. 12, 36–57 (1991)	es_ES
dc.description.references	Chandrasekaran, S., Ipsen, I.C.F.: On rank-revealing factorisations. SIAM J. Matrix Anal. Appl. 15, 592–622 (1994)	es_ES
dc.description.references	Castaldo, A.M., Whaley, R.C.: Scaling LAPACK panel operations using parallel cache assignment. In: 15th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, pp. 223–231 (2010)	es_ES
dc.description.references	Drmač, Z., Bujanović, Z.: On the failure of rank-revealing QR factorization software – a case study. ACM Trans. Math. Softw. 35, 12:1–12:28 (2008)	es_ES
dc.description.references	Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm I. SIAM J. Matrix Anal. Appl. 29, 1322–1342 (2008)	es_ES
dc.description.references	Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm II. SIAM J. Matrix Anal. Appl. 29, 1343–1362 (2008)	es_ES
dc.description.references	Golub, G.H.: Numerical methods for solving linear least squares problems. Numer. Math. 7, 206–216 (1965)	es_ES
dc.description.references	Gu, M., Eisenstat, S.: Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM J. Sci. Comput. 17, 848–869 (1996)	es_ES
dc.description.references	Quintana-Orti, G., Sun, X., Bischof, C.H.: A BLAS-3 version of the QR factorization with column pivoting. SIAM J. Sci. Comput. 19, 1486–1494 (1998)	es_ES
dc.description.references	Schreiber, R., van Loan, C.: A storage-efficient WY representation for products of Householder transformations. SIAM J. Sci. Stat. Comput. 10, 53–57 (1989)	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [48360]

Mostrar el registro sencillo del ítem

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)