Mostrar el registro sencillo del ítem
dc.contributor.author | Tomás Domínguez, Andrés Enrique | es_ES |
dc.contributor.author | Bai, Zhaojun | es_ES |
dc.contributor.author | Hernández García, Vicente | es_ES |
dc.contributor.editor | Dayde, M. | es_ES |
dc.contributor.editor | Marques, O. | es_ES |
dc.contributor.editor | Nakajima, K. | es_ES |
dc.date.accessioned | 2016-11-18T14:53:50Z | |
dc.date.available | 2016-11-18T14:53:50Z | |
dc.date.issued | 2013 | |
dc.identifier.isbn | 978-3-642-38717-3 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.uri | http://hdl.handle.net/10251/74384 | |
dc.description.abstract | The QR decomposition with column pivoting (QRP) of a matrix is widely used for rank revealing. The performance of LAPACK implementation (DGEQP3) of the Householder QRP algorithm is limited by Level 2 BLAS operations required for updating the column norms. In this paper, we propose an implementation of the QRP algorithm using a distribution of the matrix columns in a round-robin fashion for better data locality and parallel memory bus utilization on multicore architectures. Our performance results show a 60% improvement over the routine DGEQP3 of Intel MKL (version 10.3) on a 12 core Intel Xeon X5670 machine. In addition, we show that the same data distribution is also suitable for general purpose GPU processors, where our implementation obtains up to 90 GFlops on a NVIDIA GeForce GTX480. This is about 2 times faster than the QRP implementation of MAGMA (version 1.2.1). | es_ES |
dc.description.sponsorship | Tom ́as and Bai were supported in part by the U.S. DOES ciDAC grant DOE-DE-FC0206ER25793 and NSF grant PHY1005502. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-05CH11231. | es_ES |
dc.format.extent | 8 | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Springer Verlag (Germany): Series | es_ES |
dc.relation.ispartof | High Performance Computing for Computational Science - VECPAR 2012 | es_ES |
dc.relation.ispartofseries | Lecture Notes in Computer Science;7851 | |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Data distribution | es_ES |
dc.subject | General purpose GPU | es_ES |
dc.subject | Multicore architectures | es_ES |
dc.subject | Parallel memory | es_ES |
dc.subject | Parallelizations | es_ES |
dc.subject | Q R decomposition | es_ES |
dc.subject | Rank-revealing | es_ES |
dc.subject | Round-robin fashions | es_ES |
dc.subject.classification | CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors | es_ES |
dc.type | Capítulo de libro | es_ES |
dc.type | Comunicación en congreso | es_ES |
dc.identifier.doi | 10.1007/978-3-642-38718-0_8 | |
dc.relation.projectID | info:eu-repo/grantAgreement/DOE//DE-FC0206ER25793/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/NSF//1005502/US/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/DOE//DE-AC02-05CH11231/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escuela Politécnica Superior de Gandia - Escola Politècnica Superior de Gandia | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.description.bibliographicCitation | Tomás Domínguez, AE.; Bai, Z.; Hernández García, V. (2013). Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors. En High Performance Computing for Computational Science - VECPAR 2012. Springer Verlag (Germany): Series. 50-58. https://doi.org/10.1007/978-3-642-38718-0_8 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.conferencename | 10th International Conference on High Performance Computing for Computational Science, VECPAR 2012 | es_ES |
dc.relation.conferencedate | July 17-20, 2012 | es_ES |
dc.relation.conferenceplace | Kobe, Japan | es_ES |
dc.relation.publisherversion | http://dx.doi.org/10.1007/978-3-642-38718-0_8 | es_ES |
dc.description.upvformatpinicio | 50 | es_ES |
dc.description.upvformatpfin | 58 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.relation.senia | 240485 | es_ES |
dc.contributor.funder | U.S. Department of Energy | es_ES |
dc.contributor.funder | National Science Foundation, EEUU | es_ES |
dc.description.references | Bischof, C.H.: A parallel QR factorization algorithm with controlled local pivoting. SIAM J. Sci. Stat. Comput. 12, 36–57 (1991) | es_ES |
dc.description.references | Chandrasekaran, S., Ipsen, I.C.F.: On rank-revealing factorisations. SIAM J. Matrix Anal. Appl. 15, 592–622 (1994) | es_ES |
dc.description.references | Castaldo, A.M., Whaley, R.C.: Scaling LAPACK panel operations using parallel cache assignment. In: 15th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, pp. 223–231 (2010) | es_ES |
dc.description.references | Drmač, Z., Bujanović, Z.: On the failure of rank-revealing QR factorization software – a case study. ACM Trans. Math. Softw. 35, 12:1–12:28 (2008) | es_ES |
dc.description.references | Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm I. SIAM J. Matrix Anal. Appl. 29, 1322–1342 (2008) | es_ES |
dc.description.references | Drmač, Z., Veselić, K.: New fast and accurate Jacobi SVD algorithm II. SIAM J. Matrix Anal. Appl. 29, 1343–1362 (2008) | es_ES |
dc.description.references | Golub, G.H.: Numerical methods for solving linear least squares problems. Numer. Math. 7, 206–216 (1965) | es_ES |
dc.description.references | Gu, M., Eisenstat, S.: Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM J. Sci. Comput. 17, 848–869 (1996) | es_ES |
dc.description.references | Quintana-Orti, G., Sun, X., Bischof, C.H.: A BLAS-3 version of the QR factorization with column pivoting. SIAM J. Sci. Comput. 19, 1486–1494 (1998) | es_ES |
dc.description.references | Schreiber, R., van Loan, C.: A storage-efficient WY representation for products of Householder transformations. SIAM J. Sci. Stat. Comput. 10, 53–57 (1989) | es_ES |