Hierarchical approach for deriving a reproducible unblocked LU factorization

Iakymchuk, Roman; Graillat, Stef; Defour, David; Quintana-Orti, Enrique S.

doi:10.1177/1094342019832968

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Hierarchical approach for deriving a reproducible unblocked LU factorization

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Iakymchuk;Grailla ...

Tamaño: 399.9Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: RI_J110.pdf

Tamaño: 824.4Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Iakymchuk, Roman	es_ES
dc.contributor.author	Graillat, Stef	es_ES
dc.contributor.author	Defour, David	es_ES
dc.contributor.author	Quintana-Orti, Enrique S.	es_ES
dc.date.accessioned	2020-09-24T12:30:16Z
dc.date.available	2020-09-24T12:30:16Z
dc.date.issued	2019-09	es_ES
dc.identifier.issn	1094-3420	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/150674
dc.description.abstract	[EN] We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.	es_ES
dc.description.sponsorship	The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The simulations were performed on resources provided by the Swed-ish National Infrastructure for Computing (SNIC) at PDC Centre for High Performance Computing (PDC-HPC). This work was also granted access to the HPC resources of The Institute for Scientific Computing and Simulation financed by Region Ile-de-France and the project Equip@Meso (reference ANR-10-EQPX-29-01) overseen by the French National Agency for Research (ANR) as part of the Investissements d Avenir pro-gram. This work was also partly supported by the FastRelax (ANR-14-CE25-0018-01) project of ANR.	es_ES
dc.language	Inglés	es_ES
dc.publisher	SAGE Publications	es_ES
dc.relation.ispartof	International Journal of High Performance Computing Applications	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	LU factorization	es_ES
dc.subject	BLAS	es_ES
dc.subject	Reproducibility	es_ES
dc.subject	Accuracy	es_ES
dc.subject	Long accumulator	es_ES
dc.subject	Error-free transformation	es_ES
dc.subject	GPUs	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	Hierarchical approach for deriving a reproducible unblocked LU factorization	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1177/1094342019832968	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/ANR//ANR-10-EQPX-0029/FR/Equipement d'excellence de calcul intensif de Mesocentres coordonnés - Tremplin vers le calcul petaflopique et l'exascale/EQUIP@MESO/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/ANR//ANR-14-CE25-0018/FR/Fast Reliable Approximation/Fast Relax/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.description.bibliographicCitation	Iakymchuk, R.; Graillat, S.; Defour, D.; Quintana-Orti, ES. (2019). Hierarchical approach for deriving a reproducible unblocked LU factorization. International Journal of High Performance Computing Applications. 33(5):791-803. https://doi.org/10.1177/1094342019832968	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1177/1094342019832968	es_ES
dc.description.upvformatpinicio	791	es_ES
dc.description.upvformatpfin	803	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	33	es_ES
dc.description.issue	5	es_ES
dc.relation.pasarela	S\392811	es_ES
dc.contributor.funder	Region Ile-de-France	es_ES
dc.contributor.funder	Agence Nationale de la Recherche, Francia	es_ES
dc.description.references	Arteaga, A., Fuhrer, O., & Hoefler, T. (2014). Designing Bit-Reproducible Portable High-Performance Applications. 2014 IEEE 28th International Parallel and Distributed Processing Symposium. doi:10.1109/ipdps.2014.127	es_ES
dc.description.references	Bientinesi, P., Quintana-Ortí, E. S., & Geijn, R. A. van de. (2005). Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Transactions on Mathematical Software, 31(1), 27-59. doi:10.1145/1055531.1055533	es_ES
dc.description.references	Chohra, C., Langlois, P., & Parello, D. (2016). Efficiency of Reproducible Level 1 BLAS. Lecture Notes in Computer Science, 99-108. doi:10.1007/978-3-319-31769-4_8	es_ES
dc.description.references	Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing, 49, 83-97. doi:10.1016/j.parco.2015.09.001	es_ES
dc.description.references	Demmel, J., & Hong Diep Nguyen. (2013). Fast Reproducible Floating-Point Summation. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.9	es_ES
dc.description.references	Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391	es_ES
dc.description.references	Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170	es_ES
dc.description.references	Dongarra, J., Hittinger, J., Bell, J., Chacon, L., Falgout, R., Heroux, M., … Wild, S. (2014). Applied Mathematics Research for Exascale Computing. doi:10.2172/1149042	es_ES
dc.description.references	Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468	es_ES
dc.description.references	Haidar, A., Dong, T., Luszczek, P., Tomov, S., & Dongarra, J. (2015). Batched matrix computations on hardware accelerators based on GPUs. The International Journal of High Performance Computing Applications, 29(2), 193-208. doi:10.1177/1094342014567546	es_ES
dc.description.references	Hida, Y., Li, X. S., & Bailey, D. H. (s. f.). Algorithms for quad-double precision floating point arithmetic. Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. doi:10.1109/arith.2001.930115	es_ES
dc.description.references	Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms. doi:10.1137/1.9780898718027	es_ES
dc.description.references	Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2015). Reproducible Triangular Solvers for High-Performance Computing. 2015 12th International Conference on Information Technology - New Generations. doi:10.1109/itng.2015.63	es_ES
dc.description.references	Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. Lecture Notes in Computer Science, 126-137. doi:10.1007/978-3-319-31769-4_11	es_ES
dc.description.references	Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7	es_ES
dc.description.references	Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., … Yoo, D. J. (2002). Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software, 28(2), 152-205. doi:10.1145/567806.567808	es_ES
dc.description.references	Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefèvre, V., Melquiond, G., … Torres, S. (2010). Handbook of Floating-Point Arithmetic. doi:10.1007/978-0-8176-4705-6	es_ES
dc.description.references	Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818	es_ES
dc.description.references	Ortega, J. . (1988). The ijk forms of factorization methods I. Vector computers. Parallel Computing, 7(2), 135-147. doi:10.1016/0167-8191(88)90035-x	es_ES
dc.description.references	Rump, S. M. (2009). Ultimately Fast Accurate Summation. SIAM Journal on Scientific Computing, 31(5), 3466-3502. doi:10.1137/080738490	es_ES
dc.description.references	Skeel, R. D. (1979). Scaling for Numerical Stability in Gaussian Elimination. Journal of the ACM, 26(3), 494-526. doi:10.1145/322139.322148	es_ES
dc.description.references	Zhu, Y.-K., & Hayes, W. B. (2010). Algorithm 908. ACM Transactions on Mathematical Software, 37(3), 1-13. doi:10.1145/1824801.1824815	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [48344]

Mostrar el registro sencillo del ítem

Hierarchical approach for deriving a reproducible unblocked LU factorization

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Hierarchical approach for deriving a reproducible unblocked LU factorization

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)