- -

Hierarchical approach for deriving a reproducible unblocked LU factorization

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Hierarchical approach for deriving a reproducible unblocked LU factorization

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Iakymchuk, Roman es_ES
dc.contributor.author Graillat, Stef es_ES
dc.contributor.author Defour, David es_ES
dc.contributor.author Quintana-Orti, Enrique S. es_ES
dc.date.accessioned 2020-09-24T12:30:16Z
dc.date.available 2020-09-24T12:30:16Z
dc.date.issued 2019-09 es_ES
dc.identifier.issn 1094-3420 es_ES
dc.identifier.uri http://hdl.handle.net/10251/150674
dc.description.abstract [EN] We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization. es_ES
dc.description.sponsorship The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The simulations were performed on resources provided by the Swed-ish National Infrastructure for Computing (SNIC) at PDC Centre for High Performance Computing (PDC-HPC). This work was also granted access to the HPC resources of The Institute for Scientific Computing and Simulation financed by Region Ile-de-France and the project Equip@Meso (reference ANR-10-EQPX-29-01) overseen by the French National Agency for Research (ANR) as part of the Investissements d Avenir pro-gram. This work was also partly supported by the FastRelax (ANR-14-CE25-0018-01) project of ANR. es_ES
dc.language Inglés es_ES
dc.publisher SAGE Publications es_ES
dc.relation.ispartof International Journal of High Performance Computing Applications es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject LU factorization es_ES
dc.subject BLAS es_ES
dc.subject Reproducibility es_ES
dc.subject Accuracy es_ES
dc.subject Long accumulator es_ES
dc.subject Error-free transformation es_ES
dc.subject GPUs es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Hierarchical approach for deriving a reproducible unblocked LU factorization es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1177/1094342019832968 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/ANR//ANR-10-EQPX-0029/FR/Equipement d'excellence de calcul intensif de Mesocentres coordonnés - Tremplin vers le calcul petaflopique et l'exascale/EQUIP@MESO/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/ANR//ANR-14-CE25-0018/FR/Fast Reliable Approximation/Fast Relax/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Iakymchuk, R.; Graillat, S.; Defour, D.; Quintana-Orti, ES. (2019). Hierarchical approach for deriving a reproducible unblocked LU factorization. International Journal of High Performance Computing Applications. 33(5):791-803. https://doi.org/10.1177/1094342019832968 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1177/1094342019832968 es_ES
dc.description.upvformatpinicio 791 es_ES
dc.description.upvformatpfin 803 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 33 es_ES
dc.description.issue 5 es_ES
dc.relation.pasarela S\392811 es_ES
dc.contributor.funder Region Ile-de-France es_ES
dc.contributor.funder Agence Nationale de la Recherche, Francia es_ES
dc.description.references Arteaga, A., Fuhrer, O., & Hoefler, T. (2014). Designing Bit-Reproducible Portable High-Performance Applications. 2014 IEEE 28th International Parallel and Distributed Processing Symposium. doi:10.1109/ipdps.2014.127 es_ES
dc.description.references Bientinesi, P., Quintana-Ortí, E. S., & Geijn, R. A. van de. (2005). Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Transactions on Mathematical Software, 31(1), 27-59. doi:10.1145/1055531.1055533 es_ES
dc.description.references Chohra, C., Langlois, P., & Parello, D. (2016). Efficiency of Reproducible Level 1 BLAS. Lecture Notes in Computer Science, 99-108. doi:10.1007/978-3-319-31769-4_8 es_ES
dc.description.references Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing, 49, 83-97. doi:10.1016/j.parco.2015.09.001 es_ES
dc.description.references Demmel, J., & Hong Diep Nguyen. (2013). Fast Reproducible Floating-Point Summation. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.9 es_ES
dc.description.references Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391 es_ES
dc.description.references Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170 es_ES
dc.description.references Dongarra, J., Hittinger, J., Bell, J., Chacon, L., Falgout, R., Heroux, M., … Wild, S. (2014). Applied Mathematics Research for Exascale Computing. doi:10.2172/1149042 es_ES
dc.description.references Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468 es_ES
dc.description.references Haidar, A., Dong, T., Luszczek, P., Tomov, S., & Dongarra, J. (2015). Batched matrix computations on hardware accelerators based on GPUs. The International Journal of High Performance Computing Applications, 29(2), 193-208. doi:10.1177/1094342014567546 es_ES
dc.description.references Hida, Y., Li, X. S., & Bailey, D. H. (s. f.). Algorithms for quad-double precision floating point arithmetic. Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. doi:10.1109/arith.2001.930115 es_ES
dc.description.references Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms. doi:10.1137/1.9780898718027 es_ES
dc.description.references Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2015). Reproducible Triangular Solvers for High-Performance Computing. 2015 12th International Conference on Information Technology - New Generations. doi:10.1109/itng.2015.63 es_ES
dc.description.references Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. Lecture Notes in Computer Science, 126-137. doi:10.1007/978-3-319-31769-4_11 es_ES
dc.description.references Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7 es_ES
dc.description.references Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., … Yoo, D. J. (2002). Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software, 28(2), 152-205. doi:10.1145/567806.567808 es_ES
dc.description.references Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefèvre, V., Melquiond, G., … Torres, S. (2010). Handbook of Floating-Point Arithmetic. doi:10.1007/978-0-8176-4705-6 es_ES
dc.description.references Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818 es_ES
dc.description.references Ortega, J. . (1988). The ijk forms of factorization methods I. Vector computers. Parallel Computing, 7(2), 135-147. doi:10.1016/0167-8191(88)90035-x es_ES
dc.description.references Rump, S. M. (2009). Ultimately Fast Accurate Summation. SIAM Journal on Scientific Computing, 31(5), 3466-3502. doi:10.1137/080738490 es_ES
dc.description.references Skeel, R. D. (1979). Scaling for Numerical Stability in Gaussian Elimination. Journal of the ACM, 26(3), 494-526. doi:10.1145/322139.322148 es_ES
dc.description.references Zhu, Y.-K., & Hayes, W. B. (2010). Algorithm 908. ACM Transactions on Mathematical Software, 37(3), 1-13. doi:10.1145/1824801.1824815 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem