Mostrar el registro sencillo del ítem
dc.contributor.author | Iakymchuk, Roman | es_ES |
dc.contributor.author | Graillat, Stef | es_ES |
dc.contributor.author | Defour, David | es_ES |
dc.contributor.author | Quintana-Orti, Enrique S. | es_ES |
dc.date.accessioned | 2020-09-24T12:30:16Z | |
dc.date.available | 2020-09-24T12:30:16Z | |
dc.date.issued | 2019-09 | es_ES |
dc.identifier.issn | 1094-3420 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/150674 | |
dc.description.abstract | [EN] We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization. | es_ES |
dc.description.sponsorship | The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The simulations were performed on resources provided by the Swed-ish National Infrastructure for Computing (SNIC) at PDC Centre for High Performance Computing (PDC-HPC). This work was also granted access to the HPC resources of The Institute for Scientific Computing and Simulation financed by Region Ile-de-France and the project Equip@Meso (reference ANR-10-EQPX-29-01) overseen by the French National Agency for Research (ANR) as part of the Investissements d Avenir pro-gram. This work was also partly supported by the FastRelax (ANR-14-CE25-0018-01) project of ANR. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | SAGE Publications | es_ES |
dc.relation.ispartof | International Journal of High Performance Computing Applications | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | LU factorization | es_ES |
dc.subject | BLAS | es_ES |
dc.subject | Reproducibility | es_ES |
dc.subject | Accuracy | es_ES |
dc.subject | Long accumulator | es_ES |
dc.subject | Error-free transformation | es_ES |
dc.subject | GPUs | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | Hierarchical approach for deriving a reproducible unblocked LU factorization | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1177/1094342019832968 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/ANR//ANR-10-EQPX-0029/FR/Equipement d'excellence de calcul intensif de Mesocentres coordonnés - Tremplin vers le calcul petaflopique et l'exascale/EQUIP@MESO/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/ANR//ANR-14-CE25-0018/FR/Fast Reliable Approximation/Fast Relax/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors | es_ES |
dc.description.bibliographicCitation | Iakymchuk, R.; Graillat, S.; Defour, D.; Quintana-Orti, ES. (2019). Hierarchical approach for deriving a reproducible unblocked LU factorization. International Journal of High Performance Computing Applications. 33(5):791-803. https://doi.org/10.1177/1094342019832968 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1177/1094342019832968 | es_ES |
dc.description.upvformatpinicio | 791 | es_ES |
dc.description.upvformatpfin | 803 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 33 | es_ES |
dc.description.issue | 5 | es_ES |
dc.relation.pasarela | S\392811 | es_ES |
dc.contributor.funder | Region Ile-de-France | es_ES |
dc.contributor.funder | Agence Nationale de la Recherche, Francia | es_ES |
dc.description.references | Arteaga, A., Fuhrer, O., & Hoefler, T. (2014). Designing Bit-Reproducible Portable High-Performance Applications. 2014 IEEE 28th International Parallel and Distributed Processing Symposium. doi:10.1109/ipdps.2014.127 | es_ES |
dc.description.references | Bientinesi, P., Quintana-Ortí, E. S., & Geijn, R. A. van de. (2005). Representing linear algebra algorithms in code: the FLAME application program interfaces. ACM Transactions on Mathematical Software, 31(1), 27-59. doi:10.1145/1055531.1055533 | es_ES |
dc.description.references | Chohra, C., Langlois, P., & Parello, D. (2016). Efficiency of Reproducible Level 1 BLAS. Lecture Notes in Computer Science, 99-108. doi:10.1007/978-3-319-31769-4_8 | es_ES |
dc.description.references | Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing, 49, 83-97. doi:10.1016/j.parco.2015.09.001 | es_ES |
dc.description.references | Demmel, J., & Hong Diep Nguyen. (2013). Fast Reproducible Floating-Point Summation. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.9 | es_ES |
dc.description.references | Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391 | es_ES |
dc.description.references | Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170 | es_ES |
dc.description.references | Dongarra, J., Hittinger, J., Bell, J., Chacon, L., Falgout, R., Heroux, M., … Wild, S. (2014). Applied Mathematics Research for Exascale Computing. doi:10.2172/1149042 | es_ES |
dc.description.references | Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468 | es_ES |
dc.description.references | Haidar, A., Dong, T., Luszczek, P., Tomov, S., & Dongarra, J. (2015). Batched matrix computations on hardware accelerators based on GPUs. The International Journal of High Performance Computing Applications, 29(2), 193-208. doi:10.1177/1094342014567546 | es_ES |
dc.description.references | Hida, Y., Li, X. S., & Bailey, D. H. (s. f.). Algorithms for quad-double precision floating point arithmetic. Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. doi:10.1109/arith.2001.930115 | es_ES |
dc.description.references | Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms. doi:10.1137/1.9780898718027 | es_ES |
dc.description.references | Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2015). Reproducible Triangular Solvers for High-Performance Computing. 2015 12th International Conference on Information Technology - New Generations. doi:10.1109/itng.2015.63 | es_ES |
dc.description.references | Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. Lecture Notes in Computer Science, 126-137. doi:10.1007/978-3-319-31769-4_11 | es_ES |
dc.description.references | Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7 | es_ES |
dc.description.references | Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., … Yoo, D. J. (2002). Design, implementation and testing of extended and mixed precision BLAS. ACM Transactions on Mathematical Software, 28(2), 152-205. doi:10.1145/567806.567808 | es_ES |
dc.description.references | Muller, J.-M., Brisebarre, N., de Dinechin, F., Jeannerod, C.-P., Lefèvre, V., Melquiond, G., … Torres, S. (2010). Handbook of Floating-Point Arithmetic. doi:10.1007/978-0-8176-4705-6 | es_ES |
dc.description.references | Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818 | es_ES |
dc.description.references | Ortega, J. . (1988). The ijk forms of factorization methods I. Vector computers. Parallel Computing, 7(2), 135-147. doi:10.1016/0167-8191(88)90035-x | es_ES |
dc.description.references | Rump, S. M. (2009). Ultimately Fast Accurate Summation. SIAM Journal on Scientific Computing, 31(5), 3466-3502. doi:10.1137/080738490 | es_ES |
dc.description.references | Skeel, R. D. (1979). Scaling for Numerical Stability in Gaussian Elimination. Journal of the ACM, 26(3), 494-526. doi:10.1145/322139.322148 | es_ES |
dc.description.references | Zhu, Y.-K., & Hayes, W. B. (2010). Algorithm 908. ACM Transactions on Mathematical Software, 37(3), 1-13. doi:10.1145/1824801.1824815 | es_ES |