- -

Reproducibility strategies for parallel preconditioned Conjugate Gradient

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Reproducibility strategies for parallel preconditioned Conjugate Gradient

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Iakymchuk, Roman es_ES
dc.contributor.author Barreda, María es_ES
dc.contributor.author Wiesenberger, Matthias es_ES
dc.contributor.author Aliaga, José I. es_ES
dc.contributor.author Quintana Ortí, Enrique Salvador es_ES
dc.date.accessioned 2021-07-24T03:33:36Z
dc.date.available 2021-07-24T03:33:36Z
dc.date.issued 2020-06 es_ES
dc.identifier.issn 0377-0427 es_ES
dc.identifier.uri http://hdl.handle.net/10251/170081
dc.description.abstract [EN] The Preconditioned Conjugate Gradient method is often used in numerical simulations. While being widely used, the solver is also known for its lack of accuracy while computing the residual. In this article, we aim at a twofold goal: enhance the accuracy of the solver but also ensure its reproducibility in a message-passing implementation. We design and employ various strategies starting from the ExBLAS approach (through preserving every bit of information until final rounding) to its more lightweight performance-oriented variant (through expanding the intermediate precision). These algorithmic strategies are reinforced with programmability suggestions to assure deterministic executions. Finally, we verify these strategies on modern HPC systems: both versions deliver reproducible number of iterations, residuals, direct errors, and vector-solutions for the overhead of only 29% (ExBLAS) and 4% (lightweight) on 768 processes. es_ES
dc.description.sponsorship To begin with, we would like to thank the reviewers for their thorough reading of the article as well as their valuable comments and suggestions. This research was partially supported by the European Union's Horizon 2020 research, innovation programme under the Marie Sklodowska-Curie grant agreement via the Robust project No. 842528 as well as the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the H2020 EC RIA Programme; in particular, the author gratefully acknowledges the support of Vicenc Beltran and the computer resources and technical support provided by BSC. The researchers from Universitat Jaume I (UJI) and Universidad Politecnica de Valencia (UPV) were supported by MINECO, Spain project TIN2017-82972-R. Maria Barreda was also supported by the POSDOC-A/2017/11 project from the Universitat Jaume I, Spain. es_ES
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation.ispartof Journal of Computational and Applied Mathematics es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Reproducibility es_ES
dc.subject Accuracy es_ES
dc.subject Floating-point expansion es_ES
dc.subject Long accumulator es_ES
dc.subject Preconditioned Conjugate Gradient es_ES
dc.subject High-Performance Computing es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Reproducibility strategies for parallel preconditioned Conjugate Gradient es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.cam.2019.112697 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/730897/EU/Transnational Access Programme for a Pan-European Network of HPC Research Infrastructures and Laboratories for scientific computing/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/UJI//POSDOC-A%2F2017%2F11/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/842528/EU/Robust and Energy-Efficient Numerical Solvers Towards Reliable and Sustainable Scientific Computations/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Iakymchuk, R.; Barreda, M.; Wiesenberger, M.; Aliaga, JI.; Quintana Ortí, ES. (2020). Reproducibility strategies for parallel preconditioned Conjugate Gradient. Journal of Computational and Applied Mathematics. 371:1-13. https://doi.org/10.1016/j.cam.2019.112697 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1016/j.cam.2019.112697 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 13 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 371 es_ES
dc.relation.pasarela S\401226 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder Universitat Jaume I es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.description.references Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979). Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software, 5(3), 308-323. doi:10.1145/355841.355847 es_ES
dc.description.references Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170 es_ES
dc.description.references Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391 es_ES
dc.description.references Iakymchuk, R., Graillat, S., Defour, D., & Quintana-Ortí, E. S. (2019). Hierarchical approach for deriving a reproducible unblocked LU factorization. The International Journal of High Performance Computing Applications, 33(5), 791-803. doi:10.1177/1094342019832968 es_ES
dc.description.references Iakymchuk, R., Defour, D., Collange, S., & Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. Lecture Notes in Computer Science, 126-137. doi:10.1007/978-3-319-31769-4_11 es_ES
dc.description.references Rump, S. M., Ogita, T., & Oishi, S. (2009). Accurate Floating-Point Summation Part II: Sign, K-Fold Faithful and Rounding to Nearest. SIAM Journal on Scientific Computing, 31(2), 1269-1302. doi:10.1137/07068816x es_ES
dc.description.references Burgess, N., Goodyer, C., Hinds, C. N., & Lutz, D. R. (2019). High-Precision Anchored Accumulators for Reproducible Floating-Point Summation. IEEE Transactions on Computers, 68(7), 967-978. doi:10.1109/tc.2018.2855729 es_ES
dc.description.references D. Mukunoki, T. Ogita, K. Ozaki, Accurate and reproducible BLAS routines with Ozaki scheme for many-core architectures, in: Proc. International Conference on Parallel Processing and Applied Mathematics, PPAM2019, 2019, accepted. es_ES
dc.description.references Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818 es_ES
dc.description.references Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7 es_ES
dc.description.references Boldo, S., & Melquiond, G. (2008). Emulation of a FMA and Correctly Rounded Sums: Proved Algorithms Using Rounding to Odd. IEEE Transactions on Computers, 57(4), 462-471. doi:10.1109/tc.2007.70819 es_ES
dc.description.references Wiesenberger, M., Einkemmer, L., Held, M., Gutierrez-Milla, A., Sáez, X., & Iakymchuk, R. (2019). Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures. Computer Physics Communications, 238, 145-156. doi:10.1016/j.cpc.2018.12.006 es_ES
dc.description.references Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468 es_ES
dc.description.references J. Demmel, H.D. Nguyen, Fast reproducible floating-point summation, in: Proceedings of ARITH-21, 2013, pp. 163–172. es_ES
dc.description.references Ozaki, K., Ogita, T., Oishi, S., & Rump, S. M. (2011). Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications. Numerical Algorithms, 59(1), 95-118. doi:10.1007/s11075-011-9478-1 es_ES
dc.description.references Carson, E., & Higham, N. J. (2018). Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions. SIAM Journal on Scientific Computing, 40(2), A817-A847. doi:10.1137/17m1140819 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem