Mostrar el registro sencillo del ítem
dc.contributor.author | Iakymchuk, Roman | es_ES |
dc.contributor.author | Barreda Vayá, Maria | es_ES |
dc.contributor.author | Graillat, Stef | es_ES |
dc.contributor.author | Aliaga, José I. | es_ES |
dc.contributor.author | Quintana Ortí, Enrique Salvador | es_ES |
dc.date.accessioned | 2021-07-17T03:34:38Z | |
dc.date.available | 2021-07-17T03:34:38Z | |
dc.date.issued | 2020-09 | es_ES |
dc.identifier.issn | 1094-3420 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/169416 | |
dc.description.abstract | [EN] The Preconditioned Conjugate Gradient method is often employed for the solution of linear systems of equations arising in numerical simulations of physical phenomena. While being widely used, the solver is also known for its lack of accuracy while computing the residual. In this article, we propose two algorithmic solutions that originate from the ExBLAS project to enhance the accuracy of the solver as well as to ensure its reproducibility in a hybrid MPI + OpenMP tasks programming environment. One is based on ExBLAS and preserves every bit of information until the final rounding, while the other relies upon floating-point expansions and, hence, expands the intermediate precision. Instead of converting the entire solver into its ExBLAS-related implementation, we identify those parts that violate reproducibility/non-associativity, secure them, and combine this with the sequential executions. These algorithmic strategies are reinforced with programmability suggestions to assure deterministic executions. Finally, we verify these approaches on two modern HPC systems: both versions deliver reproducible number of iterations, residuals, direct errors, and vector-solutions for the overhead of less than 37.7% on 768 cores. | es_ES |
dc.description.sponsorship | The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially supported by the European Union's Horizon 2020 research, innovation program under the Marie Sklodowska-Curie grant agreement via the Robust project No. 842528 as well as the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the H2020 EC RIA Programme; in particular, the author gratefully acknowledges the support of Vicenc comma Beltran and the computer resources and technical support provided by BSC. The researchers from Universitat Jaume I (UJI) and Universitat Polit ' ecnica de Valencia (UPV) were supported by MINECO project TIN2017-82972-R. Maria Barreda was also supported by the POSDOC-A/2017/11 project from the Universitat Jaume I. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | SAGE Publications | es_ES |
dc.relation.ispartof | International Journal of High Performance Computing Applications | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Preconditioned conjugate gradient | es_ES |
dc.subject | MPI | es_ES |
dc.subject | OpenMP tasks | es_ES |
dc.subject | Reproducibility | es_ES |
dc.subject | Accuracy | es_ES |
dc.subject | Floating-point expansion | es_ES |
dc.subject | Long accumulator | es_ES |
dc.subject | Fused multiply-add | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1177/1094342020932650 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/H2020/730897/EU/Transnational Access Programme for a Pan-European Network of HPC Research Infrastructures and Laboratories for scientific computing/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/UJI//POSDOC-A%2F2017%2F11/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/H2020/842528/EU/Robust and Energy-Efficient Numerical Solvers Towards Reliable and Sustainable Scientific Computations/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors | es_ES |
dc.description.bibliographicCitation | Iakymchuk, R.; Barreda Vayá, M.; Graillat, S.; Aliaga, JI.; Quintana Ortí, ES. (2020). Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments. International Journal of High Performance Computing Applications. 34(5):502-518. https://doi.org/10.1177/1094342020932650 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1177/1094342020932650 | es_ES |
dc.description.upvformatpinicio | 502 | es_ES |
dc.description.upvformatpfin | 518 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 34 | es_ES |
dc.description.issue | 5 | es_ES |
dc.relation.pasarela | S\417240 | es_ES |
dc.contributor.funder | European Commission | es_ES |
dc.contributor.funder | Universitat Jaume I | es_ES |
dc.contributor.funder | Agencia Estatal de Investigación | es_ES |
dc.description.references | Aliaga, J. I., Barreda, M., Flegar, G., Bollhöfer, M., & Quintana-Ortí, E. S. (2017). Communication in task-parallel ILU-preconditioned CG solvers using MPI + OmpSs. Concurrency and Computation: Practice and Experience, 29(21), e4280. doi:10.1002/cpe.4280 | es_ES |
dc.description.references | Bailey, D. H. (2013). High-precision computation: Applications and challenges [Keynote I]. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.39 | es_ES |
dc.description.references | Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., … van der Vorst, H. (1994). Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. doi:10.1137/1.9781611971538 | es_ES |
dc.description.references | Burgess, N., Goodyer, C., Hinds, C. N., & Lutz, D. R. (2019). High-Precision Anchored Accumulators for Reproducible Floating-Point Summation. IEEE Transactions on Computers, 68(7), 967-978. doi:10.1109/tc.2018.2855729 | es_ES |
dc.description.references | Carson, E., & Higham, N. J. (2018). Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions. SIAM Journal on Scientific Computing, 40(2), A817-A847. doi:10.1137/17m1140819 | es_ES |
dc.description.references | Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing, 49, 83-97. doi:10.1016/j.parco.2015.09.001 | es_ES |
dc.description.references | Dekker, T. J. (1971). A floating-point technique for extending the available precision. Numerische Mathematik, 18(3), 224-242. doi:10.1007/bf01397083 | es_ES |
dc.description.references | Demmel, J., & Hong Diep Nguyen. (2013). Fast Reproducible Floating-Point Summation. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.9 | es_ES |
dc.description.references | Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391 | es_ES |
dc.description.references | Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170 | es_ES |
dc.description.references | Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468 | es_ES |
dc.description.references | Hida, Y., Li, X. S., & Bailey, D. H. (s. f.). Algorithms for quad-double precision floating point arithmetic. Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. doi:10.1109/arith.2001.930115 | es_ES |
dc.description.references | Hunold, S., & Carpen-Amarie, A. (2016). Reproducible MPI Benchmarking is Still Not as Easy as You Think. IEEE Transactions on Parallel and Distributed Systems, 27(12), 3617-3630. doi:10.1109/tpds.2016.2539167 | es_ES |
dc.description.references | IEEE Computer Society (2008) IEEE Standard for Floating-Point Arithmetic. Piscataway: IEEE Standard, pp. 754–2008. | es_ES |
dc.description.references | Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7 | es_ES |
dc.description.references | Kulisch, U. (2013). Computer Arithmetic and Validity. doi:10.1515/9783110301793 | es_ES |
dc.description.references | Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979). Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software, 5(3), 308-323. doi:10.1145/355841.355847 | es_ES |
dc.description.references | Lutz, D. R., & Hinds, C. N. (2017). High-Precision Anchored Accumulators for Reproducible Floating-Point Summation. 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH). doi:10.1109/arith.2017.20 | es_ES |
dc.description.references | Mukunoki, D., Ogita, T., & Ozaki, K. (2020). Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures. Lecture Notes in Computer Science, 516-527. doi:10.1007/978-3-030-43229-4_44 | es_ES |
dc.description.references | Nguyen, H. D., & Demmel, J. (2015). Reproducible Tall-Skinny QR. 2015 IEEE 22nd Symposium on Computer Arithmetic. doi:10.1109/arith.2015.28 | es_ES |
dc.description.references | Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818 | es_ES |
dc.description.references | Ozaki, K., Ogita, T., Oishi, S., & Rump, S. M. (2011). Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications. Numerical Algorithms, 59(1), 95-118. doi:10.1007/s11075-011-9478-1 | es_ES |
dc.description.references | Priest, D. M. (s. f.). Algorithms for arbitrary precision floating point arithmetic. [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic. doi:10.1109/arith.1991.145549 | es_ES |
dc.description.references | Rump, S. M., Ogita, T., & Oishi, S. (2008). Accurate Floating-Point Summation Part I: Faithful Rounding. SIAM Journal on Scientific Computing, 31(1), 189-224. doi:10.1137/050645671 | es_ES |
dc.description.references | Rump, S. M., Ogita, T., & Oishi, S. (2009). Accurate Floating-Point Summation Part II: Sign, K-Fold Faithful and Rounding to Nearest. SIAM Journal on Scientific Computing, 31(2), 1269-1302. doi:10.1137/07068816x | es_ES |
dc.description.references | Rump, S. M., Ogita, T., & Oishi, S. (2010). Fast high precision summation. Nonlinear Theory and Its Applications, IEICE, 1(1), 2-24. doi:10.1587/nolta.1.2 | es_ES |
dc.description.references | Saad, Y. (2003). Iterative Methods for Sparse Linear Systems. doi:10.1137/1.9780898718003 | es_ES |
dc.description.references | Wiesenberger, M., Einkemmer, L., Held, M., Gutierrez-Milla, A., Sáez, X., & Iakymchuk, R. (2019). Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures. Computer Physics Communications, 238, 145-156. doi:10.1016/j.cpc.2018.12.006 | es_ES |