Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments

Iakymchuk, Roman; Barreda Vayá, Maria; Graillat, Stef; Aliaga, José I.; Quintana Ortí, Enrique Salvador

doi:10.1177/1094342020932650

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: IakymchukBarredaG ...

Tamaño: 558.8Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: 1094342020932650.pdf

Tamaño: 1.050Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Iakymchuk, Roman	es_ES
dc.contributor.author	Barreda Vayá, Maria	es_ES
dc.contributor.author	Graillat, Stef	es_ES
dc.contributor.author	Aliaga, José I.	es_ES
dc.contributor.author	Quintana Ortí, Enrique Salvador	es_ES
dc.date.accessioned	2021-07-17T03:34:38Z
dc.date.available	2021-07-17T03:34:38Z
dc.date.issued	2020-09	es_ES
dc.identifier.issn	1094-3420	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/169416
dc.description.abstract	[EN] The Preconditioned Conjugate Gradient method is often employed for the solution of linear systems of equations arising in numerical simulations of physical phenomena. While being widely used, the solver is also known for its lack of accuracy while computing the residual. In this article, we propose two algorithmic solutions that originate from the ExBLAS project to enhance the accuracy of the solver as well as to ensure its reproducibility in a hybrid MPI + OpenMP tasks programming environment. One is based on ExBLAS and preserves every bit of information until the final rounding, while the other relies upon floating-point expansions and, hence, expands the intermediate precision. Instead of converting the entire solver into its ExBLAS-related implementation, we identify those parts that violate reproducibility/non-associativity, secure them, and combine this with the sequential executions. These algorithmic strategies are reinforced with programmability suggestions to assure deterministic executions. Finally, we verify these approaches on two modern HPC systems: both versions deliver reproducible number of iterations, residuals, direct errors, and vector-solutions for the overhead of less than 37.7% on 768 cores.	es_ES
dc.description.sponsorship	The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially supported by the European Union's Horizon 2020 research, innovation program under the Marie Sklodowska-Curie grant agreement via the Robust project No. 842528 as well as the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the H2020 EC RIA Programme; in particular, the author gratefully acknowledges the support of Vicenc comma Beltran and the computer resources and technical support provided by BSC. The researchers from Universitat Jaume I (UJI) and Universitat Polit ' ecnica de Valencia (UPV) were supported by MINECO project TIN2017-82972-R. Maria Barreda was also supported by the POSDOC-A/2017/11 project from the Universitat Jaume I.	es_ES
dc.language	Inglés	es_ES
dc.publisher	SAGE Publications	es_ES
dc.relation.ispartof	International Journal of High Performance Computing Applications	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Preconditioned conjugate gradient	es_ES
dc.subject	MPI	es_ES
dc.subject	OpenMP tasks	es_ES
dc.subject	Reproducibility	es_ES
dc.subject	Accuracy	es_ES
dc.subject	Floating-point expansion	es_ES
dc.subject	Long accumulator	es_ES
dc.subject	Fused multiply-add	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1177/1094342020932650	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/730897/EU/Transnational Access Programme for a Pan-European Network of HPC Research Infrastructures and Laboratories for scientific computing/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/UJI//POSDOC-A%2F2017%2F11/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/842528/EU/Robust and Energy-Efficient Numerical Solvers Towards Reliable and Sustainable Scientific Computations/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.description.bibliographicCitation	Iakymchuk, R.; Barreda Vayá, M.; Graillat, S.; Aliaga, JI.; Quintana Ortí, ES. (2020). Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments. International Journal of High Performance Computing Applications. 34(5):502-518. https://doi.org/10.1177/1094342020932650	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1177/1094342020932650	es_ES
dc.description.upvformatpinicio	502	es_ES
dc.description.upvformatpfin	518	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	34	es_ES
dc.description.issue	5	es_ES
dc.relation.pasarela	S\417240	es_ES
dc.contributor.funder	European Commission	es_ES
dc.contributor.funder	Universitat Jaume I	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES
dc.description.references	Aliaga, J. I., Barreda, M., Flegar, G., Bollhöfer, M., & Quintana-Ortí, E. S. (2017). Communication in task-parallel ILU-preconditioned CG solvers using MPI + OmpSs. Concurrency and Computation: Practice and Experience, 29(21), e4280. doi:10.1002/cpe.4280	es_ES
dc.description.references	Bailey, D. H. (2013). High-precision computation: Applications and challenges [Keynote I]. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.39	es_ES
dc.description.references	Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., … van der Vorst, H. (1994). Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. doi:10.1137/1.9781611971538	es_ES
dc.description.references	Burgess, N., Goodyer, C., Hinds, C. N., & Lutz, D. R. (2019). High-Precision Anchored Accumulators for Reproducible Floating-Point Summation. IEEE Transactions on Computers, 68(7), 967-978. doi:10.1109/tc.2018.2855729	es_ES
dc.description.references	Carson, E., & Higham, N. J. (2018). Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions. SIAM Journal on Scientific Computing, 40(2), A817-A847. doi:10.1137/17m1140819	es_ES
dc.description.references	Collange, S., Defour, D., Graillat, S., & Iakymchuk, R. (2015). Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing, 49, 83-97. doi:10.1016/j.parco.2015.09.001	es_ES
dc.description.references	Dekker, T. J. (1971). A floating-point technique for extending the available precision. Numerische Mathematik, 18(3), 224-242. doi:10.1007/bf01397083	es_ES
dc.description.references	Demmel, J., & Hong Diep Nguyen. (2013). Fast Reproducible Floating-Point Summation. 2013 IEEE 21st Symposium on Computer Arithmetic. doi:10.1109/arith.2013.9	es_ES
dc.description.references	Demmel, J., & Nguyen, H. D. (2015). Parallel Reproducible Summation. IEEE Transactions on Computers, 64(7), 2060-2070. doi:10.1109/tc.2014.2345391	es_ES
dc.description.references	Dongarra, J. J., Du Croz, J., Hammarling, S., & Duff, I. S. (1990). A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1), 1-17. doi:10.1145/77626.79170	es_ES
dc.description.references	Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., & Zimmermann, P. (2007). MPFR. ACM Transactions on Mathematical Software, 33(2), 13. doi:10.1145/1236463.1236468	es_ES
dc.description.references	Hida, Y., Li, X. S., & Bailey, D. H. (s. f.). Algorithms for quad-double precision floating point arithmetic. Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. doi:10.1109/arith.2001.930115	es_ES
dc.description.references	Hunold, S., & Carpen-Amarie, A. (2016). Reproducible MPI Benchmarking is Still Not as Easy as You Think. IEEE Transactions on Parallel and Distributed Systems, 27(12), 3617-3630. doi:10.1109/tpds.2016.2539167	es_ES
dc.description.references	IEEE Computer Society (2008) IEEE Standard for Floating-Point Arithmetic. Piscataway: IEEE Standard, pp. 754–2008.	es_ES
dc.description.references	Kulisch, U., & Snyder, V. (2010). The exact dot product as basic tool for long interval arithmetic. Computing, 91(3), 307-313. doi:10.1007/s00607-010-0127-7	es_ES
dc.description.references	Kulisch, U. (2013). Computer Arithmetic and Validity. doi:10.1515/9783110301793	es_ES
dc.description.references	Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979). Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software, 5(3), 308-323. doi:10.1145/355841.355847	es_ES
dc.description.references	Lutz, D. R., & Hinds, C. N. (2017). High-Precision Anchored Accumulators for Reproducible Floating-Point Summation. 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH). doi:10.1109/arith.2017.20	es_ES
dc.description.references	Mukunoki, D., Ogita, T., & Ozaki, K. (2020). Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures. Lecture Notes in Computer Science, 516-527. doi:10.1007/978-3-030-43229-4_44	es_ES
dc.description.references	Nguyen, H. D., & Demmel, J. (2015). Reproducible Tall-Skinny QR. 2015 IEEE 22nd Symposium on Computer Arithmetic. doi:10.1109/arith.2015.28	es_ES
dc.description.references	Ogita, T., Rump, S. M., & Oishi, S. (2005). Accurate Sum and Dot Product. SIAM Journal on Scientific Computing, 26(6), 1955-1988. doi:10.1137/030601818	es_ES
dc.description.references	Ozaki, K., Ogita, T., Oishi, S., & Rump, S. M. (2011). Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications. Numerical Algorithms, 59(1), 95-118. doi:10.1007/s11075-011-9478-1	es_ES
dc.description.references	Priest, D. M. (s. f.). Algorithms for arbitrary precision floating point arithmetic. [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic. doi:10.1109/arith.1991.145549	es_ES
dc.description.references	Rump, S. M., Ogita, T., & Oishi, S. (2008). Accurate Floating-Point Summation Part I: Faithful Rounding. SIAM Journal on Scientific Computing, 31(1), 189-224. doi:10.1137/050645671	es_ES
dc.description.references	Rump, S. M., Ogita, T., & Oishi, S. (2009). Accurate Floating-Point Summation Part II: Sign, K-Fold Faithful and Rounding to Nearest. SIAM Journal on Scientific Computing, 31(2), 1269-1302. doi:10.1137/07068816x	es_ES
dc.description.references	Rump, S. M., Ogita, T., & Oishi, S. (2010). Fast high precision summation. Nonlinear Theory and Its Applications, IEICE, 1(1), 2-24. doi:10.1587/nolta.1.2	es_ES
dc.description.references	Saad, Y. (2003). Iterative Methods for Sparse Linear Systems. doi:10.1137/1.9780898718003	es_ES
dc.description.references	Wiesenberger, M., Einkemmer, L., Held, M., Gutierrez-Milla, A., Sáez, X., & Iakymchuk, R. (2019). Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures. Computer Physics Communications, 238, 145-156. doi:10.1016/j.cpc.2018.12.006	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)