Mostrar el registro sencillo del ítem
dc.contributor.author | Anzt, Hartwig | es_ES |
dc.contributor.author | Dongarra, Jack | es_ES |
dc.contributor.author | Quintana Ortí, Enrique Salvador | es_ES |
dc.date.accessioned | 2021-02-03T04:32:20Z | |
dc.date.available | 2021-02-03T04:32:20Z | |
dc.date.issued | 2019-09 | es_ES |
dc.identifier.issn | 1877-7503 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/160574 | |
dc.description.abstract | [EN] Resilience is considered a challenging under-addressed issue that the high performance computing community (HPC) will have to face in order to produce reliable Exascale systems by the beginning of the next decade. As part of a push toward a resilient HPC ecosystem, in this paper we propose an error-resilient iterative solver for sparse linear systems based on stationary component-wise relaxation methods. Starting from a plain implementation of the Jacobi iteration, our approach introduces a low-cost component-wise technique that detects bit-flips, rejecting some component updates, and turning the initial synchronized solver into an asynchronous iteration. Our experimental study with sparse incomplete factorizations from a collection of real-world applications, and a practical GPU implementation, exposes the convergence delay incurred by the fault-tolerant implementation and its practical performance. | es_ES |
dc.description.sponsorship | This material is based upon work supported in part by the U.S. Department of Energy (Award Number DE-SC-0010042) and NVIDIA. E. S. Quintana-Orti was supported by project CICYT TIN2014-53495-R of MINECO and FEDER. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Elsevier | es_ES |
dc.relation.ispartof | Journal of Computational Science | es_ES |
dc.rights | Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) | es_ES |
dc.subject | Sparse linear systems | es_ES |
dc.subject | Iterative solvers | es_ES |
dc.subject | Jacobi method | es_ES |
dc.subject | Fault tolerance | es_ES |
dc.subject | Bit flips | es_ES |
dc.subject | High performance computing | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | Fine-grained bit-flip protection for relaxation methods | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1016/j.jocs.2016.11.013 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2014-53495-R/ES/COMPUTACION HETEROGENEA DE BAJO CONSUMO/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/DOE//DE-SC-0010042/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors | es_ES |
dc.description.bibliographicCitation | Anzt, H.; Dongarra, J.; Quintana Ortí, ES. (2019). Fine-grained bit-flip protection for relaxation methods. Journal of Computational Science. 36:1-11. https://doi.org/10.1016/j.jocs.2016.11.013 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1016/j.jocs.2016.11.013 | es_ES |
dc.description.upvformatpinicio | 1 | es_ES |
dc.description.upvformatpfin | 11 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 36 | es_ES |
dc.relation.pasarela | S\395107 | es_ES |
dc.contributor.funder | U.S. Department of Energy | es_ES |
dc.contributor.funder | European Regional Development Fund | es_ES |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |
dc.description.references | Chow, E., & Patel, A. (2015). Fine-Grained Parallel Incomplete LU Factorization. SIAM Journal on Scientific Computing, 37(2), C169-C193. doi:10.1137/140968896 | es_ES |
dc.description.references | Karpuzcu, U. R., Kim, N. S., & Torrellas, J. (2013). Coping with Parametric Variation at Near-Threshold Voltages. IEEE Micro, 33(4), 6-14. doi:10.1109/mm.2013.71 | es_ES |
dc.description.references | Bronevetsky, G., & de Supinski, B. (2008). Soft error vulnerability of iterative linear algebra methods. Proceedings of the 22nd annual international conference on Supercomputing - ICS ’08. doi:10.1145/1375527.1375552 | es_ES |
dc.description.references | Sao, P., & Vuduc, R. (2013). Self-stabilizing iterative solvers. Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - ScalA ’13. doi:10.1145/2530268.2530272 | es_ES |
dc.description.references | Calhoun, J., Snir, M., Olson, L., & Garzaran, M. (2015). Understanding the Propagation of Error Due to a Silent Data Corruption in a Sparse Matrix Vector Multiply. 2015 IEEE International Conference on Cluster Computing. doi:10.1109/cluster.2015.101 | es_ES |
dc.description.references | Chazan, D., & Miranker, W. (1969). Chaotic relaxation. Linear Algebra and its Applications, 2(2), 199-222. doi:10.1016/0024-3795(69)90028-7 | es_ES |
dc.description.references | Frommer, A., & Szyld, D. B. (2000). On asynchronous iterations. Journal of Computational and Applied Mathematics, 123(1-2), 201-216. doi:10.1016/s0377-0427(00)00409-x | es_ES |
dc.description.references | Duff, I. S., & Meurant, G. A. (1989). The effect of ordering on preconditioned conjugate gradients. BIT, 29(4), 635-657. doi:10.1007/bf01932738 | es_ES |
dc.description.references | Aliaga, J. I., Barreda, M., Dolz, M. F., Martín, A. F., Mayo, R., & Quintana-Ortí, E. S. (2014). Assessing the impact of the CPU power-saving modes on the task-parallel solution of sparse linear systems. Cluster Computing, 17(4), 1335-1348. doi:10.1007/s10586-014-0402-z | es_ES |