- -

Fine-grained bit-flip protection for relaxation methods

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Fine-grained bit-flip protection for relaxation methods

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Anzt, Hartwig es_ES
dc.contributor.author Dongarra, Jack es_ES
dc.contributor.author Quintana Ortí, Enrique Salvador es_ES
dc.date.accessioned 2021-02-03T04:32:20Z
dc.date.available 2021-02-03T04:32:20Z
dc.date.issued 2019-09 es_ES
dc.identifier.issn 1877-7503 es_ES
dc.identifier.uri http://hdl.handle.net/10251/160574
dc.description.abstract [EN] Resilience is considered a challenging under-addressed issue that the high performance computing community (HPC) will have to face in order to produce reliable Exascale systems by the beginning of the next decade. As part of a push toward a resilient HPC ecosystem, in this paper we propose an error-resilient iterative solver for sparse linear systems based on stationary component-wise relaxation methods. Starting from a plain implementation of the Jacobi iteration, our approach introduces a low-cost component-wise technique that detects bit-flips, rejecting some component updates, and turning the initial synchronized solver into an asynchronous iteration. Our experimental study with sparse incomplete factorizations from a collection of real-world applications, and a practical GPU implementation, exposes the convergence delay incurred by the fault-tolerant implementation and its practical performance. es_ES
dc.description.sponsorship This material is based upon work supported in part by the U.S. Department of Energy (Award Number DE-SC-0010042) and NVIDIA. E. S. Quintana-Orti was supported by project CICYT TIN2014-53495-R of MINECO and FEDER. es_ES
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation.ispartof Journal of Computational Science es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Sparse linear systems es_ES
dc.subject Iterative solvers es_ES
dc.subject Jacobi method es_ES
dc.subject Fault tolerance es_ES
dc.subject Bit flips es_ES
dc.subject High performance computing es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Fine-grained bit-flip protection for relaxation methods es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.jocs.2016.11.013 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2014-53495-R/ES/COMPUTACION HETEROGENEA DE BAJO CONSUMO/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/DOE//DE-SC-0010042/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Anzt, H.; Dongarra, J.; Quintana Ortí, ES. (2019). Fine-grained bit-flip protection for relaxation methods. Journal of Computational Science. 36:1-11. https://doi.org/10.1016/j.jocs.2016.11.013 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1016/j.jocs.2016.11.013 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 11 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 36 es_ES
dc.relation.pasarela S\395107 es_ES
dc.contributor.funder U.S. Department of Energy es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.description.references Chow, E., & Patel, A. (2015). Fine-Grained Parallel Incomplete LU Factorization. SIAM Journal on Scientific Computing, 37(2), C169-C193. doi:10.1137/140968896 es_ES
dc.description.references Karpuzcu, U. R., Kim, N. S., & Torrellas, J. (2013). Coping with Parametric Variation at Near-Threshold Voltages. IEEE Micro, 33(4), 6-14. doi:10.1109/mm.2013.71 es_ES
dc.description.references Bronevetsky, G., & de Supinski, B. (2008). Soft error vulnerability of iterative linear algebra methods. Proceedings of the 22nd annual international conference on Supercomputing - ICS ’08. doi:10.1145/1375527.1375552 es_ES
dc.description.references Sao, P., & Vuduc, R. (2013). Self-stabilizing iterative solvers. Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - ScalA ’13. doi:10.1145/2530268.2530272 es_ES
dc.description.references Calhoun, J., Snir, M., Olson, L., & Garzaran, M. (2015). Understanding the Propagation of Error Due to a Silent Data Corruption in a Sparse Matrix Vector Multiply. 2015 IEEE International Conference on Cluster Computing. doi:10.1109/cluster.2015.101 es_ES
dc.description.references Chazan, D., & Miranker, W. (1969). Chaotic relaxation. Linear Algebra and its Applications, 2(2), 199-222. doi:10.1016/0024-3795(69)90028-7 es_ES
dc.description.references Frommer, A., & Szyld, D. B. (2000). On asynchronous iterations. Journal of Computational and Applied Mathematics, 123(1-2), 201-216. doi:10.1016/s0377-0427(00)00409-x es_ES
dc.description.references Duff, I. S., & Meurant, G. A. (1989). The effect of ordering on preconditioned conjugate gradients. BIT, 29(4), 635-657. doi:10.1007/bf01932738 es_ES
dc.description.references Aliaga, J. I., Barreda, M., Dolz, M. F., Martín, A. F., Mayo, R., & Quintana-Ortí, E. S. (2014). Assessing the impact of the CPU power-saving modes on the task-parallel solution of sparse linear systems. Cluster Computing, 17(4), 1335-1348. doi:10.1007/s10586-014-0402-z es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem