Mostrar el registro sencillo del ítem
dc.contributor.author | Grützmacher, Thomas | es_ES |
dc.contributor.author | Anzt, Hartwig | es_ES |
dc.contributor.author | Quintana-Ortí, Enrique S. | es_ES |
dc.date.accessioned | 2023-07-21T18:06:10Z | |
dc.date.available | 2023-07-21T18:06:10Z | |
dc.date.issued | 2023-01 | es_ES |
dc.identifier.issn | 0038-0644 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/195332 | |
dc.description.abstract | [EN] The roofline model not only provides a powerful tool to relate an application's performance with the specific constraints imposed by the target hardware but also offers a graphic representation of the balance between memory access cost and compute throughput. In this work, we present a strategy to break up the tight coupling between the precision format used for arithmetic operations and the storage format employed for memory operations. (At a high level, this idea is equivalent to compressing/decompressing the data in registers before/after invoking store/load memory operations.) In practice, we demonstrate that a "memory accessor" that hides the data compression behind the memory access, can virtually push the bandwidth-induced roofline, yielding higher performance for memory-bound applications using high precision arithmetic that can handle the numerical effects associated with lossy compression. We also demonstrate that memory-bound applications operating on low precision data can increase the accuracy by relying on the memory accessor to perform all arithmetic operations in high precision. In particular, we demonstrate that memory-bound BLAS operations (including the sparse matrix-vector product) can be re-engineered with the memory accessor and that the resulting accessor-enabled BLAS routines achieve lower rounding errors while delivering the same performance as the fast low precision BLAS. | es_ES |
dc.description.sponsorship | Helmholtz-Gemeinschaft, Grant/Award Number: VH-NG-1241; US Exascale Computing Project, Grant/Award Number: 17-SC-20-SC | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | John Wiley & Sons | es_ES |
dc.relation.ispartof | Software Practice and Experience | es_ES |
dc.rights | Reconocimiento (by) | es_ES |
dc.subject | Accessor | es_ES |
dc.subject | Floating-point formats | es_ES |
dc.subject | High performance | es_ES |
dc.subject | Memory-bound algorithms | es_ES |
dc.subject | Mixed precision | es_ES |
dc.subject | Roofline model | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | Using Ginkgo's memory accessor for improving the accuracy of memory-bound low precision BLAS | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1002/spe.3041 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/DOE//17-SC-20-SC//Exascale Computing Project/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/Helmholtz Association of German Research Centers//VH-NG-1241/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica | es_ES |
dc.description.bibliographicCitation | Grützmacher, T.; Anzt, H.; Quintana-Ortí, ES. (2023). Using Ginkgo's memory accessor for improving the accuracy of memory-bound low precision BLAS. Software Practice and Experience. 53(1):81-98. https://doi.org/10.1002/spe.3041 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1002/spe.3041 | es_ES |
dc.description.upvformatpinicio | 81 | es_ES |
dc.description.upvformatpfin | 98 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 53 | es_ES |
dc.description.issue | 1 | es_ES |
dc.relation.pasarela | S\479300 | es_ES |
dc.contributor.funder | U.S. Department of Energy | es_ES |
dc.contributor.funder | Helmholtz Association of German Research Centers | es_ES |