- -

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Candel-Margaix, Francisco es_ES
dc.contributor.author Valero Bresó, Alejandro es_ES
dc.contributor.author Petit Martí, Salvador Vicente es_ES
dc.contributor.author Sahuquillo Borrás, Julio es_ES
dc.date.accessioned 2020-11-14T04:31:46Z
dc.date.available 2020-11-14T04:31:46Z
dc.date.issued 2019-10-01 es_ES
dc.identifier.issn 0018-9340 es_ES
dc.identifier.uri http://hdl.handle.net/10251/155061
dc.description "© 2019 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works." es_ES
dc.description.abstract [EN] To support the massive amount of memory accesses that GPGPU applications generate, GPU memory hierarchies are becoming more and more complex, and the Last Level Cache (LLC) size considerably increases each GPU generation. This paper shows that counter-intuitively, enlarging the LLC brings marginal performance gains in most applications. In other words, increasing the LLC size does not scale neither in performance nor energy consumption. We examine how LLC misses are managed in typical GPUs, and we find that in most cases the way LLC misses are managed are precisely the main performance limiter. This paper proposes a novel approach that addresses this shortcoming by leveraging a tiny additional Fetch and Replacement Cache-like structure (FRC) that stores control and coherence information of the incoming blocks until they are fetched from main memory. Then, the fetched blocks are swapped with the victim blocks (i.e., selected to be replaced) in the LLC, and the eviction of such victim blocks is performed from the FRC. This approach improves performance due to three main reasons: i) the lifetime of blocks being replaced is enlarged, ii) the main memory path is unclogged on long bursts of LLC misses, and iii) the average LLC miss latency is reduced. The proposal improves the LLC hit ratio, memory-level parallelism, and reduces the miss latency compared to much larger conventional caches. Moreover, this is achieved with reduced energy consumption and with much less area requirements. Experimental results show that the proposed FRC cache scales in performance with the number of GPU compute units and the LLC size, since, depending on the FRC size, performance improves ranging from 30 to 67 percent for a modern baseline GPU card, and from 32 to 118 percent for a larger GPU. In addition, energy consumption is reduced on average from 49 to 57 percent for the larger GPU. These benefits come with a small area increase (by 7.3 percent) over the LLC baseline. es_ES
dc.description.sponsorship This work has been supported by the Spanish Ministerio de Ciencia, Innovacion y Universidades and the European ERDF under Grants T-PARCCA (RTI2018-098156-B-C51), and TIN2016-76635-C2-1-R (AEI/ERDF, EU), by the Universitat Politecnica de Valencia under Grant SP20190169, and by the gaZ: T58_17R research group (Aragon Gov. and European ESF). es_ES
dc.language Inglés es_ES
dc.publisher Institute of Electrical and Electronics Engineers es_ES
dc.relation.ispartof IEEE Transactions on Computers es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject GPU es_ES
dc.subject Memory hierarchy es_ES
dc.subject Miss management es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1109/TC.2019.2907591 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/Gobierno de Aragón//T58_17R/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-098156-B-C51/ES/TECNOLOGIAS INNOVADORAS DE PROCESADORES, ACELERADORES Y REDES, PARA CENTROS DE DATOS Y COMPUTACION DE ALTAS PRESTACIONES/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2016-76635-C2-1-R/ES/ARQUITECTURA Y PROGRAMACION DE COMPUTADORES ESCALABLES DE ALTO RENDIMIENTO Y BAJO CONSUMO/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/UPV//SP20190169/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Candel-Margaix, F.; Valero Bresó, A.; Petit Martí, SV.; Sahuquillo Borrás, J. (2019). Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance. IEEE Transactions on Computers. 68(10):1442-1454. https://doi.org/10.1109/TC.2019.2907591 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1109/TC.2019.2907591 es_ES
dc.description.upvformatpinicio 1442 es_ES
dc.description.upvformatpfin 1454 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 68 es_ES
dc.description.issue 10 es_ES
dc.relation.pasarela S\384158 es_ES
dc.contributor.funder Gobierno de Aragón es_ES
dc.contributor.funder European Social Fund es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem