- -

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Lakshminarasimhan, Kartik es_ES
dc.contributor.author Naithani, Ajeya es_ES
dc.contributor.author Feliu-Pérez, Josué es_ES
dc.contributor.author Eeckhout, Lieven es_ES
dc.date.accessioned 2023-05-23T18:01:56Z
dc.date.available 2023-05-23T18:01:56Z
dc.date.issued 2022-06 es_ES
dc.identifier.issn 1544-3566 es_ES
dc.identifier.uri http://hdl.handle.net/10251/193541
dc.description.abstract [EN] Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity. es_ES
dc.description.sponsorship This work is supported by the European Research Council (ERC) Advanced Grant agreement no. 741097, and FWO project G.0144.17N. Josue Feliu is supported by a Juan de la Cierva Formacion Contract (FJC2018-036021-I). es_ES
dc.language Inglés es_ES
dc.publisher Association for Computing Machinery es_ES
dc.relation.ispartof ACM Transactions on Architecture and Code Optimization es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Superscalar microarchitecture es_ES
dc.subject Slice-out-of-order es_ES
dc.subject Dynamic instruction scheduling es_ES
dc.title The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1145/3499424 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/741097/EU es_ES
dc.relation.projectID info:eu-repo/grantAgreement/ERC//741097//Load Slice Core: A Power and Cost-Efficient Microarchitecture for the Future/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/FWO//G.0144.17N/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MCIU//FJC2018-036021-I//Ayudas Juan de la Cierva - Formación/ es_ES
dc.rights.accessRights Cerrado es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Lakshminarasimhan, K.; Naithani, A.; Feliu-Pérez, J.; Eeckhout, L. (2022). The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture. ACM Transactions on Architecture and Code Optimization. 19(2):1-25. https://doi.org/10.1145/3499424 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1145/3499424 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 25 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 19 es_ES
dc.description.issue 2 es_ES
dc.relation.pasarela S\488536 es_ES
dc.contributor.funder European Research Council es_ES
dc.contributor.funder Research Foundation Flanders es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Ministerio de Ciencia, Innovación y Universidades es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem