The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

Lakshminarasimhan, Kartik; Naithani, Ajeya; Feliu-Pérez, Josué; Eeckhout, Lieven

doi:10.1145/3499424

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Lakshminarasimhan ...

Tamaño: 1.519Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Lakshminarasimhan, Kartik	es_ES
dc.contributor.author	Naithani, Ajeya	es_ES
dc.contributor.author	Feliu-Pérez, Josué	es_ES
dc.contributor.author	Eeckhout, Lieven	es_ES
dc.date.accessioned	2023-05-23T18:01:56Z
dc.date.available	2023-05-23T18:01:56Z
dc.date.issued	2022-06	es_ES
dc.identifier.issn	1544-3566	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/193541
dc.description.abstract	[EN] Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.	es_ES
dc.description.sponsorship	This work is supported by the European Research Council (ERC) Advanced Grant agreement no. 741097, and FWO project G.0144.17N. Josue Feliu is supported by a Juan de la Cierva Formacion Contract (FJC2018-036021-I).	es_ES
dc.language	Inglés	es_ES
dc.publisher	Association for Computing Machinery	es_ES
dc.relation.ispartof	ACM Transactions on Architecture and Code Optimization	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Superscalar microarchitecture	es_ES
dc.subject	Slice-out-of-order	es_ES
dc.subject	Dynamic instruction scheduling	es_ES
dc.title	The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1145/3499424	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/741097/EU	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/ERC//741097//Load Slice Core: A Power and Cost-Efficient Microarchitecture for the Future/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/FWO//G.0144.17N/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MCIU//FJC2018-036021-I//Ayudas Juan de la Cierva - Formación/	es_ES
dc.rights.accessRights	Cerrado	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.description.bibliographicCitation	Lakshminarasimhan, K.; Naithani, A.; Feliu-Pérez, J.; Eeckhout, L. (2022). The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture. ACM Transactions on Architecture and Code Optimization. 19(2):1-25. https://doi.org/10.1145/3499424	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1145/3499424	es_ES
dc.description.upvformatpinicio	1	es_ES
dc.description.upvformatpfin	25	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	19	es_ES
dc.description.issue	2	es_ES
dc.relation.pasarela	S\488536	es_ES
dc.contributor.funder	European Research Council	es_ES
dc.contributor.funder	Research Foundation Flanders	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES
dc.contributor.funder	Ministerio de Ciencia, Innovación y Universidades	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)