Ubal Tena, R.; Sahuquillo Borrás, J.; Petit Martí, SV.; López Rodríguez, PJ.; Duato Marín, JF. (2013). Hardware-based generation of independent subtraces of instructions in clustered processors. IEEE Transactions on Computers. 62(5):944-955. https://doi.org/10.1109/TC.2012.42
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/38251
Título:
|
Hardware-based generation of independent subtraces of instructions in clustered processors
|
Autor:
|
Ubal Tena, Rafael
Sahuquillo Borrás, Julio
Petit Martí, Salvador Vicente
López Rodríguez, Pedro Juan
Duato Marín, José Francisco
|
Entidad UPV:
|
Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors
|
Fecha difusión:
|
|
Resumen:
|
Multicore chips are currently dominating the microprocessor market as designs that improve performance and sustain power consumption. However, complex core features must be still considered to provide good performance for ...[+]
Multicore chips are currently dominating the microprocessor market as designs that improve performance and sustain power consumption. However, complex core features must be still considered to provide good performance for existing sequential applications. An effective approach to reduce core complexity without dramatically sacrificing performance is to distribute critical processor structures by using clustered microarchitectures. In these designs, communication latency among clusters is a critical performance bottleneck, and a good steering algorithm is required to reduce intercluster communication. In this paper, we propose a new energy-efficient microarchitectural approach that reduces intercluster communication by detecting and generating independent chains of instructions, referred to as subtraces, from the execution of sequential programs. The devised mechanism has been modeled on an x86-based trace-cache processor, where subtraces are built in the fill unit, stored in a trace cache, and individually steered to different clusters. Experimental results show that the proposal reaches performance speedups around 7 and 15 percent for point-to-point and bus-based interconnects, respectively, while achieving energy savings of up to 12 percent.
[-]
|
Palabras clave:
|
Clustered Processors
,
Trace Caches
,
Hardware extraction of paralelism
|
Derechos de uso:
|
Reserva de todos los derechos
|
Fuente:
|
IEEE Transactions on Computers. (issn:
0018-9340
)
|
DOI:
|
10.1109/TC.2012.42
|
Editorial:
|
Institute of Electrical and Electronics Engineers (IEEE)
|
Versión del editor:
|
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6148217
|
Código del Proyecto:
|
info:eu-repo/grantAgreement/MICINN//TIN2009-14475-C04-01/ES/Arquitecturas De Servidores, Aplicaciones Y Servicios/
info:eu-repo/grantAgreement/EC/FP7/287759/EU/High Performance and Embedded Architecture and Compilation/
info:eu-repo/grantAgreement/MEC//CSD2006-00046/ES/High-performance, reliable architectures for data centers and Internet servers/
|
Descripción:
|
© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be
obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new
collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted
component of this work in other works.
|
Agradecimientos:
|
This work was supported by the Spanish MICINN, Consolider Programme, and Plan E funds, as well as European Commission FEDER funds, under Grants CSD2006-00046 and TIN2009-14475-C04-01.
|
Tipo:
|
Artículo
|