Reaño González, Carlos; Silla Jiménez, Federico; Peña Monferrer, Antonio José; Shainer, Gilad; Schultz, Scot; Castello Gimeno, Adrián; Quintana Ortí, Enrique Salvador; Duato Marín, José Francisco(IEEE, 2014-09-22)
[EN] A clear trend has emerged involving the acceleration of scientific applications by using GPUs. However, the capabilities of these devices are still generally underutilized. Remote GPU virtualization techniques can ...
Feliu Pérez, Josué; Petit Martí, Salvador Vicente; Sahuquillo Borrás, Julio; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers (IEEE), 2014-03)
To improve chip multiprocessor (CMP) performance, recent research has focused on scheduling strategies to mitigate main memory bandwidth contention. Nowadays, commercial CMPs implement multilevel cache hierarchies that are ...
Hernández Luz, Carles; Roca Pérez, Antoni; Flich Cardo, José; Silla Jiménez, Federico; Duato Marín, José Francisco(Elsevier, 2011-05)
[EN] Current integration scales make possible to design chip multiprocessors with a large amount of cores interconnected by a NoC. Unfortunately, they also bring process variation, posing a new burden to processor ...
Lorente Garcés, Vicente Jesús; Valero Bresó, Alejandro; Sahuquillo Borrás, Julio; Petit Martí, Salvador Vicente; Canal, Ramón; López Rodríguez, Pedro Juan; Duato Marín, José Francisco(IEEE, ACM, 2013-03-18)
Low-power modes in modern microprocessors rely
on low frequencies and low voltages to reduce the energy budget.
Nevertheless, manufacturing induced parameter variations can
make SRAM cells unreliable producing hard ...
Valero Bresó, Alejandro; Sahuquillo Borrás, Julio; Petit Martí, Salvador Vicente; López Rodríguez, Pedro Juan; Duato Marín, José Francisco(Association for Computing Machinery (ACM), 2012)
Memory latency has become an important performance bottleneck in current microprocessors. This problem aggravates as the number of cores sharing the same memory controller increases. To palliate this problem, a common ...
Andújar-Muñoz, Francisco José; Villar, Juan Antonio; Sanchez Garcia, Jose Luis; Alfaro Cortes, Francisco Jose; Duato Marín, José Francisco; Fröning, Holger(John Wiley & Sons, 2017)
[EN] In the Top500 and Graph500 lists of the last years, some of the most powerful systems implement
a torus topology to interconnect themillions of computing nodes they include. Some of these torus
networks are of five ...
Escudero, Jesús; García García, Pedro Javier; Quiles Flor, Francisco Jose; Flich Cardo, José; Duato Marín, José Francisco(Wiley-Blackwell, 2011)
The fat-tree is one of the most common topologies among the interconnection networks of the systems currently used for high-performance parallel computing. Among other advantages, fat-trees allow the use of simple but very ...
Rodrigo Mocholí, Samuel; Flich Cardo, José; Roca Pérez, Antoni; Medardoni, Simone; Bertozzi, Davide; Camacho Villanueva, Jesús; Silla Jiménez, Federico; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers (IEEE), 2011-04)
[EN] The high-performance computing domain is enriching with the inclusion of networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while ...
Reaño González, Carlos; Peña Monferrer, Antonio José; Silla Jiménez, Federico; Duato Marín, José Francisco; Mayo Gual, Rafael; Quintana Ortí, Enrique Salvador(IEEE, 2012)
GPUs are being increasingly embraced by the high performance computing and computational communities as an effective way of considerably reducing execution time by accelerating significant parts of their application codes. ...
Valero Bresó, Alejandro; Sahuquillo Borrás, Julio; Petit Martí, Salvador Vicente; López Rodríguez, Pedro Juan; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers (IEEE), 2015-07)
In recent years, embedded dynamic random-access memory (eDRAM) technology has been implemented in last-level
caches due to its low leakage energy consumption and high density. However, the fact that eDRAM presents slower ...
Valero Bresó, Alejandro; Petit Martí, Salvador Vicente; Sahuquillo Borrás, Julio; López Rodríguez, Pedro Juan; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers (IEEE), 2012-09)
SRAM and DRAM have been the predominant technologies used to implement memory cells in computer systems, each one having its advantages and shortcomings. SRAM cells are faster and require no refresh since reads are not ...
Yuste Romero, David(Universitat Politècnica de València, 2011-11-17)
Desarrollo de una técnica para la paralelización automática de código secuencial basada en la ejecución concurrente de llamadas a función independientes. También forma parte de esta tesina la implementación de dicha técnica ...
Escudero Sahuquillo, Jesús; Gran, Ernst Gunnar; Garcia Garcia, Pedro-Javier; Flich Cardo, José; Skeie, Tor; Lysne, Olav; Quiles Flor, Francisco Jose; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers (IEEE), 2015-01)
Interconnection networks are key components in high-performance computing (HPC) systems, their performance having a
strong influence on the overall system one. However, at high load, congestion and its negative effects ...
Cuesta Sáez, Blas Antonio; Robles Martínez, Antonio; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers (IEEE), 2011-10)
[EN] Token Coherence is a cache coherence protocol that simultaneously captures the best attributes of the traditional
approximations to coherence: direct communication between processors (like snooping-based protocols) ...
Cano Reyes, José; Flich Cardo, José; Roca Pérez, Antoni; Duato Marín, José Francisco; Coppola, Marcello; Locatelli, Riccardo(Institute of Electrical and Electronics Engineers (IEEE), 2014-03)
In application-specific SoCs, the irregularity of the topology ends up in a complex and customized implementation of the routing algorithm, usually relying on routing tables implemented with memory structures at source end ...
Cuesta Sáez, Blas Antonio(Universitat Politècnica de València, 2009-07-17)
Cache coherence protocols based on tokens can provide low latency without relying on non-scalable interconnects thanks to the use of efficient requests that are unordered. However, when these unordered requests contend for ...
Esteve García, Albert; Ros Bardisa, Alberto; Gómez Requena, María Engracia; Robles Martínez, Antonio; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers (IEEE), 2016-03)
Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are referenced by a single thread, i.e., private. Recent proposals leverage this observation to improve many aspects ...
Picornell-Sanjuan, Tomás; Flich Cardo, José; Hernández Luz, Carles; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers, 2021-02-01)
[EN] The ever need for higher performance forces industry to include technology based on multi-processors system on chip (MPSoCs) in their safety-critical embedded systems. MPSoCs include a network-on-chip (NoC) to ...
Valero Bresó, Alejandro; Sahuquillo Borrás, Julio; Petit Martí, Salvador Vicente; Duato Marín, José Francisco(ACM, 2013-06)
This work introduces a novel refresh mechanism that leverages
reuse information to decide which blocks should be refreshed in an
energy-aware eDRAM last-level cache. Experimental results show
that, compared to a ...
Ros Bardisa, Alberto; Cuesta Sáez, Blas Antonio; Fernández-Pascual, Ricardo; Gómez Requena, María Engracia; Acacio Sánchez, Manuel E.; Robles Martínez, Antonio; García Carrasco, José Manuel; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers (IEEE), 2012-05)
One cost-effective way to meet the increasing demand for larger high-performance shared-memory servers is to build clusters with off-the-shelf processors connected with low-latency point-to-point interconnections like ...