Serrano Gómez, Mónica; Sahuquillo Borrás, Julio; Hassan Mohamed, Houcine; Petit Martí, Salvador Vicente; Duato Marín, José Francisco(Springer Verlag (Germany), 2011)
Remote Memory Access (RMA) hardware allow a given motherboard
in a cluster to directly access the memory installed in a remote motherboard of the
same cluster. In recent works, this characteristic has been used to extend ...
Tornero, Rafael; Orduña Huertas, Juan Manuel; Mejia, Andres; Flich Cardo, José; Duato Marín, José Francisco(Springer Verlag (Germany), 2011-06)
Networks on Chip (NoCs) have been shown as an efficient solution to the complex on-chip communication problems derived from the increasing number of processor cores. One of the key issues in the design of NoCs is the ...
Peña Monferrer, Antonio José; Reaño González, Carlos; Silla Jiménez, Federico; Mayo Gual, Rafael; Quintana-Orti, Enrique S.; Duato Marín, José Francisco(Elsevier, 2014-12)
In this paper we detail the key features, architectural design, and implementation of rCUDA,
an advanced framework to enable remote and transparent GPGPU acceleration in HPC
clusters. rCUDA allows decoupling GPUs from ...
Serrano Gómez, Mónica; Sahuquillo Borrás, Julio; Petit Martí, Salvador Vicente; Hassan Mohamed, Houcine; Duato Marín, José Francisco(Springer Verlag (Germany), 2012-03)
Cluster computers represent a cost-effective alternative solution to supercomputers. In these systems, it is common to constrain the memory address space of a given processor to the local motherboard. Constraining the ...
Sahuquillo Borrás, Julio; Hassan Mohamed, Houcine; Petit Martí, Salvador Vicente; March Cabrelles, José Luis; Duato Marín, José Francisco(Elsevier, 2016-03)
Nowadays, real-time embedded applications have to cope with an increasing demand of functionalities,
which require increasing processing capabilities. With this aim real-time systems are being implemented
on top of ...
March Cabrelles, José Luis; Sahuquillo Borrás, Julio; Petit Martí, Salvador Vicente; Hassan Mohamed, Houcine; Duato Marín, José Francisco(Springer Verlag (Germany), 2011)
Nowadays, a key design issue in embedded systems is how to reduce the power consumption, since batteries have a limited energy budget. For this purpose, several techniques such as Dynamic Voltage Scaling (DVS) or task ...
Bermúdez Garzón, Diego Fernando; Gómez Requena, Crispín; Gómez Requena, María Engracia; López Rodríguez, Pedro Juan; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers (IEEE), 2016-04)
On the one hand, performance and fault-tolerance of interconnection networks are key design issues for high performance computing (HPC) systems. On the other hand, cost should be also considered. Indirect topologies are ...
Gómez Requena, Crispín; Gilabert Villamón, Francisco; Gómez Requena, María Engracia; López Rodríguez, Pedro Juan; Duato Marín, José Francisco(Springer Verlag (Germany), 2015-07)
Large cluster-based machines require efficient high-performance interconnection networks. Routing is a key design issue of interconnection networks. Adaptive routing usually outperforms deterministic routing at the expense ...
Roca Pérez, Antoni; Flich Cardo, José; Silla Jiménez, Federico; Duato Marín, José Francisco(Elsevier, 2011-11)
[EN] As technology advances, the number of cores in Chip MultiProcessor systems and MultiProcessor Systems-on-Chips keeps increasing. The network must provide sustained throughput and ultra-low latencies. In this paper we ...
Montaner Mas, Héctor; Silla Jiménez, Federico; Fröning, Holger; Duato Marín, José Francisco(Springer Verlag (Germany), 2012-06)
Improvements in parallel computing hardware usually involve increments in the number of available resources for a given application such as the number of computing cores and the amount of memory. In the case of shared-memory ...
March Cabrelles, José Luis; Sahuquillo Borrás, Julio; Hassan Mohamed, Houcine; Petit Martí, Salvador Vicente; Duato Marín, José Francisco(Oxford University Press (OUP): Policy A - Oxford Open Option A, 2011)
Power consumption is a major design concern in current embedded systems. To deal with consumption, many systems apply dynamic voltage scaling (DVS) techniques which dynamically change the system speed depending on the ...
Valero Bresó, Alejandro; Petit Martí, Salvador Vicente; Sahuquillo Borrás, Julio; Kaeli, David R.; Duato Marín, José Francisco(Elsevier, 2015-02)
DRAM technology requires refresh operations to be performed in order to avoid data loss due to capacitance
leakage. Refresh operations consume a significant amount of dynamic energy, which increases
with the storage ...
Flich Cardo, José; Skeie, . Tor; Mejia, Andres; Lysne, . Olav; López Rodríguez, Pedro Juan; Robles Martínez, Antonio; Duato Marín, José Francisco; Koibuchi, . Michihiro; Rokicki, . Tomas; Sancho, . Jose Carlos(Institute of Electrical and Electronics Engineers (IEEE), 2012)
Most standard cluster interconnect technologies are flexible with respect to network topology. This has spawned a substantial amount of research on topology-agnostic routing algorithms, which make no assumption about the ...
Castelló, Adrián; Quintana-Ortí, Enrique S.; Duato Marín, José Francisco(Springer-Verlag, 2021-12)
[EN] TensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce ...
Candel-Margaix, Francisco; Petit Martí, Salvador Vicente; Sahuquillo Borrás, Julio; Duato Marín, José Francisco(Elsevier, 2018-05)
[EN] Research on GPU architecture is becoming pervasive in both the academia and the industry because these architectures offer much more performance per watt than typical CPU architectures. This is the main reason why ...
Feliu-Pérez, Josué; Sahuquillo Borrás, Julio; Petit Martí, Salvador Vicente; Duato Marín, José Francisco(ACM, 2014-06)
To mitigate the impact of bandwidth contention, which in some processes can yield to performance degradations up to 40%, we devise a scheduling algorithm that tackles main memory and L1 bandwidth contention. Experimental ...
Feliu Pérez, Josué; Sahuquillo Borrás, Julio; Petit Martí, Salvador Vicente; Duato Marín, José Francisco(IEEE, 2015-05-25)
Current SMT (simultaneous multithreading) processors co-schedule jobs on the same core, thus sharing core resources like L1 caches. In SMT multicores, threads also compete among themselves for uncore resources like the LLC ...
Escudero-Sahuquillo, Jesús; Garcia Garcia, Pedro-Javier; Quiles Flor, Francisco Jose; Flich Cardo, José; Duato Marín, José Francisco(Institute of Electrical and Electronics Engineers (IEEE), 2013-10)
As parallel computing systems increase in size, the interconnection network is becoming a critical subsystem. The current trend in network design is to use as few components as possible to interconnect the end nodes, thereby ...
[EN] For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance ...