[EN] We address the efficient realization of matrix multiplication (gemm), with application in the convolution operator for machine learning, for the RISC-V core present in the GreenWaves GAP8 processor. Our approach ...
Catalán, Sandra; Herrero, José R.; Quintana Ortí, Enrique Salvador; Rodríguez-Sánchez, Rafael; van de Geijn, Robert(Institute of Electrical and Electronics Engineers, 2019-01-31)
[EN] We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques ...
[EN] The advances in genomic sequencing during the past few years have motivated the development of fast and reliable software for DNA/RNA sequencing on current high performance architectures. Most of these efforts target ...
Castelló, Adrián; Quintana-Ortí, Enrique S.; Duato Marín, José Francisco(Springer-Verlag, 2021-12)
[EN] TensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce ...
Belloch Rodríguez, José Antonio; Alventosa, Fran J.; Alonso-Jordá, Pedro; Quintana Ortí, Enrique Salvador; Vidal Maciá, Antonio Manuel(Springer Verlag (Germany), 2017-01)
Tablets and smart phones are nowadays equipped with low-power processor
architectures such as the ARMv7 and the ARMv8 series. These processors integrate
powerful SIMD units to exploit the intrinsic data-parallelism of ...
BADÍA CONTELLES, JOSÉ MANUEL; Belloch Rodríguez, José Antonio; Cobos Serrano, Máximo; IGUAL PEÑA, FRANCISCO DANIEL; Quintana-Ortí, Enrique S.(Springer-Verlag, 2019-03)
[EN] The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in ...
[EN] ILUPACK is a valuable tool for the solution of sparse linear systems via iterative Krylov subspace-based methods. Its relevance for the solution of real problems has motivated several efforts to enhance its performance ...
[EN]
This paper presents a hybrid methodology for accelerating Computational Fluid Dynamics (CFD) simulations intertwining inferences from deep neural networks (DNN). The strategy leverages the local spatial data of the ...
[EN] We describe the application of a communication-reduction technique for the PageRank algorithm that dynamically adapts the precision of the data access to the numerical requirements of the algorithm as the iteration ...
[EN] The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic ...
[EN] We propose an adaptive scheme to reduce communication overhead caused by data movement by selectively storing the diagonal blocks of a block-Jacobi preconditioner in different precision formats (half, single, or ...
Claver Iborra, José Manuel(Universitat Politècnica de València, 2009-05-21)
La reducción de modelos para problemas de control de gran tamaño es actualmente uno de los temas fundamentales en teoría de sistemas y control. Entre diversas técnicas existentes, los métodos de truncamiento de estados son ...
[EN] In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several ...
Castelló, Adrián; Mayo Gual, Rafael; Seo, Sangmin; Balaji, Pavan; Quintana Ortí, Enrique Salvador; Peña, Antonio J.(Institute of Electrical and Electronics Engineers, 2020-09-01)
[EN] With the appearance of multi-/many core machines, applications and runtime systems have evolved in order to exploit the new on-node concurrency brought by new software paradigms. POSIX threads (Pthreads) was widely-adopted ...
[EN] For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance ...
Diouri, Mohammed El Mehdi; Dolz Zaragozá, Manuel Francisco; Glück, Olivier; Lefèvre, Laurent; Alonso-Jordá, Pedro; Catalán, Sandra; Mayo, Rafael; Quintana Ortí, Enrique Salvador(Elsevier, 2014-06)
Large-scale distributed systems (e.g., datacenters, HPC systems, clouds, large-scale networks, etc.) con- sume and will consume enormous amounts of energy. Therefore, accurately monitoring the power dissipation and energy ...
Meliá Sevilla, Javier(Universitat Politècnica de València, 2024-02-14)
[ES] El software de visión/expulsión MVS de la empresa MultiScan Technologies controla el funcionamiento de las máquinas que produce esta compañía. Este software ha sido desarrollado y mejorado con el paso de los años por ...
Reaño González, Carlos; Silla Jiménez, Federico; Peña Monferrer, Antonio José; Shainer, Gilad; Schultz, Scot; Castello Gimeno, Adrián; Quintana Ortí, Enrique Salvador; Duato Marín, José Francisco(IEEE, 2014-09-22)
[EN] A clear trend has emerged involving the acceleration of scientific applications by using GPUs. However, the capabilities of these devices are still generally underutilized. Remote GPU virtualization techniques can ...
[EN] We contribute to the optimization of the sparse matrix-vector product by introducing a variant of the coordinate sparse matrix format that balances the workload distribution and compresses both the indexing arrays and ...