Catalán, Sandra; Herrero, José R.; Quintana Ortí, Enrique Salvador; Rodríguez-Sánchez, Rafael; van de Geijn, Robert(Institute of Electrical and Electronics Engineers, 2019-01-31)
[EN] We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques ...
[EN] We address the parallelization of the LU factorization of hierarchical matrices (H-matrices) arising from boundary element methods. Our approach exploits task-parallelism via the OmpSs programming model and runtime, ...
Catalán, Sandra; Herrero, José R.; Igual Peña, Francisco Daniel; Rodríguez-Sánchez, Rafael; Quintana Ortí, Enrique Salvador; Adeniyi-Jones, Chris(Elsevier, 2018-03)
[EN] Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the ...
[EN] We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded ...
[EN] We analyze the benefits of look-ahead in the parallel execution of the LU factorization with partial pivoting (LUpp) in two distinct "asymmetric" multicore scenarios. The first one corresponds to an actual hardware-asymmetric ...
[EN] We investigate how to leverage the heterogeneous resources of an Asymmetric Multicore Processor (AMP) in order to deliver high performance in the reduction to condensed forms for the solution of dense eigenvalue and ...
Feliu-Pérez, Josué; Naithani, Ajeya; Sahuquillo Borrás, Julio; Petit Martí, Salvador Vicente; Qureshi, Moinuddin; Eeckhout, Lieven(Institute of Electrical and Electronics Engineers, 2022-06-01)
[EN] Modern-day graph workloads operate on huge graphs through pointer chasing which leads to high last-level cache (LLC) miss rates and limited memory-level parallelism (MLP). Simultaneous Multi-Threading (SMT) effectively ...