[EN] We address the efficient realization of matrix multiplication (gemm), with application in the convolution operator for machine learning, for the RISC-V core present in the GreenWaves GAP8 processor. Our approach ...
Del Campo Calvo, Francisco Javier(Universitat Politècnica de València, 2023-09-25)
[ES] La adopción de las redes neuronales en prácticamente todos los ámbitos científicos está propiciando su uso en una amplia variedad de dispositivos. Estos dispositivos pueden ser de muy diversa naturaleza: desde grandes ...
San Juan-Sebastian, Pablo; Rodríguez-Sánchez, Rafael; Igual, Francisco D.; Alonso-Jordá, Pedro; Quintana-Ortí, Enrique S.(Springer-Verlag, 2021-10)
[EN] We introduce a high performance, multi-threaded realization of the gemm kernel for the ARMv8.2 architecture that operates with 16-bit (half precision)/queryKindly check and confirm whether the corresponding author is ...
[EN] We provide a practical demonstration that it is possible to systematically generate a variety of high-performance micro-kernels for the general matrix multiplication (gemm) via generic templates which can be easily ...
[EN] We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) ...