San Juan-Sebastian, P.; Rodríguez-Sánchez, R.; Igual, FD.; Alonso-Jordá, P.; Quintana-Ortí, ES. (2021). Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors. The Journal of Supercomputing. 77(10):11257-11269. https://doi.org/10.1007/s11227-021-03636-4
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/189610
Title:
|
Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors
|
Author:
|
San Juan-Sebastian, Pablo
Rodríguez-Sánchez, Rafael
Igual, Francisco D.
Alonso-Jordá, Pedro
Quintana-Ortí, Enrique S.
|
UPV Unit:
|
Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica
|
Issued date:
|
|
Abstract:
|
[EN] We introduce a high performance, multi-threaded realization of the gemm kernel for the ARMv8.2 architecture that operates with 16-bit (half precision)/queryKindly check and confirm whether the corresponding author is ...[+]
[EN] We introduce a high performance, multi-threaded realization of the gemm kernel for the ARMv8.2 architecture that operates with 16-bit (half precision)/queryKindly check and confirm whether the corresponding author is correctly identified. floating point operands. Our code is especially designed for efficient machine learning inference (and to a certain extent, also training) with deep neural networks. The results on the NVIDIA Carmel multicore processor, which implements the ARMv8.2 architecture, show considerable performance gains for the gemm kernel, close to the theoretical peak acceleration that could be expected when moving from 32-bit arithmetic/data to 16-bit. Combined with the type of convolution operator arising in convolutional neural networks, the speed-ups are more modest though still relevant.
[-]
|
Subjects:
|
Deep learning
,
Matrix multiplication
,
High performance
,
NVIDIA Carmel system-on-chip (SoC)
|
Copyrigths:
|
Reserva de todos los derechos
|
Source:
|
The Journal of Supercomputing. (issn:
0920-8542
)
|
DOI:
|
10.1007/s11227-021-03636-4
|
Publisher:
|
Springer-Verlag
|
Publisher version:
|
https://doi.org/10.1007/s11227-021-03636-4
|
Project ID:
|
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-093684-B-I00/ES/HETEROGENEIDAD Y ESPECIALIZACION EN LA ERA POST-MOORE/
info:eu-repo/grantAgreement/CAM//S2018%2FTCS-4423 /
info:eu-repo/grantAgreement/CAM//PR65%2F19-22445/
info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F109//COMUNICACION Y COMPUTACION INTELIGENTES Y SOCIALES/
|
Thanks:
|
This work was supported by projects TIN2017-82972-R and RTI2018-093684-B-I00 from the Ministerio de Ciencia, Innovacion y Universidades, project S2018/TCS-4423 of the Comunidad de Madrid, project PR65/19-22445 of the UCM, ...[+]
This work was supported by projects TIN2017-82972-R and RTI2018-093684-B-I00 from the Ministerio de Ciencia, Innovacion y Universidades, project S2018/TCS-4423 of the Comunidad de Madrid, project PR65/19-22445 of the UCM, and project Prometeo/2019/109 of the Generalitat Valenciana.
[-]
|
Type:
|
Artículo
|