Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors

Barrachina, Sergio; Dolz, Manuel F.; San Juan, Pablo; Quintana-Ortí, Enrique S.

doi:10.1016/j.jpdc.2022.05.009

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: BarrachinaDolzSan ...

Tamaño: 1.412Mb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Barrachina, Sergio	es_ES
dc.contributor.author	Dolz, Manuel F.	es_ES
dc.contributor.author	San Juan, Pablo	es_ES
dc.contributor.author	Quintana-Ortí, Enrique S.	es_ES
dc.date.accessioned	2023-07-10T18:02:58Z
dc.date.available	2023-07-10T18:02:58Z
dc.date.issued	2022-09	es_ES
dc.identifier.issn	0743-7315	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/194790
dc.description.abstract	[EN] Convolutional Neural Networks (CNNs) play a crucial role in many image recognition and classification tasks, recommender systems, brain-computer interfaces, etc. As a consequence, there is a notable interest in developing high performance realizations of the convolution operators, which concentrate a significant portion of the computational cost of this type of neural networks. In a previous work, we introduced a portable, high performance convolution algorithm, based on the BLIS realization of matrix multiplication, which eliminates most of the runtime and memory overheads that impair the performance of the convolution operators appearing in the forward training pass, when performed via explicit im2col transform. In this paper, we extend our ideas to the full training process of CNNs on multicore processors, proposing new high performance strategies to tackle the convolution operators that are present in the more complex backward pass of the training process, while maintaining the portability of the realizations. In addition, we conduct a full integration of these algorithms into a framework for distributed training of CNNs on clusters of computers, providing a complete experimental evaluation of the actual benefits in terms of both performance and memory consumption. Compared with baseline implementation, the use of the new convolution operators using pre-allocated memory can accelerate the training by a factor of about 6%-25%, provided there is sufficient memory available. In comparison, the operator variants that do not rely on persistent memory can save up to 70% of memory.	es_ES
dc.description.sponsorship	This research was funded by Project PID2020-113656RB-C21/C22 supported by MCIN/AEI/10.13039/501100011033 and Prometeo/2019/109 of the Generalitat Valenciana . Manuel F. Dolz was also supported by the Plan Gen-T grant CDEIGENT/2018/014 of the Generalitat Valenciana .	es_ES
dc.language	Inglés	es_ES
dc.publisher	Elsevier	es_ES
dc.relation.ispartof	Journal of Parallel and Distributed Computing	es_ES
dc.rights	Reconocimiento - No comercial - Sin obra derivada (by-nc-nd)	es_ES
dc.subject	Convolutional neural networks	es_ES
dc.subject	Distributed training	es_ES
dc.subject	High performance	es_ES
dc.subject	Python	es_ES
dc.subject	Clusters of multicore processors	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1016/j.jpdc.2022.05.009	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113656RB-C21/ES/COMPUTACION Y COMUNICACIONES DE ALTAS PRESTACIONES CONSCIENTE DEL CONSUMO ENERGETICO. APLICACIONES AL APRENDIZAJE PROFUNDO COMPUTACIONAL - UJI/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//CDEIGENT%2F2018%2F014//Plan GenT/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113656RB-C22/ES/COMPUTACION Y COMUNICACIONES DE ALTAS PRESTACIONES CONSCIENTES DEL CONSUMO ENERGETICO. APLICACIONES AL APRENDIZAJE PROFUNDO COMPUTACIONAL - UPV/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F109//COMUNICACION Y COMPUTACION INTELIGENTES Y SOCIALES/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.description.bibliographicCitation	Barrachina, S.; Dolz, MF.; San Juan, P.; Quintana-Ortí, ES. (2022). Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors. Journal of Parallel and Distributed Computing. 167:240-254. https://doi.org/10.1016/j.jpdc.2022.05.009	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.jpdc.2022.05.009	es_ES
dc.description.upvformatpinicio	240	es_ES
dc.description.upvformatpfin	254	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	167	es_ES
dc.relation.pasarela	S\466577	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)