High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS

Castelló, Adrián; SERGIO BARRACHINA; DOLZ ZARAGOZÁ, MANUEL FRANCISCO; Enrique S. Quintana-Ortí; San Juan-Sebastian, Pablo; Tomás Domínguez, Andrés Enrique

doi:10.1016/j.sysarc.2022.102459

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: CastelloSERGIO ...

Tamaño: 732.2Kb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Castelló, Adrián	es_ES
dc.contributor.author	SERGIO BARRACHINA	es_ES
dc.contributor.author	DOLZ ZARAGOZÁ, MANUEL FRANCISCO	es_ES
dc.contributor.author	Enrique S. Quintana-Ortí	es_ES
dc.contributor.author	San Juan-Sebastian, Pablo	es_ES
dc.contributor.author	Tomás Domínguez, Andrés Enrique	es_ES
dc.date.accessioned	2023-10-04T18:01:39Z
dc.date.available	2023-10-04T18:01:39Z
dc.date.issued	2022-04	es_ES
dc.identifier.issn	1383-7621	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/197569
dc.description.abstract	[EN] We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors involves several high-level transformations of the original framework, such as the development and integration of Cython routines to exploit thread-level parallelism; the design and development of micro-kernels for the matrix multiplication, vectorized with ARM's NEON intrinsics, that can accommodate layer fusion; and the appropriate selection of several cache configuration parameters tailored to the memory hierarchy of the target ARM processors.Our experiments evaluate both inference throughput (measured in processed images/s) and inference latency (i.e., time-to-response) as well as energy consumption per image when varying the level of thread parallelism and the processor power modes. The experiments with the new inference engine are reported for the ResNet50 v1.5 model on the ImageNet dataset from the MLPerf suite using the ARM v8.2 cores in the NVIDIA Jetson AGX Xavier board. These results show superior performance compared with the well-spread TFLite from Google and slightly inferior results when compared with ArmNN, the native library from ARM for DNN inference.	es_ES
dc.description.sponsorship	This research was partially sponsored by projects TIN2017-82972-R of Ministerio de Ciencia, Innovacion y Universidades, Spain and Prometeo/2019/109 of the Generalitat Valenciana, Spain. Adrian Castello was supported by the Juan de la Cierva-Formacion project FJC2019-039222-I of the Ministerio de Ciencia, Innovacion y Universidades, Spain. Manuel F. Dolz was also supported by the Plan GenT project CDEIGENT/2018/014 of the Generalitat Valenciana, Spain.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Elsevier	es_ES
dc.relation.ispartof	Journal of Systems Architecture	es_ES
dc.rights	Reconocimiento - No comercial - Sin obra derivada (by-nc-nd)	es_ES
dc.subject	Convolutional neural network	es_ES
dc.subject	Inference	es_ES
dc.subject	Multicore low-power processors	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1016/j.sysarc.2022.102459	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F109//COMUNICACION Y COMPUTACION INTELIGENTES Y SOCIALES/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//CDEIGENT%2F2018%2F014//Plan GenT/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.description.bibliographicCitation	Castelló, A.; SERGIO BARRACHINA; Dolz Zaragozá, MF.; Enrique S. Quintana-Ortí; San Juan-Sebastian, P.; Tomás Domínguez, AE. (2022). High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS. Journal of Systems Architecture. 125:1-9. https://doi.org/10.1016/j.sysarc.2022.102459	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.sysarc.2022.102459	es_ES
dc.description.upvformatpinicio	1	es_ES
dc.description.upvformatpfin	9	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	125	es_ES
dc.relation.pasarela	S\466672	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES
dc.contributor.funder	Universitat Politècnica de València	es_ES
dc.contributor.funder	Ministerio de Ciencia, Innovación y Universidades	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)