- -

High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Castelló, Adrián es_ES
dc.contributor.author SERGIO BARRACHINA es_ES
dc.contributor.author DOLZ ZARAGOZÁ, MANUEL FRANCISCO es_ES
dc.contributor.author Enrique S. Quintana-Ortí es_ES
dc.contributor.author San Juan-Sebastian, Pablo es_ES
dc.contributor.author Tomás Domínguez, Andrés Enrique es_ES
dc.date.accessioned 2023-10-04T18:01:39Z
dc.date.available 2023-10-04T18:01:39Z
dc.date.issued 2022-04 es_ES
dc.identifier.issn 1383-7621 es_ES
dc.identifier.uri http://hdl.handle.net/10251/197569
dc.description.abstract [EN] We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors involves several high-level transformations of the original framework, such as the development and integration of Cython routines to exploit thread-level parallelism; the design and development of micro-kernels for the matrix multiplication, vectorized with ARM's NEON intrinsics, that can accommodate layer fusion; and the appropriate selection of several cache configuration parameters tailored to the memory hierarchy of the target ARM processors.Our experiments evaluate both inference throughput (measured in processed images/s) and inference latency (i.e., time-to-response) as well as energy consumption per image when varying the level of thread parallelism and the processor power modes. The experiments with the new inference engine are reported for the ResNet50 v1.5 model on the ImageNet dataset from the MLPerf suite using the ARM v8.2 cores in the NVIDIA Jetson AGX Xavier board. These results show superior performance compared with the well-spread TFLite from Google and slightly inferior results when compared with ArmNN, the native library from ARM for DNN inference. es_ES
dc.description.sponsorship This research was partially sponsored by projects TIN2017-82972-R of Ministerio de Ciencia, Innovacion y Universidades, Spain and Prometeo/2019/109 of the Generalitat Valenciana, Spain. Adrian Castello was supported by the Juan de la Cierva-Formacion project FJC2019-039222-I of the Ministerio de Ciencia, Innovacion y Universidades, Spain. Manuel F. Dolz was also supported by the Plan GenT project CDEIGENT/2018/014 of the Generalitat Valenciana, Spain. es_ES
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation.ispartof Journal of Systems Architecture es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Convolutional neural network es_ES
dc.subject Inference es_ES
dc.subject Multicore low-power processors es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.sysarc.2022.102459 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-82972-R/ES/TECNICAS ALGORITMICAS PARA COMPUTACION DE ALTO RENDIMIENTO CONSCIENTE DEL CONSUMO ENERGETICO Y RESISTENTE A ERRORES/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F109//COMUNICACION Y COMPUTACION INTELIGENTES Y SOCIALES/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//CDEIGENT%2F2018%2F014//Plan GenT/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Castelló, A.; SERGIO BARRACHINA; Dolz Zaragozá, MF.; Enrique S. Quintana-Ortí; San Juan-Sebastian, P.; Tomás Domínguez, AE. (2022). High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS. Journal of Systems Architecture. 125:1-9. https://doi.org/10.1016/j.sysarc.2022.102459 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1016/j.sysarc.2022.102459 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 9 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 125 es_ES
dc.relation.pasarela S\466672 es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
dc.contributor.funder Ministerio de Ciencia, Innovación y Universidades es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem