Mostrar el registro sencillo del ítem
dc.contributor.author | Barrachina, Sergio![]() |
es_ES |
dc.contributor.author | Dolz, Manuel F.![]() |
es_ES |
dc.contributor.author | San Juan, Pablo![]() |
es_ES |
dc.contributor.author | Quintana-Ortí, Enrique S.![]() |
es_ES |
dc.date.accessioned | 2023-07-10T18:02:58Z | |
dc.date.available | 2023-07-10T18:02:58Z | |
dc.date.issued | 2022-09 | es_ES |
dc.identifier.issn | 0743-7315 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/194790 | |
dc.description.abstract | [EN] Convolutional Neural Networks (CNNs) play a crucial role in many image recognition and classification tasks, recommender systems, brain-computer interfaces, etc. As a consequence, there is a notable interest in developing high performance realizations of the convolution operators, which concentrate a significant portion of the computational cost of this type of neural networks. In a previous work, we introduced a portable, high performance convolution algorithm, based on the BLIS realization of matrix multiplication, which eliminates most of the runtime and memory overheads that impair the performance of the convolution operators appearing in the forward training pass, when performed via explicit im2col transform. In this paper, we extend our ideas to the full training process of CNNs on multicore processors, proposing new high performance strategies to tackle the convolution operators that are present in the more complex backward pass of the training process, while maintaining the portability of the realizations. In addition, we conduct a full integration of these algorithms into a framework for distributed training of CNNs on clusters of computers, providing a complete experimental evaluation of the actual benefits in terms of both performance and memory consumption. Compared with baseline implementation, the use of the new convolution operators using pre-allocated memory can accelerate the training by a factor of about 6%-25%, provided there is sufficient memory available. In comparison, the operator variants that do not rely on persistent memory can save up to 70% of memory. | es_ES |
dc.description.sponsorship | This research was funded by Project PID2020-113656RB-C21/C22 supported by MCIN/AEI/10.13039/501100011033 and Prometeo/2019/109 of the Generalitat Valenciana . Manuel F. Dolz was also supported by the Plan Gen-T grant CDEIGENT/2018/014 of the Generalitat Valenciana . | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Elsevier | es_ES |
dc.relation.ispartof | Journal of Parallel and Distributed Computing | es_ES |
dc.rights | Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) | es_ES |
dc.subject | Convolutional neural networks | es_ES |
dc.subject | Distributed training | es_ES |
dc.subject | High performance | es_ES |
dc.subject | Python | es_ES |
dc.subject | Clusters of multicore processors | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1016/j.jpdc.2022.05.009 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113656RB-C21/ES/COMPUTACION Y COMUNICACIONES DE ALTAS PRESTACIONES CONSCIENTE DEL CONSUMO ENERGETICO. APLICACIONES AL APRENDIZAJE PROFUNDO COMPUTACIONAL - UJI/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/GVA//CDEIGENT%2F2018%2F014//Plan GenT/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113656RB-C22/ES/COMPUTACION Y COMUNICACIONES DE ALTAS PRESTACIONES CONSCIENTES DEL CONSUMO ENERGETICO. APLICACIONES AL APRENDIZAJE PROFUNDO COMPUTACIONAL - UPV/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F109//COMUNICACION Y COMPUTACION INTELIGENTES Y SOCIALES/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica | es_ES |
dc.description.bibliographicCitation | Barrachina, S.; Dolz, MF.; San Juan, P.; Quintana-Ortí, ES. (2022). Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors. Journal of Parallel and Distributed Computing. 167:240-254. https://doi.org/10.1016/j.jpdc.2022.05.009 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1016/j.jpdc.2022.05.009 | es_ES |
dc.description.upvformatpinicio | 240 | es_ES |
dc.description.upvformatpfin | 254 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 167 | es_ES |
dc.relation.pasarela | S\466577 | es_ES |
dc.contributor.funder | Generalitat Valenciana | es_ES |
dc.contributor.funder | Agencia Estatal de Investigación | es_ES |