Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

Castelló, Adrián; Catalán, Mar; Dolz, Manuel F.; Quintana-Ortí, Enrique S.; Duato, José

doi:10.1007/s00607-021-01029-2

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: CastelloCatalanDolz ...

Tamaño: 693.4Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: s00607-021-01029-2.pdf

Tamaño: 1.052Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Castelló, Adrián	es_ES
dc.contributor.author	Catalán, Mar	es_ES
dc.contributor.author	Dolz, Manuel F.	es_ES
dc.contributor.author	Quintana-Ortí, Enrique S.	es_ES
dc.contributor.author	Duato, José	es_ES
dc.date.accessioned	2024-10-11T18:03:40Z
dc.date.available	2024-10-11T18:03:40Z
dc.date.issued	2023-05	es_ES
dc.identifier.issn	0010-485X	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/209937
dc.description.abstract	[EN] For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance of the system increases yet eventually becomes limited by the interconnection network. This is the case for distributed data-parallel training of convolutional neural networks (CNNs), which usually proceeds on a cluster with a small to moderate number of nodes. In this paper, we analyze the performance of the Allreduce collective communication primitive, a key to the efficient data-parallel distributed training of CNNs. Our study targets the distinct realizations of this primitive in three high performance instances of Message Passing Interface (MPI), namely MPICH, OpenMPI, and IntelMPI, and employs a cluster equipped with state-of-the-art processor and network technologies. In addition, we apply the insights gained from the experimental analysis to the optimization of the TensorFlow framework when running on top of Horovod. Our study reveals that a careful selection of the most convenient MPI library and Allreduce (ARD) realization accelerates the training throughput by a factor of 1.2x compared with the default algorithm in the same MPI library, and up to 2.8x when comparing distinct MPI libraries in a number of relevant combinations of CNN model+dataset.	es_ES
dc.description.sponsorship	Project TIN2017-82972-R of the Spanish Ministerio de Ciencia, Innovacion y Universidades. Agencia Valenciana de la Innovacion. This research was partially sponsored by projects TIN2017-82972-R of Ministerio de Ciencia, Innovación y Universidades and PROMETEO/2019/109 of the Generalitat Valenciana. Adrián Castelló was supported by the Juan de la Cierva-Formación project FJC2019-039222-I of the Ministerio de Ciencia, Innovación y Universidades. Manuel F. Dolz was also supported by the Plan GenT project CDEIGENT/2018/014 of the Generalitat Valenciana.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer-Verlag	es_ES
dc.relation.ispartof	Computing	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Message passing interface (MPI)	es_ES
dc.subject	Collective communication primitives	es_ES
dc.subject	Allreduce	es_ES
dc.subject	Deep learning	es_ES
dc.subject	Distributed training	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1007/s00607-021-01029-2	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F109//COMUNICACION Y COMPUTACION INTELIGENTES Y SOCIALES/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//CDEIGENT%2F2018%2F014//Plan GenT/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MCIU//TIN2017-82972-R/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MCIU//FJC2019-039222-I/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.description.bibliographicCitation	Castelló, A.; Catalán, M.; Dolz, MF.; Quintana-Ortí, ES.; Duato, J. (2023). Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks. Computing. 105(5):1101-1119. https://doi.org/10.1007/s00607-021-01029-2	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1007/s00607-021-01029-2	es_ES
dc.description.upvformatpinicio	1101	es_ES
dc.description.upvformatpfin	1119	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	105	es_ES
dc.description.issue	5	es_ES
dc.relation.pasarela	S\495738	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Ministerio de Ciencia, Innovación y Universidades	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)