- -

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Castelló, Adrián es_ES
dc.contributor.author Catalán, Mar es_ES
dc.contributor.author Dolz, Manuel F. es_ES
dc.contributor.author Quintana-Ortí, Enrique S. es_ES
dc.contributor.author Duato, José es_ES
dc.date.accessioned 2024-10-11T18:03:40Z
dc.date.available 2024-10-11T18:03:40Z
dc.date.issued 2023-05 es_ES
dc.identifier.issn 0010-485X es_ES
dc.identifier.uri http://hdl.handle.net/10251/209937
dc.description.abstract [EN] For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance of the system increases yet eventually becomes limited by the interconnection network. This is the case for distributed data-parallel training of convolutional neural networks (CNNs), which usually proceeds on a cluster with a small to moderate number of nodes. In this paper, we analyze the performance of the Allreduce collective communication primitive, a key to the efficient data-parallel distributed training of CNNs. Our study targets the distinct realizations of this primitive in three high performance instances of Message Passing Interface (MPI), namely MPICH, OpenMPI, and IntelMPI, and employs a cluster equipped with state-of-the-art processor and network technologies. In addition, we apply the insights gained from the experimental analysis to the optimization of the TensorFlow framework when running on top of Horovod. Our study reveals that a careful selection of the most convenient MPI library and Allreduce (ARD) realization accelerates the training throughput by a factor of 1.2x compared with the default algorithm in the same MPI library, and up to 2.8x when comparing distinct MPI libraries in a number of relevant combinations of CNN model+dataset. es_ES
dc.description.sponsorship Project TIN2017-82972-R of the Spanish Ministerio de Ciencia, Innovacion y Universidades. Agencia Valenciana de la Innovacion. This research was partially sponsored by projects TIN2017-82972-R of Ministerio de Ciencia, Innovación y Universidades and PROMETEO/2019/109 of the Generalitat Valenciana. Adrián Castelló was supported by the Juan de la Cierva-Formación project FJC2019-039222-I of the Ministerio de Ciencia, Innovación y Universidades. Manuel F. Dolz was also supported by the Plan GenT project CDEIGENT/2018/014 of the Generalitat Valenciana. es_ES
dc.language Inglés es_ES
dc.publisher Springer-Verlag es_ES
dc.relation.ispartof Computing es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Message passing interface (MPI) es_ES
dc.subject Collective communication primitives es_ES
dc.subject Allreduce es_ES
dc.subject Deep learning es_ES
dc.subject Distributed training es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s00607-021-01029-2 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F109//COMUNICACION Y COMPUTACION INTELIGENTES Y SOCIALES/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//CDEIGENT%2F2018%2F014//Plan GenT/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MCIU//TIN2017-82972-R/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MCIU//FJC2019-039222-I/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Castelló, A.; Catalán, M.; Dolz, MF.; Quintana-Ortí, ES.; Duato, J. (2023). Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks. Computing. 105(5):1101-1119. https://doi.org/10.1007/s00607-021-01029-2 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1007/s00607-021-01029-2 es_ES
dc.description.upvformatpinicio 1101 es_ES
dc.description.upvformatpfin 1119 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 105 es_ES
dc.description.issue 5 es_ES
dc.relation.pasarela S\495738 es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Ministerio de Ciencia, Innovación y Universidades es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem