- -

Time and energy modeling of high-performance Level-3 BLAS on x86 architectures

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Time and energy modeling of high-performance Level-3 BLAS on x86 architectures

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Alonso-Jordá, Pedro es_ES
dc.contributor.author Catalán, Sandra es_ES
dc.contributor.author Igual, Francisco D es_ES
dc.contributor.author Mayo, Rafael es_ES
dc.contributor.author Rodríguez-Sánchez, Rafael es_ES
dc.contributor.author Quintana Ortí, Enrique Salvador es_ES
dc.date.accessioned 2016-06-21T15:56:28Z
dc.date.available 2016-06-21T15:56:28Z
dc.date.issued 2015-06
dc.identifier.issn 1569-190X
dc.identifier.uri http://hdl.handle.net/10251/66259
dc.description This is the author’s version of a work that was accepted for publication in Simulation Modelling Practice and Theory. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Simulation Modelling Practice and Theory, [Volume 55, June 2015, Pages 77–94] DOI 10.1016/j.simpat.2015.04.003 es_ES
dc.description.abstract [EN] We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) on x86 architectures. Our methodology decouples the costs due to the floating-point arithmetic/data movement occurring in the higher levels of the cache hierarchy from those of packing/data transfers between the main memory and the L2/L3 cache. A careful analytical study of the data transfers, in combination with an architecture-specific calibration of the costs per operation, render then the components to assemble piece-wise models for the accurate estimation of gemm and trsm¿s performance on x86 processors. Our experimental results on an Intel Xeon E5-2620 processor confirm the accuracy of this approach, which reports relative errors for different shapes of gemm and trsm that are, respectively, around 1.5% and 4.5% on average for both time and energy. es_ES
dc.description.sponsorship This work was supported by the CICYT Projects TIN2011-23283 and CICYT-TIN 2012-32180 of the MINECO and FEDER, and the EU FET Project FP7 318793 "EXA2GREEN". en_EN
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation.ispartof Simulation Modelling Practice and Theory es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Modeling es_ES
dc.subject High performance es_ES
dc.subject Energy consumption es_ES
dc.subject Matrix multiplication es_ES
dc.subject Linear algebra es_ES
dc.subject.classification CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Time and energy modeling of high-performance Level-3 BLAS on x86 architectures es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.simpat.2015.04.003
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/318793/EU/Energy-Aware Sustainable Computing on Future Technology – Paving the Road to Exascale Computing/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MICINN//TIN2011-23283/ES/POWER-AWARE HIGH PERFORMANCE COMPUTING/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2012-32180/ES/ARQUITECTURAS Y TECNOLOGIAS EMERGENTES. EFICIENCIA ENERGETICA MEDIANTE HETEROGENEIDAD/ es_ES
dc.rights.accessRights Cerrado es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Alonso-Jordá, P.; Catalán, S.; Igual, FD.; Mayo, R.; Rodríguez-Sánchez, R.; Quintana Ortí, ES. (2015). Time and energy modeling of high-performance Level-3 BLAS on x86 architectures. Simulation Modelling Practice and Theory. 55:77-94. https://doi.org/10.1016/j.simpat.2015.04.003 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://dx.doi.org/10.1016/j.simpat.2015.04.003 es_ES
dc.description.upvformatpinicio 77 es_ES
dc.description.upvformatpfin 94 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 55 es_ES
dc.relation.senia 290481 es_ES
dc.contributor.funder Ministerio de Economía y Competitividad
dc.contributor.funder European Commission
dc.contributor.funder Ministerio de Ciencia e Innovación es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem