Alonso-Jordá, P.; Catalán, S.; Igual, FD.; Mayo, R.; Rodríguez-Sánchez, R.; Quintana Ortí, ES. (2015). Time and energy modeling of high-performance Level-3 BLAS on x86 architectures. Simulation Modelling Practice and Theory. 55:77-94. https://doi.org/10.1016/j.simpat.2015.04.003
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/66259
Title:
|
Time and energy modeling of high-performance Level-3 BLAS on x86 architectures
|
Author:
|
Alonso-Jordá, Pedro
Catalán, Sandra
Igual, Francisco D
Mayo, Rafael
Rodríguez-Sánchez, Rafael
Quintana Ortí, Enrique Salvador
|
UPV Unit:
|
Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
|
Issued date:
|
|
Abstract:
|
[EN] We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) ...[+]
[EN] We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) on x86 architectures. Our methodology decouples the costs due to the floating-point arithmetic/data movement occurring in the higher levels of the cache hierarchy from those of packing/data transfers between the main memory and the L2/L3 cache. A careful analytical study of the data transfers, in combination with an architecture-specific calibration of the costs per operation, render then the components to assemble piece-wise models for the accurate estimation of gemm and trsm¿s performance on x86 processors.
Our experimental results on an Intel Xeon E5-2620 processor confirm the accuracy of this approach, which reports relative errors for different shapes of gemm and trsm that are, respectively, around 1.5% and 4.5% on average for both time and energy.
[-]
|
Subjects:
|
Modeling
,
High performance
,
Energy consumption
,
Matrix multiplication
,
Linear algebra
|
Copyrigths:
|
Cerrado |
Source:
|
Simulation Modelling Practice and Theory. (issn:
1569-190X
)
|
DOI:
|
10.1016/j.simpat.2015.04.003
|
Publisher:
|
Elsevier
|
Publisher version:
|
http://dx.doi.org/10.1016/j.simpat.2015.04.003
|
Project ID:
|
info:eu-repo/grantAgreement/EC/FP7/318793/EU/Energy-Aware Sustainable Computing on Future Technology – Paving the Road to Exascale Computing/
info:eu-repo/grantAgreement/MICINN//TIN2011-23283/ES/POWER-AWARE HIGH PERFORMANCE COMPUTING/
info:eu-repo/grantAgreement/MINECO//TIN2012-32180/ES/ARQUITECTURAS Y TECNOLOGIAS EMERGENTES. EFICIENCIA ENERGETICA MEDIANTE HETEROGENEIDAD/
|
Description:
|
This is the author’s version of a work that was accepted for publication in Simulation Modelling Practice and Theory. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Simulation Modelling Practice and Theory, [Volume 55, June 2015, Pages 77–94] DOI 10.1016/j.simpat.2015.04.003
|
Thanks:
|
This work was supported by the CICYT Projects TIN2011-23283 and CICYT-TIN 2012-32180 of the MINECO and FEDER, and the EU FET Project FP7 318793 "EXA2GREEN".
|
Type:
|
Artículo
|