- -

Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Hernández, Mario es_ES
dc.contributor.author Cebrián, Juan M. es_ES
dc.contributor.author Cecilia-Canales, José María es_ES
dc.contributor.author García, José M. es_ES
dc.date.accessioned 2021-07-17T03:34:50Z
dc.date.available 2021-07-17T03:34:50Z
dc.date.issued 2020-03 es_ES
dc.identifier.issn 1094-3420 es_ES
dc.identifier.uri http://hdl.handle.net/10251/169425
dc.description.abstract [EN] The ever-increasing computational requirements of HPC and service provider applications are becoming a great challenge for hardware and software designers. These requirements are reaching levels where the isolated development on either computational field is not enough to deal with such challenge. A holistic view of the computational thinking is therefore the only way to success in real scenarios. However, this is not a trivial task as it requires, among others, of hardware¿software codesign. In the hardware side, most high-throughput computers are designed aiming for heterogeneity, where accelerators (e.g. Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), etc.) are connected through high-bandwidth bus, such as PCI-Express, to the host CPUs. Applications, either via programmers, compilers, or runtime, should orchestrate data movement, synchronization, and so on among devices with different compute and memory capabilities. This increases the programming complexity and it may reduce the overall application performance. This article evaluates different offloading strategies to leverage heterogeneous systems, based on several cards with the firstgeneration Xeon Phi coprocessors (Knights Corner). We use a 11-point 3-D Stencil kernel that models heat dissipation as a case study. Our results reveal substantial performance improvements when using several accelerator cards. Additionally, we show that computing of an approximate result by reducing the communication overhead can yield 23% performance gains for double-precision data sets. es_ES
dc.description.sponsorship The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is jointly supported by the Fundacion Seneca (Agencia Regional de Ciencia y Tecnologia, Region de Murcia) under grants 15290/PI/2010 and 18946/JLI/13 and by the Spanish MINECO, as well as European Commission FEDER funds, under grants TIN2015-66972-C5-3-R and TIN2016-78799-P (AEI/ FEDER, UE). MH was supported by a research grant from the PRODEP under the Professional Development Program for Teachers (UAGro-197) México es_ES
dc.language Inglés es_ES
dc.publisher SAGE Publications es_ES
dc.relation.ispartof International Journal of High Performance Computing Applications es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Offloading computation es_ES
dc.subject Stencil codes es_ES
dc.subject Approximate computing es_ES
dc.subject Heterogeneous computing es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1177/1094342017738352 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/f SéNeCa//15290%2FPI%2F10/ES/Diseño, evaluación y explotación de aplicaciones biomédicas para arquitecturas paralelas de altas prestaciones y bajo coste/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/UAGro//UAGro-197/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/f SéNeCa//18946%2FJLI%2F13/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2016-78799-P/ES/DESARROLLO HOLISTICO DE APLICACIONES EMERGENTES EN SISTEMAS HETEROGENEOS/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2015-66972-C5-3-R/ES/TECNICAS PARA LA MEJORA DE LAS PRESTACIONES, FIABILIDAD Y CONSUMO DE ENERGIA DE LOS SERVIDORES. OPTIMIZACION DE APLICACIONES CIENTIFICAS, MEDICAS Y DE VISION ARTIFICIAL/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI//RYC-2018-025580-I/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Hernández, M.; Cebrián, JM.; Cecilia-Canales, JM.; García, JM. (2020). Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance. International Journal of High Performance Computing Applications. 34(2):199-297. https://doi.org/10.1177/1094342017738352 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1177/1094342017738352 es_ES
dc.description.upvformatpinicio 199 es_ES
dc.description.upvformatpfin 297 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 34 es_ES
dc.description.issue 2 es_ES
dc.relation.pasarela S\428415 es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.contributor.funder Universidad Autónoma de Guerrero, México es_ES
dc.contributor.funder Fundación Séneca-Agencia de Ciencia y Tecnología de la Región de Murcia es_ES
dc.description.references Michael Brown, W., Carrillo, J.-M. Y., Gavhane, N., Thakkar, F. M., & Plimpton, S. J. (2015). Optimizing legacy molecular dynamics software with directive-based offload. Computer Physics Communications, 195, 95-101. doi:10.1016/j.cpc.2015.05.004 es_ES
dc.description.references Esmaeilzadeh, H., Blem, E., St. Amant, R., Sankaralingam, K., & Burger, D. (2012). Power Limitations and Dark Silicon Challenge the Future of Multicore. ACM Transactions on Computer Systems, 30(3), 1-27. doi:10.1145/2324876.2324879 es_ES
dc.description.references Feng, L. (2015). Data Transfer Using the Intel COI Library. High Performance Parallelism Pearls, 341-348. doi:10.1016/b978-0-12-802118-7.00020-0 es_ES
dc.description.references Jeffers, J., & Reinders, J. (2013). Offload. Intel Xeon Phi Coprocessor High Performance Programming, 189-241. doi:10.1016/b978-0-12-410414-3.00007-4 es_ES
dc.description.references Rahman, R. (2013). Intel® Xeon Phi™ Coprocessor Architecture and Tools. doi:10.1007/978-1-4302-5927-5 es_ES
dc.description.references Reinders J, Jeffers J (2014) High Performance Parallelism Pearls, Multicore and Many-core Programming Approaches (Characterization and Auto-tuning of 3DFD). Morgan Kaufmann, pp. 377–396. es_ES
dc.description.references Shareef, B., de Doncker, E., & Kapenga, J. (2015). Monte Carlo simulations on Intel Xeon Phi: Offload and native mode. 2015 IEEE High Performance Extreme Computing Conference (HPEC). doi:10.1109/hpec.2015.7322456 es_ES
dc.description.references Ujaldón, M. (2016). CUDA Achievements and GPU Challenges Ahead. Lecture Notes in Computer Science, 207-217. doi:10.1007/978-3-319-41778-3_20 es_ES
dc.description.references Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., & Wang, Y. (2014). High-Performance Computing on the Intel® Xeon Phi™. doi:10.1007/978-3-319-06486-4 es_ES
dc.description.references Wende, F., Klemm, M., Steinke, T., & Reinefeld, A. (2015). Concurrent Kernel Offloading. High Performance Parallelism Pearls, 201-223. doi:10.1016/b978-0-12-802118-7.00012-1 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem