Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance

Hernández, Mario; Cebrián, Juan M.; Cecilia-Canales, José María; García, José M.

doi:10.1177/1094342017738352

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: HernandezCebrianC ...

Tamaño: 423.4Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: paperpublished.pdf

Tamaño: 405.5Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Hernández, Mario	es_ES
dc.contributor.author	Cebrián, Juan M.	es_ES
dc.contributor.author	Cecilia-Canales, José María	es_ES
dc.contributor.author	García, José M.	es_ES
dc.date.accessioned	2021-07-17T03:34:50Z
dc.date.available	2021-07-17T03:34:50Z
dc.date.issued	2020-03	es_ES
dc.identifier.issn	1094-3420	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/169425
dc.description.abstract	[EN] The ever-increasing computational requirements of HPC and service provider applications are becoming a great challenge for hardware and software designers. These requirements are reaching levels where the isolated development on either computational field is not enough to deal with such challenge. A holistic view of the computational thinking is therefore the only way to success in real scenarios. However, this is not a trivial task as it requires, among others, of hardware¿software codesign. In the hardware side, most high-throughput computers are designed aiming for heterogeneity, where accelerators (e.g. Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), etc.) are connected through high-bandwidth bus, such as PCI-Express, to the host CPUs. Applications, either via programmers, compilers, or runtime, should orchestrate data movement, synchronization, and so on among devices with different compute and memory capabilities. This increases the programming complexity and it may reduce the overall application performance. This article evaluates different offloading strategies to leverage heterogeneous systems, based on several cards with the firstgeneration Xeon Phi coprocessors (Knights Corner). We use a 11-point 3-D Stencil kernel that models heat dissipation as a case study. Our results reveal substantial performance improvements when using several accelerator cards. Additionally, we show that computing of an approximate result by reducing the communication overhead can yield 23% performance gains for double-precision data sets.	es_ES
dc.description.sponsorship	The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is jointly supported by the Fundacion Seneca (Agencia Regional de Ciencia y Tecnologia, Region de Murcia) under grants 15290/PI/2010 and 18946/JLI/13 and by the Spanish MINECO, as well as European Commission FEDER funds, under grants TIN2015-66972-C5-3-R and TIN2016-78799-P (AEI/ FEDER, UE). MH was supported by a research grant from the PRODEP under the Professional Development Program for Teachers (UAGro-197) México	es_ES
dc.language	Inglés	es_ES
dc.publisher	SAGE Publications	es_ES
dc.relation.ispartof	International Journal of High Performance Computing Applications	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Offloading computation	es_ES
dc.subject	Stencil codes	es_ES
dc.subject	Approximate computing	es_ES
dc.subject	Heterogeneous computing	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1177/1094342017738352	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/f SéNeCa//15290%2FPI%2F10/ES/Diseño, evaluación y explotación de aplicaciones biomédicas para arquitecturas paralelas de altas prestaciones y bajo coste/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/UAGro//UAGro-197/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/f SéNeCa//18946%2FJLI%2F13/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2016-78799-P/ES/DESARROLLO HOLISTICO DE APLICACIONES EMERGENTES EN SISTEMAS HETEROGENEOS/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2015-66972-C5-3-R/ES/TECNICAS PARA LA MEJORA DE LAS PRESTACIONES, FIABILIDAD Y CONSUMO DE ENERGIA DE LOS SERVIDORES. OPTIMIZACION DE APLICACIONES CIENTIFICAS, MEDICAS Y DE VISION ARTIFICIAL/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI//RYC-2018-025580-I/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.description.bibliographicCitation	Hernández, M.; Cebrián, JM.; Cecilia-Canales, JM.; García, JM. (2020). Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance. International Journal of High Performance Computing Applications. 34(2):199-297. https://doi.org/10.1177/1094342017738352	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1177/1094342017738352	es_ES
dc.description.upvformatpinicio	199	es_ES
dc.description.upvformatpfin	297	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	34	es_ES
dc.description.issue	2	es_ES
dc.relation.pasarela	S\428415	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.contributor.funder	Universidad Autónoma de Guerrero, México	es_ES
dc.contributor.funder	Fundación Séneca-Agencia de Ciencia y Tecnología de la Región de Murcia	es_ES
dc.description.references	Michael Brown, W., Carrillo, J.-M. Y., Gavhane, N., Thakkar, F. M., & Plimpton, S. J. (2015). Optimizing legacy molecular dynamics software with directive-based offload. Computer Physics Communications, 195, 95-101. doi:10.1016/j.cpc.2015.05.004	es_ES
dc.description.references	Esmaeilzadeh, H., Blem, E., St. Amant, R., Sankaralingam, K., & Burger, D. (2012). Power Limitations and Dark Silicon Challenge the Future of Multicore. ACM Transactions on Computer Systems, 30(3), 1-27. doi:10.1145/2324876.2324879	es_ES
dc.description.references	Feng, L. (2015). Data Transfer Using the Intel COI Library. High Performance Parallelism Pearls, 341-348. doi:10.1016/b978-0-12-802118-7.00020-0	es_ES
dc.description.references	Jeffers, J., & Reinders, J. (2013). Offload. Intel Xeon Phi Coprocessor High Performance Programming, 189-241. doi:10.1016/b978-0-12-410414-3.00007-4	es_ES
dc.description.references	Rahman, R. (2013). Intel® Xeon Phi™ Coprocessor Architecture and Tools. doi:10.1007/978-1-4302-5927-5	es_ES
dc.description.references	Reinders J, Jeffers J (2014) High Performance Parallelism Pearls, Multicore and Many-core Programming Approaches (Characterization and Auto-tuning of 3DFD). Morgan Kaufmann, pp. 377–396.	es_ES
dc.description.references	Shareef, B., de Doncker, E., & Kapenga, J. (2015). Monte Carlo simulations on Intel Xeon Phi: Offload and native mode. 2015 IEEE High Performance Extreme Computing Conference (HPEC). doi:10.1109/hpec.2015.7322456	es_ES
dc.description.references	Ujaldón, M. (2016). CUDA Achievements and GPU Challenges Ahead. Lecture Notes in Computer Science, 207-217. doi:10.1007/978-3-319-41778-3_20	es_ES
dc.description.references	Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., & Wang, Y. (2014). High-Performance Computing on the Intel® Xeon Phi™. doi:10.1007/978-3-319-06486-4	es_ES
dc.description.references	Wende, F., Klemm, M., Steinke, T., & Reinefeld, A. (2015). Concurrent Kernel Offloading. High Performance Parallelism Pearls, 201-223. doi:10.1016/b978-0-12-802118-7.00012-1	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [47484]

Mostrar el registro sencillo del ítem

Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)