Mostrar el registro sencillo del ítem
dc.contributor.author | Hernández, Mario | es_ES |
dc.contributor.author | Cebrián, Juan M. | es_ES |
dc.contributor.author | Cecilia-Canales, José María | es_ES |
dc.contributor.author | García, José M. | es_ES |
dc.date.accessioned | 2021-07-17T03:34:50Z | |
dc.date.available | 2021-07-17T03:34:50Z | |
dc.date.issued | 2020-03 | es_ES |
dc.identifier.issn | 1094-3420 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/169425 | |
dc.description.abstract | [EN] The ever-increasing computational requirements of HPC and service provider applications are becoming a great challenge for hardware and software designers. These requirements are reaching levels where the isolated development on either computational field is not enough to deal with such challenge. A holistic view of the computational thinking is therefore the only way to success in real scenarios. However, this is not a trivial task as it requires, among others, of hardware¿software codesign. In the hardware side, most high-throughput computers are designed aiming for heterogeneity, where accelerators (e.g. Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), etc.) are connected through high-bandwidth bus, such as PCI-Express, to the host CPUs. Applications, either via programmers, compilers, or runtime, should orchestrate data movement, synchronization, and so on among devices with different compute and memory capabilities. This increases the programming complexity and it may reduce the overall application performance. This article evaluates different offloading strategies to leverage heterogeneous systems, based on several cards with the firstgeneration Xeon Phi coprocessors (Knights Corner). We use a 11-point 3-D Stencil kernel that models heat dissipation as a case study. Our results reveal substantial performance improvements when using several accelerator cards. Additionally, we show that computing of an approximate result by reducing the communication overhead can yield 23% performance gains for double-precision data sets. | es_ES |
dc.description.sponsorship | The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is jointly supported by the Fundacion Seneca (Agencia Regional de Ciencia y Tecnologia, Region de Murcia) under grants 15290/PI/2010 and 18946/JLI/13 and by the Spanish MINECO, as well as European Commission FEDER funds, under grants TIN2015-66972-C5-3-R and TIN2016-78799-P (AEI/ FEDER, UE). MH was supported by a research grant from the PRODEP under the Professional Development Program for Teachers (UAGro-197) México | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | SAGE Publications | es_ES |
dc.relation.ispartof | International Journal of High Performance Computing Applications | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Offloading computation | es_ES |
dc.subject | Stencil codes | es_ES |
dc.subject | Approximate computing | es_ES |
dc.subject | Heterogeneous computing | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1177/1094342017738352 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/f SéNeCa//15290%2FPI%2F10/ES/Diseño, evaluación y explotación de aplicaciones biomédicas para arquitecturas paralelas de altas prestaciones y bajo coste/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/UAGro//UAGro-197/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/f SéNeCa//18946%2FJLI%2F13/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2016-78799-P/ES/DESARROLLO HOLISTICO DE APLICACIONES EMERGENTES EN SISTEMAS HETEROGENEOS/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2015-66972-C5-3-R/ES/TECNICAS PARA LA MEJORA DE LAS PRESTACIONES, FIABILIDAD Y CONSUMO DE ENERGIA DE LOS SERVIDORES. OPTIMIZACION DE APLICACIONES CIENTIFICAS, MEDICAS Y DE VISION ARTIFICIAL/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI//RYC-2018-025580-I/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors | es_ES |
dc.description.bibliographicCitation | Hernández, M.; Cebrián, JM.; Cecilia-Canales, JM.; García, JM. (2020). Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance. International Journal of High Performance Computing Applications. 34(2):199-297. https://doi.org/10.1177/1094342017738352 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1177/1094342017738352 | es_ES |
dc.description.upvformatpinicio | 199 | es_ES |
dc.description.upvformatpfin | 297 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 34 | es_ES |
dc.description.issue | 2 | es_ES |
dc.relation.pasarela | S\428415 | es_ES |
dc.contributor.funder | Agencia Estatal de Investigación | es_ES |
dc.contributor.funder | European Regional Development Fund | es_ES |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |
dc.contributor.funder | Universidad Autónoma de Guerrero, México | es_ES |
dc.contributor.funder | Fundación Séneca-Agencia de Ciencia y Tecnología de la Región de Murcia | es_ES |
dc.description.references | Michael Brown, W., Carrillo, J.-M. Y., Gavhane, N., Thakkar, F. M., & Plimpton, S. J. (2015). Optimizing legacy molecular dynamics software with directive-based offload. Computer Physics Communications, 195, 95-101. doi:10.1016/j.cpc.2015.05.004 | es_ES |
dc.description.references | Esmaeilzadeh, H., Blem, E., St. Amant, R., Sankaralingam, K., & Burger, D. (2012). Power Limitations and Dark Silicon Challenge the Future of Multicore. ACM Transactions on Computer Systems, 30(3), 1-27. doi:10.1145/2324876.2324879 | es_ES |
dc.description.references | Feng, L. (2015). Data Transfer Using the Intel COI Library. High Performance Parallelism Pearls, 341-348. doi:10.1016/b978-0-12-802118-7.00020-0 | es_ES |
dc.description.references | Jeffers, J., & Reinders, J. (2013). Offload. Intel Xeon Phi Coprocessor High Performance Programming, 189-241. doi:10.1016/b978-0-12-410414-3.00007-4 | es_ES |
dc.description.references | Rahman, R. (2013). Intel® Xeon Phi™ Coprocessor Architecture and Tools. doi:10.1007/978-1-4302-5927-5 | es_ES |
dc.description.references | Reinders J, Jeffers J (2014) High Performance Parallelism Pearls, Multicore and Many-core Programming Approaches (Characterization and Auto-tuning of 3DFD). Morgan Kaufmann, pp. 377–396. | es_ES |
dc.description.references | Shareef, B., de Doncker, E., & Kapenga, J. (2015). Monte Carlo simulations on Intel Xeon Phi: Offload and native mode. 2015 IEEE High Performance Extreme Computing Conference (HPEC). doi:10.1109/hpec.2015.7322456 | es_ES |
dc.description.references | Ujaldón, M. (2016). CUDA Achievements and GPU Challenges Ahead. Lecture Notes in Computer Science, 207-217. doi:10.1007/978-3-319-41778-3_20 | es_ES |
dc.description.references | Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., & Wang, Y. (2014). High-Performance Computing on the Intel® Xeon Phi™. doi:10.1007/978-3-319-06486-4 | es_ES |
dc.description.references | Wende, F., Klemm, M., Steinke, T., & Reinefeld, A. (2015). Concurrent Kernel Offloading. High Performance Parallelism Pearls, 201-223. doi:10.1016/b978-0-12-802118-7.00012-1 | es_ES |