Mostrar el registro sencillo del ítem
dc.contributor.author | Agosta, Giovanni | es_ES |
dc.contributor.author | Fornaciari, William | es_ES |
dc.contributor.author | Atienza, David | es_ES |
dc.contributor.author | Canal, Ramon | es_ES |
dc.contributor.author | Cilardo, Alessandro | es_ES |
dc.contributor.author | Flich Cardo, José | es_ES |
dc.contributor.author | Hernández Luz, Carles | es_ES |
dc.contributor.author | Kulczewski, Michal | es_ES |
dc.contributor.author | Massari, Giuseppe | es_ES |
dc.contributor.author | Tornero-Gavilá, Rafael | es_ES |
dc.contributor.author | Zapater, Marina | es_ES |
dc.date.accessioned | 2021-05-28T03:34:27Z | |
dc.date.available | 2021-05-28T03:34:27Z | |
dc.date.issued | 2020-09 | es_ES |
dc.identifier.issn | 0141-9331 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/166916 | |
dc.description.abstract | [EN] RECIPE (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems) is a recently started project funded within the H2020 FETHPC programme, which is expressly targeted at exploring new High-Performance Computing (HPC) technologies. RECIPE aims at introducing a hierarchical runtime resource management infrastructure to optimize energy efficiency and minimize the occurrence of thermal hotspots, while enforcing the time constraints imposed by the applications and ensuring reliability for both time-critical and throughput-oriented computation that run on deeply heterogeneous accelerator-based systems. This paper presents a detailed overview of RECIPE, identifying the fundamental challenges as well as the key innovations addressed by the project. In particular, the need for predictive reliability approaches to maximizing hardware lifetime and guarantee application performance is identified as the key concern for RECIPE. We address it through hierarchical resource management of the heterogeneous architectural components of the system, driven by estimates of the application latency and hardware reliability obtained respectively through timing analysis and modeling thermal properties and mean-time-to-failure of subsystems. We show the impact of prediction accuracy on the overheads imposed by the checkpointing policy, as well as a possible application to a weather forecasting use case. | es_ES |
dc.description.sponsorship | The activities described in this article received funding from the European Union's Horizon 2020 research and innovation programme under the FETHPC grant agreement no. 801137 RECIPE: REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Elsevier | es_ES |
dc.relation.ispartof | Microprocessors and Microsystems | es_ES |
dc.rights | Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) | es_ES |
dc.subject | HPC | es_ES |
dc.subject | Heterogeneous computing | es_ES |
dc.subject | Run-time management | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | The RECIPE approach to challenges in deeply heterogeneous high performance systems | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1016/j.micpro.2020.103185 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/H2020/801137/EU/REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors | es_ES |
dc.description.bibliographicCitation | Agosta, G.; Fornaciari, W.; Atienza, D.; Canal, R.; Cilardo, A.; Flich Cardo, J.; Hernández Luz, C.... (2020). The RECIPE approach to challenges in deeply heterogeneous high performance systems. Microprocessors and Microsystems. 77:1-13. https://doi.org/10.1016/j.micpro.2020.103185 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1016/j.micpro.2020.103185 | es_ES |
dc.description.upvformatpinicio | 1 | es_ES |
dc.description.upvformatpfin | 13 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 77 | es_ES |
dc.relation.pasarela | S\431096 | es_ES |
dc.contributor.funder | European Commission | es_ES |
dc.description.references | Flich, J., Agosta, G., Ampletzer, P., Alonso, D. A., Brandolese, C., Cappe, E., … Zoni, D. (2018). Exploring manycore architectures for next-generation HPC systems through the MANGO approach. Microprocessors and Microsystems, 61, 154-170. doi:10.1016/j.micpro.2018.05.011 | es_ES |
dc.description.references | https://euroexa.eu. | es_ES |
dc.description.references | https://www.altera.com/products/sip/memory/stratix-10-mx/overview.html. | es_ES |
dc.description.references | http://www.mango-project.eu. | es_ES |
dc.description.references | https://www.infinibandta.org/infiniband-roadmap/. | es_ES |
dc.description.references | Reghenzani, F., Massari, G., & Fornaciari, W. (2018). chronovise: Measurement-Based Probabilistic Timing Analysis framework. Journal of Open Source Software, 3(28), 711. doi:10.21105/joss.00711 | es_ES |
dc.description.references | Abella, J., Padilla, M., Castillo, J. D., & Cazorla, F. J. (2017). Measurement-Based Worst-Case Execution Time Estimation Using the Coefficient of Variation. ACM Transactions on Design Automation of Electronic Systems, 22(4), 1-29. doi:10.1145/3065924 | es_ES |
dc.description.references | https://lanl.gov/projects/trinity/specifications.php. | es_ES |
dc.description.references | https://www.bsc.es/marenostrum/marenostrum/technical-information. | es_ES |
dc.description.references | https://www.olcf.ornl.gov/olcf-resources/compute-systems/titan/. | es_ES |
dc.description.references | Bellasi, P., Massari, G., & Fornaciari, W. (2015). Effective Runtime Resource Management Using Linux Control Groups with the BarbequeRTRM Framework. ACM Transactions on Embedded Computing Systems, 14(2), 1-17. doi:10.1145/2658990 | es_ES |
dc.description.references | Egwutuoha, I. P., Levy, D., Selic, B., & Chen, S. (2013). A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems. The Journal of Supercomputing, 65(3), 1302-1326. doi:10.1007/s11227-013-0884-0 | es_ES |
dc.description.references | Lee, K., & Wong, S. S. (2017). Fault-Tolerant FPGA with Column-Based Redundancy and Power Gating Using RRAM. IEEE Transactions on Computers, 66(6), 946-956. doi:10.1109/tc.2016.2634533 | es_ES |
dc.description.references | Cheatham, J. A., Emmert, J. M., & Baumgart, S. (2006). A survey of fault tolerant methodologies for FPGAs. ACM Transactions on Design Automation of Electronic Systems, 11(2), 501-533. doi:10.1145/1142155.1142167 | es_ES |
dc.description.references | Parris, M. G., Sharma, C. A., & Demara, R. F. (2011). Progress in autonomous fault recovery of field programmable gate arrays. ACM Computing Surveys, 43(4), 1-30. doi:10.1145/1978802.1978810 | es_ES |
dc.description.references | A. Iranfar, F. Terraneo, W.A. Simon, L. Dragic, I. Pilji, M. Zapater Sancho, W. Fornaciari, M. Kovac, D. Atienza Alonso, Thermal characterization of next-generation workloads on heterogeneous MPSoCs (2017). | es_ES |
dc.description.references | Zoni, D., & Fornaciari, W. (2015). Modeling DVFS and Power-Gating Actuators for Cycle-Accurate NoC-Based Simulators. ACM Journal on Emerging Technologies in Computing Systems, 12(3), 1-24. doi:10.1145/2751561 | es_ES |
dc.description.references | Curtsinger, C., & Berger, E. D. (2013). STABILIZER. ACM SIGARCH Computer Architecture News, 41(1), 219-228. doi:10.1145/2490301.2451141 | es_ES |
dc.description.references | Kormann, J., Rodríguez, J. E., Gutierrez, N., Ferrer, M., Rojas, O., de la Puente, J., … Cela, J. M. (2016). Toward an automatic full-wave inversion: Synthetic study cases. The Leading Edge, 35(12), 1047-1052. doi:10.1190/tle35121047.1 | es_ES |
dc.description.references | Fusi, M., Mazzocchetti, F., Farres, A., Kosmidis, L., Canal, R., Cazorla, F. J., & Abella, J. (2020). On the Use of Probabilistic Worst-Case Execution Time Estimation for Parallel Applications in High Performance Systems. Mathematics, 8(3), 314. doi:10.3390/math8030314 | es_ES |
dc.description.references | D.W. Wright, R.A. Richardson, W. Edeling, J. Lakhlili, R.C. Sinclair, V. Jacauskas, D. Suleimenova, B. Bosak, M. Kulczewski, T. Piontek, P. Kopta, I. Chirca, H. Arabnejad, O.O. Luk, O. Hoenen, J. Weglarz, D. Crommelin, D. Groen, Building confidence in simulation: Application of easyvvuq, Submitted to Journal of Advanced Theory and Simulations on 12/12/2019. | es_ES |