- -

An efficient cache flat storage organization for multithreaded workloads for low power processors

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

An efficient cache flat storage organization for multithreaded workloads for low power processors

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Puche, José es_ES
dc.contributor.author Petit Martí, Salvador Vicente es_ES
dc.contributor.author Gómez Requena, María Engracia es_ES
dc.contributor.author Sahuquillo Borrás, Julio es_ES
dc.date.accessioned 2021-06-30T03:31:04Z
dc.date.available 2021-06-30T03:31:04Z
dc.date.issued 2020-09 es_ES
dc.identifier.issn 0167-739X es_ES
dc.identifier.uri http://hdl.handle.net/10251/168546
dc.description.abstract [EN] The cache hierarchy of current multicores typically consists of three levels, ranging from the faster and smaller L1 level to the slower and larger L3 level. This approach has been demonstrated to be effective in high performance processors, since it reduces the average memory access time. However, when implemented in devices where energy efficiency becomes critical, like low power or embedded processors, conventional cache hierarchies may present some concerns. These concerns, which incur a waste of area and energy, are multiple cache lookups, block replication, block migration and private cache space overprovisioning. To deal with these issues, in this work we propose FOS-Mt, a new cache organization aimed at addressing energy savings in current multicores for multithreaded applications. FOS-Mt's cache hierarchy consists of only two levels: the L1 cache level located in the core pipeline, and a single and flattened second level which conforms an aggregated cache space which is accessible by all the execution cores. This level is sliced into multiple small buffers, which are dynamically assigned to any of the running thread when they are expected to improve the system performance. Those buffers that are not allocated to any core are powered off to save energy. Experimental results show that FOS-Mt significantly reduces both static and dynamic energy consumption over other conventional cache organizations like NUCA or shared caches with the same storage capacity. Compared to the widely known cache decay approach, FOS-Mt achieves an improvement in the energy delay product by 19.3% on average. Moreover, despite the fact that FOS-Mt is an energy-aware architecture, performance is scarcely affected, since it is kept similar to that one achieved by conventional and cache decay approaches. es_ES
dc.description.sponsorship This work has been supported by the Spanish Ministerio de Economia y Competitividad under grant RTI2018-098156-B-C51, and by the Generalitat Valenciana, Spain under grant AICO/2019/317. es_ES
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation.ispartof Future Generation Computer Systems es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Cache hierarchy es_ES
dc.subject Multicores es_ES
dc.subject Energy efficiency es_ES
dc.subject On-chip photonic network es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title An efficient cache flat storage organization for multithreaded workloads for low power processors es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.future.2019.11.024 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-098156-B-C51/ES/TECNOLOGIAS INNOVADORAS DE PROCESADORES, ACELERADORES Y REDES, PARA CENTROS DE DATOS Y COMPUTACION DE ALTAS PRESTACIONES/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//AICO%2F2019%2F317/ es_ES
dc.rights.accessRights Cerrado es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Puche, J.; Petit Martí, SV.; Gómez Requena, ME.; Sahuquillo Borrás, J. (2020). An efficient cache flat storage organization for multithreaded workloads for low power processors. Future Generation Computer Systems. 110:1037-1054. https://doi.org/10.1016/j.future.2019.11.024 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1016/j.future.2019.11.024 es_ES
dc.description.upvformatpinicio 1037 es_ES
dc.description.upvformatpfin 1054 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 110 es_ES
dc.relation.pasarela S\398066 es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.description.references S. Kaxiras, Z. Hu, M. Martonosi, Cache decay: exploiting generational behavior to reduce cache leakage power, in: Procs. of the 28th Annual International Symposium on Computer Architecture, ISCA’01, 2001, pp. 240–251. es_ES
dc.description.references Sinharoy, B., Kalla, R. N., Tendler, J. M., Eickemeyer, R. J., & Joyner, J. B. (2005). POWER5 system microarchitecture. IBM Journal of Research and Development, 49(4.5), 505-521. doi:10.1147/rd.494.0505 es_ES
dc.description.references M. Qureshi, Y. Patt, Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches, in: MICRO, 2006, pp. 423–432. es_ES
dc.description.references Selfa, V., Sahuquillo, J., Gómez, M. E., & Gómez, C. (2018). Efficient selective multicore prefetching under limited memory bandwidth. Journal of Parallel and Distributed Computing, 120, 32-43. doi:10.1016/j.jpdc.2018.05.002 es_ES
dc.description.references A. Shacham, K. Bergman, L. Carloni, On the design of a photonic network-on-chip, in: Networks-on-Chip, NOCS 2007, pp. 53–64. es_ES
dc.description.references G. Chen, H. Chen, M. Haurylau, N. Nelson, P.M. Fauchet, E.G. Friedman, D. Albonesi, Predictions of CMOS compatible on-chip optical interconnect, in: Procs. of Int. Workshop on System Level Interconnect Prediction, SLIP ’05, 2005, pp. 13–20. es_ES
dc.description.references J. Pang, C. Dwyer, A.R. Lebeck, Exploiting emerging technologies for nanoscale photonic networks-on-chip, in: Procs. of 6th Int. Workshop on NoC Architectures, NoCArc ’13, pp. 53–58. es_ES
dc.description.references Soref, R., & Bennett, B. (1987). Electrooptical effects in silicon. IEEE Journal of Quantum Electronics, 23(1), 123-129. doi:10.1109/jqe.1987.1073206 es_ES
dc.description.references García-Guirado, A., Fernández-Pascual, R., García, J. M., & Bartolini, S. (2014). Managing resources dynamically in hybrid photonic-electronic networks-on-chip. Concurrency and Computation: Practice and Experience, 26(15), 2530-2550. doi:10.1002/cpe.3332 es_ES
dc.description.references D. Vantrease, N. Binkert, R. Schreiber, M. Lipasti, Light speed arbitration and flow control for nanophotonic interconnects, in: Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium, pp. 304–315. es_ES
dc.description.references S. Werner, J. Navaridas, M. Lujan, Designing low-power, low-latency networks-on-chip by optimally combining electrical and optical Links, in: 2017 IEEE Int. Symp. of High Performance Computer Architecture, IEEE, Manchester, UK. es_ES
dc.description.references Bahirat, S., & Pasricha, S. (2014). METEOR. ACM Transactions on Embedded Computing Systems, 13(3s), 1-33. doi:10.1145/2567940 es_ES
dc.description.references R. Morris, A.K. Kodi, A. Louri, Dynamic reconfiguration of 3D photonic networks-on-chip for maximizing performance and improving fault tolerance, in: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 282–293. http://dx.doi.org/10.1109/MICRO.2012.34. es_ES
dc.description.references R. Ubal, J. Sahuquillo, S. Petit, P. Lopez, Multi2Sim: A simulation framework to evaluate multicore-multithreaded processors, in: Int. Symp. on Computer Architecture and High Performance Computing, pp. 62–68. http://dx.doi.org/10.1109/SBAC-PAD.2007.17. es_ES
dc.description.references Rosenfeld, P., Cooper-Balis, E., & Jacob, B. (2011). DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters, 10(1), 16-19. doi:10.1109/l-ca.2011.4 es_ES
dc.description.references N. Muralimanohar, R. Balasubramonian, N.P. Jouppi, CACTI 6.0: A tool to model large caches, in: HP Laboratories, 2009. es_ES
dc.description.references . Man-Lap Li, R. Sasanka, S.V. Adve, . Yen-Kuang Chen, E. Debes, The ALPBench benchmark suite for complex multimedia applications, in: Proceedings of the IEEE International Workload Characterization Symposium, 2005, IIWC’05, 2015. es_ES
dc.description.references Valero, A., Petit, S., Sahuquillo, J., Kaeli, D. R., & Duato, J. (2015). A reuse-based refresh policy for energy-aware eDRAM caches. Microprocessors and Microsystems, 39(1), 37-48. doi:10.1016/j.micpro.2014.12.001 es_ES
dc.description.references Valero, A., Sahuquillo, J., Petit, S., López, P., & Duato, J. (2012). Combining recency of information with selective random and a victim cache in last-level caches. ACM Transactions on Architecture and Code Optimization, 9(3), 1-20. doi:10.1145/2355585.2355589 es_ES
dc.description.references S. Kim, D. Chandra, D. Solihin, Fair cache sharing and partitioning in a chip multiprocessor architecture, in: PACT, 2004, pp. 111–122. es_ES
dc.description.references Sahuquillo, J., & Pont, A. (2000). Splitting the data cache: a survey. IEEE Concurrency, 8(3), 30-35. doi:10.1109/4434.865890 es_ES
dc.description.references J.A. Rivers, E.S. Tam, G.S. Tyson, E.S. Davidson, M.K. Farrens, Utilizing reuse information in data cache management, in: Proceedings of the 12th International Conference on Supercomputing, ICS 1998, Melbourne, Australia, July 13–17, 1998, 1998, pp. 449–456. http://dx.doi.org/10.1145/277830.277941. URL http://doi.acm.org/10.1145/277830.277941. es_ES
dc.description.references J. Sahuquillo, A. Pont, The filter cache: A run-time cache management approach1, in: 25th EUROMICRO ’99 Conference, Informatics: Theory and Practice for the New Millenium, 8–10 September 1999, Milan, Italy, 1999, pp. 1424–1431. http://dx.doi.org/10.1109/EURMIC.1999.794504. URL https://doi.org/10.1109/EURMIC.1999.794504. es_ES
dc.description.references Chishti, Z., Powell, M. D., & Vijaykumar, T. N. (2005). Optimizing Replication, Communication, and Capacity Allocation in CMPs. ACM SIGARCH Computer Architecture News, 33(2), 357-368. doi:10.1145/1080695.1070001 es_ES
dc.description.references Hardavellas, N., Ferdman, M., Falsafi, B., & Ailamaki, A. (2009). Reactive NUCA. ACM SIGARCH Computer Architecture News, 37(3), 184-195. doi:10.1145/1555815.1555779 es_ES
dc.description.references Tsai, P.-A., Beckmann, N., & Sanchez, D. (2017). Jenga. ACM SIGARCH Computer Architecture News, 45(2), 652-665. doi:10.1145/3140659.3080214 es_ES
dc.description.references D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. Jouppi, M. Fiorentino, A. Davis, N. Binkert, R. Beausoleil, J. Ahn, Corona: System implications of emerging nanophotonic technology, in: Computer Architecture, 2008. ISCA ’08. 35th International Symposium on, pp. 153–164. http://dx.doi.org/10.1109/ISCA.2008.35. es_ES
dc.description.references Y. Pan, J. Kim, G. Memik, FlexiShare: Channel sharing for an energy-efficient nanophotonic crossbar, in: High Performance Computer Architecture, 2010 IEEE 16th International Symposium, pp. 1–12. http://dx.doi.org/10.1109/HPCA.2010.5416626. es_ES
dc.description.references Pan, Y., Kumar, P., Kim, J., Memik, G., Zhang, Y., & Choudhary, A. (2009). Firefly. ACM SIGARCH Computer Architecture News, 37(3), 429-440. doi:10.1145/1555815.1555808 es_ES
dc.description.references Li, C., Browning, M., Gratz, P. V., & Palermo, S. (2014). LumiNOC: A Power-Efficient, High-Performance, Photonic Network-on-Chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 33(6), 826-838. doi:10.1109/tcad.2014.2320510 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem