An efficient cache flat storage organization for multithreaded workloads for low power processors

Puche, José; Petit Martí, Salvador Vicente; Gómez Requena, María Engracia; Sahuquillo Borrás, Julio

doi:10.1016/j.future.2019.11.024

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

An efficient cache flat storage organization for multithreaded workloads for low power processors

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Puche;Pett;Gómez - ...

Tamaño: 2.332Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Puche, José	es_ES
dc.contributor.author	Petit Martí, Salvador Vicente	es_ES
dc.contributor.author	Gómez Requena, María Engracia	es_ES
dc.contributor.author	Sahuquillo Borrás, Julio	es_ES
dc.date.accessioned	2021-06-30T03:31:04Z
dc.date.available	2021-06-30T03:31:04Z
dc.date.issued	2020-09	es_ES
dc.identifier.issn	0167-739X	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/168546
dc.description.abstract	[EN] The cache hierarchy of current multicores typically consists of three levels, ranging from the faster and smaller L1 level to the slower and larger L3 level. This approach has been demonstrated to be effective in high performance processors, since it reduces the average memory access time. However, when implemented in devices where energy efficiency becomes critical, like low power or embedded processors, conventional cache hierarchies may present some concerns. These concerns, which incur a waste of area and energy, are multiple cache lookups, block replication, block migration and private cache space overprovisioning. To deal with these issues, in this work we propose FOS-Mt, a new cache organization aimed at addressing energy savings in current multicores for multithreaded applications. FOS-Mt's cache hierarchy consists of only two levels: the L1 cache level located in the core pipeline, and a single and flattened second level which conforms an aggregated cache space which is accessible by all the execution cores. This level is sliced into multiple small buffers, which are dynamically assigned to any of the running thread when they are expected to improve the system performance. Those buffers that are not allocated to any core are powered off to save energy. Experimental results show that FOS-Mt significantly reduces both static and dynamic energy consumption over other conventional cache organizations like NUCA or shared caches with the same storage capacity. Compared to the widely known cache decay approach, FOS-Mt achieves an improvement in the energy delay product by 19.3% on average. Moreover, despite the fact that FOS-Mt is an energy-aware architecture, performance is scarcely affected, since it is kept similar to that one achieved by conventional and cache decay approaches.	es_ES
dc.description.sponsorship	This work has been supported by the Spanish Ministerio de Economia y Competitividad under grant RTI2018-098156-B-C51, and by the Generalitat Valenciana, Spain under grant AICO/2019/317.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Elsevier	es_ES
dc.relation.ispartof	Future Generation Computer Systems	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Cache hierarchy	es_ES
dc.subject	Multicores	es_ES
dc.subject	Energy efficiency	es_ES
dc.subject	On-chip photonic network	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	An efficient cache flat storage organization for multithreaded workloads for low power processors	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1016/j.future.2019.11.024	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-098156-B-C51/ES/TECNOLOGIAS INNOVADORAS DE PROCESADORES, ACELERADORES Y REDES, PARA CENTROS DE DATOS Y COMPUTACION DE ALTAS PRESTACIONES/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//AICO%2F2019%2F317/	es_ES
dc.rights.accessRights	Cerrado	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.description.bibliographicCitation	Puche, J.; Petit Martí, SV.; Gómez Requena, ME.; Sahuquillo Borrás, J. (2020). An efficient cache flat storage organization for multithreaded workloads for low power processors. Future Generation Computer Systems. 110:1037-1054. https://doi.org/10.1016/j.future.2019.11.024	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.future.2019.11.024	es_ES
dc.description.upvformatpinicio	1037	es_ES
dc.description.upvformatpfin	1054	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	110	es_ES
dc.relation.pasarela	S\398066	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES
dc.description.references	S. Kaxiras, Z. Hu, M. Martonosi, Cache decay: exploiting generational behavior to reduce cache leakage power, in: Procs. of the 28th Annual International Symposium on Computer Architecture, ISCA’01, 2001, pp. 240–251.	es_ES
dc.description.references	Sinharoy, B., Kalla, R. N., Tendler, J. M., Eickemeyer, R. J., & Joyner, J. B. (2005). POWER5 system microarchitecture. IBM Journal of Research and Development, 49(4.5), 505-521. doi:10.1147/rd.494.0505	es_ES
dc.description.references	M. Qureshi, Y. Patt, Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches, in: MICRO, 2006, pp. 423–432.	es_ES
dc.description.references	Selfa, V., Sahuquillo, J., Gómez, M. E., & Gómez, C. (2018). Efficient selective multicore prefetching under limited memory bandwidth. Journal of Parallel and Distributed Computing, 120, 32-43. doi:10.1016/j.jpdc.2018.05.002	es_ES
dc.description.references	A. Shacham, K. Bergman, L. Carloni, On the design of a photonic network-on-chip, in: Networks-on-Chip, NOCS 2007, pp. 53–64.	es_ES
dc.description.references	G. Chen, H. Chen, M. Haurylau, N. Nelson, P.M. Fauchet, E.G. Friedman, D. Albonesi, Predictions of CMOS compatible on-chip optical interconnect, in: Procs. of Int. Workshop on System Level Interconnect Prediction, SLIP ’05, 2005, pp. 13–20.	es_ES
dc.description.references	J. Pang, C. Dwyer, A.R. Lebeck, Exploiting emerging technologies for nanoscale photonic networks-on-chip, in: Procs. of 6th Int. Workshop on NoC Architectures, NoCArc ’13, pp. 53–58.	es_ES
dc.description.references	Soref, R., & Bennett, B. (1987). Electrooptical effects in silicon. IEEE Journal of Quantum Electronics, 23(1), 123-129. doi:10.1109/jqe.1987.1073206	es_ES
dc.description.references	García-Guirado, A., Fernández-Pascual, R., García, J. M., & Bartolini, S. (2014). Managing resources dynamically in hybrid photonic-electronic networks-on-chip. Concurrency and Computation: Practice and Experience, 26(15), 2530-2550. doi:10.1002/cpe.3332	es_ES
dc.description.references	D. Vantrease, N. Binkert, R. Schreiber, M. Lipasti, Light speed arbitration and flow control for nanophotonic interconnects, in: Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium, pp. 304–315.	es_ES
dc.description.references	S. Werner, J. Navaridas, M. Lujan, Designing low-power, low-latency networks-on-chip by optimally combining electrical and optical Links, in: 2017 IEEE Int. Symp. of High Performance Computer Architecture, IEEE, Manchester, UK.	es_ES
dc.description.references	Bahirat, S., & Pasricha, S. (2014). METEOR. ACM Transactions on Embedded Computing Systems, 13(3s), 1-33. doi:10.1145/2567940	es_ES
dc.description.references	R. Morris, A.K. Kodi, A. Louri, Dynamic reconfiguration of 3D photonic networks-on-chip for maximizing performance and improving fault tolerance, in: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 282–293. http://dx.doi.org/10.1109/MICRO.2012.34.	es_ES
dc.description.references	R. Ubal, J. Sahuquillo, S. Petit, P. Lopez, Multi2Sim: A simulation framework to evaluate multicore-multithreaded processors, in: Int. Symp. on Computer Architecture and High Performance Computing, pp. 62–68. http://dx.doi.org/10.1109/SBAC-PAD.2007.17.	es_ES
dc.description.references	Rosenfeld, P., Cooper-Balis, E., & Jacob, B. (2011). DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters, 10(1), 16-19. doi:10.1109/l-ca.2011.4	es_ES
dc.description.references	N. Muralimanohar, R. Balasubramonian, N.P. Jouppi, CACTI 6.0: A tool to model large caches, in: HP Laboratories, 2009.	es_ES
dc.description.references	. Man-Lap Li, R. Sasanka, S.V. Adve, . Yen-Kuang Chen, E. Debes, The ALPBench benchmark suite for complex multimedia applications, in: Proceedings of the IEEE International Workload Characterization Symposium, 2005, IIWC’05, 2015.	es_ES
dc.description.references	Valero, A., Petit, S., Sahuquillo, J., Kaeli, D. R., & Duato, J. (2015). A reuse-based refresh policy for energy-aware eDRAM caches. Microprocessors and Microsystems, 39(1), 37-48. doi:10.1016/j.micpro.2014.12.001	es_ES
dc.description.references	Valero, A., Sahuquillo, J., Petit, S., López, P., & Duato, J. (2012). Combining recency of information with selective random and a victim cache in last-level caches. ACM Transactions on Architecture and Code Optimization, 9(3), 1-20. doi:10.1145/2355585.2355589	es_ES
dc.description.references	S. Kim, D. Chandra, D. Solihin, Fair cache sharing and partitioning in a chip multiprocessor architecture, in: PACT, 2004, pp. 111–122.	es_ES
dc.description.references	Sahuquillo, J., & Pont, A. (2000). Splitting the data cache: a survey. IEEE Concurrency, 8(3), 30-35. doi:10.1109/4434.865890	es_ES
dc.description.references	J.A. Rivers, E.S. Tam, G.S. Tyson, E.S. Davidson, M.K. Farrens, Utilizing reuse information in data cache management, in: Proceedings of the 12th International Conference on Supercomputing, ICS 1998, Melbourne, Australia, July 13–17, 1998, 1998, pp. 449–456. http://dx.doi.org/10.1145/277830.277941. URL http://doi.acm.org/10.1145/277830.277941.	es_ES
dc.description.references	J. Sahuquillo, A. Pont, The filter cache: A run-time cache management approach1, in: 25th EUROMICRO ’99 Conference, Informatics: Theory and Practice for the New Millenium, 8–10 September 1999, Milan, Italy, 1999, pp. 1424–1431. http://dx.doi.org/10.1109/EURMIC.1999.794504. URL https://doi.org/10.1109/EURMIC.1999.794504.	es_ES
dc.description.references	Chishti, Z., Powell, M. D., & Vijaykumar, T. N. (2005). Optimizing Replication, Communication, and Capacity Allocation in CMPs. ACM SIGARCH Computer Architecture News, 33(2), 357-368. doi:10.1145/1080695.1070001	es_ES
dc.description.references	Hardavellas, N., Ferdman, M., Falsafi, B., & Ailamaki, A. (2009). Reactive NUCA. ACM SIGARCH Computer Architecture News, 37(3), 184-195. doi:10.1145/1555815.1555779	es_ES
dc.description.references	Tsai, P.-A., Beckmann, N., & Sanchez, D. (2017). Jenga. ACM SIGARCH Computer Architecture News, 45(2), 652-665. doi:10.1145/3140659.3080214	es_ES
dc.description.references	D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. Jouppi, M. Fiorentino, A. Davis, N. Binkert, R. Beausoleil, J. Ahn, Corona: System implications of emerging nanophotonic technology, in: Computer Architecture, 2008. ISCA ’08. 35th International Symposium on, pp. 153–164. http://dx.doi.org/10.1109/ISCA.2008.35.	es_ES
dc.description.references	Y. Pan, J. Kim, G. Memik, FlexiShare: Channel sharing for an energy-efficient nanophotonic crossbar, in: High Performance Computer Architecture, 2010 IEEE 16th International Symposium, pp. 1–12. http://dx.doi.org/10.1109/HPCA.2010.5416626.	es_ES
dc.description.references	Pan, Y., Kumar, P., Kim, J., Memik, G., Zhang, Y., & Choudhary, A. (2009). Firefly. ACM SIGARCH Computer Architecture News, 37(3), 429-440. doi:10.1145/1555815.1555808	es_ES
dc.description.references	Li, C., Browning, M., Gratz, P. V., & Palermo, S. (2014). LumiNOC: A Power-Efficient, High-Performance, Photonic Network-on-Chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 33(6), 826-838. doi:10.1109/tcad.2014.2320510	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [48344]

Mostrar el registro sencillo del ítem

An efficient cache flat storage organization for multithreaded workloads for low power processors

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

An efficient cache flat storage organization for multithreaded workloads for low power processors

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)