PS directory: a scalable multilevel directory cache for CMPs

Valls, Joan J; Ros Bardisa, Alberto; Sahuquillo Borrás, Julio; Gómez Requena, María Engracia

doi:10.1007/s11227-014-1332-5

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

PS directory: a scalable multilevel directory cache for CMPs

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: art%3A10.1007%2Fs ...

Tamaño: 1.022Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

Nombre: jjvalls-jsc15b.pdf

Tamaño: 438.8Kb

Formato: PDF

Descripción: Versión del autor

Abrir

dc.contributor.author	Valls, Joan J	es_ES
dc.contributor.author	Ros Bardisa, Alberto	es_ES
dc.contributor.author	Sahuquillo Borrás, Julio	es_ES
dc.contributor.author	Gómez Requena, María Engracia	es_ES
dc.date.accessioned	2016-05-17T14:08:48Z
dc.date.available	2016-05-17T14:08:48Z
dc.date.issued	2015-08
dc.identifier.issn	1573-0484
dc.identifier.uri	http://hdl.handle.net/10251/64269
dc.description	The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-014-1332-5	es_ES
dc.description.abstract	As the number of cores increases in current and future chip-multiprocessor (CMP) generations, coherence protocols must rely on novel hardware structures to scale in terms of performance, power, and area. Systems that use directory information for coherence purposes are currently the most scalable alternative. This paper studies the important differences between the directory behavior of private and shared blocks, which claim for a separate management of both types of blocks at the directory. We propose the PS directory, a two-level directory cache that keeps the reduced number of frequently accessed shared entries in a small and fast first-level cache, namely Shared cache, and uses a larger and slower second-level Private cache to track the large amount of private blocks. Entries in the Private cache do not implement the sharer vector, which allows important silicon area savings. Speed and area reasons suggest the use of eDRAM technology, much denser but slower than SRAM technology, for the Private cache, which in turn brings energy savings. Experimental results for a 16-core CMP show that, compared to a conventional directory, the PS directory improves performance by 14 % while reducing silicon area and energy consumption by 34 and 27 %, respectively. Also, compared to the state-of-the-art Multi-Grain Directory, the PS directory apart from increasing performance, it reduces power by 18.7 %, and provides more scalability in terms of area.	es_ES
dc.description.sponsorship	This work has been jointly supported by the MINECO and European Commission (FEDER funds) under the project TIN2012-38341-C04-01 and the Fundacion Seneca-Agencia de Ciencia y Tecnologia de la Region de Murcia under the project Jovenes Lideres en Investigacion 18956/JLI/13.	en_EN
dc.language	Inglés	es_ES
dc.publisher	Springer Verlag (Germany)	es_ES
dc.relation.ispartof	Journal of Supercomputing	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Chip-multiprocessors	es_ES
dc.subject	Coherence	es_ES
dc.subject	Sparse directories	es_ES
dc.subject	Energy-aware	es_ES
dc.subject	Area reduction	es_ES
dc.subject	EDRAM	es_ES
dc.subject.classification	ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES	es_ES
dc.title	PS directory: a scalable multilevel directory cache for CMPs	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1007/s11227-014-1332-5
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2012-38341-C04-01/ES/MEJORA DE LA ARQUITECTURA DE SERVIDORES, SERVICIOS Y APLICACIONES/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/f SéNeCa//18956%2FJLI%2F13/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors	es_ES
dc.description.bibliographicCitation	Valls, JJ.; Ros Bardisa, A.; Sahuquillo Borrás, J.; Gómez Requena, ME. (2015). PS directory: a scalable multilevel directory cache for CMPs. Journal of Supercomputing. 71(8):2847-2876. https://doi.org/10.1007/s11227-014-1332-5	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://link.springer.com/article/10.1007%2Fs11227-014-1332-5	es_ES
dc.description.upvformatpinicio	2847	es_ES
dc.description.upvformatpfin	2876	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	71	es_ES
dc.description.issue	8	es_ES
dc.relation.senia	292477	es_ES
dc.identifier.eissn	1573-0484
dc.contributor.funder	Fundación Séneca-Agencia de Ciencia y Tecnología de la Región de Murcia	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.description.references	Acacio ME, González J, García JM, Duato J (2001) A new scalable directory architecture for large-scale multiprocessors. In: Proceedings of 7th international symposium on high-performance computer architecture (HPCA), pp 97–106	es_ES
dc.description.references	Acacio ME, González J, García JM, Duato J (2005) A two-level directory architecture for highly scalable cc-NUMA multiprocessors. IEEE Trans Parallel Distrib Syst (TPDS) 16(1):67–79	es_ES
dc.description.references	Agarwal N, Krishna T, Peh L-S, Jha NK (2009) GARNET: a detailed on-chip network model inside a full-system simulator. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS), pp 33–42	es_ES
dc.description.references	Barroso LA, Gharachorloo K, McNamara R et al (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: Proceedings of 27th international symposium on computer architecture (ISCA), pp 12–14	es_ES
dc.description.references	Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of 17th international conference on parallel architectures and compilation techniques (PACT), pp 72–81	es_ES
dc.description.references	Chaiken D, Kubiatowicz J, Agarwal A (1991) LimitLESS directories: a scalable cache coherence scheme. In: 4th international conference on architectural support for programming language and operating systems (ASPLOS), pp 224–234	es_ES
dc.description.references	Chen G (1993) Slid: a cost-effective and scalable limited-directory scheme for cache coherence. In: 5th international conference on parallel architectures and languages Europe (PARLE), pp 341–352	es_ES
dc.description.references	Conway P, Kalyanasundharam N, Donley G, Lepak K, Hughes B (2010) Cache hierarchy and memory subsystem of the AMD opteron processor. IEEE Micro 30(2):16–29	es_ES
dc.description.references	Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: Proceedings of 38th international symposium on computer architecture (ISCA), pp 93–103	es_ES
dc.description.references	Ferdman M, Lotfi-Kamran P, Balet K, Falsafi B (2011) Cuckoo directory: a scalable directory for many-core systems. In: 17th international symposium on high-performance computer architecture (HPCA), pp 169–180	es_ES
dc.description.references	Guo S-L, Wang H-X, Xue Y-B, Li C-M, Wang D-S (2010) Hierarchical cache directory for cmp. J Comput Sci Technol 25(2):246–256	es_ES
dc.description.references	Gupta A, Weber W-D, Mowry TC (1990) Reducing memory traffic requirements for scalable directory-based cache coherence schemes. In: Proceedings of international conference on parallel processing (ICPP), pp 312–321	es_ES
dc.description.references	Kalla R, Sinharoy B, Starke WJ, Floyd M (2010) POWER7: IBMs next-generation server processor. IEEE Micro 30(2):7–15	es_ES
dc.description.references	Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proceedings of 10th international conference on architectural support for programming language and operating systems (ASPLOS), pp 211–222	es_ES
dc.description.references	Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of ACM SIGPLAN conference on programming language design and implementation (PLDI), June 2005, pp 190–200	es_ES
dc.description.references	Magnusson PS, Christensson M, Eskilson J et al (2002) Simics: a full system simulation platform. IEEE Comput 35(2):50–58	es_ES
dc.description.references	Martin MM, Sorin DJ, Beckmann BM et al (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput Archit News 33(4):92–99	es_ES
dc.description.references	Marty MR, Hill MD (2007) Virtual hierarchies to support server consolidation. In: Proceedings of 34th international symposium on computer architecture (ISCA), pp 46–56	es_ES
dc.description.references	Marty MR, Hill MD (2008) Virtual hierarchies. IEEE Micro 28(1):99–109	es_ES
dc.description.references	Matick RE, Schuster SE (2005) Logic-based eDRAM: origins and rationale for use. IBM J Res Dev 49(1):145–165	es_ES
dc.description.references	Muralimanohar N, Balasubramonian R, Jouppi NP (2009) Cacti 6.0, HP Labs, technical report HPL-2009-85	es_ES
dc.description.references	O’Krafka BW, Newton AR (1990) An empirical evaluation of two memory-efficient directory methods. In: Proceedings of 17th international symposium on computer architecture (ISCA), pp 138–147	es_ES
dc.description.references	Ros A, Acacio ME, García JM (2010) A scalable organization for distributed directories. J Syst Archit (JSA) 56(2–3):77–87	es_ES
dc.description.references	Ros A, Cuesta B, Fernández-Pascual R, Gómez ME, Acacio ME, Robles A, García JM, Duato J (2012) Extending magny-cours cache coherence. IEEE Trans Comput (TC) 61(5):593–606	es_ES
dc.description.references	Sanchez D, Kozyrakis C (2012) SCD: a scalable coherence directory with flexible sharer set encoding. In: Proceedings of 18th international sympoium on high-performance computer architecture (HPCA), pp 129–140	es_ES
dc.description.references	Shah M, Barreh J, Brooks J et al (2007) UltraSPARC T2: a highly-threaded, power-efficient, SPARC SoC. In: Proceedings of IEEE Asian solid-state circuits conference, pp 22–25	es_ES
dc.description.references	Sinharoy B, Kalla RN, Tendler JM, Eickemeyer RJ, Joyner JB (2005) Power5 system microarchitecture. IBM J Res Dev 49(4/5):505–521	es_ES
dc.description.references	Tendler JM, Dodson JS, Fields JS, Le H, Sinharoy B (2002) POWER4 system microarchitecture. IBM J Res Dev 46(1):5–25	es_ES
dc.description.references	Valero A, Sahuquillo J, Petit S, Lorente V, Canal R, López P, Duato J (2009) An hybrid eDRAM/SRAM macrocell to implement first-level data caches. In: Proceedings of 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 213–221	es_ES
dc.description.references	Valls JJ, Ros A, Sahuquillo J, Gómez ME, Duato J (2012) PS-Dir: a scalable two-level directory cache. In: Proceedings of 21st international conference on parallel architectures and compilation techniques (PACT), pp 451–452	es_ES
dc.description.references	Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of 22nd international symposium on computer architecture (ISCA), pp 24–36	es_ES
dc.description.references	Wu X, Li J, Zhang L, Speight E, Rajamony R, Xie Y (2009) Hybrid cache architecture with disparate memory technologies. In: Proceedings of 36th international symposium on computer architecture (ISCA), pp 34–45	es_ES
dc.description.references	Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: Proceedings of 46th IEEE/ACM international symposium on microarchitecture (MICRO), pp 359–370	es_ES
dc.description.references	Zebchuk J, Srinivasan V, Qureshi MK, Moshovos A (2009) A tagless coherence directory. In: Proceedings of 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 423–434	es_ES
dc.description.references	Zhao H, Shriraman A, Dwarkadas S, Srinivasan V (2011) SPATL: Honey, I shrunk the coherence directory. In: Proceedings of 20th international conference on parallel architectures and compilation techniques (PACT), pp 148–157	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [48344]

Mostrar el registro sencillo del ítem

PS directory: a scalable multilevel directory cache for CMPs

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

PS directory: a scalable multilevel directory cache for CMPs

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)