- -

PS directory: a scalable multilevel directory cache for CMPs

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

PS directory: a scalable multilevel directory cache for CMPs

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Valls, Joan J es_ES
dc.contributor.author Ros Bardisa, Alberto es_ES
dc.contributor.author Sahuquillo Borrás, Julio es_ES
dc.contributor.author Gómez Requena, María Engracia es_ES
dc.date.accessioned 2016-05-17T14:08:48Z
dc.date.available 2016-05-17T14:08:48Z
dc.date.issued 2015-08
dc.identifier.issn 1573-0484
dc.identifier.uri http://hdl.handle.net/10251/64269
dc.description The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-014-1332-5 es_ES
dc.description.abstract As the number of cores increases in current and future chip-multiprocessor (CMP) generations, coherence protocols must rely on novel hardware structures to scale in terms of performance, power, and area. Systems that use directory information for coherence purposes are currently the most scalable alternative. This paper studies the important differences between the directory behavior of private and shared blocks, which claim for a separate management of both types of blocks at the directory. We propose the PS directory, a two-level directory cache that keeps the reduced number of frequently accessed shared entries in a small and fast first-level cache, namely Shared cache, and uses a larger and slower second-level Private cache to track the large amount of private blocks. Entries in the Private cache do not implement the sharer vector, which allows important silicon area savings. Speed and area reasons suggest the use of eDRAM technology, much denser but slower than SRAM technology, for the Private cache, which in turn brings energy savings. Experimental results for a 16-core CMP show that, compared to a conventional directory, the PS directory improves performance by 14 % while reducing silicon area and energy consumption by 34 and 27 %, respectively. Also, compared to the state-of-the-art Multi-Grain Directory, the PS directory apart from increasing performance, it reduces power by 18.7 %, and provides more scalability in terms of area. es_ES
dc.description.sponsorship This work has been jointly supported by the MINECO and European Commission (FEDER funds) under the project TIN2012-38341-C04-01 and the Fundacion Seneca-Agencia de Ciencia y Tecnologia de la Region de Murcia under the project Jovenes Lideres en Investigacion 18956/JLI/13. en_EN
dc.language Inglés es_ES
dc.publisher Springer Verlag (Germany) es_ES
dc.relation.ispartof Journal of Supercomputing es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Chip-multiprocessors es_ES
dc.subject Coherence es_ES
dc.subject Sparse directories es_ES
dc.subject Energy-aware es_ES
dc.subject Area reduction es_ES
dc.subject EDRAM es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title PS directory: a scalable multilevel directory cache for CMPs es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s11227-014-1332-5
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2012-38341-C04-01/ES/MEJORA DE LA ARQUITECTURA DE SERVIDORES, SERVICIOS Y APLICACIONES/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/f SéNeCa//18956%2FJLI%2F13/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors es_ES
dc.description.bibliographicCitation Valls, JJ.; Ros Bardisa, A.; Sahuquillo Borrás, J.; Gómez Requena, ME. (2015). PS directory: a scalable multilevel directory cache for CMPs. Journal of Supercomputing. 71(8):2847-2876. https://doi.org/10.1007/s11227-014-1332-5 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://link.springer.com/article/10.1007%2Fs11227-014-1332-5 es_ES
dc.description.upvformatpinicio 2847 es_ES
dc.description.upvformatpfin 2876 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 71 es_ES
dc.description.issue 8 es_ES
dc.relation.senia 292477 es_ES
dc.identifier.eissn 1573-0484
dc.contributor.funder Fundación Séneca-Agencia de Ciencia y Tecnología de la Región de Murcia es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.description.references Acacio ME, González J, García JM, Duato J (2001) A new scalable directory architecture for large-scale multiprocessors. In: Proceedings of 7th international symposium on high-performance computer architecture (HPCA), pp 97–106 es_ES
dc.description.references Acacio ME, González J, García JM, Duato J (2005) A two-level directory architecture for highly scalable cc-NUMA multiprocessors. IEEE Trans Parallel Distrib Syst (TPDS) 16(1):67–79 es_ES
dc.description.references Agarwal N, Krishna T, Peh L-S, Jha NK (2009) GARNET: a detailed on-chip network model inside a full-system simulator. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS), pp 33–42 es_ES
dc.description.references Barroso LA, Gharachorloo K, McNamara R et al (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: Proceedings of 27th international symposium on computer architecture (ISCA), pp 12–14 es_ES
dc.description.references Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of 17th international conference on parallel architectures and compilation techniques (PACT), pp 72–81 es_ES
dc.description.references Chaiken D, Kubiatowicz J, Agarwal A (1991) LimitLESS directories: a scalable cache coherence scheme. In: 4th international conference on architectural support for programming language and operating systems (ASPLOS), pp 224–234 es_ES
dc.description.references Chen G (1993) Slid: a cost-effective and scalable limited-directory scheme for cache coherence. In: 5th international conference on parallel architectures and languages Europe (PARLE), pp 341–352 es_ES
dc.description.references Conway P, Kalyanasundharam N, Donley G, Lepak K, Hughes B (2010) Cache hierarchy and memory subsystem of the AMD opteron processor. IEEE Micro 30(2):16–29 es_ES
dc.description.references Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: Proceedings of 38th international symposium on computer architecture (ISCA), pp 93–103 es_ES
dc.description.references Ferdman M, Lotfi-Kamran P, Balet K, Falsafi B (2011) Cuckoo directory: a scalable directory for many-core systems. In: 17th international symposium on high-performance computer architecture (HPCA), pp 169–180 es_ES
dc.description.references Guo S-L, Wang H-X, Xue Y-B, Li C-M, Wang D-S (2010) Hierarchical cache directory for cmp. J Comput Sci Technol 25(2):246–256 es_ES
dc.description.references Gupta A, Weber W-D, Mowry TC (1990) Reducing memory traffic requirements for scalable directory-based cache coherence schemes. In: Proceedings of international conference on parallel processing (ICPP), pp 312–321 es_ES
dc.description.references Kalla R, Sinharoy B, Starke WJ, Floyd M (2010) POWER7: IBMs next-generation server processor. IEEE Micro 30(2):7–15 es_ES
dc.description.references Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proceedings of 10th international conference on architectural support for programming language and operating systems (ASPLOS), pp 211–222 es_ES
dc.description.references Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of ACM SIGPLAN conference on programming language design and implementation (PLDI), June 2005, pp 190–200 es_ES
dc.description.references Magnusson PS, Christensson M, Eskilson J et al (2002) Simics: a full system simulation platform. IEEE Comput 35(2):50–58 es_ES
dc.description.references Martin MM, Sorin DJ, Beckmann BM et al (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput Archit News 33(4):92–99 es_ES
dc.description.references Marty MR, Hill MD (2007) Virtual hierarchies to support server consolidation. In: Proceedings of 34th international symposium on computer architecture (ISCA), pp 46–56 es_ES
dc.description.references Marty MR, Hill MD (2008) Virtual hierarchies. IEEE Micro 28(1):99–109 es_ES
dc.description.references Matick RE, Schuster SE (2005) Logic-based eDRAM: origins and rationale for use. IBM J Res Dev 49(1):145–165 es_ES
dc.description.references Muralimanohar N, Balasubramonian R, Jouppi NP (2009) Cacti 6.0, HP Labs, technical report HPL-2009-85 es_ES
dc.description.references O’Krafka BW, Newton AR (1990) An empirical evaluation of two memory-efficient directory methods. In: Proceedings of 17th international symposium on computer architecture (ISCA), pp 138–147 es_ES
dc.description.references Ros A, Acacio ME, García JM (2010) A scalable organization for distributed directories. J Syst Archit (JSA) 56(2–3):77–87 es_ES
dc.description.references Ros A, Cuesta B, Fernández-Pascual R, Gómez ME, Acacio ME, Robles A, García JM, Duato J (2012) Extending magny-cours cache coherence. IEEE Trans Comput (TC) 61(5):593–606 es_ES
dc.description.references Sanchez D, Kozyrakis C (2012) SCD: a scalable coherence directory with flexible sharer set encoding. In: Proceedings of 18th international sympoium on high-performance computer architecture (HPCA), pp 129–140 es_ES
dc.description.references Shah M, Barreh J, Brooks J et al (2007) UltraSPARC T2: a highly-threaded, power-efficient, SPARC SoC. In: Proceedings of IEEE Asian solid-state circuits conference, pp 22–25 es_ES
dc.description.references Sinharoy B, Kalla RN, Tendler JM, Eickemeyer RJ, Joyner JB (2005) Power5 system microarchitecture. IBM J Res Dev 49(4/5):505–521 es_ES
dc.description.references Tendler JM, Dodson JS, Fields JS, Le H, Sinharoy B (2002) POWER4 system microarchitecture. IBM J Res Dev 46(1):5–25 es_ES
dc.description.references Valero A, Sahuquillo J, Petit S, Lorente V, Canal R, López P, Duato J (2009) An hybrid eDRAM/SRAM macrocell to implement first-level data caches. In: Proceedings of 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 213–221 es_ES
dc.description.references Valls JJ, Ros A, Sahuquillo J, Gómez ME, Duato J (2012) PS-Dir: a scalable two-level directory cache. In: Proceedings of 21st international conference on parallel architectures and compilation techniques (PACT), pp 451–452 es_ES
dc.description.references Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of 22nd international symposium on computer architecture (ISCA), pp 24–36 es_ES
dc.description.references Wu X, Li J, Zhang L, Speight E, Rajamony R, Xie Y (2009) Hybrid cache architecture with disparate memory technologies. In: Proceedings of 36th international symposium on computer architecture (ISCA), pp 34–45 es_ES
dc.description.references Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: Proceedings of 46th IEEE/ACM international symposium on microarchitecture (MICRO), pp 359–370 es_ES
dc.description.references Zebchuk J, Srinivasan V, Qureshi MK, Moshovos A (2009) A tagless coherence directory. In: Proceedings of 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 423–434 es_ES
dc.description.references Zhao H, Shriraman A, Dwarkadas S, Srinivasan V (2011) SPATL: Honey, I shrunk the coherence directory. In: Proceedings of 20th international conference on parallel architectures and compilation techniques (PACT), pp 148–157 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem