Mostrar el registro sencillo del ítem
dc.contributor.author | Valls, Joan J | es_ES |
dc.contributor.author | Ros Bardisa, Alberto | es_ES |
dc.contributor.author | Sahuquillo Borrás, Julio | es_ES |
dc.contributor.author | Gómez Requena, María Engracia | es_ES |
dc.date.accessioned | 2016-05-17T14:08:48Z | |
dc.date.available | 2016-05-17T14:08:48Z | |
dc.date.issued | 2015-08 | |
dc.identifier.issn | 1573-0484 | |
dc.identifier.uri | http://hdl.handle.net/10251/64269 | |
dc.description | The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-014-1332-5 | es_ES |
dc.description.abstract | As the number of cores increases in current and future chip-multiprocessor (CMP) generations, coherence protocols must rely on novel hardware structures to scale in terms of performance, power, and area. Systems that use directory information for coherence purposes are currently the most scalable alternative. This paper studies the important differences between the directory behavior of private and shared blocks, which claim for a separate management of both types of blocks at the directory. We propose the PS directory, a two-level directory cache that keeps the reduced number of frequently accessed shared entries in a small and fast first-level cache, namely Shared cache, and uses a larger and slower second-level Private cache to track the large amount of private blocks. Entries in the Private cache do not implement the sharer vector, which allows important silicon area savings. Speed and area reasons suggest the use of eDRAM technology, much denser but slower than SRAM technology, for the Private cache, which in turn brings energy savings. Experimental results for a 16-core CMP show that, compared to a conventional directory, the PS directory improves performance by 14 % while reducing silicon area and energy consumption by 34 and 27 %, respectively. Also, compared to the state-of-the-art Multi-Grain Directory, the PS directory apart from increasing performance, it reduces power by 18.7 %, and provides more scalability in terms of area. | es_ES |
dc.description.sponsorship | This work has been jointly supported by the MINECO and European Commission (FEDER funds) under the project TIN2012-38341-C04-01 and the Fundacion Seneca-Agencia de Ciencia y Tecnologia de la Region de Murcia under the project Jovenes Lideres en Investigacion 18956/JLI/13. | en_EN |
dc.language | Inglés | es_ES |
dc.publisher | Springer Verlag (Germany) | es_ES |
dc.relation.ispartof | Journal of Supercomputing | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Chip-multiprocessors | es_ES |
dc.subject | Coherence | es_ES |
dc.subject | Sparse directories | es_ES |
dc.subject | Energy-aware | es_ES |
dc.subject | Area reduction | es_ES |
dc.subject | EDRAM | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | PS directory: a scalable multilevel directory cache for CMPs | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1007/s11227-014-1332-5 | |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2012-38341-C04-01/ES/MEJORA DE LA ARQUITECTURA DE SERVIDORES, SERVICIOS Y APLICACIONES/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/f SéNeCa//18956%2FJLI%2F13/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Informática de Sistemas y Computadores - Departament d'Informàtica de Sistemes i Computadors | es_ES |
dc.description.bibliographicCitation | Valls, JJ.; Ros Bardisa, A.; Sahuquillo Borrás, J.; Gómez Requena, ME. (2015). PS directory: a scalable multilevel directory cache for CMPs. Journal of Supercomputing. 71(8):2847-2876. https://doi.org/10.1007/s11227-014-1332-5 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | http://link.springer.com/article/10.1007%2Fs11227-014-1332-5 | es_ES |
dc.description.upvformatpinicio | 2847 | es_ES |
dc.description.upvformatpfin | 2876 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 71 | es_ES |
dc.description.issue | 8 | es_ES |
dc.relation.senia | 292477 | es_ES |
dc.identifier.eissn | 1573-0484 | |
dc.contributor.funder | Fundación Séneca-Agencia de Ciencia y Tecnología de la Región de Murcia | es_ES |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |
dc.description.references | Acacio ME, González J, García JM, Duato J (2001) A new scalable directory architecture for large-scale multiprocessors. In: Proceedings of 7th international symposium on high-performance computer architecture (HPCA), pp 97–106 | es_ES |
dc.description.references | Acacio ME, González J, García JM, Duato J (2005) A two-level directory architecture for highly scalable cc-NUMA multiprocessors. IEEE Trans Parallel Distrib Syst (TPDS) 16(1):67–79 | es_ES |
dc.description.references | Agarwal N, Krishna T, Peh L-S, Jha NK (2009) GARNET: a detailed on-chip network model inside a full-system simulator. In: Proceedings of IEEE international symposium on performance analysis of systems and software (ISPASS), pp 33–42 | es_ES |
dc.description.references | Barroso LA, Gharachorloo K, McNamara R et al (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: Proceedings of 27th international symposium on computer architecture (ISCA), pp 12–14 | es_ES |
dc.description.references | Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of 17th international conference on parallel architectures and compilation techniques (PACT), pp 72–81 | es_ES |
dc.description.references | Chaiken D, Kubiatowicz J, Agarwal A (1991) LimitLESS directories: a scalable cache coherence scheme. In: 4th international conference on architectural support for programming language and operating systems (ASPLOS), pp 224–234 | es_ES |
dc.description.references | Chen G (1993) Slid: a cost-effective and scalable limited-directory scheme for cache coherence. In: 5th international conference on parallel architectures and languages Europe (PARLE), pp 341–352 | es_ES |
dc.description.references | Conway P, Kalyanasundharam N, Donley G, Lepak K, Hughes B (2010) Cache hierarchy and memory subsystem of the AMD opteron processor. IEEE Micro 30(2):16–29 | es_ES |
dc.description.references | Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: Proceedings of 38th international symposium on computer architecture (ISCA), pp 93–103 | es_ES |
dc.description.references | Ferdman M, Lotfi-Kamran P, Balet K, Falsafi B (2011) Cuckoo directory: a scalable directory for many-core systems. In: 17th international symposium on high-performance computer architecture (HPCA), pp 169–180 | es_ES |
dc.description.references | Guo S-L, Wang H-X, Xue Y-B, Li C-M, Wang D-S (2010) Hierarchical cache directory for cmp. J Comput Sci Technol 25(2):246–256 | es_ES |
dc.description.references | Gupta A, Weber W-D, Mowry TC (1990) Reducing memory traffic requirements for scalable directory-based cache coherence schemes. In: Proceedings of international conference on parallel processing (ICPP), pp 312–321 | es_ES |
dc.description.references | Kalla R, Sinharoy B, Starke WJ, Floyd M (2010) POWER7: IBMs next-generation server processor. IEEE Micro 30(2):7–15 | es_ES |
dc.description.references | Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proceedings of 10th international conference on architectural support for programming language and operating systems (ASPLOS), pp 211–222 | es_ES |
dc.description.references | Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of ACM SIGPLAN conference on programming language design and implementation (PLDI), June 2005, pp 190–200 | es_ES |
dc.description.references | Magnusson PS, Christensson M, Eskilson J et al (2002) Simics: a full system simulation platform. IEEE Comput 35(2):50–58 | es_ES |
dc.description.references | Martin MM, Sorin DJ, Beckmann BM et al (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput Archit News 33(4):92–99 | es_ES |
dc.description.references | Marty MR, Hill MD (2007) Virtual hierarchies to support server consolidation. In: Proceedings of 34th international symposium on computer architecture (ISCA), pp 46–56 | es_ES |
dc.description.references | Marty MR, Hill MD (2008) Virtual hierarchies. IEEE Micro 28(1):99–109 | es_ES |
dc.description.references | Matick RE, Schuster SE (2005) Logic-based eDRAM: origins and rationale for use. IBM J Res Dev 49(1):145–165 | es_ES |
dc.description.references | Muralimanohar N, Balasubramonian R, Jouppi NP (2009) Cacti 6.0, HP Labs, technical report HPL-2009-85 | es_ES |
dc.description.references | O’Krafka BW, Newton AR (1990) An empirical evaluation of two memory-efficient directory methods. In: Proceedings of 17th international symposium on computer architecture (ISCA), pp 138–147 | es_ES |
dc.description.references | Ros A, Acacio ME, García JM (2010) A scalable organization for distributed directories. J Syst Archit (JSA) 56(2–3):77–87 | es_ES |
dc.description.references | Ros A, Cuesta B, Fernández-Pascual R, Gómez ME, Acacio ME, Robles A, García JM, Duato J (2012) Extending magny-cours cache coherence. IEEE Trans Comput (TC) 61(5):593–606 | es_ES |
dc.description.references | Sanchez D, Kozyrakis C (2012) SCD: a scalable coherence directory with flexible sharer set encoding. In: Proceedings of 18th international sympoium on high-performance computer architecture (HPCA), pp 129–140 | es_ES |
dc.description.references | Shah M, Barreh J, Brooks J et al (2007) UltraSPARC T2: a highly-threaded, power-efficient, SPARC SoC. In: Proceedings of IEEE Asian solid-state circuits conference, pp 22–25 | es_ES |
dc.description.references | Sinharoy B, Kalla RN, Tendler JM, Eickemeyer RJ, Joyner JB (2005) Power5 system microarchitecture. IBM J Res Dev 49(4/5):505–521 | es_ES |
dc.description.references | Tendler JM, Dodson JS, Fields JS, Le H, Sinharoy B (2002) POWER4 system microarchitecture. IBM J Res Dev 46(1):5–25 | es_ES |
dc.description.references | Valero A, Sahuquillo J, Petit S, Lorente V, Canal R, López P, Duato J (2009) An hybrid eDRAM/SRAM macrocell to implement first-level data caches. In: Proceedings of 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 213–221 | es_ES |
dc.description.references | Valls JJ, Ros A, Sahuquillo J, Gómez ME, Duato J (2012) PS-Dir: a scalable two-level directory cache. In: Proceedings of 21st international conference on parallel architectures and compilation techniques (PACT), pp 451–452 | es_ES |
dc.description.references | Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of 22nd international symposium on computer architecture (ISCA), pp 24–36 | es_ES |
dc.description.references | Wu X, Li J, Zhang L, Speight E, Rajamony R, Xie Y (2009) Hybrid cache architecture with disparate memory technologies. In: Proceedings of 36th international symposium on computer architecture (ISCA), pp 34–45 | es_ES |
dc.description.references | Zebchuk J, Falsafi B, Moshovos A (2013) Multi-grain coherence directories. In: Proceedings of 46th IEEE/ACM international symposium on microarchitecture (MICRO), pp 359–370 | es_ES |
dc.description.references | Zebchuk J, Srinivasan V, Qureshi MK, Moshovos A (2009) A tagless coherence directory. In: Proceedings of 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 423–434 | es_ES |
dc.description.references | Zhao H, Shriraman A, Dwarkadas S, Srinivasan V (2011) SPATL: Honey, I shrunk the coherence directory. In: Proceedings of 20th international conference on parallel architectures and compilation techniques (PACT), pp 148–157 | es_ES |