Mostrar el registro sencillo del ítem
dc.contributor.author | Padulano, Vincenzo Eduardo | es_ES |
dc.contributor.author | Tejedor Saavedra, Enric | es_ES |
dc.contributor.author | Alonso-Jordá, Pedro | es_ES |
dc.contributor.author | López Gómez, Javier | es_ES |
dc.contributor.author | Blomer, Jakob | es_ES |
dc.date.accessioned | 2023-01-05T19:01:16Z | |
dc.date.available | 2023-01-05T19:01:16Z | |
dc.date.issued | 2022-10-14 | es_ES |
dc.identifier.issn | 1386-7857 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/191061 | |
dc.description.abstract | [EN] Data analysis workflows in High Energy Physics (HEP) read data written in the ROOT columnar format. Such data has traditionally been stored in files that are often read via the network from remote storage facilities, which represents a performance penalty especially for data processing workflows that are I/O bound. To address that issue, this paper presents a new caching mechanism, implemented in the I/O subsystem of ROOT, which is independent of the storage backend used to write the dataset. Notably, it can be used to leverage the speed of high-bandwidth, low-latency object stores. The performance of this caching approach is evaluated by running a real physics analysis on an Intel DAOS cluster, both on a single node and distributed on multiple nodes. | es_ES |
dc.description.sponsorship | This work benefited from the support of the CERN Strategic R&D Programme on Technologies for Future Experiments [1] and from grant PID2020-113656RB-C22 funded by Ministerio de Ciencia e Innovacion MCIN/AEI/10.13039/501100011033. The hardware used to perform the experimental evaluation involving DAOS (HPE Delphi cluster described in Sect. 5.2) was made available thanks to a collaboration agreement with Hewlett-Packard Enterprise (HPE) and Intel. User access to the Virgo cluster at the GSI institute was given for the purpose of running the benchmarks using the Lustre filesystem. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Springer-Verlag | es_ES |
dc.relation.ispartof | Cluster Computing | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | ROOT | es_ES |
dc.subject | High Energy Physics | es_ES |
dc.subject | Caching | es_ES |
dc.subject | Object store | es_ES |
dc.subject | DAOS | es_ES |
dc.subject.classification | CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL | es_ES |
dc.title | A caching mechanism to exploit object store speed in High Energy Physics analysis | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1007/s10586-022-03757-2 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113656RB-C22/ES/COMPUTACION Y COMUNICACIONES DE ALTAS PRESTACIONES CONSCIENTES DEL CONSUMO ENERGETICO. APLICACIONES AL APRENDIZAJE PROFUNDO COMPUTACIONAL - UPV/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica | es_ES |
dc.description.bibliographicCitation | Padulano, VE.; Tejedor Saavedra, E.; Alonso-Jordá, P.; López Gómez, J.; Blomer, J. (2022). A caching mechanism to exploit object store speed in High Energy Physics analysis. Cluster Computing. 1-16. https://doi.org/10.1007/s10586-022-03757-2 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1007/s10586-022-03757-2 | es_ES |
dc.description.upvformatpinicio | 1 | es_ES |
dc.description.upvformatpfin | 16 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.relation.pasarela | S\472704 | es_ES |
dc.contributor.funder | AGENCIA ESTATAL DE INVESTIGACION | es_ES |
dc.contributor.funder | Universitat Politècnica de València | es_ES |
dc.description.references | Aleksa, M., Blomer, J., Cure, B., et al.: Strategic R &D Programme on Technologies for Future Experiments. Tech. rep, CERN, Geneva (2018) | es_ES |
dc.description.references | Altenmüller, K., Cebrián, S., Dafni, T., et al.: REST-for-Physics, a ROOT-based framework for event oriented data analysis and combined Monte Carlo response. Comput. Phys. Commun. 273(108), 281 (2022). https://doi.org/10.1016/j.cpc.2021.108281 | es_ES |
dc.description.references | Amazon Amazon Simple Storage Service Documentation. https://docs.aws.amazon.com/s3/. Accessed 1 Feb 2022 (2021) | es_ES |
dc.description.references | Andreozzi, S., Magnoni, L., Zappi, R.: Towards the integration of StoRM on Amazon Simple Storage Service (S3). J. Phys. 119(6), 062011 (2008). https://doi.org/10.1088/1742-6596/119/6/062011 | es_ES |
dc.description.references | Apollinari, G., Béjar Alonso, I., Brüning, O. et al: High-luminosity large Hadron Collider (HL-LHC): Technical Design Report V. 0.1. Tech. rep., CERN, (2017) https://doi.org/10.23731/CYRM-2017-004 | es_ES |
dc.description.references | Arsuaga-Ríos, M., Heikkilä, S.S., Duellmann, D., et al.: Using S3 cloud storage with ROOT and CvmFS. J. Phys. 664(2), 022001 (2015). https://doi.org/10.1088/1742-6596/664/2/022001 | es_ES |
dc.description.references | Badino, P., Barring, O., Baud, J.P., et al: The Storage Resource Manager Interface Specification (v2.2). (2009) https://sdm.lbl.gov/srm-wg/doc/SRM.v2.2.html | es_ES |
dc.description.references | Bevilacqua, G., Bi, H.Y., Hartanto, H.B., et al.: $$\bar{tt}\bar{bb}$$ at the LHC: on the size of corrections and b-jet definitions. J. High Energy Phys. 8, 1–37 (2021). https://doi.org/10.1007/JHEP08(2021)008 | es_ES |
dc.description.references | Bird, I.: Computing for the Large Hadron Collider. Annu. Rev. Nucl. Particle Sci. 61(1), 99–118 (2011). https://doi.org/10.1146/annurev-nucl-102010-130059 | es_ES |
dc.description.references | Birrittella, M.S., Debbage, M., Huggahalli, R., et al: Intel omni-path architecture: enabling scalable, high performance fabrics. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp 1–9 (2015) https://doi.org/10.1109/HOTI.2015.22 | es_ES |
dc.description.references | Blomer, J., Canal, P., Naumann, A., et al: Evolution of the ROOT Tree I/O. In: 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019), (2020) https://doi.org/10.1051/epjconf/202024502030 | es_ES |
dc.description.references | Braam, P.: The Lustre Storage Architecture. (2019) https://arxiv.org/abs/1903.01955 | es_ES |
dc.description.references | Brun, R., Rademakers, F.: ROOT—an object oriented data analysis framework. Nucl. Instrum. Methods Phys. Res. Sect. A 389(1), 81–86 (1997). https://doi.org/10.1016/S0168-9002(97)00048-X | es_ES |
dc.description.references | Calder, B., Wang, J., Ogus, A. et al.: Windows Azure Storage: a highly available cloud storage service with strong consistency. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, USA, SOSP ’11, pp. 143–157, (2011) https://doi.org/10.1145/2043556.2043571 | es_ES |
dc.description.references | Carrier, J.: Disrupting high performance storage with intel DC persistent memory & DAOS. In: IXPUG 2019 Annual Conference at CERN. (2019) https://cds.cern.ch/record/2691951 | es_ES |
dc.description.references | Charbonneau, A., Agarwal, A., Anderson, M., et al.: Data intensive high energy physics analysis in a distributed cloud. J. Phys. 341(012), 003 (2012). https://doi.org/10.1088/1742-6596/341/1/012003 | es_ES |
dc.description.references | Dai, D., Chen, Y., Kimpe, D., et al.: Provenance-based prediction scheme for object storage system in HPC. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 550–551, (2014) https://doi.org/10.1109/CCGrid.2014.27 | es_ES |
dc.description.references | DAOS developers (2022) Caching. https://docs.daos.io/v2.0/user/filesystem/#caching. Accessed 30 July 2022 | es_ES |
dc.description.references | Din, I.U., Hassan, S., Almogren, A., et al.: PUC: packet update caching for energy efficient IoT-based information-centric networking. Future Gener. Comput. Syst. 111, 634–643 (2020). https://doi.org/10.1016/j.future.2019.11.022 | es_ES |
dc.description.references | Dorigo, A., Elmer, P., Furano, F., et al.: XROOTD—a highly scalable architecture for data access. WSEAS Trans. Comput. 4, 348–353 (2005) | es_ES |
dc.description.references | Elsen, E.: A roadmap for HEP software and computing R &D for the 2020s. Comput. Softw. Big Sci. (2019). https://doi.org/10.1007/s41781-019-0031-6 | es_ES |
dc.description.references | Hanushevsky, A., Ito, H., Lassnig, M., et al.: Xcache in the atlas distributed computing environment. EPJ Web Conf. 214, 04008 (2019). https://doi.org/10.1051/epjconf/201921404008 | es_ES |
dc.description.references | ISO Central Secretary (2014) Information technology—Procedures for the operation of object identifier registration authorities—Part 8: Generation of universally unique identifiers (UUIDs) and their use in object identifiers. Standard ISO/IEC 9834-8:2014, International Organization for Standardization, Geneva, CH, https://www.iso.org/standard/62795.html | es_ES |
dc.description.references | Jette, M., Dunlap, C., Garlick, J. et al.: Slurm: simple linux utility for resource management. Tech. rep., LLNL, (2002) https://www.osti.gov/biblio/15002962 | es_ES |
dc.description.references | Kang, G., Kong, D., Wang, L., et al.: OStoreBench: benchmarking distributed object storage systems using real-world application scenarios. In: Wolf, F., Gao, W. (eds.) Benchmarking, Measuring, and Optimizing, pp. 90–105. Springer International Publishing, Cham (2021) | es_ES |
dc.description.references | LHCb Collaboration (2017) Matter antimatter differences (b meson decays to three hadrons)—project notebook. http://opendata.cern.ch/record/4902. Accessed 1 Feb 2022 | es_ES |
dc.description.references | Liang, Z., Lombardi, J., Chaarawi, M., et al.: DAOS: a scale-out high performance storage stack for storage class memory. In: Panda, D.K. (ed.) Supercomputing Frontiers, pp. 40–54. Springer International Publishing, Cham (2020) | es_ES |
dc.description.references | Liu, J., Koziol, Q., Butler, G.F. et al.: Evaluation of HPC application I/O on object storage systems. In: 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage Data Intensive Scalable Computing Systems (PDSW-DISCS), pp. 24–34 (2018) https://doi.org/10.1109/PDSW-DISCS.2018.00005 | es_ES |
dc.description.references | Lombardi, J.: DAOS: Nextgen Storage Stack for AI, Big Data and Exascale HPC. CERN openlab Technical Workshop. (2021) https://cds.cern.ch/record/2754116 | es_ES |
dc.description.references | López-Gómez, J., Blomer, J.: Exploring object stores for high-energy physics data storage. EPJ Web Conf. 251(02), 066 (2021). https://doi.org/10.1051/epjconf/202125102066 | es_ES |
dc.description.references | Matri, P., Alforov, Y., Brandon, A. et al.: Could blobs fuel storage-based convergence between HPC and big data? In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 81–86, (2017) https://doi.org/10.1109/CLUSTER.2017.63 | es_ES |
dc.description.references | Mu, J., Soumagne, J., Tang, H. et al.: A transparent server-managed object storage system for HPC. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 477–481, (2018) https://doi.org/10.1109/CLUSTER.2018.00063 | es_ES |
dc.description.references | Padulano, V.E., Cervantes Villanueva, J., Guiraud, E., et al.: Distributed data analysis with ROOT RDataFrame. EPJ Web Conf. 245(03), 009 (2020). https://doi.org/10.1051/epjconf/202024503009 | es_ES |
dc.description.references | Padulano, V.E., Tejedor Saavedra, E., Alonso-Jordá, P.: Fine-grained data caching approaches to speedup a distributed RDataFrame analysis. EPJ Web Conf. 251(02), 027 (2021). https://doi.org/10.1051/epjconf/202125102027 | es_ES |
dc.description.references | Panda, D.K., Sur, S.: InfiniBand. Springer, Boston, pp. 927–935. (2011) https://doi.org/10.1007/978-0-387-09766-4_21 | es_ES |
dc.description.references | Piparo, D., Canal, P., Guiraud, E., et al.: RDataFrame: easy parallel ROOT analysis at 100 threads. EPJ Web Conf. 214(06), 029 (2019). https://doi.org/10.1051/epjconf/201921406029 | es_ES |
dc.description.references | Plechschmidt, U.: Lustre expands its lead in the Top 100 supercomputers. https://community.hpe.com/t5/Advantage-EX/Lustre-expands-its-lead-in-the-Top-100-supercomputers/ba-p/7141807#.YukqZUhByXJ. Accessed 2 August 2022 (2021) | es_ES |
dc.description.references | ROOT team (2021) RNTuple class reference guide. https://root.cern.ch/doc/master/structROOT_1_1Experimental_1_1RNTuple.html. Accessed 1 Feb 2022 | es_ES |
dc.description.references | ROOT team (2021) TTree class reference guide. https://root.cern.ch/doc/master/classTTree.html. Accessed 1 Feb 2022 | es_ES |
dc.description.references | Rupprecht, L., Zhang, R., Hildebrand, D.: Big data analytics on object stores : a performance study. In: The International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14) (2014) | es_ES |
dc.description.references | Rupprecht, L., Zhang, R., Owen, B. et al.: SwiftAnalytics: optimizing object storage for big data analytics. In: 2017 IEEE International Conference on Cloud Engineering (IC2E), pp. 245–251. https://doi.org/10.1109/IC2E.2017.19 (2017) | es_ES |
dc.description.references | Seiz, M., Offenhäuser, P., Andersson, S., et al.: Lustre I/O performance investigations on Hazel Hen: experiments and heuristics. J. Supercomput. 77, 12508–12536 (2021). https://doi.org/10.1007/s11227-021-03730-7 | es_ES |
dc.description.references | Shin, H., Lee, K., Kwon, H.: A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark. J. Supercomput. 78, 2556–2579 (2022). https://doi.org/10.1007/s11227-021-03946-7 | es_ES |
dc.description.references | Soumagne, J., Henderson, J., Chaarawi, M., et al.: Accelerating HDF5 I/O for exascale using DAOS. IEEE Trans. Parallel Distrib. Syst. 33(4), 903–914 (2022). https://doi.org/10.1109/TPDS.2021.3097884 | es_ES |
dc.description.references | Spiga, D., Ciangottini, D., Tracolli, M., et al.: Smart caching at CMS: applying AI to XCache edge services. EPJ Web Conf. 245, 04024 (2020). https://doi.org/10.1051/epjconf/202024504024 | es_ES |
dc.description.references | Tang, H., Byna, S., Tessier, F. et al.: Toward scalable and asynchronous object-centric data management for HPC. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 113–122 (2018) https://doi.org/10.1109/CCGRID.2018.00026 | es_ES |
dc.description.references | Tannenbaum, T., Wright, D., Miller, K., et al.: Condor—a distributed job scheduler. In: Sterling, T. (ed.) Beowulf Cluster Computing with Linux. MIT Press, New York (2001) | es_ES |
dc.description.references | The ATLAS Collaboration, Aad, G., Abat, E., et al.: The ATLAS experiment at the CERN large Hadron Collider. J. Instrum. 3(08), S08003 (2008). https://doi.org/10.1088/1748-0221/3/08/s08003 | es_ES |
dc.description.references | The LHCb collaboration: angular analysis of the rare decay $$B_s^0 \rightarrow \phi \mu ^+ \mu ^-$$. J. High Energy Phys. (2021). https://doi.org/10.1007/JHEP11(2021)043 | es_ES |
dc.description.references | The LHCb Collaboration, Alves, A.A., Andrade, L.M., et al.: The LHCb Detector at the LHC. JINST 3, S08,005 (2008). https://doi.org/10.1088/1748-0221/3/08/S08005 , also published by CERN Geneva in 2010 | es_ES |
dc.description.references | Vernik, G., Factor, M., Kolodner, E.K. et al.: Stocator: a high performance object store connector for spark. In: Proceedings of the 10th ACM International Systems and Storage Conference. Association for Computing Machinery, New York, NY, USA, SYSTOR ’17, (2017) https://doi.org/10.1145/3078468.3078496 | es_ES |
dc.description.references | Vincenzo Eduardo Padulano: Test suite repository. (2021) https://github.com/vepadulano/rdf-rntuple-daos-tests. Accessed 1 Feb 2022 | es_ES |
dc.description.references | Virgo Cluster: User Manual. (2022) https://hpc.gsi.de/virgo/preface.html. Accessed 2 Aug 2022 | es_ES |
dc.description.references | Vohra, D.: Apache Parquet, Apress, Berkeley, CA, pp. 325–335. (2016) https://doi.org/10.1007/978-1-4842-2199-0_8 | es_ES |
dc.description.references | Walker, C.J., Traynor, D.P., Martin, A.J.: Scalable Petascale storage for HEP using Lustre. J. Phys. 396(4), 042063 (2012). https://doi.org/10.1088/1742-6596/396/4/042063 | es_ES |
dc.description.references | Zhong, J., Huang, R.S., Lee, S.C.: A program for the Bayesian Neural Network in the ROOT framework. Comput. Phys. Commun. 182(12), 2655–2660 (2011). https://doi.org/10.1016/j.cpc.2011.07.019 | es_ES |