- -

A caching mechanism to exploit object store speed in High Energy Physics analysis

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

A caching mechanism to exploit object store speed in High Energy Physics analysis

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Padulano, Vincenzo Eduardo es_ES
dc.contributor.author Tejedor Saavedra, Enric es_ES
dc.contributor.author Alonso-Jordá, Pedro es_ES
dc.contributor.author López Gómez, Javier es_ES
dc.contributor.author Blomer, Jakob es_ES
dc.date.accessioned 2023-01-05T19:01:16Z
dc.date.available 2023-01-05T19:01:16Z
dc.date.issued 2022-10-14 es_ES
dc.identifier.issn 1386-7857 es_ES
dc.identifier.uri http://hdl.handle.net/10251/191061
dc.description.abstract [EN] Data analysis workflows in High Energy Physics (HEP) read data written in the ROOT columnar format. Such data has traditionally been stored in files that are often read via the network from remote storage facilities, which represents a performance penalty especially for data processing workflows that are I/O bound. To address that issue, this paper presents a new caching mechanism, implemented in the I/O subsystem of ROOT, which is independent of the storage backend used to write the dataset. Notably, it can be used to leverage the speed of high-bandwidth, low-latency object stores. The performance of this caching approach is evaluated by running a real physics analysis on an Intel DAOS cluster, both on a single node and distributed on multiple nodes. es_ES
dc.description.sponsorship This work benefited from the support of the CERN Strategic R&D Programme on Technologies for Future Experiments [1] and from grant PID2020-113656RB-C22 funded by Ministerio de Ciencia e Innovacion MCIN/AEI/10.13039/501100011033. The hardware used to perform the experimental evaluation involving DAOS (HPE Delphi cluster described in Sect. 5.2) was made available thanks to a collaboration agreement with Hewlett-Packard Enterprise (HPE) and Intel. User access to the Virgo cluster at the GSI institute was given for the purpose of running the benchmarks using the Lustre filesystem. es_ES
dc.language Inglés es_ES
dc.publisher Springer-Verlag es_ES
dc.relation.ispartof Cluster Computing es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject ROOT es_ES
dc.subject High Energy Physics es_ES
dc.subject Caching es_ES
dc.subject Object store es_ES
dc.subject DAOS es_ES
dc.subject.classification CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL es_ES
dc.title A caching mechanism to exploit object store speed in High Energy Physics analysis es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s10586-022-03757-2 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113656RB-C22/ES/COMPUTACION Y COMUNICACIONES DE ALTAS PRESTACIONES CONSCIENTES DEL CONSUMO ENERGETICO. APLICACIONES AL APRENDIZAJE PROFUNDO COMPUTACIONAL - UPV/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Padulano, VE.; Tejedor Saavedra, E.; Alonso-Jordá, P.; López Gómez, J.; Blomer, J. (2022). A caching mechanism to exploit object store speed in High Energy Physics analysis. Cluster Computing. 1-16. https://doi.org/10.1007/s10586-022-03757-2 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1007/s10586-022-03757-2 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 16 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.pasarela S\472704 es_ES
dc.contributor.funder AGENCIA ESTATAL DE INVESTIGACION es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
dc.description.references Aleksa, M., Blomer, J., Cure, B., et al.: Strategic R &D Programme on Technologies for Future Experiments. Tech. rep, CERN, Geneva (2018) es_ES
dc.description.references Altenmüller, K., Cebrián, S., Dafni, T., et al.: REST-for-Physics, a ROOT-based framework for event oriented data analysis and combined Monte Carlo response. Comput. Phys. Commun. 273(108), 281 (2022). https://doi.org/10.1016/j.cpc.2021.108281 es_ES
dc.description.references Amazon Amazon Simple Storage Service Documentation. https://docs.aws.amazon.com/s3/. Accessed 1 Feb 2022 (2021) es_ES
dc.description.references Andreozzi, S., Magnoni, L., Zappi, R.: Towards the integration of StoRM on Amazon Simple Storage Service (S3). J. Phys. 119(6), 062011 (2008). https://doi.org/10.1088/1742-6596/119/6/062011 es_ES
dc.description.references Apollinari, G., Béjar Alonso, I., Brüning, O. et al: High-luminosity large Hadron Collider (HL-LHC): Technical Design Report V. 0.1. Tech. rep., CERN, (2017) https://doi.org/10.23731/CYRM-2017-004 es_ES
dc.description.references Arsuaga-Ríos, M., Heikkilä, S.S., Duellmann, D., et al.: Using S3 cloud storage with ROOT and CvmFS. J. Phys. 664(2), 022001 (2015). https://doi.org/10.1088/1742-6596/664/2/022001 es_ES
dc.description.references Badino, P., Barring, O., Baud, J.P., et al: The Storage Resource Manager Interface Specification (v2.2). (2009) https://sdm.lbl.gov/srm-wg/doc/SRM.v2.2.html es_ES
dc.description.references Bevilacqua, G., Bi, H.Y., Hartanto, H.B., et al.: $$\bar{tt}\bar{bb}$$ at the LHC: on the size of corrections and b-jet definitions. J. High Energy Phys. 8, 1–37 (2021). https://doi.org/10.1007/JHEP08(2021)008 es_ES
dc.description.references Bird, I.: Computing for the Large Hadron Collider. Annu. Rev. Nucl. Particle Sci. 61(1), 99–118 (2011). https://doi.org/10.1146/annurev-nucl-102010-130059 es_ES
dc.description.references Birrittella, M.S., Debbage, M., Huggahalli, R., et al: Intel omni-path architecture: enabling scalable, high performance fabrics. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp 1–9 (2015) https://doi.org/10.1109/HOTI.2015.22 es_ES
dc.description.references Blomer, J., Canal, P., Naumann, A., et al: Evolution of the ROOT Tree I/O. In: 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019), (2020) https://doi.org/10.1051/epjconf/202024502030 es_ES
dc.description.references Braam, P.: The Lustre Storage Architecture. (2019) https://arxiv.org/abs/1903.01955 es_ES
dc.description.references Brun, R., Rademakers, F.: ROOT—an object oriented data analysis framework. Nucl. Instrum. Methods Phys. Res. Sect. A 389(1), 81–86 (1997). https://doi.org/10.1016/S0168-9002(97)00048-X es_ES
dc.description.references Calder, B., Wang, J., Ogus, A. et al.: Windows Azure Storage: a highly available cloud storage service with strong consistency. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, USA, SOSP ’11, pp. 143–157, (2011) https://doi.org/10.1145/2043556.2043571 es_ES
dc.description.references Carrier, J.: Disrupting high performance storage with intel DC persistent memory & DAOS. In: IXPUG 2019 Annual Conference at CERN. (2019) https://cds.cern.ch/record/2691951 es_ES
dc.description.references Charbonneau, A., Agarwal, A., Anderson, M., et al.: Data intensive high energy physics analysis in a distributed cloud. J. Phys. 341(012), 003 (2012). https://doi.org/10.1088/1742-6596/341/1/012003 es_ES
dc.description.references Dai, D., Chen, Y., Kimpe, D., et al.: Provenance-based prediction scheme for object storage system in HPC. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 550–551, (2014) https://doi.org/10.1109/CCGrid.2014.27 es_ES
dc.description.references DAOS developers (2022) Caching. https://docs.daos.io/v2.0/user/filesystem/#caching. Accessed 30 July 2022 es_ES
dc.description.references Din, I.U., Hassan, S., Almogren, A., et al.: PUC: packet update caching for energy efficient IoT-based information-centric networking. Future Gener. Comput. Syst. 111, 634–643 (2020). https://doi.org/10.1016/j.future.2019.11.022 es_ES
dc.description.references Dorigo, A., Elmer, P., Furano, F., et al.: XROOTD—a highly scalable architecture for data access. WSEAS Trans. Comput. 4, 348–353 (2005) es_ES
dc.description.references Elsen, E.: A roadmap for HEP software and computing R &D for the 2020s. Comput. Softw. Big Sci. (2019). https://doi.org/10.1007/s41781-019-0031-6 es_ES
dc.description.references Hanushevsky, A., Ito, H., Lassnig, M., et al.: Xcache in the atlas distributed computing environment. EPJ Web Conf. 214, 04008 (2019). https://doi.org/10.1051/epjconf/201921404008 es_ES
dc.description.references ISO Central Secretary (2014) Information technology—Procedures for the operation of object identifier registration authorities—Part 8: Generation of universally unique identifiers (UUIDs) and their use in object identifiers. Standard ISO/IEC 9834-8:2014, International Organization for Standardization, Geneva, CH, https://www.iso.org/standard/62795.html es_ES
dc.description.references Jette, M., Dunlap, C., Garlick, J. et al.: Slurm: simple linux utility for resource management. Tech. rep., LLNL, (2002) https://www.osti.gov/biblio/15002962 es_ES
dc.description.references Kang, G., Kong, D., Wang, L., et al.: OStoreBench: benchmarking distributed object storage systems using real-world application scenarios. In: Wolf, F., Gao, W. (eds.) Benchmarking, Measuring, and Optimizing, pp. 90–105. Springer International Publishing, Cham (2021) es_ES
dc.description.references LHCb Collaboration (2017) Matter antimatter differences (b meson decays to three hadrons)—project notebook. http://opendata.cern.ch/record/4902. Accessed 1 Feb 2022 es_ES
dc.description.references Liang, Z., Lombardi, J., Chaarawi, M., et al.: DAOS: a scale-out high performance storage stack for storage class memory. In: Panda, D.K. (ed.) Supercomputing Frontiers, pp. 40–54. Springer International Publishing, Cham (2020) es_ES
dc.description.references Liu, J., Koziol, Q., Butler, G.F. et al.: Evaluation of HPC application I/O on object storage systems. In: 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage Data Intensive Scalable Computing Systems (PDSW-DISCS), pp. 24–34 (2018) https://doi.org/10.1109/PDSW-DISCS.2018.00005 es_ES
dc.description.references Lombardi, J.: DAOS: Nextgen Storage Stack for AI, Big Data and Exascale HPC. CERN openlab Technical Workshop. (2021) https://cds.cern.ch/record/2754116 es_ES
dc.description.references López-Gómez, J., Blomer, J.: Exploring object stores for high-energy physics data storage. EPJ Web Conf. 251(02), 066 (2021). https://doi.org/10.1051/epjconf/202125102066 es_ES
dc.description.references Matri, P., Alforov, Y., Brandon, A. et al.: Could blobs fuel storage-based convergence between HPC and big data? In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 81–86, (2017) https://doi.org/10.1109/CLUSTER.2017.63 es_ES
dc.description.references Mu, J., Soumagne, J., Tang, H. et al.: A transparent server-managed object storage system for HPC. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 477–481, (2018) https://doi.org/10.1109/CLUSTER.2018.00063 es_ES
dc.description.references Padulano, V.E., Cervantes Villanueva, J., Guiraud, E., et al.: Distributed data analysis with ROOT RDataFrame. EPJ Web Conf. 245(03), 009 (2020). https://doi.org/10.1051/epjconf/202024503009 es_ES
dc.description.references Padulano, V.E., Tejedor Saavedra, E., Alonso-Jordá, P.: Fine-grained data caching approaches to speedup a distributed RDataFrame analysis. EPJ Web Conf. 251(02), 027 (2021). https://doi.org/10.1051/epjconf/202125102027 es_ES
dc.description.references Panda, D.K., Sur, S.: InfiniBand. Springer, Boston, pp. 927–935. (2011) https://doi.org/10.1007/978-0-387-09766-4_21 es_ES
dc.description.references Piparo, D., Canal, P., Guiraud, E., et al.: RDataFrame: easy parallel ROOT analysis at 100 threads. EPJ Web Conf. 214(06), 029 (2019). https://doi.org/10.1051/epjconf/201921406029 es_ES
dc.description.references Plechschmidt, U.: Lustre expands its lead in the Top 100 supercomputers. https://community.hpe.com/t5/Advantage-EX/Lustre-expands-its-lead-in-the-Top-100-supercomputers/ba-p/7141807#.YukqZUhByXJ. Accessed 2 August 2022 (2021) es_ES
dc.description.references ROOT team (2021) RNTuple class reference guide. https://root.cern.ch/doc/master/structROOT_1_1Experimental_1_1RNTuple.html. Accessed 1 Feb 2022 es_ES
dc.description.references ROOT team (2021) TTree class reference guide. https://root.cern.ch/doc/master/classTTree.html. Accessed 1 Feb 2022 es_ES
dc.description.references Rupprecht, L., Zhang, R., Hildebrand, D.: Big data analytics on object stores : a performance study. In: The International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14) (2014) es_ES
dc.description.references Rupprecht, L., Zhang, R., Owen, B. et al.: SwiftAnalytics: optimizing object storage for big data analytics. In: 2017 IEEE International Conference on Cloud Engineering (IC2E), pp. 245–251. https://doi.org/10.1109/IC2E.2017.19 (2017) es_ES
dc.description.references Seiz, M., Offenhäuser, P., Andersson, S., et al.: Lustre I/O performance investigations on Hazel Hen: experiments and heuristics. J. Supercomput. 77, 12508–12536 (2021). https://doi.org/10.1007/s11227-021-03730-7 es_ES
dc.description.references Shin, H., Lee, K., Kwon, H.: A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark. J. Supercomput. 78, 2556–2579 (2022). https://doi.org/10.1007/s11227-021-03946-7 es_ES
dc.description.references Soumagne, J., Henderson, J., Chaarawi, M., et al.: Accelerating HDF5 I/O for exascale using DAOS. IEEE Trans. Parallel Distrib. Syst. 33(4), 903–914 (2022). https://doi.org/10.1109/TPDS.2021.3097884 es_ES
dc.description.references Spiga, D., Ciangottini, D., Tracolli, M., et al.: Smart caching at CMS: applying AI to XCache edge services. EPJ Web Conf. 245, 04024 (2020). https://doi.org/10.1051/epjconf/202024504024 es_ES
dc.description.references Tang, H., Byna, S., Tessier, F. et al.: Toward scalable and asynchronous object-centric data management for HPC. In: 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 113–122 (2018) https://doi.org/10.1109/CCGRID.2018.00026 es_ES
dc.description.references Tannenbaum, T., Wright, D., Miller, K., et al.: Condor—a distributed job scheduler. In: Sterling, T. (ed.) Beowulf Cluster Computing with Linux. MIT Press, New York (2001) es_ES
dc.description.references The ATLAS Collaboration, Aad, G., Abat, E., et al.: The ATLAS experiment at the CERN large Hadron Collider. J. Instrum. 3(08), S08003 (2008). https://doi.org/10.1088/1748-0221/3/08/s08003 es_ES
dc.description.references The LHCb collaboration: angular analysis of the rare decay $$B_s^0 \rightarrow \phi \mu ^+ \mu ^-$$. J. High Energy Phys. (2021). https://doi.org/10.1007/JHEP11(2021)043 es_ES
dc.description.references The LHCb Collaboration, Alves, A.A., Andrade, L.M., et al.: The LHCb Detector at the LHC. JINST 3, S08,005 (2008). https://doi.org/10.1088/1748-0221/3/08/S08005 , also published by CERN Geneva in 2010 es_ES
dc.description.references Vernik, G., Factor, M., Kolodner, E.K. et al.: Stocator: a high performance object store connector for spark. In: Proceedings of the 10th ACM International Systems and Storage Conference. Association for Computing Machinery, New York, NY, USA, SYSTOR ’17, (2017) https://doi.org/10.1145/3078468.3078496 es_ES
dc.description.references Vincenzo Eduardo Padulano: Test suite repository. (2021) https://github.com/vepadulano/rdf-rntuple-daos-tests. Accessed 1 Feb 2022 es_ES
dc.description.references Virgo Cluster: User Manual. (2022) https://hpc.gsi.de/virgo/preface.html. Accessed 2 Aug 2022 es_ES
dc.description.references Vohra, D.: Apache Parquet, Apress, Berkeley, CA, pp. 325–335. (2016) https://doi.org/10.1007/978-1-4842-2199-0_8 es_ES
dc.description.references Walker, C.J., Traynor, D.P., Martin, A.J.: Scalable Petascale storage for HEP using Lustre. J. Phys. 396(4), 042063 (2012). https://doi.org/10.1088/1742-6596/396/4/042063 es_ES
dc.description.references Zhong, J., Huang, R.S., Lee, S.C.: A program for the Bayesian Neural Network in the ROOT framework. Comput. Phys. Commun. 182(12), 2655–2660 (2011). https://doi.org/10.1016/j.cpc.2011.07.019 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem