- -

Leveraging an open source serverless framework for high energy physics computing

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Leveraging an open source serverless framework for high energy physics computing

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Padulano, Vincenzo Eduardo es_ES
dc.contributor.author Oliver Cortés, Pablo es_ES
dc.contributor.author Alonso-Jordá, Pedro es_ES
dc.contributor.author Tejedor Saavedra, Enric es_ES
dc.contributor.author Risco, Sebastián es_ES
dc.contributor.author Moltó, Germán es_ES
dc.date.accessioned 2023-05-04T18:01:48Z
dc.date.available 2023-05-04T18:01:48Z
dc.date.issued 2023-05 es_ES
dc.identifier.issn 0920-8542 es_ES
dc.identifier.uri http://hdl.handle.net/10251/193126
dc.description.abstract [EN] CERN (Centre Europeen pour la Recherce Nucleaire) is the largest research centre for high energy physics (HEP). It ofers unique computational challenges as a result of the large amount of data generated by the large hadron collider. CERN has developed and supports a software called ROOT, which is the de facto standard for HEP data analysis. This framework ofers a high-level and easy-to-use interface called RDataFrame, which allows managing and processing large data sets. In recent years, its functionality has been extended to take advantage of distributed computing capabilities. Thanks to its declarative programming model, the user-facing API can be decoupled from the actual execution backend. This decoupling allows physical analysis to scale automatically to thousands of computational cores over various types of distributed resources. In fact, the distributed RDataFrame module already supports the use of established general industry engines such as Apache Spark or Dask. Notwithstanding the foregoing, these current solutions will not be sufcient to meet future requirements in terms of the amount of data that the new projected accelerators will generate. It is of interest, for this reason, to investigate a diferent approach, the one ofered by serverless computing. Based on a frst prototype using AWS Lambda, this work presents the creation of a new backend for RDataFrame distributed over the OSCAR tool, an open source framework that supports serverless computing. The implementation introduces new ways, relative to the AWS Lambdabased prototype, to synchronize the work of functions. es_ES
dc.description.sponsorship Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the research projects PID2020-113656RB-C22 (MCIN/AEI/10.13039/ 501100011033). Also, Grant PID2020-113126RB-I00 funded by MCIN/AEI/10.13039/501100011033 and Project PDC2021-120844-I00 funded by MCIN/AEI/10.13039/501100011033 funded by the European Union NextGenerationEU/PRTR all support the research on the OSCAR software tool. es_ES
dc.language Inglés es_ES
dc.publisher Springer-Verlag es_ES
dc.relation.ispartof The Journal of Supercomputing es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject CERN es_ES
dc.subject ROOT es_ES
dc.subject OSCAR es_ES
dc.subject Serverless computing es_ES
dc.subject AWS Lambda es_ES
dc.subject.classification CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL es_ES
dc.title Leveraging an open source serverless framework for high energy physics computing es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s11227-022-05016-y es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113126RB-I00/ES/COMPUTACION CIENTIFICA SERVERLESS A TRAVES DEL HIBRIDO CONTINUO CLOUD/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AGENCIA ESTATAL DE INVESTIGACION//PDC2021-120844-I00//COMPUTACION ABIERTA SIN SERVIDOR PARA LA ADOPCION DE INNOVACION RAPIDA EN RECURSOS SEGUROS PREPARADOS PARA LA EMPRESA/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113656RB-C22/ES/COMPUTACION Y COMUNICACIONES DE ALTAS PRESTACIONES CONSCIENTES DEL CONSUMO ENERGETICO. APLICACIONES AL APRENDIZAJE PROFUNDO COMPUTACIONAL - UPV/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Padulano, VE.; Oliver Cortés, P.; Alonso-Jordá, P.; Tejedor Saavedra, E.; Risco, S.; Moltó, G. (2023). Leveraging an open source serverless framework for high energy physics computing. The Journal of Supercomputing. 79:8940-8965. https://doi.org/10.1007/s11227-022-05016-y es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1007/s11227-022-05016-y es_ES
dc.description.upvformatpinicio 8940 es_ES
dc.description.upvformatpfin 8965 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 79 es_ES
dc.relation.pasarela S\480228 es_ES
dc.contributor.funder AGENCIA ESTATAL DE INVESTIGACION es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.contributor.funder Consejo Superior de Investigaciones Científicas es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
dc.description.references Albrecht J, Alves AA, Amadio G et al (2019) A roadmap for HEP software and computing R &D for the 2020s. Comput Softw Big Sci 3(1):7. https://doi.org/10.1007/s41781-018-0018-8 es_ES
dc.description.references Alvarruiz F, de Alfonso C, Caballer M, et al (2012) An energy manager for high performance computer clusters. In: 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, p 231–238. https://doi.org/10.1109/ISPA.2012.38 es_ES
dc.description.references Amazon Web Services (2022a) Lambda. https://aws.amazon.com/releasenotes/release-aws-lambda-on-2014-11-13. Accessed 4 Dec 2022 es_ES
dc.description.references Amazon Web Services (2022b) Organizing objects in the Amazon S3 console using folders. https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html. Accessed 4 Dec 2022 es_ES
dc.description.references Amazon Web Services (2022c) S3: Simple Storage Service. https://aws.amazon.com/s3. Accessed 4 Dec 2022 es_ES
dc.description.references Apache Software Foundation (2022) OpenWhisk. https://openwhisk.apache.org/. Accessed 4 Dec 2022 es_ES
dc.description.references Apollinari G, Béjar Alonso I, Brüning O et al (2017) High-luminosity large hadron collider (HL-LHC): technical design report V.0.1. Tech Rep CERN. https://doi.org/10.23731/CYRM-2017-004 es_ES
dc.description.references Beswick J (2022) Using Amazon EFS for AWS Lambda in your serverless applications. https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/. Accessed 4 Dec 2022 es_ES
dc.description.references Bila N, Dettori P, Kanso A, et al (2017) Leveraging the serverless architecture for securing linux containers. In: 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW), p 401–404. https://doi.org/10.1109/ICDCSW.2017.66 es_ES
dc.description.references Bird I, Buncic P, Carminati F, et al (2014) Update of the computing models of the WLCG and the LHC experiments. Tech Rep CERN. https://cds.cern.ch/record/1695401 es_ES
dc.description.references Blomer J, Buncic P, Fuhrmann T (2011) CernVM-FS: delivering scientific software to globally distributed computing resources. In: Proceedings of the First International Workshop on Network-aware Data Management. Association for Computing Machinery, New York, p 49-56. https://doi.org/10.1145/2110217.2110225 es_ES
dc.description.references Blomer J, Ganis G, Mosciatti S et al (2019) Towards a serverless CernVM-FS. EPJ Web Conf 214(09):007. https://doi.org/10.1051/epjconf/201921409007 es_ES
dc.description.references Brun R, Rademakers F (1997) ROOT-an object oriented data analysis framework. Nuclear instruments and methods in physics research section A: accelerators, spectrometers, detectors and associated equipment. New Comput Tech Phys Res V 389(1):81–86. https://doi.org/10.1016/S0168-9002(97)00048-X es_ES
dc.description.references Caballer M, de Alfonso C, Alvarruiz F et al (2013) EC3: elastic cloud computing cluster. J Comput Syst Sci 79(8):1341–1351. https://doi.org/10.1016/j.jcss.2013.06.005 es_ES
dc.description.references Caballer M, Blanquer I, Moltó G et al (2015) Dynamic management of virtual infrastructures. J Grid Comput 13(1):53–70. https://doi.org/10.1007/s10723-014-9296-5 es_ES
dc.description.references Carver B, Zhang J, Wang A, et al (2020) Wukong: a scalable and locality-enhanced framework for serverless parallel computing. In: Proceedings of the 11th ACM Symposium on Cloud Computing. Association for Computing Machinery, New York, p 1–15. https://doi.org/10.1145/3419111.3421286 es_ES
dc.description.references Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: OSDI’04: Sixth Symposium on Operating System Design and Implementation. San Francisco, CA, p 137–150 es_ES
dc.description.references Dorigo A, Elmer P, Furano F et al (2005) XROOTD—a highly scalable architecture for data access. WSEAS Trans Comput 4:348–353 es_ES
dc.description.references Giménez-Alventosa V, Moltó G, Caballer M (2019) A framework and a performance assessment for serverless MapReduce on AWS Lambda. Future Gener Comput Syst 97:259–274. https://doi.org/10.1016/j.future.2019.02.057 es_ES
dc.description.references Google (2022) Cloud Functions. https://cloud.google.com/functions. Accessed 4 Dec 2022 es_ES
dc.description.references Grzesik P, Augustyn DR, Wyciślik L et al (2021) Serverless computing in omics data analysis and integration. Brief Bioinform. https://doi.org/10.1093/bib/bbab349 es_ES
dc.description.references Harris CR, Millman KJ, van der Walt SJ et al (2020) Array programming with NumPy. Nature 585(7825):357–362. https://doi.org/10.1038/s41586-020-2649-2 es_ES
dc.description.references HEPix (2017) Hepix benchmarking working group. https://w3.hepix.org/benchmarking.html. Accessed 4 Dec 2022 es_ES
dc.description.references Jonas E, Pu Q, Venkataraman S, et al (2017) Occupy the cloud: distributed computing for the 99%. In: Proceedings of the 2017 Symposium on Cloud Computing. Association for Computing Machinery, New York, p 445-451. https://doi.org/10.1145/3127479.3128601 es_ES
dc.description.references Kuśnierz J, Padulano VE, Malawski M, et al (2022) A serverless engine for high energy physics distributed analysis. In: 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), p 575–584. https://doi.org/10.1109/CCGrid54584.2022.00067 es_ES
dc.description.references Lavrijsen WTLP, Dutta A (2016) High-performance python-C++ bindings with PyPy and Cling. In: PyHPC ’16. IEEE Press, p 27-35. http://wlav.web.cern.ch/wlav/Cppyy_LavrijsenDutta_PyHPC16.pdf es_ES
dc.description.references Le DN, Pal S, Pattnaik PK (2022) OpenFaaS. John Wiley & Sons, p 287–303. https://doi.org/10.1002/9781119682318.ch17 es_ES
dc.description.references Li Z, Guo L, Chen Q, et al (2022) Help rather than recycle: alleviating cold startup in serverless computing through inter-function container sharing. In: 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, p 69–84. https://www.usenix.org/conference/atc22/presentation/li-zijun-help es_ES
dc.description.references McKinney W (2010) Data structures for statistical computing in python. In: Stéfan van der Walt, Jarrod Millman (eds) Proceedings of the 9th Python in Science Conference, p 56–61. https://doi.org/10.25080/Majora-92bf1922-00a es_ES
dc.description.references Merkel D (2014) Docker: lightweight linux containers for consistent development and deployment. Linux J 2014(239):2 es_ES
dc.description.references MinIO (2022) White paper: high performance multi-cloud object storage. Tech Rep MinIO Inc., Palo Alto, CA. https://min.io/resources/docs/MinIO-High-Performance-Multi-Cloud-Object-Storage.pdf es_ES
dc.description.references Müller I, Marroquín R, Alonso G (2020) Lambada: interactive data analytics on cold data using serverless cloud infrastructure. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, New York, p 115–130. https://doi.org/10.1145/3318464.3389758 es_ES
dc.description.references Nguyen HD, Yang Z, Chien AA (2021) Motivating high performance serverless workloads. In: Proceedings of the 1st Workshop on High Performance Serverless Computing. Association for Computing Machinery, New York, p 25–32. https://doi.org/10.1145/3452413.3464786 es_ES
dc.description.references Oakes E, Yang L, Zhou D, et al (2018) SOCK: rapid task provisioning with serverless-optimized containers. In: 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, p 57–70. https://www.usenix.org/conference/atc18/presentation/oakes es_ES
dc.description.references ONEDATA (2022) https://onedata.org. Accessed 4 Dec 2022 es_ES
dc.description.references Padulano VE, Villanueva JC, Guiraud E et al (2020) Distributed data analysis with ROOT RDataFrame. EPJ Web Conf 245(03):009. https://doi.org/10.1051/epjconf/202024503009 es_ES
dc.description.references Pheatt C (2008) Intel®threading building blocks. J Comput Sci Coll 23(4):298 es_ES
dc.description.references Piparo D, Canal P, Guiraud E et al (2019) RDataFrame: easy parallel ROOT analysis at 100 threads. EPJ Web Conf 214(06):029. https://doi.org/10.1051/epjconf/201921406029 es_ES
dc.description.references Pérez A, Moltó G, Caballer M et al (2018) Serverless computing for container-based architectures. Future Gener Comput Syst 83:50–59. https://doi.org/10.1016/j.future.2018.01.022 es_ES
dc.description.references Pérez A, Risco S, Naranjo DM, et al (2019) On-premises serverless computing for event-driven data processing applications. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). https://doi.org/10.1109/CLOUD.2019.00073 es_ES
dc.description.references Rocklin M (2015) Dask: parallel computation with blocked algorithms and task scheduling. In: Huff K, Bergstra J (eds) Proceedings of the 14th Python in Science Conference. SciPy, online, p 130–136 es_ES
dc.description.references Serguei C et al (2008) The CMS experiment at the CERN LHC. JINST 3(S08):004. https://doi.org/10.1088/1748-0221/3/08/S08004 es_ES
dc.description.references Sexton-Kennedy E (2018) HEP software éevelopment in the next decade; the views of the HSF community. J Phys Conf Series 1085(022):006. https://doi.org/10.1088/1742-6596/1085/2/022006 es_ES
dc.description.references Shankar V, Krauth K, Vodrahalli K, et al (2020) Serverless linear algebra. In: Proceedings of the 11th ACM Symposium on Cloud Computing. Association for Computing Machinery, New York, p 281–295. https://doi.org/10.1145/3419111.3421287 es_ES
dc.description.references The Knative Authors (2022) Knative. https://knative.dev. Accessed 4 Dec 2022 es_ES
dc.description.references The Kubernetes Authors (2022) Kubernetes. https://kubernetes.io/. Accessed 4 Dec 2022 es_ES
dc.description.references Vassilev V, Canal P, Naumann A et al (2012) Cling–the new interactive interpreter for ROOT 6. J Phys Conf Series. https://doi.org/10.1088/1742-6596/396/5/052071 es_ES
dc.description.references WLCG (2022) Homepage. http://wlcg.web.cern.ch/. Accessed 4 Dec 2022 es_ES
dc.description.references Wunsch S (2019) Analysis of the di-muon spectrum using data from the CMS detector taken in 2012. https://doi.org/10.7483/OPENDATA.CMS.AAR1.4NZQ es_ES
dc.description.references Zaharia M, Chowdhury M, Franklin MJ, et al (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. USENIX Association, Boston, p 10. https://www.usenix.org/conference/hotcloud-10/spark-cluster-computing-working-sets es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem