Mostrar el registro sencillo del ítem
dc.contributor.author | Padulano, Vincenzo Eduardo | es_ES |
dc.contributor.author | Oliver Cortés, Pablo | es_ES |
dc.contributor.author | Alonso-Jordá, Pedro | es_ES |
dc.contributor.author | Tejedor Saavedra, Enric | es_ES |
dc.contributor.author | Risco, Sebastián | es_ES |
dc.contributor.author | Moltó, Germán | es_ES |
dc.date.accessioned | 2023-05-04T18:01:48Z | |
dc.date.available | 2023-05-04T18:01:48Z | |
dc.date.issued | 2023-05 | es_ES |
dc.identifier.issn | 0920-8542 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/193126 | |
dc.description.abstract | [EN] CERN (Centre Europeen pour la Recherce Nucleaire) is the largest research centre for high energy physics (HEP). It ofers unique computational challenges as a result of the large amount of data generated by the large hadron collider. CERN has developed and supports a software called ROOT, which is the de facto standard for HEP data analysis. This framework ofers a high-level and easy-to-use interface called RDataFrame, which allows managing and processing large data sets. In recent years, its functionality has been extended to take advantage of distributed computing capabilities. Thanks to its declarative programming model, the user-facing API can be decoupled from the actual execution backend. This decoupling allows physical analysis to scale automatically to thousands of computational cores over various types of distributed resources. In fact, the distributed RDataFrame module already supports the use of established general industry engines such as Apache Spark or Dask. Notwithstanding the foregoing, these current solutions will not be sufcient to meet future requirements in terms of the amount of data that the new projected accelerators will generate. It is of interest, for this reason, to investigate a diferent approach, the one ofered by serverless computing. Based on a frst prototype using AWS Lambda, this work presents the creation of a new backend for RDataFrame distributed over the OSCAR tool, an open source framework that supports serverless computing. The implementation introduces new ways, relative to the AWS Lambdabased prototype, to synchronize the work of functions. | es_ES |
dc.description.sponsorship | Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was supported by the research projects PID2020-113656RB-C22 (MCIN/AEI/10.13039/ 501100011033). Also, Grant PID2020-113126RB-I00 funded by MCIN/AEI/10.13039/501100011033 and Project PDC2021-120844-I00 funded by MCIN/AEI/10.13039/501100011033 funded by the European Union NextGenerationEU/PRTR all support the research on the OSCAR software tool. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Springer-Verlag | es_ES |
dc.relation.ispartof | The Journal of Supercomputing | es_ES |
dc.rights | Reconocimiento (by) | es_ES |
dc.subject | CERN | es_ES |
dc.subject | ROOT | es_ES |
dc.subject | OSCAR | es_ES |
dc.subject | Serverless computing | es_ES |
dc.subject | AWS Lambda | es_ES |
dc.subject.classification | CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL | es_ES |
dc.title | Leveraging an open source serverless framework for high energy physics computing | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1007/s11227-022-05016-y | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113126RB-I00/ES/COMPUTACION CIENTIFICA SERVERLESS A TRAVES DEL HIBRIDO CONTINUO CLOUD/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AGENCIA ESTATAL DE INVESTIGACION//PDC2021-120844-I00//COMPUTACION ABIERTA SIN SERVIDOR PARA LA ADOPCION DE INNOVACION RAPIDA EN RECURSOS SEGUROS PREPARADOS PARA LA EMPRESA/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113656RB-C22/ES/COMPUTACION Y COMUNICACIONES DE ALTAS PRESTACIONES CONSCIENTES DEL CONSUMO ENERGETICO. APLICACIONES AL APRENDIZAJE PROFUNDO COMPUTACIONAL - UPV/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica | es_ES |
dc.description.bibliographicCitation | Padulano, VE.; Oliver Cortés, P.; Alonso-Jordá, P.; Tejedor Saavedra, E.; Risco, S.; Moltó, G. (2023). Leveraging an open source serverless framework for high energy physics computing. The Journal of Supercomputing. 79:8940-8965. https://doi.org/10.1007/s11227-022-05016-y | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1007/s11227-022-05016-y | es_ES |
dc.description.upvformatpinicio | 8940 | es_ES |
dc.description.upvformatpfin | 8965 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 79 | es_ES |
dc.relation.pasarela | S\480228 | es_ES |
dc.contributor.funder | AGENCIA ESTATAL DE INVESTIGACION | es_ES |
dc.contributor.funder | Agencia Estatal de Investigación | es_ES |
dc.contributor.funder | Consejo Superior de Investigaciones Científicas | es_ES |
dc.contributor.funder | Universitat Politècnica de València | es_ES |
dc.description.references | Albrecht J, Alves AA, Amadio G et al (2019) A roadmap for HEP software and computing R &D for the 2020s. Comput Softw Big Sci 3(1):7. https://doi.org/10.1007/s41781-018-0018-8 | es_ES |
dc.description.references | Alvarruiz F, de Alfonso C, Caballer M, et al (2012) An energy manager for high performance computer clusters. In: 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, p 231–238. https://doi.org/10.1109/ISPA.2012.38 | es_ES |
dc.description.references | Amazon Web Services (2022a) Lambda. https://aws.amazon.com/releasenotes/release-aws-lambda-on-2014-11-13. Accessed 4 Dec 2022 | es_ES |
dc.description.references | Amazon Web Services (2022b) Organizing objects in the Amazon S3 console using folders. https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html. Accessed 4 Dec 2022 | es_ES |
dc.description.references | Amazon Web Services (2022c) S3: Simple Storage Service. https://aws.amazon.com/s3. Accessed 4 Dec 2022 | es_ES |
dc.description.references | Apache Software Foundation (2022) OpenWhisk. https://openwhisk.apache.org/. Accessed 4 Dec 2022 | es_ES |
dc.description.references | Apollinari G, Béjar Alonso I, Brüning O et al (2017) High-luminosity large hadron collider (HL-LHC): technical design report V.0.1. Tech Rep CERN. https://doi.org/10.23731/CYRM-2017-004 | es_ES |
dc.description.references | Beswick J (2022) Using Amazon EFS for AWS Lambda in your serverless applications. https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/. Accessed 4 Dec 2022 | es_ES |
dc.description.references | Bila N, Dettori P, Kanso A, et al (2017) Leveraging the serverless architecture for securing linux containers. In: 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW), p 401–404. https://doi.org/10.1109/ICDCSW.2017.66 | es_ES |
dc.description.references | Bird I, Buncic P, Carminati F, et al (2014) Update of the computing models of the WLCG and the LHC experiments. Tech Rep CERN. https://cds.cern.ch/record/1695401 | es_ES |
dc.description.references | Blomer J, Buncic P, Fuhrmann T (2011) CernVM-FS: delivering scientific software to globally distributed computing resources. In: Proceedings of the First International Workshop on Network-aware Data Management. Association for Computing Machinery, New York, p 49-56. https://doi.org/10.1145/2110217.2110225 | es_ES |
dc.description.references | Blomer J, Ganis G, Mosciatti S et al (2019) Towards a serverless CernVM-FS. EPJ Web Conf 214(09):007. https://doi.org/10.1051/epjconf/201921409007 | es_ES |
dc.description.references | Brun R, Rademakers F (1997) ROOT-an object oriented data analysis framework. Nuclear instruments and methods in physics research section A: accelerators, spectrometers, detectors and associated equipment. New Comput Tech Phys Res V 389(1):81–86. https://doi.org/10.1016/S0168-9002(97)00048-X | es_ES |
dc.description.references | Caballer M, de Alfonso C, Alvarruiz F et al (2013) EC3: elastic cloud computing cluster. J Comput Syst Sci 79(8):1341–1351. https://doi.org/10.1016/j.jcss.2013.06.005 | es_ES |
dc.description.references | Caballer M, Blanquer I, Moltó G et al (2015) Dynamic management of virtual infrastructures. J Grid Comput 13(1):53–70. https://doi.org/10.1007/s10723-014-9296-5 | es_ES |
dc.description.references | Carver B, Zhang J, Wang A, et al (2020) Wukong: a scalable and locality-enhanced framework for serverless parallel computing. In: Proceedings of the 11th ACM Symposium on Cloud Computing. Association for Computing Machinery, New York, p 1–15. https://doi.org/10.1145/3419111.3421286 | es_ES |
dc.description.references | Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: OSDI’04: Sixth Symposium on Operating System Design and Implementation. San Francisco, CA, p 137–150 | es_ES |
dc.description.references | Dorigo A, Elmer P, Furano F et al (2005) XROOTD—a highly scalable architecture for data access. WSEAS Trans Comput 4:348–353 | es_ES |
dc.description.references | Giménez-Alventosa V, Moltó G, Caballer M (2019) A framework and a performance assessment for serverless MapReduce on AWS Lambda. Future Gener Comput Syst 97:259–274. https://doi.org/10.1016/j.future.2019.02.057 | es_ES |
dc.description.references | Google (2022) Cloud Functions. https://cloud.google.com/functions. Accessed 4 Dec 2022 | es_ES |
dc.description.references | Grzesik P, Augustyn DR, Wyciślik L et al (2021) Serverless computing in omics data analysis and integration. Brief Bioinform. https://doi.org/10.1093/bib/bbab349 | es_ES |
dc.description.references | Harris CR, Millman KJ, van der Walt SJ et al (2020) Array programming with NumPy. Nature 585(7825):357–362. https://doi.org/10.1038/s41586-020-2649-2 | es_ES |
dc.description.references | HEPix (2017) Hepix benchmarking working group. https://w3.hepix.org/benchmarking.html. Accessed 4 Dec 2022 | es_ES |
dc.description.references | Jonas E, Pu Q, Venkataraman S, et al (2017) Occupy the cloud: distributed computing for the 99%. In: Proceedings of the 2017 Symposium on Cloud Computing. Association for Computing Machinery, New York, p 445-451. https://doi.org/10.1145/3127479.3128601 | es_ES |
dc.description.references | Kuśnierz J, Padulano VE, Malawski M, et al (2022) A serverless engine for high energy physics distributed analysis. In: 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), p 575–584. https://doi.org/10.1109/CCGrid54584.2022.00067 | es_ES |
dc.description.references | Lavrijsen WTLP, Dutta A (2016) High-performance python-C++ bindings with PyPy and Cling. In: PyHPC ’16. IEEE Press, p 27-35. http://wlav.web.cern.ch/wlav/Cppyy_LavrijsenDutta_PyHPC16.pdf | es_ES |
dc.description.references | Le DN, Pal S, Pattnaik PK (2022) OpenFaaS. John Wiley & Sons, p 287–303. https://doi.org/10.1002/9781119682318.ch17 | es_ES |
dc.description.references | Li Z, Guo L, Chen Q, et al (2022) Help rather than recycle: alleviating cold startup in serverless computing through inter-function container sharing. In: 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, p 69–84. https://www.usenix.org/conference/atc22/presentation/li-zijun-help | es_ES |
dc.description.references | McKinney W (2010) Data structures for statistical computing in python. In: Stéfan van der Walt, Jarrod Millman (eds) Proceedings of the 9th Python in Science Conference, p 56–61. https://doi.org/10.25080/Majora-92bf1922-00a | es_ES |
dc.description.references | Merkel D (2014) Docker: lightweight linux containers for consistent development and deployment. Linux J 2014(239):2 | es_ES |
dc.description.references | MinIO (2022) White paper: high performance multi-cloud object storage. Tech Rep MinIO Inc., Palo Alto, CA. https://min.io/resources/docs/MinIO-High-Performance-Multi-Cloud-Object-Storage.pdf | es_ES |
dc.description.references | Müller I, Marroquín R, Alonso G (2020) Lambada: interactive data analytics on cold data using serverless cloud infrastructure. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, New York, p 115–130. https://doi.org/10.1145/3318464.3389758 | es_ES |
dc.description.references | Nguyen HD, Yang Z, Chien AA (2021) Motivating high performance serverless workloads. In: Proceedings of the 1st Workshop on High Performance Serverless Computing. Association for Computing Machinery, New York, p 25–32. https://doi.org/10.1145/3452413.3464786 | es_ES |
dc.description.references | Oakes E, Yang L, Zhou D, et al (2018) SOCK: rapid task provisioning with serverless-optimized containers. In: 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, p 57–70. https://www.usenix.org/conference/atc18/presentation/oakes | es_ES |
dc.description.references | ONEDATA (2022) https://onedata.org. Accessed 4 Dec 2022 | es_ES |
dc.description.references | Padulano VE, Villanueva JC, Guiraud E et al (2020) Distributed data analysis with ROOT RDataFrame. EPJ Web Conf 245(03):009. https://doi.org/10.1051/epjconf/202024503009 | es_ES |
dc.description.references | Pheatt C (2008) Intel®threading building blocks. J Comput Sci Coll 23(4):298 | es_ES |
dc.description.references | Piparo D, Canal P, Guiraud E et al (2019) RDataFrame: easy parallel ROOT analysis at 100 threads. EPJ Web Conf 214(06):029. https://doi.org/10.1051/epjconf/201921406029 | es_ES |
dc.description.references | Pérez A, Moltó G, Caballer M et al (2018) Serverless computing for container-based architectures. Future Gener Comput Syst 83:50–59. https://doi.org/10.1016/j.future.2018.01.022 | es_ES |
dc.description.references | Pérez A, Risco S, Naranjo DM, et al (2019) On-premises serverless computing for event-driven data processing applications. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). https://doi.org/10.1109/CLOUD.2019.00073 | es_ES |
dc.description.references | Rocklin M (2015) Dask: parallel computation with blocked algorithms and task scheduling. In: Huff K, Bergstra J (eds) Proceedings of the 14th Python in Science Conference. SciPy, online, p 130–136 | es_ES |
dc.description.references | Serguei C et al (2008) The CMS experiment at the CERN LHC. JINST 3(S08):004. https://doi.org/10.1088/1748-0221/3/08/S08004 | es_ES |
dc.description.references | Sexton-Kennedy E (2018) HEP software éevelopment in the next decade; the views of the HSF community. J Phys Conf Series 1085(022):006. https://doi.org/10.1088/1742-6596/1085/2/022006 | es_ES |
dc.description.references | Shankar V, Krauth K, Vodrahalli K, et al (2020) Serverless linear algebra. In: Proceedings of the 11th ACM Symposium on Cloud Computing. Association for Computing Machinery, New York, p 281–295. https://doi.org/10.1145/3419111.3421287 | es_ES |
dc.description.references | The Knative Authors (2022) Knative. https://knative.dev. Accessed 4 Dec 2022 | es_ES |
dc.description.references | The Kubernetes Authors (2022) Kubernetes. https://kubernetes.io/. Accessed 4 Dec 2022 | es_ES |
dc.description.references | Vassilev V, Canal P, Naumann A et al (2012) Cling–the new interactive interpreter for ROOT 6. J Phys Conf Series. https://doi.org/10.1088/1742-6596/396/5/052071 | es_ES |
dc.description.references | WLCG (2022) Homepage. http://wlcg.web.cern.ch/. Accessed 4 Dec 2022 | es_ES |
dc.description.references | Wunsch S (2019) Analysis of the di-muon spectrum using data from the CMS detector taken in 2012. https://doi.org/10.7483/OPENDATA.CMS.AAR1.4NZQ | es_ES |
dc.description.references | Zaharia M, Chowdhury M, Franklin MJ, et al (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. USENIX Association, Boston, p 10. https://www.usenix.org/conference/hotcloud-10/spark-cluster-computing-working-sets | es_ES |