- -

Automating Data Integration in Adaptive and Data-Intensive Information Systems

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

  • Estadisticas de Uso

Automating Data Integration in Adaptive and Data-Intensive Information Systems

Show simple item record

Files in this item

dc.contributor.author Galvão, João es_ES
dc.contributor.author León-Palacio, Ana es_ES
dc.contributor.author Costa, Carlos es_ES
dc.contributor.author Santos, Maribel Yasmina es_ES
dc.contributor.author Pastor López, Oscar es_ES
dc.date.accessioned 2022-01-07T07:40:23Z
dc.date.available 2022-01-07T07:40:23Z
dc.date.issued 2020-11-26 es_ES
dc.identifier.isbn 978-3-030-63395-0 es_ES
dc.identifier.issn 1865-1348 es_ES
dc.identifier.uri http://hdl.handle.net/10251/179348
dc.description.abstract [EN] Data acquisition is no longer a problem for organizations, as many efforts have been performed in automating data collection and storage, providing access to a wide amount of heterogeneous data sources that can be used to support the decision-making process. Nevertheless, those efforts were not extended to the context of data integration, as many data transformation and integration tasks such as entity and attribute matching remain highly manual. This is not suitable for complex and dynamic contexts where Information Systems must be adaptative enough to mitigate the difficulties derived from the frequent addition and removal of sources. This work proposes a method for the automatic inference of the appropriate data mapping of heterogeneous sources, supporting the data integration process by providing a semantic overview of the data sources, with quantitative measures of the confidence level. The proposed method includes both technical and domain knowledge and has been evaluated through the implementation of a prototype and its application in a particularly dynamic and complex domain where data integration remains an open problem, i.e., genomics. es_ES
dc.description.sponsorship This work has been supported by FCT Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2019, the Doctoral scholarship PD/BDE/135100/2017 and European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project nº 039479; Funding Reference: POCI-01-0247-FEDER-039479]. We also thank both the Spanish State Research Agency and the Generalitat Valenciana under the projects DataME TIN2016-80811-P, ACIF/2018/171, and PROMETEO/2018/176. Icons made by Freepik, from www.flaticon.com. es_ES
dc.language Inglés es_ES
dc.publisher Springer Nature es_ES
dc.relation.ispartof Information Systems. 17th European, Mediterranean, and Middle Eastern Conference, EMCIS 2020, Dubai, United Arab Emirates, November 25-26, 2020, Proceedings es_ES
dc.relation.ispartofseries Lecture Notes in Business Information Processing;402 es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Big Data es_ES
dc.subject Data integration es_ES
dc.subject Schema matching es_ES
dc.subject Similarity measures es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Automating Data Integration in Adaptive and Data-Intensive Information Systems es_ES
dc.type Comunicación en congreso es_ES
dc.type Artículo es_ES
dc.type Capítulo de libro es_ES
dc.identifier.doi 10.1007/978-3-030-63396-7_2 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/FCT//UID%2FCEC%2F00319%2F2019/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/FCT//PD%2FBDE%2F135100%2F2017/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/FEDER//POCI-01-0247-FEDER-039479/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//ACIF%2F2018%2F171//SOPORTE ONTOLOGICO Y TECNOLOGICO PARA EL DESARROLLO DE APLICACIONES BIG DATA/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement///TIN2016-80811-P//UN METODO DE PRODUCCION DE SOFTWARE DIRIGIDO POR MODELOS PARA EL DESARROLLO DE APLICACIONES BIG DATA/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement///PROMETEO%2F2018%2F176//GISPRO-GENOMIC INFORMATION SYSTEMS PRODUCTION/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Galvão, J.; León-Palacio, A.; Costa, C.; Santos, MY.; Pastor López, O. (2020). Automating Data Integration in Adaptive and Data-Intensive Information Systems. Springer Nature. 20-34. https://doi.org/10.1007/978-3-030-63396-7_2 es_ES
dc.description.accrualMethod S es_ES
dc.relation.conferencename 17th European, Mediterranean and Middle Eastern Conference on Information Systems (EMCIS 2020) es_ES
dc.relation.conferencedate Noviembre 25-26,2020 es_ES
dc.relation.conferenceplace Online es_ES
dc.relation.publisherversion https://doi.org/10.1007/978-3-030-63396-7_2 es_ES
dc.description.upvformatpinicio 20 es_ES
dc.description.upvformatpfin 34 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.pasarela S\423311 es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Fundação para a Ciência e a Tecnologia, Portugal es_ES
dc.description.references Krishnan, K.: Data Warehousing in the Age of Big Data. Newnes (2013) es_ES
dc.description.references Vaisman, A., Zimányi, E.: Data warehouses: next challenges. In: Aufaure, M.-A., Zimányi, E. (eds.) eBISS 2011. LNBIP, vol. 96, pp. 1–26. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27358-2_1 es_ES
dc.description.references Costa, C., Santos, M.Y.: Evaluating several design patterns and trends in big data warehousing systems. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 459–473. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_28 es_ES
dc.description.references Bellahsene, Z., Bonifati, A., Duchateau, F., Velegrakis, Y.: On Evaluating Schema Matching and mapping. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping, pp. 253–291. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-16518-4_9 es_ES
dc.description.references Santos, M.Y., Costa, C., Galvão, J., Andrade, C., Pastor, O., Marcén, A.C.: Enhancing big data warehousing for efficient, integrated and advanced analytics - visionary paper. In: Cappiello, C., Ruiz, M. (eds.) CAiSE Forum 2019. LNBIP, vol. 350, pp. 215–226. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-030-21297-1_19 es_ES
dc.description.references Bernstein, P.A., Madhavan, J., Rahm, E.: Generic schema matching. Ten Years Later. PVLDB 4, 695–701 (2011) es_ES
dc.description.references Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 49–58. Morgan Kaufmann Publishers Inc., San Francisco (2001) es_ES
dc.description.references Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y.: A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE 10, e0144059 (2015). https://doi.org/10.1371/journal.pone.0144059 es_ES
dc.description.references Xiao, C., Wang, W., Lin, X., Shang, H.: Top-k set similarity joins. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, pp. 916–927. IEEE Computer Society, Washington, DC (2009). https://doi.org/10.1109/ICDE.2009.111 es_ES
dc.description.references Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Phys. Doklady 10, 707 (1966) es_ES
dc.description.references Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz, Lausanne (1901) es_ES
dc.description.references Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage [microform]/William E. Winkler. Distributed by ERIC Clearinghouse, [Washington, D.C.] (1990) es_ES
dc.description.references Zhu, E., Nargesian, F., Pu, K.Q., Miller, R.J.: LSH ensemble: internet-scale domain search. Proc. VLDB Endow. 9, 1185–1196 (2016). https://doi.org/10.14778/2994509.2994534 es_ES
dc.description.references Banek, M., Vrdoljak, B., Tjoa, A.M.: Using ontologies for measuring semantic similarity in data warehouse schema matching process. In: 2007 9th International Conference on Telecommunications, pp. 227–234 (2007). https://doi.org/10.1109/CONTEL.2007.381876 es_ES
dc.description.references Deb Nath, R.P., Hose, K., Pedersen, T.B.: Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In: Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, pp. 15–24. ACM, New York (2015). https://doi.org/10.1145/2811222.2811229 es_ES
dc.description.references Abdellaoui, S., Nader, F.: Semantic data warehouse at the heart of competitive intelligence systems: design approach. In: 2015 6th International Conference on Information Systems and Economic Intelligence (SIIE), pp. 141–145 (2015). https://doi.org/10.1109/ISEI.2015.7358736 es_ES
dc.description.references El Hajjamy, O., Alaoui, L., Bahaj, M.: Semantic integration of heterogeneous classical data sources in ontological data warehouse. In: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications, pp. 36:1–36:8. ACM, New York (2018). https://doi.org/10.1145/3230905.3230929 es_ES
dc.description.references Maccioni, A., Torlone, R.: KAYAK: a framework for just-in-time data preparation in a data lake. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 474–489. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_29 es_ES
dc.description.references Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100. ACM, New York (2016). https://doi.org/10.1145/2882903.2899389 es_ES


This item appears in the following Collection(s)

Show simple item record