dc.contributor.author |
Galvão, João
|
es_ES |
dc.contributor.author |
León-Palacio, Ana
|
es_ES |
dc.contributor.author |
Costa, Carlos
|
es_ES |
dc.contributor.author |
Santos, Maribel Yasmina
|
es_ES |
dc.contributor.author |
Pastor López, Oscar
|
es_ES |
dc.date.accessioned |
2022-01-07T07:40:23Z |
|
dc.date.available |
2022-01-07T07:40:23Z |
|
dc.date.issued |
2020-11-26 |
es_ES |
dc.identifier.isbn |
978-3-030-63395-0 |
es_ES |
dc.identifier.issn |
1865-1348 |
es_ES |
dc.identifier.uri |
http://hdl.handle.net/10251/179348 |
|
dc.description.abstract |
[EN] Data acquisition is no longer a problem for organizations, as many
efforts have been performed in automating data collection and storage, providing access to a wide amount of heterogeneous data sources that can be used to support the decision-making process. Nevertheless, those efforts were not extended to the context of data integration, as many data transformation and integration tasks such as entity and attribute matching remain highly manual.
This is not suitable for complex and dynamic contexts where Information Systems must be adaptative enough to mitigate the difficulties derived from the frequent addition and removal of sources. This work proposes a method for the automatic inference of the appropriate data mapping of heterogeneous sources, supporting the data integration process by providing a semantic overview of the
data sources, with quantitative measures of the confidence level. The proposed method includes both technical and domain knowledge and has been evaluated through the implementation of a prototype and its application in a particularly dynamic and complex domain where data integration remains an open problem, i.e., genomics. |
es_ES |
dc.description.sponsorship |
This work has been supported by FCT Fundação para a Ciência e
Tecnologia within the Project Scope: UID/CEC/00319/2019, the Doctoral scholarship
PD/BDE/135100/2017 and European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE
2020) [Project nº 039479; Funding Reference: POCI-01-0247-FEDER-039479]. We also thank
both the Spanish State Research Agency and the Generalitat Valenciana under the projects
DataME TIN2016-80811-P, ACIF/2018/171, and PROMETEO/2018/176. Icons made by
Freepik, from www.flaticon.com. |
es_ES |
dc.language |
Inglés |
es_ES |
dc.publisher |
Springer Nature |
es_ES |
dc.relation.ispartof |
Information Systems. 17th European, Mediterranean, and Middle Eastern Conference, EMCIS 2020, Dubai, United Arab Emirates, November 25-26, 2020, Proceedings |
es_ES |
dc.relation.ispartofseries |
Lecture Notes in Business Information Processing;402 |
es_ES |
dc.rights |
Reserva de todos los derechos |
es_ES |
dc.subject |
Big Data |
es_ES |
dc.subject |
Data integration |
es_ES |
dc.subject |
Schema matching |
es_ES |
dc.subject |
Similarity measures |
es_ES |
dc.subject.classification |
LENGUAJES Y SISTEMAS INFORMATICOS |
es_ES |
dc.title |
Automating Data Integration in Adaptive and Data-Intensive Information Systems |
es_ES |
dc.type |
Comunicación en congreso |
es_ES |
dc.type |
Artículo |
es_ES |
dc.type |
Capítulo de libro |
es_ES |
dc.identifier.doi |
10.1007/978-3-030-63396-7_2 |
es_ES |
dc.relation.projectID |
info:eu-repo/grantAgreement/FCT//UID%2FCEC%2F00319%2F2019/ |
es_ES |
dc.relation.projectID |
info:eu-repo/grantAgreement/FCT//PD%2FBDE%2F135100%2F2017/ |
es_ES |
dc.relation.projectID |
info:eu-repo/grantAgreement/FEDER//POCI-01-0247-FEDER-039479/ |
es_ES |
dc.relation.projectID |
info:eu-repo/grantAgreement/GVA//ACIF%2F2018%2F171//SOPORTE ONTOLOGICO Y TECNOLOGICO PARA EL DESARROLLO DE APLICACIONES BIG DATA/ |
es_ES |
dc.relation.projectID |
info:eu-repo/grantAgreement///TIN2016-80811-P//UN METODO DE PRODUCCION DE SOFTWARE DIRIGIDO POR MODELOS PARA EL DESARROLLO DE APLICACIONES BIG DATA/ |
es_ES |
dc.relation.projectID |
info:eu-repo/grantAgreement///PROMETEO%2F2018%2F176//GISPRO-GENOMIC INFORMATION SYSTEMS PRODUCTION/ |
es_ES |
dc.rights.accessRights |
Abierto |
es_ES |
dc.contributor.affiliation |
Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació |
es_ES |
dc.description.bibliographicCitation |
Galvão, J.; León-Palacio, A.; Costa, C.; Santos, MY.; Pastor López, O. (2020). Automating Data Integration in Adaptive and Data-Intensive Information Systems. Springer Nature. 20-34. https://doi.org/10.1007/978-3-030-63396-7_2 |
es_ES |
dc.description.accrualMethod |
S |
es_ES |
dc.relation.conferencename |
17th European, Mediterranean and Middle Eastern Conference on Information Systems (EMCIS 2020) |
es_ES |
dc.relation.conferencedate |
Noviembre 25-26,2020 |
es_ES |
dc.relation.conferenceplace |
Online |
es_ES |
dc.relation.publisherversion |
https://doi.org/10.1007/978-3-030-63396-7_2 |
es_ES |
dc.description.upvformatpinicio |
20 |
es_ES |
dc.description.upvformatpfin |
34 |
es_ES |
dc.type.version |
info:eu-repo/semantics/publishedVersion |
es_ES |
dc.relation.pasarela |
S\423311 |
es_ES |
dc.contributor.funder |
Generalitat Valenciana |
es_ES |
dc.contributor.funder |
European Regional Development Fund |
es_ES |
dc.contributor.funder |
Fundação para a Ciência e a Tecnologia, Portugal |
es_ES |
dc.description.references |
Krishnan, K.: Data Warehousing in the Age of Big Data. Newnes (2013) |
es_ES |
dc.description.references |
Vaisman, A., Zimányi, E.: Data warehouses: next challenges. In: Aufaure, M.-A., Zimányi, E. (eds.) eBISS 2011. LNBIP, vol. 96, pp. 1–26. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27358-2_1 |
es_ES |
dc.description.references |
Costa, C., Santos, M.Y.: Evaluating several design patterns and trends in big data warehousing systems. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 459–473. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_28 |
es_ES |
dc.description.references |
Bellahsene, Z., Bonifati, A., Duchateau, F., Velegrakis, Y.: On Evaluating Schema Matching and mapping. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping, pp. 253–291. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-16518-4_9 |
es_ES |
dc.description.references |
Santos, M.Y., Costa, C., Galvão, J., Andrade, C., Pastor, O., Marcén, A.C.: Enhancing big data warehousing for efficient, integrated and advanced analytics - visionary paper. In: Cappiello, C., Ruiz, M. (eds.) CAiSE Forum 2019. LNBIP, vol. 350, pp. 215–226. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-030-21297-1_19 |
es_ES |
dc.description.references |
Bernstein, P.A., Madhavan, J., Rahm, E.: Generic schema matching. Ten Years Later. PVLDB 4, 695–701 (2011) |
es_ES |
dc.description.references |
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 49–58. Morgan Kaufmann Publishers Inc., San Francisco (2001) |
es_ES |
dc.description.references |
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y.: A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE 10, e0144059 (2015). https://doi.org/10.1371/journal.pone.0144059 |
es_ES |
dc.description.references |
Xiao, C., Wang, W., Lin, X., Shang, H.: Top-k set similarity joins. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, pp. 916–927. IEEE Computer Society, Washington, DC (2009). https://doi.org/10.1109/ICDE.2009.111 |
es_ES |
dc.description.references |
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Phys. Doklady 10, 707 (1966) |
es_ES |
dc.description.references |
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz, Lausanne (1901) |
es_ES |
dc.description.references |
Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage [microform]/William E. Winkler. Distributed by ERIC Clearinghouse, [Washington, D.C.] (1990) |
es_ES |
dc.description.references |
Zhu, E., Nargesian, F., Pu, K.Q., Miller, R.J.: LSH ensemble: internet-scale domain search. Proc. VLDB Endow. 9, 1185–1196 (2016). https://doi.org/10.14778/2994509.2994534 |
es_ES |
dc.description.references |
Banek, M., Vrdoljak, B., Tjoa, A.M.: Using ontologies for measuring semantic similarity in data warehouse schema matching process. In: 2007 9th International Conference on Telecommunications, pp. 227–234 (2007). https://doi.org/10.1109/CONTEL.2007.381876 |
es_ES |
dc.description.references |
Deb Nath, R.P., Hose, K., Pedersen, T.B.: Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In: Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, pp. 15–24. ACM, New York (2015). https://doi.org/10.1145/2811222.2811229 |
es_ES |
dc.description.references |
Abdellaoui, S., Nader, F.: Semantic data warehouse at the heart of competitive intelligence systems: design approach. In: 2015 6th International Conference on Information Systems and Economic Intelligence (SIIE), pp. 141–145 (2015). https://doi.org/10.1109/ISEI.2015.7358736 |
es_ES |
dc.description.references |
El Hajjamy, O., Alaoui, L., Bahaj, M.: Semantic integration of heterogeneous classical data sources in ontological data warehouse. In: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications, pp. 36:1–36:8. ACM, New York (2018). https://doi.org/10.1145/3230905.3230929 |
es_ES |
dc.description.references |
Maccioni, A., Torlone, R.: KAYAK: a framework for just-in-time data preparation in a data lake. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 474–489. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_29 |
es_ES |
dc.description.references |
Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2097–2100. ACM, New York (2016). https://doi.org/10.1145/2882903.2899389 |
es_ES |