Martínez-Plumed, F.; Contreras-Ochando, L.; Ferri Ramírez, C.; Hernández-Orallo, J.; Kull, M.; Lachiche, N.; Ramírez Quintana, MJ.... (2021). CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories. IEEE Transactions on Knowledge and Data Engineering. 33(8):3048-3061. https://doi.org/10.1109/TKDE.2019.2962680
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/184751
Título:
|
CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories
|
Autor:
|
Martínez-Plumed, Fernando
Contreras-Ochando, Lidia
Ferri Ramírez, César
Hernández-Orallo, José
Kull, Meelis
Lachiche, Nicolas
Ramírez Quintana, María José
Flach, Peter
|
Entidad UPV:
|
Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
|
Fecha difusión:
|
|
Resumen:
|
[EN] CRISP-DM (CRoss-Industry Standard Process for Data Mining) has its origins in the second half of the nineties and is thus about two decades old. According to many surveys and user polls it is still thede factostandard ...[+]
[EN] CRISP-DM (CRoss-Industry Standard Process for Data Mining) has its origins in the second half of the nineties and is thus about two decades old. According to many surveys and user polls it is still thede factostandard for developing data mining and knowledge discovery projects. However, undoubtedly the field has moved on considerably in twenty years, with data science now the leading term being favoured over data mining. In this paper we investigate whether, and in what contexts, CRISP-DM is still fit for purpose for data science projects. We argue that if the project is goal-directed and process-driven the process model view still largely holds. On the other hand, when data science projects become more exploratory the paths that the project can take become more varied, and a more flexible model is called for. We suggest what the outlines of such a trajectory-based model might look like and how it can be used to categorise data science projects (goal-directed, exploratory or data management). We examine seven real-life exemplars where exploratory activities play an important role and compare them against 51 use cases extracted from the NIST Big Data Public Working Group. We anticipate this categorisation can help project planning in terms of time and cost characteristics.
[-]
|
Palabras clave:
|
Data science trajectories
,
Data mining
,
Knowledge discovery process
,
Data-driven methodologies
|
Derechos de uso:
|
Reserva de todos los derechos
|
Fuente:
|
IEEE Transactions on Knowledge and Data Engineering. (issn:
1041-4347
)
|
DOI:
|
10.1109/TKDE.2019.2962680
|
Editorial:
|
Institute of Electrical and Electronics Engineers
|
Versión del editor:
|
https://doi.org/10.1109/TKDE.2019.2962680
|
Código del Proyecto:
|
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094403-B-C32/ES/RAZONAMIENTO FORMAL PARA TECNOLOGIAS FACILITADORAS Y EMERGENTES/
...[+]
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094403-B-C32/ES/RAZONAMIENTO FORMAL PARA TECNOLOGIAS FACILITADORAS Y EMERGENTES/
info:eu-repo/grantAgreement/FLI//RFP2-152/
info:eu-repo/grantAgreement/EC//CT-EX2018D335821-101/
info:eu-repo/grantAgreement/UPV//PAID-06-18/
info:eu-repo/grantAgreement/INST NAL DE CIBERSEGURIDAD DE ESPAÑA, S.A. //INCIBEI-2015-27345//AYUDAS PARA LA EXCELENCIA DE EQUIPOS DE INVESTIGACION AVANZADA EN CIBERSEGURIDAD-CONTRATACION POSTDOCTORAL-MARTINEZ PLUMED/
info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//PROMETEO%2F2019%2F098//DEEPTRUST/
[-]
|
Agradecimientos:
|
We thank the anonymous reviewers for their comments, which motivated the analysis in Section 5. This material is based upon work supported by the EU (FEDER), and the Spanish MINECO under Grant RTI2018-094403-B-C3, the ...[+]
We thank the anonymous reviewers for their comments, which motivated the analysis in Section 5. This material is based upon work supported by the EU (FEDER), and the Spanish MINECO under Grant RTI2018-094403-B-C3, the Generalitat Valenciana PROMETEO/2019/098. F. MartinezPlumed was also supported by INCIBE (Ayudas para la excelencia de los equipos de investigacion avanzada en ciberseguridad), the European Commission (JRC) HUMAINT project (CT-EX2018D335821-101), and UPV (PAID-06-18). J. H-Orallo is also funded by an FLI grant RFP2-152.
[-]
|
Tipo:
|
Artículo
|