- -

CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Martínez-Plumed, Fernando es_ES
dc.contributor.author Contreras-Ochando, Lidia es_ES
dc.contributor.author Ferri Ramírez, César es_ES
dc.contributor.author Hernández-Orallo, José es_ES
dc.contributor.author Kull, Meelis es_ES
dc.contributor.author Lachiche, Nicolas es_ES
dc.contributor.author Ramírez Quintana, María José es_ES
dc.contributor.author Flach, Peter es_ES
dc.date.accessioned 2022-07-25T18:06:31Z
dc.date.available 2022-07-25T18:06:31Z
dc.date.issued 2021-08-01 es_ES
dc.identifier.issn 1041-4347 es_ES
dc.identifier.uri http://hdl.handle.net/10251/184751
dc.description.abstract [EN] CRISP-DM (CRoss-Industry Standard Process for Data Mining) has its origins in the second half of the nineties and is thus about two decades old. According to many surveys and user polls it is still thede factostandard for developing data mining and knowledge discovery projects. However, undoubtedly the field has moved on considerably in twenty years, with data science now the leading term being favoured over data mining. In this paper we investigate whether, and in what contexts, CRISP-DM is still fit for purpose for data science projects. We argue that if the project is goal-directed and process-driven the process model view still largely holds. On the other hand, when data science projects become more exploratory the paths that the project can take become more varied, and a more flexible model is called for. We suggest what the outlines of such a trajectory-based model might look like and how it can be used to categorise data science projects (goal-directed, exploratory or data management). We examine seven real-life exemplars where exploratory activities play an important role and compare them against 51 use cases extracted from the NIST Big Data Public Working Group. We anticipate this categorisation can help project planning in terms of time and cost characteristics. es_ES
dc.description.sponsorship We thank the anonymous reviewers for their comments, which motivated the analysis in Section 5. This material is based upon work supported by the EU (FEDER), and the Spanish MINECO under Grant RTI2018-094403-B-C3, the Generalitat Valenciana PROMETEO/2019/098. F. MartinezPlumed was also supported by INCIBE (Ayudas para la excelencia de los equipos de investigacion avanzada en ciberseguridad), the European Commission (JRC) HUMAINT project (CT-EX2018D335821-101), and UPV (PAID-06-18). J. H-Orallo is also funded by an FLI grant RFP2-152. es_ES
dc.language Inglés es_ES
dc.publisher Institute of Electrical and Electronics Engineers es_ES
dc.relation.ispartof IEEE Transactions on Knowledge and Data Engineering es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Data science trajectories es_ES
dc.subject Data mining es_ES
dc.subject Knowledge discovery process es_ES
dc.subject Data-driven methodologies es_ES
dc.subject.classification CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1109/TKDE.2019.2962680 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094403-B-C32/ES/RAZONAMIENTO FORMAL PARA TECNOLOGIAS FACILITADORAS Y EMERGENTES/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/FLI//RFP2-152/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC//CT-EX2018D335821-101/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/UPV//PAID-06-18/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/INST NAL DE CIBERSEGURIDAD DE ESPAÑA, S.A. //INCIBEI-2015-27345//AYUDAS PARA LA EXCELENCIA DE EQUIPOS DE INVESTIGACION AVANZADA EN CIBERSEGURIDAD-CONTRATACION POSTDOCTORAL-MARTINEZ PLUMED/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//PROMETEO%2F2019%2F098//DEEPTRUST/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Martínez-Plumed, F.; Contreras-Ochando, L.; Ferri Ramírez, C.; Hernández-Orallo, J.; Kull, M.; Lachiche, N.; Ramírez Quintana, MJ.... (2021). CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories. IEEE Transactions on Knowledge and Data Engineering. 33(8):3048-3061. https://doi.org/10.1109/TKDE.2019.2962680 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1109/TKDE.2019.2962680 es_ES
dc.description.upvformatpinicio 3048 es_ES
dc.description.upvformatpfin 3061 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 33 es_ES
dc.description.issue 8 es_ES
dc.relation.pasarela S\401451 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder GENERALITAT VALENCIANA es_ES
dc.contributor.funder Future of Life Institute es_ES
dc.contributor.funder AGENCIA ESTATAL DE INVESTIGACION es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
dc.contributor.funder INST NAL DE CIBERSEGURIDAD DE ESPAÑA, S.A. es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem