- -

Setting Crunchbase for Data Science: Preprocessing, Data Integration and Feature Engineering

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

Setting Crunchbase for Data Science: Preprocessing, Data Integration and Feature Engineering

Show simple item record

Files in this item

dc.contributor.author Ferrati, Francesco es_ES
dc.contributor.author Muffatto, Moreno es_ES
dc.date.accessioned 2020-07-30T10:44:33Z
dc.date.available 2020-07-30T10:44:33Z
dc.date.issued 2020-07-02
dc.identifier.isbn 9788490488324
dc.identifier.uri http://hdl.handle.net/10251/148975
dc.description.abstract [EN] In order to support equity investors in their decision-making process, researchers are exploring the potential of machine learning algorithms to predict the financial success of startup ventures. In this context, a key role is played by the significance of the data used, which should reflect most of the variables considered by investors in their screening and evaluation activity. This paper provides a detailed description of the data management process that can be followed to obtain such a dataset. Using Crunchbase as the main data source, other databases have been integrated to enrich the information content and support the feature engineering process. Specifically, the following sources has been considered: USPTO PatentsView, Kauffman Indicators of Entrepreneurship, Academic Ranking of World Universities, CB Insights ranking of top-investors. The final dataset contains the profiles of 138,637 US-based ventures founded between 2000 and 2019. For each company the elements assessed by equity investors have been analyzed. Among others, the following specific areas were considered for each company: location, industry, founding team, intellectual property and funding round history. Data related to each area have been formalized in a series of features ready to be used in a machine learning context. es_ES
dc.language Inglés es_ES
dc.publisher Editorial Universitat Politècnica de València es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Web data es_ES
dc.subject Internet data es_ES
dc.subject Big data es_ES
dc.subject Qca es_ES
dc.subject Pls es_ES
dc.subject Sem es_ES
dc.subject Conference es_ES
dc.subject Crunchbase es_ES
dc.subject Startup es_ES
dc.subject Investments es_ES
dc.subject Feature engineering es_ES
dc.subject Data mining es_ES
dc.subject Machine learning es_ES
dc.title Setting Crunchbase for Data Science: Preprocessing, Data Integration and Feature Engineering es_ES
dc.type Capítulo de libro es_ES
dc.type Comunicación en congreso es_ES
dc.identifier.doi 10.4995/CARMA2020.2020.11633
dc.rights.accessRights Abierto es_ES
dc.description.bibliographicCitation Ferrati, F.; Muffatto, M. (2020). Setting Crunchbase for Data Science: Preprocessing, Data Integration and Feature Engineering. Editorial Universitat Politècnica de València. 221-229. https://doi.org/10.4995/CARMA2020.2020.11633 es_ES
dc.description.accrualMethod OCS es_ES
dc.relation.conferencename CARMA 2020 - 3rd International Conference on Advanced Research Methods and Analytics es_ES
dc.relation.conferencedate Julio 08-09,2020 es_ES
dc.relation.conferenceplace Valencia, Spain es_ES
dc.relation.publisherversion http://ocs.editorial.upv.es/index.php/CARMA/CARMA2020/paper/view/11633 es_ES
dc.description.upvformatpinicio 221 es_ES
dc.description.upvformatpfin 229 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.pasarela OCS\11633 es_ES


This item appears in the following Collection(s)

Show simple item record