Mostrar el registro sencillo del ítem
dc.contributor.author | Daas, Piet | es_ES |
dc.contributor.author | Jansen, Jelmer | es_ES |
dc.date.accessioned | 2020-07-28T11:25:24Z | |
dc.date.available | 2020-07-28T11:25:24Z | |
dc.date.issued | 2020-05-08 | |
dc.identifier.isbn | 9788490488324 | |
dc.identifier.uri | http://hdl.handle.net/10251/148779 | |
dc.description.abstract | [EN] Getting an overview of the innovative companies in a country is a challenging task. Traditionally, this is done by sending a questionnaire to a sample of large companies. For this an alternative approach has been developed: determining if a company is innovative by studying the text on the main page of its website. The text-based model created is able to reproduce the results from the survey and is also able to detect small innovative companies, such as startups. However, model stability was found to be a serious problem. It suffered from model degradation which resulted in a gradual decrease in the detection of innovative companies. The accuracy of the model dropped from 93% to 63% over a period of one year. In this paper this phenomenon is described and the data underlying it is studied in great detail. It was found that the combination of the inactivity of a subset of websites and changes in the composition of the words on company websites over time produced this effect. A solution for dealing with this phenomenon is presented and future research is discussed. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Editorial Universitat Politècnica de València | es_ES |
dc.rights | Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) | es_ES |
dc.subject | Web data | es_ES |
dc.subject | Internet data | es_ES |
dc.subject | Big data | es_ES |
dc.subject | Qca | es_ES |
dc.subject | Pls | es_ES |
dc.subject | Sem | es_ES |
dc.subject | Conference | es_ES |
dc.subject | Innovation | es_ES |
dc.subject | Text analysis | es_ES |
dc.subject | Webscraping | es_ES |
dc.title | Model degradation in web derived text-based models | es_ES |
dc.type | Capítulo de libro | es_ES |
dc.type | Comunicación en congreso | es_ES |
dc.identifier.doi | 10.4995/CARMA2020.2020.11560 | |
dc.rights.accessRights | Abierto | es_ES |
dc.description.bibliographicCitation | Daas, P.; Jansen, J. (2020). Model degradation in web derived text-based models. Editorial Universitat Politècnica de València. 77-84. https://doi.org/10.4995/CARMA2020.2020.11560 | es_ES |
dc.description.accrualMethod | OCS | es_ES |
dc.relation.conferencename | CARMA 2020 - 3rd International Conference on Advanced Research Methods and Analytics | es_ES |
dc.relation.conferencedate | Julio 08-09,2020 | es_ES |
dc.relation.conferenceplace | Valencia, Spain | es_ES |
dc.relation.publisherversion | http://ocs.editorial.upv.es/index.php/CARMA/CARMA2020/paper/view/11560 | es_ES |
dc.description.upvformatpinicio | 77 | |
dc.description.upvformatpfin | 84 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.relation.pasarela | OCS\11560 | es_ES |