- -

Model degradation in web derived text-based models

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Model degradation in web derived text-based models

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Daas, Piet es_ES
dc.contributor.author Jansen, Jelmer es_ES
dc.date.accessioned 2020-07-28T11:25:24Z
dc.date.available 2020-07-28T11:25:24Z
dc.date.issued 2020-05-08
dc.identifier.isbn 9788490488324
dc.identifier.uri http://hdl.handle.net/10251/148779
dc.description.abstract [EN] Getting an overview of the innovative companies in a country is a challenging task. Traditionally, this is done by sending a questionnaire to a sample of large companies. For this an alternative approach has been developed: determining if a company is innovative by studying the text on the main page of its website. The text-based model created is able to reproduce the results from the survey and is also able to detect small innovative companies, such as startups. However, model stability was found to be a serious problem. It suffered from model degradation which resulted in a gradual decrease in the detection of innovative companies. The accuracy of the model dropped from 93% to 63% over a period of one year. In this paper this phenomenon is described and the data underlying it is studied in great detail. It was found that the combination of the inactivity of a subset of websites and changes in the composition of the words on company websites over time produced this effect. A solution for dealing with this phenomenon is presented and future research is discussed. es_ES
dc.language Inglés es_ES
dc.publisher Editorial Universitat Politècnica de València es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Web data es_ES
dc.subject Internet data es_ES
dc.subject Big data es_ES
dc.subject Qca es_ES
dc.subject Pls es_ES
dc.subject Sem es_ES
dc.subject Conference es_ES
dc.subject Innovation es_ES
dc.subject Text analysis es_ES
dc.subject Webscraping es_ES
dc.title Model degradation in web derived text-based models es_ES
dc.type Capítulo de libro es_ES
dc.type Comunicación en congreso es_ES
dc.identifier.doi 10.4995/CARMA2020.2020.11560
dc.rights.accessRights Abierto es_ES
dc.description.bibliographicCitation Daas, P.; Jansen, J. (2020). Model degradation in web derived text-based models. Editorial Universitat Politècnica de València. 77-84. https://doi.org/10.4995/CARMA2020.2020.11560 es_ES
dc.description.accrualMethod OCS es_ES
dc.relation.conferencename CARMA 2020 - 3rd International Conference on Advanced Research Methods and Analytics es_ES
dc.relation.conferencedate Julio 08-09,2020 es_ES
dc.relation.conferenceplace Valencia, Spain es_ES
dc.relation.publisherversion http://ocs.editorial.upv.es/index.php/CARMA/CARMA2020/paper/view/11560 es_ES
dc.description.upvformatpinicio 77
dc.description.upvformatpfin 84 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.pasarela OCS\11560 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem