- -

Identifying Drone Web Sites in Multiple Countries and Languages with a Single Model

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Identifying Drone Web Sites in Multiple Countries and Languages with a Single Model

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Daas, Piet es_ES
dc.contributor.author de-Miguel-Molina, Blanca es_ES
dc.contributor.author de-Miguel-Molina, María es_ES
dc.date.accessioned 2023-02-28T19:00:55Z
dc.date.available 2023-02-28T19:00:55Z
dc.date.issued 2023-01-26 es_ES
dc.identifier.uri http://hdl.handle.net/10251/192168
dc.description.abstract [EN] A text-based, bag-of-words, model was developed to identify drone company websites for multiple European countries in different languages. A collection of Spanish drone and non-drone websites was used for initial model development. Various classification methods were compared. Supervised logistic regression (L2-norm) performed best with an accuracy of 87% on the unseen test set. The accuracy of the later model improved to 88% when it was trained on texts in which all Spanish words were translated into English. Retraining the model on texts in which all typical Spanish words, such as names of cities and regions, and words indicative for specific periods in time, such as the months of the year and days of the week, were removed did not affect the overall performance of the model and made it more generally applicable. Applying the cleaned, completely English word-based, model to a collection of Irish and Italian drone and non-drone websites revealed, after manual inspection, that it was able to detect drone websites in those countries with an accuracy of 82 and 86%, respectively. The classification of Italian texts required the creation of a translation list in which all 1560 English word-based features in the model were translated to their Italian analogs. Because the model had a very high recall, 93, 100, and 97% on Spanish, Irish and Italian drone websites respectively, it was particularly well suited to select potential drone websites in large collections of websites. es_ES
dc.description.sponsorship This research was performed as part of the study Web intelligence for measuring emerging economic trends: the drone industry led by GOPA under the framework contract on Methodological Support (Ref. 2018.0086) for Eurostat. es_ES
dc.language Inglés es_ES
dc.publisher Renmin University of China es_ES
dc.relation.ispartof Journal of Data Science es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Bag of words es_ES
dc.subject Classification model es_ES
dc.subject Multiple languages es_ES
dc.subject Text es_ES
dc.subject.classification ORGANIZACION DE EMPRESAS es_ES
dc.title Identifying Drone Web Sites in Multiple Countries and Languages with a Single Model es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.6339/23-JDS1087 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/Eurostat//2018.0086//Web Intelligence for Measuring Emerging Economic Trends: the Drone Industry/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Facultad de Administración y Dirección de Empresas - Facultat d'Administració i Direcció d'Empreses es_ES
dc.description.bibliographicCitation Daas, P.; De-Miguel-Molina, B.; De-Miguel-Molina, M. (2023). Identifying Drone Web Sites in Multiple Countries and Languages with a Single Model. Journal of Data Science. 1-14. https://doi.org/10.6339/23-JDS1087 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.6339/23-JDS1087 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 14 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.identifier.eissn 1683-8602 es_ES
dc.relation.pasarela S\482679 es_ES
dc.contributor.funder Eurostat es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem