- -

Suggested Framework for Big Data Analysis of Enterprise Websites. A Case Study for Web Intelligence Network

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Suggested Framework for Big Data Analysis of Enterprise Websites. A Case Study for Web Intelligence Network

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Maślankowski, Jacek es_ES
dc.contributor.author Nowak, Dominika es_ES
dc.date.accessioned 2022-11-14T13:23:28Z
dc.date.available 2022-11-14T13:23:28Z
dc.date.issued 2022-09-20
dc.identifier.isbn 9788413960180
dc.identifier.uri http://hdl.handle.net/10251/189710
dc.description.abstract [EN] Big Data gives an opportunity for the researchers and scholars to make surveys in various domains. In this paper we will concentrate on websites as a data source which can be used to provide lots of valuable information for enterprise statistics. In this field, Big Data allows to get various information, including the type of the enterprise (e-commerce etc.), whether the enterprise is present in social media, the frequency of updating the website etc. The main goal of the paper is to present what Big Data methods are the most efficient in acquiring and processing the information from websites. The discussion shows different variants of conducting the work, based on the case studies conducted as experimental statistics at European Union level over the last 6 years. This paper is based on the experience in processing the data from websites in ESSnet grants on Big Data I (2016-2018), Big Data II (2018-2020) and Web Intelligence Network (2021-2025). The process of getting enterprise data from websites can be divided into the following steps: (1) Defining the population of enterprise websites; (2) Web scraping; (3) Data processing (extracting); (4) Data validation (de-duplication, quality indicators); (5) Data analysis; (6) Data dissemination. Each of the steps needs additional validation, especially the first step in this process have an impact on the final results that may not be comparable to the official statistical data. The essential part is also the way the data will be extracted to find the interesting data. In this sense, we need to choose between text mining methods, e.g. machine learning and regular expressions, that gives different results according to the information which should be provided. The paper shows how the use of appropriate methods can increase the overall value of the analysis. es_ES
dc.format.extent 1 es_ES
dc.language Inglés es_ES
dc.publisher Editorial Universitat Politècnica de València es_ES
dc.relation.ispartof 4th International Conference on Advanced Research Methods and Analytics (CARMA 2022)
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.title Suggested Framework for Big Data Analysis of Enterprise Websites. A Case Study for Web Intelligence Network es_ES
dc.type Capítulo de libro es_ES
dc.type Comunicación en congreso es_ES
dc.rights.accessRights Abierto es_ES
dc.description.bibliographicCitation Maślankowski, J.; Nowak, D. (2022). Suggested Framework for Big Data Analysis of Enterprise Websites. A Case Study for Web Intelligence Network. En 4th International Conference on Advanced Research Methods and Analytics (CARMA 2022). Editorial Universitat Politècnica de València. 270-270. http://hdl.handle.net/10251/189710 es_ES
dc.description.accrualMethod OCS es_ES
dc.relation.conferencename CARMA 2022 - 4th International Conference on Advanced Research Methods and Analytics es_ES
dc.relation.conferencedate Junio 29-Julio 01, 2022 es_ES
dc.relation.conferenceplace Valencia, España
dc.relation.publisherversion http://ocs.editorial.upv.es/index.php/CARMA/CARMA2022/paper/view/15777 es_ES
dc.description.upvformatpinicio 270 es_ES
dc.description.upvformatpfin 270 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.pasarela OCS\15777 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem