- -

Prototype/topic based Clustering Method for Weblogs

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Prototype/topic based Clustering Method for Weblogs

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Perez-Tellez, Fernando es_ES
dc.contributor.author Cardiff, John es_ES
dc.contributor.author Rosso, Paolo es_ES
dc.contributor.author Pinto Avendaño, David Eduardo es_ES
dc.date.accessioned 2017-06-07T08:13:30Z
dc.date.available 2017-06-07T08:13:30Z
dc.date.issued 2016
dc.identifier.issn 1088-467X
dc.identifier.uri http://hdl.handle.net/10251/82492
dc.description.abstract [EN] In the last 10 years, the information generated on weblog sites has increased exponentially, resulting in a clear need for intelligent approaches to analyse and organise this massive amount of information. In this work, we present a methodology to cluster weblog posts according to the topics discussed therein, which we derive by text analysis. We have called the methodology Prototype/Topic Based Clustering, an approach which is based on a generative probabilistic model in conjunction with a Self-Term Expansion methodology. The usage of the Self-Term Expansion methodology is to improve the representation of the data and the generative probabilistic model is employed to identify relevant topics discussed in the weblogs. We have modified the generative probabilistic model in order to exploit predefined initialisations of the model and have performed our experiments in narrow and wide domain subsets. The results of our approach have demonstrated a considerable improvement over the pre-defined baseline and alternative state of the art approaches, achieving an improvement of up to 20% in many cases. The experiments were performed on both narrow and wide domain datasets, with the latter showing better improvement. However in both cases, our results outperformed the baseline and state of the art algorithms. es_ES
dc.description.sponsorship The work of the third author was carried out in the framework of the WIQ-EI IRSES project (Grant No. 269180) within the FP7 Marie Curie, the DIANA APPLICATIONS Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems. en_EN
dc.language Inglés es_ES
dc.publisher IOS Press es_ES
dc.relation.ispartof Intelligent Data Analysis es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Short text analysis es_ES
dc.subject Weblog clustering es_ES
dc.subject Topic Identification es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Prototype/topic based Clustering Method for Weblogs es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.3233/IDA-150793
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02-01/ES/DIANA-APPLICATIONS: FINDING HIDDEN KNOWLEDGE IN TEXTS: APPLICATIONS/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Perez-Tellez, F.; Cardiff, J.; Rosso, P.; Pinto Avendaño, DE. (2016). Prototype/topic based Clustering Method for Weblogs. Intelligent Data Analysis. 20(1):47-65. https://doi.org/10.3233/IDA-150793 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://dx.doi.org/10.3233/IDA-150793 es_ES
dc.description.upvformatpinicio 47 es_ES
dc.description.upvformatpfin 65 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 20 es_ES
dc.description.issue 1 es_ES
dc.relation.senia 326674 es_ES
dc.contributor.funder European Commission
dc.contributor.funder Ministerio de Economía y Competitividad es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem