Prototype/topic based Clustering Method for Weblogs

Perez-Tellez, Fernando; Cardiff, John; Rosso, Paolo; Pinto Avendaño, David Eduardo

doi:10.3233/IDA-150793

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Prototype/topic based Clustering Method for Weblogs

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: IDA-autor.pdf

Tamaño: 1.169Mb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: IDA-editor-autor.pdf

Tamaño: 1.363Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Perez-Tellez, Fernando	es_ES
dc.contributor.author	Cardiff, John	es_ES
dc.contributor.author	Rosso, Paolo	es_ES
dc.contributor.author	Pinto Avendaño, David Eduardo	es_ES
dc.date.accessioned	2017-06-07T08:13:30Z
dc.date.available	2017-06-07T08:13:30Z
dc.date.issued	2016
dc.identifier.issn	1088-467X
dc.identifier.uri	http://hdl.handle.net/10251/82492
dc.description.abstract	[EN] In the last 10 years, the information generated on weblog sites has increased exponentially, resulting in a clear need for intelligent approaches to analyse and organise this massive amount of information. In this work, we present a methodology to cluster weblog posts according to the topics discussed therein, which we derive by text analysis. We have called the methodology Prototype/Topic Based Clustering, an approach which is based on a generative probabilistic model in conjunction with a Self-Term Expansion methodology. The usage of the Self-Term Expansion methodology is to improve the representation of the data and the generative probabilistic model is employed to identify relevant topics discussed in the weblogs. We have modified the generative probabilistic model in order to exploit predefined initialisations of the model and have performed our experiments in narrow and wide domain subsets. The results of our approach have demonstrated a considerable improvement over the pre-defined baseline and alternative state of the art approaches, achieving an improvement of up to 20% in many cases. The experiments were performed on both narrow and wide domain datasets, with the latter showing better improvement. However in both cases, our results outperformed the baseline and state of the art algorithms.	es_ES
dc.description.sponsorship	The work of the third author was carried out in the framework of the WIQ-EI IRSES project (Grant No. 269180) within the FP7 Marie Curie, the DIANA APPLICATIONS Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.	en_EN
dc.language	Inglés	es_ES
dc.publisher	IOS Press	es_ES
dc.relation.ispartof	Intelligent Data Analysis	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Short text analysis	es_ES
dc.subject	Weblog clustering	es_ES
dc.subject	Topic Identification	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Prototype/topic based Clustering Method for Weblogs	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.3233/IDA-150793
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02-01/ES/DIANA-APPLICATIONS: FINDING HIDDEN KNOWLEDGE IN TEXTS: APPLICATIONS/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.description.bibliographicCitation	Perez-Tellez, F.; Cardiff, J.; Rosso, P.; Pinto Avendaño, DE. (2016). Prototype/topic based Clustering Method for Weblogs. Intelligent Data Analysis. 20(1):47-65. https://doi.org/10.3233/IDA-150793	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://dx.doi.org/10.3233/IDA-150793	es_ES
dc.description.upvformatpinicio	47	es_ES
dc.description.upvformatpfin	65	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	20	es_ES
dc.description.issue	1	es_ES
dc.relation.senia	326674	es_ES
dc.contributor.funder	European Commission
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Prototype/topic based Clustering Method for Weblogs

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Prototype/topic based Clustering Method for Weblogs

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)