Weblog and short text feature extraction and impact on categorisation

Perez Tellez, Fernando; Cardiff, John; Rosso, Paolo; Pinto Avendaño, David Eduardo

doi:10.3233/IFS-141227

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Weblog and short text feature extraction and impact on categorisation

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: JIFS-version-autor.pdf

Tamaño: 343.6Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: fulltext.pdf

Tamaño: 184.7Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Perez Tellez, Fernando	es_ES
dc.contributor.author	Cardiff, John	es_ES
dc.contributor.author	Rosso, Paolo	es_ES
dc.contributor.author	Pinto Avendaño, David Eduardo	es_ES
dc.date.accessioned	2015-04-28T14:19:48Z
dc.date.available	2015-04-28T14:19:48Z
dc.date.issued	2014
dc.identifier.issn	1064-1246
dc.identifier.uri	http://hdl.handle.net/10251/49400
dc.description.abstract	The characterisation and categorisation of weblogs and other short texts has become an important research theme in the areas of topic/trend detection, and pattern recognition, amongst others. The value of analysing and characterising short text is to understand and identify the features that can identify and distinguish them, thereby improving input to the classification process. In this research work, we analyse a large number of text features and establish which combinations are useful to discriminate between the different genres of short text. Having identified the most promising features, we then confirm our findings by performing the categorisation task using three approaches: the Gaussian and SVM classifiers and the K-means clustering algorithm. Several hundred combinations of features were analysed in order to identify the best combinations and the results confirmed the observations made. The novel aspect of our work is the detection of the best combination of individual metrics which are identified as potential features to be used for the categorisation process.	es_ES
dc.description.sponsorship	The research work of the third author is partially funded by the WIQ-EI (IRSES grant n. 269180) and DIANA APPLICATIONS (TIN2012-38603-C02-01), and done in the framework of the VLC/Campus Microcluster on Multimodal Interaction in Intelligent Systems.	en_EN
dc.language	Inglés	es_ES
dc.publisher	IOS Press	es_ES
dc.relation.ispartof	Journal of Intelligent and Fuzzy Systems	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Short text characterisation	es_ES
dc.subject	Feature extraction	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Weblog and short text feature extraction and impact on categorisation	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.3233/IFS-141227
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02-01/ES/DIANA-APPLICATIONS: FINDING HIDDEN KNOWLEDGE IN TEXTS: APPLICATIONS/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Centro de Investigación Pattern Recognition and Human Language Technology	es_ES
dc.description.bibliographicCitation	Perez Tellez, F.; Cardiff, J.; Rosso, P.; Pinto Avendaño, DE. (2014). Weblog and short text feature extraction and impact on categorisation. Journal of Intelligent and Fuzzy Systems. 27(5):2529-2544. https://doi.org/10.3233/IFS-141227	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://dx.doi.org/10.3233/IFS-141227	es_ES
dc.description.upvformatpinicio	2529	es_ES
dc.description.upvformatpfin	2544	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	27	es_ES
dc.description.issue	5	es_ES
dc.relation.senia	285923
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.contributor.funder	European Commission

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [48360]

Mostrar el registro sencillo del ítem

Weblog and short text feature extraction and impact on categorisation

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Weblog and short text feature extraction and impact on categorisation

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)