- -

Weblog and short text feature extraction and impact on categorisation

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

  • Estadisticas de Uso

Weblog and short text feature extraction and impact on categorisation

Show simple item record

Files in this item

dc.contributor.author Perez Tellez, Fernando es_ES
dc.contributor.author Cardiff, John es_ES
dc.contributor.author Rosso, Paolo es_ES
dc.contributor.author Pinto Avendaño, David Eduardo es_ES
dc.date.accessioned 2015-04-28T14:19:48Z
dc.date.available 2015-04-28T14:19:48Z
dc.date.issued 2014
dc.identifier.issn 1064-1246
dc.identifier.uri http://hdl.handle.net/10251/49400
dc.description.abstract The characterisation and categorisation of weblogs and other short texts has become an important research theme in the areas of topic/trend detection, and pattern recognition, amongst others. The value of analysing and characterising short text is to understand and identify the features that can identify and distinguish them, thereby improving input to the classification process. In this research work, we analyse a large number of text features and establish which combinations are useful to discriminate between the different genres of short text. Having identified the most promising features, we then confirm our findings by performing the categorisation task using three approaches: the Gaussian and SVM classifiers and the K-means clustering algorithm. Several hundred combinations of features were analysed in order to identify the best combinations and the results confirmed the observations made. The novel aspect of our work is the detection of the best combination of individual metrics which are identified as potential features to be used for the categorisation process. es_ES
dc.description.sponsorship The research work of the third author is partially funded by the WIQ-EI (IRSES grant n. 269180) and DIANA APPLICATIONS (TIN2012-38603-C02-01), and done in the framework of the VLC/Campus Microcluster on Multimodal Interaction in Intelligent Systems. en_EN
dc.language Inglés es_ES
dc.publisher IOS Press es_ES
dc.relation.ispartof Journal of Intelligent and Fuzzy Systems es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Short text characterisation es_ES
dc.subject Feature extraction es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Weblog and short text feature extraction and impact on categorisation es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.3233/IFS-141227
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02-01/ES/DIANA-APPLICATIONS: FINDING HIDDEN KNOWLEDGE IN TEXTS: APPLICATIONS/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.contributor.affiliation Universitat Politècnica de València. Centro de Investigación Pattern Recognition and Human Language Technology es_ES
dc.description.bibliographicCitation Perez Tellez, F.; Cardiff, J.; Rosso, P.; Pinto Avendaño, DE. (2014). Weblog and short text feature extraction and impact on categorisation. Journal of Intelligent and Fuzzy Systems. 27(5):2529-2544. https://doi.org/10.3233/IFS-141227 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://dx.doi.org/10.3233/IFS-141227 es_ES
dc.description.upvformatpinicio 2529 es_ES
dc.description.upvformatpfin 2544 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 27 es_ES
dc.description.issue 5 es_ES
dc.relation.senia 285923
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.contributor.funder European Commission


This item appears in the following Collection(s)

Show simple item record