dc.contributor.author |
Perez Tellez, Fernando
|
es_ES |
dc.contributor.author |
Cardiff, John
|
es_ES |
dc.contributor.author |
Rosso, Paolo
|
es_ES |
dc.contributor.author |
Pinto Avendaño, David Eduardo
|
es_ES |
dc.date.accessioned |
2015-04-28T14:19:48Z |
|
dc.date.available |
2015-04-28T14:19:48Z |
|
dc.date.issued |
2014 |
|
dc.identifier.issn |
1064-1246 |
|
dc.identifier.uri |
http://hdl.handle.net/10251/49400 |
|
dc.description.abstract |
The characterisation and categorisation of weblogs and other short texts has become an important research theme in the
areas of topic/trend detection, and pattern recognition, amongst others. The value of analysing and characterising short text is to
understand and identify the features that can identify and distinguish them, thereby improving input to the classification process.
In this research work, we analyse a large number of text features and establish which combinations are useful to discriminate
between the different genres of short text. Having identified the most promising features, we then confirm our findings by
performing the categorisation task using three approaches: the Gaussian and SVM classifiers and the K-means clustering algorithm.
Several hundred combinations of features were analysed in order to identify the best combinations and the results confirmed the
observations made. The novel aspect of our work is the detection of the best combination of individual metrics which are identified
as potential features to be used for the categorisation process. |
es_ES |
dc.description.sponsorship |
The research work of the third author is partially funded by the WIQ-EI (IRSES grant n. 269180) and DIANA APPLICATIONS (TIN2012-38603-C02-01), and done in the framework of the VLC/Campus Microcluster on Multimodal Interaction in Intelligent Systems. |
en_EN |
dc.language |
Inglés |
es_ES |
dc.publisher |
IOS Press |
es_ES |
dc.relation.ispartof |
Journal of Intelligent and Fuzzy Systems |
es_ES |
dc.rights |
Reserva de todos los derechos |
es_ES |
dc.subject |
Short text characterisation |
es_ES |
dc.subject |
Feature extraction |
es_ES |
dc.subject.classification |
LENGUAJES Y SISTEMAS INFORMATICOS |
es_ES |
dc.title |
Weblog and short text feature extraction and impact on categorisation |
es_ES |
dc.type |
Artículo |
es_ES |
dc.identifier.doi |
10.3233/IFS-141227 |
|
dc.relation.projectID |
info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02-01/ES/DIANA-APPLICATIONS: FINDING HIDDEN KNOWLEDGE IN TEXTS: APPLICATIONS/ |
es_ES |
dc.relation.projectID |
info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/ |
|
dc.rights.accessRights |
Abierto |
es_ES |
dc.contributor.affiliation |
Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació |
es_ES |
dc.contributor.affiliation |
Universitat Politècnica de València. Centro de Investigación Pattern Recognition and Human Language Technology |
es_ES |
dc.description.bibliographicCitation |
Perez Tellez, F.; Cardiff, J.; Rosso, P.; Pinto Avendaño, DE. (2014). Weblog and short text feature extraction and impact on categorisation. Journal of Intelligent and Fuzzy Systems. 27(5):2529-2544. https://doi.org/10.3233/IFS-141227 |
es_ES |
dc.description.accrualMethod |
S |
es_ES |
dc.relation.publisherversion |
http://dx.doi.org/10.3233/IFS-141227 |
es_ES |
dc.description.upvformatpinicio |
2529 |
es_ES |
dc.description.upvformatpfin |
2544 |
es_ES |
dc.type.version |
info:eu-repo/semantics/publishedVersion |
es_ES |
dc.description.volume |
27 |
es_ES |
dc.description.issue |
5 |
es_ES |
dc.relation.senia |
285923 |
|
dc.contributor.funder |
Ministerio de Economía y Competitividad |
es_ES |
dc.contributor.funder |
European Commission |
|