A Self-Enriching Methodology for Clustering Narrow Domain Short Texts

Pinto, David; Rosso, Paolo; Jiménez-Salazar, Héctor

doi:10.1093/comjnl/bxq069

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

A Self-Enriching Methodology for Clustering Narrow Domain Short Texts

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: artComputerJourna ...

Tamaño: 382.9Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Pinto, David	es_ES
dc.contributor.author	Rosso, Paolo	es_ES
dc.contributor.author	Jiménez-Salazar, Héctor	es_ES
dc.date.accessioned	2013-07-03T09:26:42Z
dc.date.issued	2011
dc.identifier.issn	0010-4620
dc.identifier.uri	http://hdl.handle.net/10251/30421
dc.description.abstract	Clustering narrow domain short texts is considered to be a complex task because of the intrinsic features of the corpus to be clustered: (i) the low frequencies of vocabulary terms in short texts, and (ii) the high vocabulary overlapping associated to narrow domains. The aim of this paper is to introduce a self-term expansion methodology for improving the performance of clustering methods when dealing with corpora of this kind. This methodology allows raw textual data to be enriched by adding co-related terms from an automatically constructed lexical knowledge resource obtained from the same target data set (and not from an external resource). We also propose a set of supervised and unsupervised text assessment measures for evaluating different corpus features, such as shortness, stylometry and domain broadness. With the help of these measures, we may determine beforehand whether or not to use the methodology proposed in this paper. Finally, we integrate all these assessment measures in a freely available web-based system named Watermarking Corpora On-line System, which may be used by computer scientists in order to evaluate the different features associated with a given textual corpus.	es_ES
dc.description.sponsorship	This work was supported by MICINN project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I+D+i) and the CONACYT research project number 106625.	en_EN
dc.language	Inglés	es_ES
dc.publisher	Oxford University Press (OUP): Policy A - Oxford Open Option A	es_ES
dc.relation.ispartof	Computer Journal	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Clustering and analysis of textual data	es_ES
dc.subject	Narrow domain short texts	es_ES
dc.subject	Natural language processing	es_ES
dc.subject	Internet tools	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	A Self-Enriching Methodology for Clustering Narrow Domain Short Texts	es_ES
dc.type	Artículo	es_ES
dc.embargo.lift	10000-01-01
dc.embargo.terms	forever	es_ES
dc.identifier.doi	10.1093/comjnl/bxq069
dc.relation.projectID	info:eu-repo/grantAgreement/MICINN//TIN2009-13391-C04-03/ES/Text-Enterprise 2.0: Tecnicas De Comprension De Textos Aplicadas A Las Necesidades De La Empresa 2.0/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/CONACYT//106625/	es_ES
dc.rights.accessRights	Cerrado	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Pinto, D.; Rosso, P.; Jiménez-Salazar, H. (2011). A Self-Enriching Methodology for Clustering Narrow Domain Short Texts. Computer Journal. 54(7):1148-1165. https://doi.org/10.1093/comjnl/bxq069	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://dx.doi.org/10.1093/comjnl/bxq069	es_ES
dc.description.upvformatpinicio	1148	es_ES
dc.description.upvformatpfin	1165	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	54	es_ES
dc.description.issue	7	es_ES
dc.relation.senia	215390
dc.identifier.eissn	1460-2067
dc.contributor.funder	Ministerio de Ciencia e Innovación	es_ES
dc.contributor.funder	Consejo Nacional de Ciencia y Tecnología, México	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [48344]

Mostrar el registro sencillo del ítem

A Self-Enriching Methodology for Clustering Narrow Domain Short Texts

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

A Self-Enriching Methodology for Clustering Narrow Domain Short Texts

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)