Silhouette + Attraction: A Simple and Effective Method for Text Clustering

Errecalde, Marcelo; Cagnina, Leticia; Rosso, Paolo

doi:10.1017/S1351324915000273

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Silhouette + Attraction: A Simple and Effective Method for Text Clustering

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: NLE-autor.pdf

Tamaño: 940.9Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: Errecalde, Cagnina, ...

Tamaño: 7.592Mb

Formato: PDF

Solicitar una copia al autor

dc.contributor.author	Errecalde, Marcelo	es_ES
dc.contributor.author	Cagnina, Leticia	es_ES
dc.contributor.author	Rosso, Paolo	es_ES
dc.date.accessioned	2016-05-17T10:36:50Z
dc.date.available	2016-05-17T10:36:50Z
dc.date.issued	2015-08
dc.identifier.issn	1351-3249
dc.identifier.uri	http://hdl.handle.net/10251/64228
dc.description.abstract	[EN] This article presents silhouette attraction (Sil Att), a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows us to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The experimental work shows that Sil Att is able to obtain high-quality results on text corpora with very different characteristics. Furthermore, its stable performance on all the considered corpora is indicative that it is a very robust method. This is a very interesting positive aspect of Sil Att with respect to the other algorithms used in the experiments, whose performances heavily depend on specific characteristics of the corpora being considered.	es_ES
dc.description.sponsorship	This research work has been partially funded by UNSL, CONICET (Argentina), DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) research project, and the WIQ-EI IRSES project (grant no. 269180) within the FP 7 Marie Curie People Framework on Web Information Quality Evaluation Initiative. The work of the third author was done also in the framework of the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.	en_EN
dc.language	Inglés	es_ES
dc.publisher	Cambridge University Press (CUP): STM Journals	es_ES
dc.relation.ispartof	Natural Language Engineering	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Silhouette + Attraction: A Simple and Effective Method for Text Clustering	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1017/S1351324915000273
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02-01/ES/DIANA-APPLICATIONS: FINDING HIDDEN KNOWLEDGE IN TEXTS: APPLICATIONS/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Errecalde, M.; Cagnina, L.; Rosso, P. (2015). Silhouette + Attraction: A Simple and Effective Method for Text Clustering. Natural Language Engineering. 1-40. https://doi.org/10.1017/S1351324915000273	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://dx.doi.org/10.1017/S1351324915000273	es_ES
dc.description.upvformatpinicio	1	es_ES
dc.description.upvformatpfin	40	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.relation.senia	306218	es_ES
dc.contributor.funder	European Commission
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.description.references	Zhao, Y., & Karypis, G. (2004). Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning, 55(3), 311-331. doi:10.1023/b:mach.0000027785.44527.d6	es_ES
dc.description.references	Tu, L., & Chen, Y. (2009). Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery from Data, 3(3), 1-27. doi:10.1145/1552303.1552305	es_ES
dc.description.references	Yang, T., Jin, R., Chi, Y., & Zhu, S. (2009). Combining link and content for community detection. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09. doi:10.1145/1557019.1557120	es_ES
dc.description.references	Zhao, Y., Karypis, G., & Fayyad, U. (2005). Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery, 10(2), 141-168. doi:10.1007/s10618-005-0361-3	es_ES
dc.description.references	Kaufman, L., & Rousseeuw, P. J. (Eds.). (1990). Finding Groups in Data. Wiley Series in Probability and Statistics. doi:10.1002/9780470316801	es_ES
dc.description.references	Karypis, G., Eui-Hong Han, & Kumar, V. (1999). Chameleon: hierarchical clustering using dynamic modeling. Computer, 32(8), 68-75. doi:10.1109/2.781637	es_ES
dc.description.references	Cagnina, L., Errecalde, M., Ingaramo, D., & Rosso, P. (2014). An efficient Particle Swarm Optimization approach to cluster short texts. Information Sciences, 265, 36-49. doi:10.1016/j.ins.2013.12.010	es_ES
dc.description.references	He, H., Chen, B., Xu, W., & Guo, J. (2007). Short Text Feature Extraction and Clustering for Web Topic Mining. Third International Conference on Semantics, Knowledge and Grid (SKG 2007). doi:10.1109/skg.2007.76	es_ES
dc.description.references	Spearman, C. (1904). The Proof and Measurement of Association between Two Things. The American Journal of Psychology, 15(1), 72. doi:10.2307/1412159	es_ES
dc.description.references	Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. doi:10.1016/0377-0427(87)90125-7	es_ES
dc.description.references	Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. doi:10.1017/cbo9780511809071	es_ES
dc.description.references	Qi, G.-J., Aggarwal, C. C., & Huang, T. (2012). Community Detection with Edge Content in Social Media Networks. 2012 IEEE 28th International Conference on Data Engineering. doi:10.1109/icde.2012.77	es_ES
dc.description.references	Daxin Jiang, Jian Pei, & Aidong Zhang. (s. f.). DHC: a density-based hierarchical clustering method for time series gene expression data. Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings. doi:10.1109/bibe.2003.1188978	es_ES
dc.description.references	Charikar, M., Chekuri, C., Feder, T., & Motwani, R. (2004). Incremental Clustering and Dynamic Information Retrieval. SIAM Journal on Computing, 33(6), 1417-1440. doi:10.1137/s0097539702418498	es_ES
dc.description.references	Selim, S. Z., & Alsultan, K. (1991). A simulated annealing algorithm for the clustering problem. Pattern Recognition, 24(10), 1003-1008. doi:10.1016/0031-3203(91)90097-o	es_ES
dc.description.references	Aranganayagi, S., & Thangavel, K. (2007). Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure. International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007). doi:10.1109/iccima.2007.328	es_ES
dc.description.references	Makagonov, P., Alexandrov, M., & Gelbukh, A. (2004). Clustering Abstracts Instead of Full Texts. Lecture Notes in Computer Science, 129-135. doi:10.1007/978-3-540-30120-2_17	es_ES
dc.description.references	Jing L. 2005. Survey of text clustering. Technical report. Department of Mathematics. The University of Hong Kong, Hong Kong, China.	es_ES
dc.description.references	Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379-423. doi:10.1002/j.1538-7305.1948.tb01338.x	es_ES
dc.description.references	Hearst, M. A. (2006). Clustering versus faceted categories for information exploration. Communications of the ACM, 49(4), 59. doi:10.1145/1121949.1121983	es_ES
dc.description.references	Alexandrov, M., Gelbukh, A., & Rosso, P. (2005). An Approach to Clustering Abstracts. Lecture Notes in Computer Science, 275-285. doi:10.1007/11428817_25	es_ES
dc.description.references	Dos Santos, J. B., Heuser, C. A., Moreira, V. P., & Wives, L. K. (2011). Automatic threshold estimation for data matching applications. Information Sciences, 181(13), 2685-2699. doi:10.1016/j.ins.2010.05.029	es_ES
dc.description.references	Hasan, M. A., Chaoji, V., Salem, S., & Zaki, M. J. (2009). Robust partitional clustering by outlier and density insensitive seeding. Pattern Recognition Letters, 30(11), 994-1002. doi:10.1016/j.patrec.2009.04.013	es_ES
dc.description.references	Dunn†, J. C. (1974). Well-Separated Clusters and Optimal Fuzzy Partitions. Journal of Cybernetics, 4(1), 95-104. doi:10.1080/01969727408546059	es_ES
dc.description.references	Carullo, M., Binaghi, E., & Gallo, I. (2009). An online document clustering technique for short web contents. Pattern Recognition Letters, 30(10), 870-876. doi:10.1016/j.patrec.2009.04.001	es_ES
dc.description.references	Kruskal, W. H., & Wallis, W. A. (1952). Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association, 47(260), 583-621. doi:10.1080/01621459.1952.10483441	es_ES
dc.description.references	Bezdek, J. C., & Pal, N. R. (s. f.). Cluster validation with generalized Dunn’s indices. Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems. doi:10.1109/annes.1995.499469	es_ES
dc.description.references	Brun, M., Sima, C., Hua, J., Lowey, J., Carroll, B., Suh, E., & Dougherty, E. R. (2007). Model-based evaluation of clustering validation measures. Pattern Recognition, 40(3), 807-824. doi:10.1016/j.patcog.2006.06.026	es_ES
dc.description.references	Davies, D. L., & Bouldin, D. W. (1979). A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224-227. doi:10.1109/tpami.1979.4766909	es_ES
dc.description.references	Pinto, D., & Rosso, P. (s. f.). On the Relative Hardness of Clustering Corpora. Lecture Notes in Computer Science, 155-161. doi:10.1007/978-3-540-74628-7_22	es_ES
dc.description.references	Pons-Porrata, A., Berlanga-Llavori, R., & Ruiz-Shulcloper, J. (2007). Topic discovery based on text mining techniques. Information Processing & Management, 43(3), 752-768. doi:10.1016/j.ipm.2006.06.001	es_ES
dc.description.references	Pinto, D., Benedí, J.-M., & Rosso, P. (2007). Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance. Lecture Notes in Computer Science, 611-622. doi:10.1007/978-3-540-70939-8_54	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Silhouette + Attraction: A Simple and Effective Method for Text Clustering

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Silhouette + Attraction: A Simple and Effective Method for Text Clustering

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)