Mostrar el registro sencillo del ítem
dc.contributor.author | Errecalde, Marcelo | es_ES |
dc.contributor.author | Cagnina, Leticia | es_ES |
dc.contributor.author | Rosso, Paolo | es_ES |
dc.date.accessioned | 2016-05-17T10:36:50Z | |
dc.date.available | 2016-05-17T10:36:50Z | |
dc.date.issued | 2015-08 | |
dc.identifier.issn | 1351-3249 | |
dc.identifier.uri | http://hdl.handle.net/10251/64228 | |
dc.description.abstract | [EN] This article presents silhouette attraction (Sil Att), a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows us to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The experimental work shows that Sil Att is able to obtain high-quality results on text corpora with very different characteristics. Furthermore, its stable performance on all the considered corpora is indicative that it is a very robust method. This is a very interesting positive aspect of Sil Att with respect to the other algorithms used in the experiments, whose performances heavily depend on specific characteristics of the corpora being considered. | es_ES |
dc.description.sponsorship | This research work has been partially funded by UNSL, CONICET (Argentina), DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) research project, and the WIQ-EI IRSES project (grant no. 269180) within the FP 7 Marie Curie People Framework on Web Information Quality Evaluation Initiative. The work of the third author was done also in the framework of the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems. | en_EN |
dc.language | Inglés | es_ES |
dc.publisher | Cambridge University Press (CUP): STM Journals | es_ES |
dc.relation.ispartof | Natural Language Engineering | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Silhouette + Attraction: A Simple and Effective Method for Text Clustering | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1017/S1351324915000273 | |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02-01/ES/DIANA-APPLICATIONS: FINDING HIDDEN KNOWLEDGE IN TEXTS: APPLICATIONS/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.description.bibliographicCitation | Errecalde, M.; Cagnina, L.; Rosso, P. (2015). Silhouette + Attraction: A Simple and Effective Method for Text Clustering. Natural Language Engineering. 1-40. https://doi.org/10.1017/S1351324915000273 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | http://dx.doi.org/10.1017/S1351324915000273 | es_ES |
dc.description.upvformatpinicio | 1 | es_ES |
dc.description.upvformatpfin | 40 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.relation.senia | 306218 | es_ES |
dc.contributor.funder | European Commission | |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |
dc.description.references | Zhao, Y., & Karypis, G. (2004). Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning, 55(3), 311-331. doi:10.1023/b:mach.0000027785.44527.d6 | es_ES |
dc.description.references | Tu, L., & Chen, Y. (2009). Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery from Data, 3(3), 1-27. doi:10.1145/1552303.1552305 | es_ES |
dc.description.references | Yang, T., Jin, R., Chi, Y., & Zhu, S. (2009). Combining link and content for community detection. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09. doi:10.1145/1557019.1557120 | es_ES |
dc.description.references | Zhao, Y., Karypis, G., & Fayyad, U. (2005). Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery, 10(2), 141-168. doi:10.1007/s10618-005-0361-3 | es_ES |
dc.description.references | Kaufman, L., & Rousseeuw, P. J. (Eds.). (1990). Finding Groups in Data. Wiley Series in Probability and Statistics. doi:10.1002/9780470316801 | es_ES |
dc.description.references | Karypis, G., Eui-Hong Han, & Kumar, V. (1999). Chameleon: hierarchical clustering using dynamic modeling. Computer, 32(8), 68-75. doi:10.1109/2.781637 | es_ES |
dc.description.references | Cagnina, L., Errecalde, M., Ingaramo, D., & Rosso, P. (2014). An efficient Particle Swarm Optimization approach to cluster short texts. Information Sciences, 265, 36-49. doi:10.1016/j.ins.2013.12.010 | es_ES |
dc.description.references | He, H., Chen, B., Xu, W., & Guo, J. (2007). Short Text Feature Extraction and Clustering for Web Topic Mining. Third International Conference on Semantics, Knowledge and Grid (SKG 2007). doi:10.1109/skg.2007.76 | es_ES |
dc.description.references | Spearman, C. (1904). The Proof and Measurement of Association between Two Things. The American Journal of Psychology, 15(1), 72. doi:10.2307/1412159 | es_ES |
dc.description.references | Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. doi:10.1016/0377-0427(87)90125-7 | es_ES |
dc.description.references | Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. doi:10.1017/cbo9780511809071 | es_ES |
dc.description.references | Qi, G.-J., Aggarwal, C. C., & Huang, T. (2012). Community Detection with Edge Content in Social Media Networks. 2012 IEEE 28th International Conference on Data Engineering. doi:10.1109/icde.2012.77 | es_ES |
dc.description.references | Daxin Jiang, Jian Pei, & Aidong Zhang. (s. f.). DHC: a density-based hierarchical clustering method for time series gene expression data. Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings. doi:10.1109/bibe.2003.1188978 | es_ES |
dc.description.references | Charikar, M., Chekuri, C., Feder, T., & Motwani, R. (2004). Incremental Clustering and Dynamic Information Retrieval. SIAM Journal on Computing, 33(6), 1417-1440. doi:10.1137/s0097539702418498 | es_ES |
dc.description.references | Selim, S. Z., & Alsultan, K. (1991). A simulated annealing algorithm for the clustering problem. Pattern Recognition, 24(10), 1003-1008. doi:10.1016/0031-3203(91)90097-o | es_ES |
dc.description.references | Aranganayagi, S., & Thangavel, K. (2007). Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure. International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007). doi:10.1109/iccima.2007.328 | es_ES |
dc.description.references | Makagonov, P., Alexandrov, M., & Gelbukh, A. (2004). Clustering Abstracts Instead of Full Texts. Lecture Notes in Computer Science, 129-135. doi:10.1007/978-3-540-30120-2_17 | es_ES |
dc.description.references | Jing L. 2005. Survey of text clustering. Technical report. Department of Mathematics. The University of Hong Kong, Hong Kong, China. | es_ES |
dc.description.references | Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379-423. doi:10.1002/j.1538-7305.1948.tb01338.x | es_ES |
dc.description.references | Hearst, M. A. (2006). Clustering versus faceted categories for information exploration. Communications of the ACM, 49(4), 59. doi:10.1145/1121949.1121983 | es_ES |
dc.description.references | Alexandrov, M., Gelbukh, A., & Rosso, P. (2005). An Approach to Clustering Abstracts. Lecture Notes in Computer Science, 275-285. doi:10.1007/11428817_25 | es_ES |
dc.description.references | Dos Santos, J. B., Heuser, C. A., Moreira, V. P., & Wives, L. K. (2011). Automatic threshold estimation for data matching applications. Information Sciences, 181(13), 2685-2699. doi:10.1016/j.ins.2010.05.029 | es_ES |
dc.description.references | Hasan, M. A., Chaoji, V., Salem, S., & Zaki, M. J. (2009). Robust partitional clustering by outlier and density insensitive seeding. Pattern Recognition Letters, 30(11), 994-1002. doi:10.1016/j.patrec.2009.04.013 | es_ES |
dc.description.references | Dunn†, J. C. (1974). Well-Separated Clusters and Optimal Fuzzy Partitions. Journal of Cybernetics, 4(1), 95-104. doi:10.1080/01969727408546059 | es_ES |
dc.description.references | Carullo, M., Binaghi, E., & Gallo, I. (2009). An online document clustering technique for short web contents. Pattern Recognition Letters, 30(10), 870-876. doi:10.1016/j.patrec.2009.04.001 | es_ES |
dc.description.references | Kruskal, W. H., & Wallis, W. A. (1952). Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association, 47(260), 583-621. doi:10.1080/01621459.1952.10483441 | es_ES |
dc.description.references | Bezdek, J. C., & Pal, N. R. (s. f.). Cluster validation with generalized Dunn’s indices. Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems. doi:10.1109/annes.1995.499469 | es_ES |
dc.description.references | Brun, M., Sima, C., Hua, J., Lowey, J., Carroll, B., Suh, E., & Dougherty, E. R. (2007). Model-based evaluation of clustering validation measures. Pattern Recognition, 40(3), 807-824. doi:10.1016/j.patcog.2006.06.026 | es_ES |
dc.description.references | Davies, D. L., & Bouldin, D. W. (1979). A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224-227. doi:10.1109/tpami.1979.4766909 | es_ES |
dc.description.references | Pinto, D., & Rosso, P. (s. f.). On the Relative Hardness of Clustering Corpora. Lecture Notes in Computer Science, 155-161. doi:10.1007/978-3-540-74628-7_22 | es_ES |
dc.description.references | Pons-Porrata, A., Berlanga-Llavori, R., & Ruiz-Shulcloper, J. (2007). Topic discovery based on text mining techniques. Information Processing & Management, 43(3), 752-768. doi:10.1016/j.ipm.2006.06.001 | es_ES |
dc.description.references | Pinto, D., Benedí, J.-M., & Rosso, P. (2007). Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance. Lecture Notes in Computer Science, 611-622. doi:10.1007/978-3-540-70939-8_54 | es_ES |