Detection of opinion spam with character n-grams

Hernández Fusilier, Donato; Montes Gomez, Manuel; Rosso, Paolo; Guzmán Cabrera, Rafael

doi:10.1007/978-3-319-18117-2_21

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Detection of opinion spam with character n-grams

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: CICLing2015-author.pdf

Tamaño: 302.7Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: CICLING2015-editor.pdf

Tamaño: 128.5Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Hernández Fusilier, Donato	es_ES
dc.contributor.author	Montes Gomez, Manuel	es_ES
dc.contributor.author	Rosso, Paolo	es_ES
dc.contributor.author	Guzmán Cabrera, Rafael	es_ES
dc.date.accessioned	2016-05-19T08:04:51Z
dc.date.available	2016-05-19T08:04:51Z
dc.date.issued	2015
dc.identifier.isbn	978-3-319-18116-5
dc.identifier.issn	0302-9743
dc.identifier.uri	http://hdl.handle.net/10251/64360
dc.description	The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-18117-2_21	es_ES
dc.description.abstract	In this paper we consider the detection of opinion spam as a stylistic classi cation task because, given a particular domain, the deceptive and truthful opinions are similar in content but di ffer in the way opinions are written (style). Particularly, we propose using character ngrams as features since they have shown to capture lexical content as well as stylistic information. We evaluated our approach on a standard corpus composed of 1600 hotel reviews, considering positive and negative reviews. We compared the results obtained with character n-grams against the ones with word n-grams. Moreover, we evaluated the e ffectiveness of character n-grams decreasing the training set size in order to simulate real training conditions. The results obtained show that character n-grams are good features for the detection of opinion spam; they seem to be able to capture better than word n-grams the content of deceptive opinions and the writing style of the deceiver. In particular, results show an improvement of 2:3% and 2:1% over the word-based representations in the detection of positive and negative deceptive opinions respectively. Furthermore, character n-grams allow to obtain a good performance also with a very small training corpus. Using only 25% of the training set, a Na ve Bayes classi er showed F1 values up to 0.80 for both opinion polarities.	es_ES
dc.description.sponsorship	This work is the result of the collaboration in the frame-work of the WIQEI IRSES project (Grant No. 269180) within the FP7 Marie Curie. The second author was partially supported by the LACCIR programme under project ID R1212LAC006. Accordingly, the work of the third author was in the framework the DIANA-APPLICATIONS-Finding Hidden Knowledge inTexts: Applications (TIN2012-38603-C02-01) project, and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer International Publishing	es_ES
dc.relation.ispartof	Computational Linguistics and Intelligent Text Processing: 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part II	es_ES
dc.relation.ispartofseries	Lecture Notes in Computer Science;9042
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Opinion spam	es_ES
dc.subject	Deceptive detection	es_ES
dc.subject	Character n-grams	es_ES
dc.subject	Word n-grams	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Detection of opinion spam with character n-grams	es_ES
dc.type	Capítulo de libro	es_ES
dc.identifier.doi	10.1007/978-3-319-18117-2_21
dc.relation.projectID	info:eu-repo/grantAgreement/LACCIR//R1212LAC006/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02-01/ES/DIANA-APPLICATIONS: FINDING HIDDEN KNOWLEDGE IN TEXTS: APPLICATIONS/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Hernández Fusilier, D.; Montes Gomez, M.; Rosso, P.; Guzmán Cabrera, R. (2015). Detection of opinion spam with character n-grams. En Computational Linguistics and Intelligent Text Processing: 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part II. Springer International Publishing. 285-294. https://doi.org/10.1007/978-3-319-18117-2_21	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://link.springer.com/chapter/10.1007/978-3-319-18117-2_21	es_ES
dc.description.upvformatpinicio	285	es_ES
dc.description.upvformatpfin	294	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.relation.senia	306266	es_ES
dc.contributor.funder	European Commission	es_ES
dc.contributor.funder	Latin American and Caribbean Collaborative ICT Research Federation	es_ES
dc.contributor.funder	Universitat de València	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.description.references	Blamey, B., Crick, T., Oatley, G.: RU:-) or:-(? character-vs. word-gram feature selection for sentiment classification of OSN corpora. Research and Development in Intelligent Systems XXIX, 207–212 (2012)	es_ES
dc.description.references	Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (2002)	es_ES
dc.description.references	Feng, S., Banerjee, R., Choi, Y.: Syntactic Stylometry for Deception Detection. Association for Computational Linguistics, short paper. ACL (2012)	es_ES
dc.description.references	Feng, S., Xing, L., Gogar, A., Choi, Y.: Distributional Footprints of Deceptive Product Reviews. In: Proceedings of the 2012 International AAAI Conference on WebBlogs and Social Media (June 2012)	es_ES
dc.description.references	Gyongyi, Z., Garcia-Molina, H., Pedersen, J.: Combating Web Spam with Trust Rank. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 576–587. VLDB Endowment (2004)	es_ES
dc.description.references	Hall, M., Eibe, F., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: an Update. SIGKDD Explor. Newsl. 10–18 (2009)	es_ES
dc.description.references	Hernández-Fusilier, D., Guzmán-Cabrera, R., Montes-y-Gómez, M., Rosso, P.: Using PU-learning to Detect Deceptive Opinion Spam. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, USA, pp. 38–45 (2013)	es_ES
dc.description.references	Hernández-Fusilier, D., Montes-y-Gómez, M., Rosso, P., Guzmán-Cabrera, R.: Detecting Positive and Negative Deceptive Opinions using PU-learning. Information Processing & Management (2014), doi:10.1016/j.ipm.2014.11.001	es_ES
dc.description.references	Jindal, N., Liu, B.: Opinion Spam and Analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 219–230 (2008)	es_ES
dc.description.references	Jindal, N., Liu, B., Lim, E.: Finding Unusual Review Patterns Using Unexpected Rules. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 210–220(October 2010)	es_ES
dc.description.references	Kanaris, I., Kanaris, K., Houvardas, I., Stamatatos, E.: Word versus character n-grams for anti-spam filtering. International Journal on Artificial Intelligence Tools 16(6), 1047–1067 (2007)	es_ES
dc.description.references	Lim, E.P., Nguyen, V.A., Jindal, N., Liu, B., Lauw, H.W.: Detecting Product Review Spammers Using Rating Behaviours. In: CIKM, pp. 939–948 (2010)	es_ES
dc.description.references	Liu, B.: Sentiment Analysis and Opinion Mining. Synthesis Lecture on Human Language Technologies. Morgan & Claypool Publishers (2012)	es_ES
dc.description.references	Mukherjee, A., Liu, B., Wang, J., Glance, N., Jindal, N.: Detecting Group Review Spam. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 93–94 (2011)	es_ES
dc.description.references	Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting Spam Web Pages through Content Analysis. Transactions on Management Information Systems (TMIS), 83–92 (2006)	es_ES
dc.description.references	Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding Deceptive Opinion Spam by any Stretch of the Imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, pp. 309–319 (2011)	es_ES
dc.description.references	Ott, M., Cardie, C., Hancock, J.T.: Negative Deceptive Opinion Spam. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, USA, pp. 309–319 (2013)	es_ES
dc.description.references	Raymond, Y.K., Lau, S.Y., Liao, R., Chi-Wai, K., Kaiquan, X., Yunqing, X., Yuefeng, L.: Text Mining and Probabilistic Modeling for Online Review Spam Detection. ACM Transactions on Management Information Systems 2(4), Article: 25, 1–30 (2011)	es_ES
dc.description.references	Stamatatos, E.: On the robustness of authorship attribution based on character n-gram features. Journal of Law & Policy 21(2) (2013)	es_ES
dc.description.references	Wu, G., Greene, D., Cunningham, P.: Merging Multiple Criteria to Identify Suspicious Reviews. In: RecSys 2010, pp. 241–244 (2010)	es_ES
dc.description.references	Xie, S., Wang, G., Lin, S., Yu, P.S.: Review Spam Detection via Time Series Pattern Discovery. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp. 635–636 (2012)	es_ES
dc.description.references	Zhou, L., Sh, Y., Zhang, D.: A Statistical Language Modeling Approach to Online Deception Detection. IEEE Transactions on Knowledge and Data Engineering 20(8), 1077–1081 (2008)	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Detection of opinion spam with character n-grams

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Detection of opinion spam with character n-grams

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)