Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context

Colton, David; Hofmann, Markus

doi:10.4995/jclr.2019.11112

RiuNet repositorio UPV
:
Investigación
:
Material investigación. Editorial UPV
:
Revistas UPV. Editorial UPV
:
Journal of Computer-Assisted Linguistic Research
:
Journal of Computer-Assisted Linguistic Research - Vol 03 (2019)
:
Ver ítem

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: 11112-48066-1-PB.pdf

Tamaño: 3.614Mb

Formato: PDF

Abrir

dc.contributor.author	Colton, David	es_ES
dc.contributor.author	Hofmann, Markus	es_ES
dc.date.accessioned	2019-07-19T10:40:02Z
dc.date.available	2019-07-19T10:40:02Z
dc.date.issued	2019-07-16
dc.identifier.uri	http://hdl.handle.net/10251/123823
dc.description.abstract	[EN] The majority of datasets suffer from class imbalance where samples of a dominant class significantly outnumber the samples available for the minority class that is to be detected. Prediction and classification machine learning models work best when there are roughly equal numbers of each class type. This paper explores sampling techniques that can be used to overcome this class imbalance problem in a cyberbullying context. A newly classified cyberbullying dataset, including detailed descriptions of the criteria used in its classification, was used to examine the feasibility of applying text mining techniques, to automate the detection of cyberbullying text when the dataset shows a significant class imbalance between the positive, cyberbullying, sample and the negative, not cyberbullying, samples. In this paper, we will investigate if oversampling the minority positive class or undersampling the majority negative class affects the performance of a prediction model. A compromise solution where the positive class is partially oversampled, and the negative class is partially undersampled is also examined. Although not strictly a class imbalance solution, sampling using the most frequently observed features was also explored.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Universitat Politècnica de València
dc.relation.ispartof	Journal of Computer-Assisted Linguistic Research
dc.rights	Reconocimiento - No comercial - Sin obra derivada (by-nc-nd)	es_ES
dc.subject	Text mining	es_ES
dc.subject	Class imbalance	es_ES
dc.subject	Cyberbullying	es_ES
dc.subject	Sampling	es_ES
dc.subject	Classification	es_ES
dc.title	Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context	es_ES
dc.type	Artículo	es_ES
dc.date.updated	2019-07-19T10:31:04Z
dc.identifier.doi	10.4995/jclr.2019.11112
dc.rights.accessRights	Abierto	es_ES
dc.description.bibliographicCitation	Colton, D.; Hofmann, M. (2019). Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context. Journal of Computer-Assisted Linguistic Research. 3(3):21-40. https://doi.org/10.4995/jclr.2019.11112	es_ES
dc.description.accrualMethod	SWORD	es_ES
dc.relation.publisherversion	https://doi.org/10.4995/jclr.2019.11112	es_ES
dc.description.upvformatpinicio	21	es_ES
dc.description.upvformatpfin	40	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	3
dc.description.issue	3
dc.identifier.eissn	2530-9455
dc.description.references	Cardie, Claire. 1997. "Improving minority class prediction using case-specific feature weights." Proceedings of the Fourteenth International Conference on Machine Learning. Morgan Kaufmann. 57-65.	es_ES
dc.description.references	Chan, Philip K., and Salvatore J. Stolfo. 1998. "Toward Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection." In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press. 164-168.	es_ES
dc.description.references	Chawla, Nitesh V. and Bowyer, Kevin W. and Hall, Lawrence O. and Kegelmeyer, W. Philip. 2002. "SMOTE: Synthetic Minority Over-sampling Technique." Journal of Artificial Intelligence Research. 321-357. https://doi.org/10.1613/jair.953	es_ES
dc.description.references	Chen, Ying, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. "Detecting Offensive Language in Social Media to Protect Adolescent Online Safety." Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom). IEEE. 71-80. https://doi.org/10.1109/SocialCom-PASSAT.2012.55	es_ES
dc.description.references	Cionnaith, Fiachra Ó. 2012. Third suicide in weeks linked to cyberbullying. Accessed 03 14, 2019. http://www.irishexaminer.com/ireland/third-suicide-in-weeks-linked-to-cyberbullying-212271.html.	es_ES
dc.description.references	Dadvar, M. , F. M. G. de Jong, R. J. F. Ordelman, and R. B. Trieschnigg. 2012. "Improved cyberbullying detection using gender information." https://doi.org/10.1007/978-3-642-36973-5_62	es_ES
dc.description.references	Dadvar, Maral, Dolf Trieschnigg, Roeland Ordelman, and Franciska de Jong. 2013. "Improving Cyberbullying Detection with User Context." In Lecture Notes in Computer Science, 693-696. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_62	es_ES
dc.description.references	Dadvar, Maral, Roeland Ordelman, Franciska de Jong, and Dolf Trieschnigg. 2012. "Towards User Modelling in the Combat against Cyberbullying." Lecture Notes in Computer Science, 277-283. https://doi.org/10.1007/978-3-642-31178-9_34	es_ES
dc.description.references	Dinakar, Karthik, Roi Reichart, and Henry Lieberman. 2011. "Modeling the Detection of Textual Cyberbullying." The Social Mobile Web, Papers from the 2011 ICWSM Workshop, Barcelona, Catalonia, Spain, July 21, 2011. Association for the Advancement of Artificial Intelligence.	es_ES
dc.description.references	FBM, Fundación Barcelona Media. 2009. CAW 2.0 Training Datasets. Barcelona.	es_ES
dc.description.references	García, Vicente, José Sánchez, Mollineda R.A, Roberto Alejo, and José Sotoca. 2007. "The class imbalance problem in pattern classification and learning." II Congreso Español de Informática.	es_ES
dc.description.references	Kontostathis, April, Kelly Reynolds, Andy Garron, and Lynne Edwards. 2013. "Detecting Cyberbullying: Query Terms and Techniques." Proceedings of the 5th Annual ACM Web Science Conference. New York: ACM. 195-204. https://doi.org/10.1145/2464464.2464499	es_ES
dc.description.references	Kontostathis, April, Lynne Edwards, and Amanda Leatherman. 2009. "ChatCoder: Toward the Tracking and Categorization of Internet Predators." Proc. Text Mining Workshop 2009 Held In Conjunction With The Ninth Siam International Conference On Data Mining (Sdm 2009). Sparks, Nv. May 2009.	es_ES
dc.description.references	Kubat, Miroslav, and Stan Matwin. 1997. "Addressing the Curse of Imbalanced Training Sets: One-Sided Selection." Proceedings of the Fourteenth International Conference on Machine Learning.Morgan Kaufmann. 179-186.	es_ES
dc.description.references	Nahar, Vinita, Xue Li, and Chaoyi Pang. 2013. "A step towards combating cyberbullying: Automated detection."	es_ES
dc.description.references	Nahar, Vinita, Xue Li, and Chaoyi Pang. 2013. "An Effective Approach for Cyberbullying Detection." Communications in Information Science and Management Engineering. 238-247.	es_ES
dc.description.references	Quinlan, J. Ross. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc	es_ES
dc.description.references	Reynolds, K., A. Kontostathis, and L. Edwards. 2011. "Using Machine Learning to Detect Cyberbullying." 2011 10th International Conference on Machine Learning and Applications and Workshops (ICMLA). Honolulu. 241-244. https://doi.org/10.1109/ICMLA.2011.152	es_ES
dc.description.references	Riegel, Ralph. 2013. Cyber-bullies claimed lives of five teens. 25 01. Accessed 03 14, 2019. http://www.herald.ie/news/cyberbullies-claimed-lives-of-five-teens-29043544.html.	es_ES
dc.description.references	RuleQuest Research. n.d. Data Mining Tools See5 and C5.0. Accessed 03 2013. https://www.rulequest.com/see5-info.html.	es_ES
dc.description.references	Smith-Spark, Laura. 2013. Hanna Smith suicide fuels calls for action on Ask.fm cyberbullying. 09 08. Accessed 03 14, 2019. http://www.cnn.com/2013/08/07/world/europe/uk-social-media-bullying/index.html.	es_ES
dc.description.references	U.S. Department of Health and Human Services. 2018. What Is Bullying. 26 06. Accessed 03 31, 2019. https://www.stopbullying.gov/what-is-bullying/index.html.	es_ES
dc.description.references	Weiss, Gary, Kate McCarthy, and Bibi Zabar. 2007. "Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?" Proceedings of the 2007 International Conference on Data Mining, DMIN 2007. Las Vegas: CSREA Press. 35-41.	es_ES
dc.description.references	Xu, Jun-Ming, Kwang-Sung Jun, Xiaojin Zhu, and Amy Bellmore. 2012. "Learning from Bullying Traces in Social Media." Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics. 656-666.	es_ES
dc.description.references	Xu, Jun-Ming, Xiaojin Zhu, and Amy Bellmore. 2012. "Fast Learning for Sentiment Analysis on Bullying." Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining. Beijing: ACM. 10:1-10:6. https://doi.org/10.1145/2346676.2346686	es_ES
dc.description.references	Yin, Dawei, Brian Davison, Zhenzhen Xue, Liangjie Hong, April Kontostathis, and Lynne Edwards. 2009. "Detection of Harassment on Web 2.0." Proceedings of the Content Analysis in the WEB. 1-7.	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Journal of Computer-Assisted Linguistic Research - Vol 03 (2019) [5]

Mostrar el registro sencillo del ítem

Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)