Mostrar el registro sencillo del ítem
dc.contributor.author | Pronoza, Ekaterina | es_ES |
dc.contributor.author | Panicheva, Polina | es_ES |
dc.contributor.author | Koltsova, Olessia | es_ES |
dc.contributor.author | Rosso, Paolo | es_ES |
dc.date.accessioned | 2022-06-03T18:02:18Z | |
dc.date.available | 2022-06-03T18:02:18Z | |
dc.date.issued | 2021-11 | es_ES |
dc.identifier.issn | 0306-4573 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/183078 | |
dc.description.abstract | [EN] Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up, informal and indirect ways of expressing negativity in user texts (such as irony, false generalization and attribution of unfavored actions to targeted groups), users¿ inclination to express opposite attitudes to different ethnic groups in the same text and, finally, lack of research on languages other than English. In this work we address several of these problems in the task of ethnicity-targeted hate speech detection in Russian-language social media texts. This approach allows us to differentiate between attitudes towards different ethnic groups mentioned in the same text ¿ a task that has never been addressed before. We use a dataset of over 2,6M user messages mentioning ethnic groups to construct a representative sample of 12K instances (ethnic group, text) that are further thoroughly annotated via a special procedure. In contrast to many previous collections that usually comprise extreme cases of toxic speech, representativity of our sample secures a realistic and, therefore, much higher proportion of subtle negativity which additionally complicates its automatic detection. We then experiment with four types of machine learning models, from traditional classifiers such as SVM to deep learning approaches, notably the recently introduced BERT architecture, and interpret their predictions in terms of various linguistic phenomena. In addition to hate speech detection with a text-level two-class approach (hate, no hate), we also justify and implement a unique instance-based three-class approach (positive, neutral, negative attitude, the latter implying hate speech). Our best results are achieved by using fine-tuned and pre-trained RuBERT combined with linguistic features, with F1-hate=0.760, F1-macro=0.833 on the text-level two-class problem comparable to previous studies, and F1-hate=0.813, F1-macro=0.824 on our unique instance-based three-class hate speech detection task. Finally, we perform error analysis, and it reveals that further improvement could be achieved by accounting for complex and creative language issues more accurately, i.e., by detecting irony and unconventional forms of obscene lexicon. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Elsevier | es_ES |
dc.relation.ispartof | Information Processing & Management | es_ES |
dc.rights | Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) | es_ES |
dc.subject | Hate speech detection | es_ES |
dc.subject | Ethnic hate | es_ES |
dc.subject | Russian language | es_ES |
dc.subject | Deep learning | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Detecting Ethnicity-targeted Hate Speech in Russian Social Media Texts | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1016/j.ipm.2021.102674 | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.description.bibliographicCitation | Pronoza, E.; Panicheva, P.; Koltsova, O.; Rosso, P. (2021). Detecting Ethnicity-targeted Hate Speech in Russian Social Media Texts. Information Processing & Management. 58(6):1-24. https://doi.org/10.1016/j.ipm.2021.102674 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1016/j.ipm.2021.102674 | es_ES |
dc.description.upvformatpinicio | 1 | es_ES |
dc.description.upvformatpfin | 24 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 58 | es_ES |
dc.description.issue | 6 | es_ES |
dc.relation.pasarela | S\463418 | es_ES |
dc.subject.ods | 04.- Garantizar una educación de calidad inclusiva y equitativa, y promover las oportunidades de aprendizaje permanente para todos | es_ES |