- -

Detecting Ethnicity-targeted Hate Speech in Russian Social Media Texts

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Detecting Ethnicity-targeted Hate Speech in Russian Social Media Texts

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Pronoza, Ekaterina es_ES
dc.contributor.author Panicheva, Polina es_ES
dc.contributor.author Koltsova, Olessia es_ES
dc.contributor.author Rosso, Paolo es_ES
dc.date.accessioned 2022-06-03T18:02:18Z
dc.date.available 2022-06-03T18:02:18Z
dc.date.issued 2021-11 es_ES
dc.identifier.issn 0306-4573 es_ES
dc.identifier.uri http://hdl.handle.net/10251/183078
dc.description.abstract [EN] Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up, informal and indirect ways of expressing negativity in user texts (such as irony, false generalization and attribution of unfavored actions to targeted groups), users¿ inclination to express opposite attitudes to different ethnic groups in the same text and, finally, lack of research on languages other than English. In this work we address several of these problems in the task of ethnicity-targeted hate speech detection in Russian-language social media texts. This approach allows us to differentiate between attitudes towards different ethnic groups mentioned in the same text ¿ a task that has never been addressed before. We use a dataset of over 2,6M user messages mentioning ethnic groups to construct a representative sample of 12K instances (ethnic group, text) that are further thoroughly annotated via a special procedure. In contrast to many previous collections that usually comprise extreme cases of toxic speech, representativity of our sample secures a realistic and, therefore, much higher proportion of subtle negativity which additionally complicates its automatic detection. We then experiment with four types of machine learning models, from traditional classifiers such as SVM to deep learning approaches, notably the recently introduced BERT architecture, and interpret their predictions in terms of various linguistic phenomena. In addition to hate speech detection with a text-level two-class approach (hate, no hate), we also justify and implement a unique instance-based three-class approach (positive, neutral, negative attitude, the latter implying hate speech). Our best results are achieved by using fine-tuned and pre-trained RuBERT combined with linguistic features, with F1-hate=0.760, F1-macro=0.833 on the text-level two-class problem comparable to previous studies, and F1-hate=0.813, F1-macro=0.824 on our unique instance-based three-class hate speech detection task. Finally, we perform error analysis, and it reveals that further improvement could be achieved by accounting for complex and creative language issues more accurately, i.e., by detecting irony and unconventional forms of obscene lexicon. es_ES
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation.ispartof Information Processing & Management es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Hate speech detection es_ES
dc.subject Ethnic hate es_ES
dc.subject Russian language es_ES
dc.subject Deep learning es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Detecting Ethnicity-targeted Hate Speech in Russian Social Media Texts es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.ipm.2021.102674 es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Pronoza, E.; Panicheva, P.; Koltsova, O.; Rosso, P. (2021). Detecting Ethnicity-targeted Hate Speech in Russian Social Media Texts. Information Processing & Management. 58(6):1-24. https://doi.org/10.1016/j.ipm.2021.102674 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1016/j.ipm.2021.102674 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 24 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 58 es_ES
dc.description.issue 6 es_ES
dc.relation.pasarela S\463418 es_ES
dc.subject.ods 04.- Garantizar una educación de calidad inclusiva y equitativa, y promover las oportunidades de aprendizaje permanente para todos es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem