- -

Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Peña-Sarracén, Gretel Liz de la es_ES
dc.contributor.author Rosso, Paolo es_ES
dc.date.accessioned 2022-11-07T19:01:36Z
dc.date.available 2022-11-07T19:01:36Z
dc.date.issued 2021-08-27 es_ES
dc.identifier.issn 1617-4909 es_ES
dc.identifier.uri http://hdl.handle.net/10251/189377
dc.description.abstract [EN] The proliferation of harmful content on social media affects a large part of the user community. Therefore, several approaches have emerged to control this phenomenon automatically. However, this is still a quite challenging task. In this paper, we explore the offensive language as a particular case of harmful content and focus our study in the analysis of keywords in available datasets composed of offensive tweets. Thus, we aim to identify relevant words in those datasets and analyze how they can affect model learning. For keyword extraction, we propose an unsupervised hybrid approach which combines the multi-head self-attention of BERT and a reasoning on a word graph. The attention mechanism allows to capture relationships among words in a context, while a language model is learned. Then, the relationships are used to generate a graph from what we identify the most relevant words by using the eigenvector centrality. Experiments were performed by means of two mechanisms. On the one hand, we used an information retrieval system to evaluate the impact of the keywords in recovering offensive tweets from a dataset. On the other hand, we evaluated a keyword-based model for offensive language detection. Results highlight some points to consider when training models with available datasets. es_ES
dc.language Inglés es_ES
dc.publisher Springer-Verlag es_ES
dc.relation.ispartof Personal and Ubiquitous Computing es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Unsupervised keyword extraction es_ES
dc.subject Offensive language detection es_ES
dc.subject Attention mechanism es_ES
dc.subject Graph representation es_ES
dc.title Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s00779-021-01605-5 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PGC2018-096212-B-C31/ES/DESINFORMACION Y AGRESIVIDAD EN SOCIAL MEDIA: AGREGANDO INFORMACION Y ANALIZANDO EL LENGUAJE/ es_ES
dc.rights.accessRights Abierto es_ES
dc.description.bibliographicCitation Peña-Sarracén, GLDL.; Rosso, P. (2021). Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation. Personal and Ubiquitous Computing. 1-13. https://doi.org/10.1007/s00779-021-01605-5 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1007/s00779-021-01605-5 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 13 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.pasarela S\450464 es_ES
dc.contributor.funder AGENCIA ESTATAL DE INVESTIGACION es_ES
dc.description.references Ao X, Yu X, Liu D, Tian H (2020) News keywords extraction algorithm based on textrank and classified TF-IDF. In: 2020 international wireless communications and mobile computing (IWCMC). IEEE, pp 1364–1369 es_ES
dc.description.references Basile V, Bosco C, Fersini E, Debora N, Patti V, Pardo FMR, Rosso P, Sanguinetti M, et al. (2019) Semeval-2019 task 5: multilingual detection of hate speech against immigrants and women in twitter. In: 13th international workshop on semantic evaluation. Association for Computational Linguistics, pp 54–63 es_ES
dc.description.references Berry MW, Kogan J (2010) Text mining: applications and theory. John Wiley & Sons, New York es_ES
dc.description.references Boudin F (2013) A comparison of centrality measures for graph-based keyphrase extraction. In: Proceedings of the sixth international joint conference on natural language processing, pp 834–838 es_ES
dc.description.references Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the seventh international conference on World Wide Web, pp 107–117 es_ES
dc.description.references Büttcher S, Clarke CL, Cormack GV (2016) Information retrieval: implementing and evaluating search engines. Mit Press, Cambridge es_ES
dc.description.references Casula C, Aprosio AP, Menini S, Tonelli S (2020) Fbk-dh at semeval-2020 task 12: using multi-channel bert for multilingual offensive language detection. In: Proceedings of the fourteenth workshop on semantic evaluation, pp 1539–1545 es_ES
dc.description.references Chaudhari S, Polatkan G, Ramanath R, Mithal V (2019) An attentive survey of attention models. arXiv:1904.02874 es_ES
dc.description.references Dai W, Yu T, Liu Z, Fung P (2020) Kungfupanda at semeval-2020 task 12: Bert-based multi-task learning for offensive language detection. arXiv:2004.13432 es_ES
dc.description.references Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 es_ES
dc.description.references Fersini E, Rosso P, Anzovino M (2018) Overview of the task on automatic misogyny identification at IberEval 2018. IberEval@ SEPLN 2150:214–228 es_ES
dc.description.references Firoozeh N, Nazarenko A, Alizon F, Daille B (2020) Keyword extraction: issues and methods. Nat Lang Eng 26(3):259–291 es_ES
dc.description.references Hasan KS, Ng V (2014) Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1262–1273 es_ES
dc.description.references Hu X, Wu B (2006) Automatic keyword extraction using linguistic features. In: Sixth IEEE international conference on data mining-workshops (ICDMW’06). IEEE, pp 19–23 es_ES
dc.description.references Kathait SS, Tiwari S, Varshney A, Sharma A (2017) Unsupervised key-phrase extraction using noun phrases. Int J Comput Appl 162(1) es_ES
dc.description.references Kaur J, Gupta V (2010) Effective approaches for extraction of keywords. Int J Comput Sci Issues (IJCSI) 7(6):144 es_ES
dc.description.references Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980 es_ES
dc.description.references Mandl T, Modha S, Majumder P, Patel D, Dave M, Mandlia C, Patel A (2019) Overview of the HASOC track at FIRE 2019: hate speech and offensive content identification in indo-european languages. In: Proceedings of the 11th forum for information retrieval evaluation, pp 14–17 es_ES
dc.description.references Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411 es_ES
dc.description.references Nasar Z, Jaffry SW, Malik MK (2019) Textual keyword extraction and summarization: state-of-the-art. Inf Process Manag 56(6):102088 es_ES
dc.description.references Newman ME (2008) The mathematics of networks. New Palgrave Encycl Econ 2(2008):1–12 es_ES
dc.description.references Pappagari R, Zelasko P, Villalba J, Carmiel Y, Dehak N (2019) Hierarchical transformers for long document classification. In: 2019 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, pp 838–844 es_ES
dc.description.references De la Pena Sarracén GL, Rosso P (2020) Prhlt-upv at semeval-2020 task 12: Bert for multilingual offensive language detection. In: Proceedings of the fourteenth workshop on semantic evaluation, pp 1605–1614 es_ES
dc.description.references Pitsilis GK, Ramampiaro H, Langseth H (2018) Detecting offensive language in tweets using deep learning. arXiv:1801.04433 es_ES
dc.description.references Poletto F, Basile V, Sanguinetti M, Bosco C, Patti V (2020) Resources and benchmark corpora for hate speech detection: a systematic review. Lang Resour Eval pp 1–47 es_ES
dc.description.references Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M, et al. (1995) Okapi at trec-3. Nist Spec Publ 109:109 es_ES
dc.description.references Rosenthal S, Atanasova P, Karadzhov G, Zampieri M, Nakov P (2020) A large-scale semi-supervised dataset for offensive language identification. arXiv:2004.14454 es_ES
dc.description.references Sahrawat D, Mahata D, Kulkarni M, Zhang H, Gosangi R, Stent A, Sharma A, Kumar Y, Shah RR, Zimmermann R (2019) Keyphrase extraction from scholarly articles as sequence labeling using contextualized embeddings. arXiv:1910.08840 es_ES
dc.description.references Uglow H, Zlocha M, Zmyślony S (2019) An exploration of state-of-the-art methods for offensive language detection. arXiv:1903.07445 es_ES
dc.description.references Vashistha N, Zubiaga A (2020) Online multilingual hate speech detection: experimenting with Hindi and English social media es_ES
dc.description.references Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 es_ES
dc.description.references Wang S, Liu J, Ouyang X, Sun Y (2020) Galileo at semeval-2020 task 12: multi-lingual learning for offensive language identification using pre-trained language models. arXiv:2010.03542 es_ES
dc.description.references Wani AH, Molvi NS, Ashraf SI (2019) Detection of hate and offensive speech in text. In: International conference on intelligent human computer interaction. Springer, pp 87–93 es_ES
dc.description.references Wiedemann G, Yimam SM, Biemann C (2020) Uhh-lt at semeval-2020 task 12: fine-tuning of pre-trained transformer networks for offensive language detection. In: Proceedings of the fourteenth workshop on semantic evaluation, pp 1638–1644 es_ES
dc.description.references Wiegand M, Ruppenhofer J, Kleinbauer T (2019) Detection of abusive language: the problem of biased datasets. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, volume 1 (long and short papers), pp 602–608 es_ES
dc.description.references Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG (2005) KEA: practical automated keyphrase extraction. In: Design and usability of digital libraries: case studies in the asia pacific. IGI Global, pp 129–152 es_ES
dc.description.references Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Predicting the type and target of offensive posts in social media. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics (NAACL), pp 1415–1420 es_ES
dc.description.references Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Predicting the type and target of offensive posts in social media. arXiv:1902.09666 es_ES
dc.description.references Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Semeval-2019 task 6: identifying and categorizing offensive language in social media (offenseval). arXiv:1903.08983 es_ES
dc.description.references Zampieri M, Nakov P, Rosenthal S, Atanasova P, Karadzhov G, Mubarak H, Derczynski L, Pitenis Z, Çöltekin Ç (2020) Semeval-2020 task 12: multilingual offensive language identification in social media (offenseval 2020). arXiv:2006.07235 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem