Mostrar el registro sencillo del ítem
dc.contributor.author | Peña-Sarracén, Gretel Liz de la | es_ES |
dc.contributor.author | Rosso, Paolo | es_ES |
dc.date.accessioned | 2022-11-07T19:01:36Z | |
dc.date.available | 2022-11-07T19:01:36Z | |
dc.date.issued | 2021-08-27 | es_ES |
dc.identifier.issn | 1617-4909 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/189377 | |
dc.description.abstract | [EN] The proliferation of harmful content on social media affects a large part of the user community. Therefore, several approaches have emerged to control this phenomenon automatically. However, this is still a quite challenging task. In this paper, we explore the offensive language as a particular case of harmful content and focus our study in the analysis of keywords in available datasets composed of offensive tweets. Thus, we aim to identify relevant words in those datasets and analyze how they can affect model learning. For keyword extraction, we propose an unsupervised hybrid approach which combines the multi-head self-attention of BERT and a reasoning on a word graph. The attention mechanism allows to capture relationships among words in a context, while a language model is learned. Then, the relationships are used to generate a graph from what we identify the most relevant words by using the eigenvector centrality. Experiments were performed by means of two mechanisms. On the one hand, we used an information retrieval system to evaluate the impact of the keywords in recovering offensive tweets from a dataset. On the other hand, we evaluated a keyword-based model for offensive language detection. Results highlight some points to consider when training models with available datasets. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Springer-Verlag | es_ES |
dc.relation.ispartof | Personal and Ubiquitous Computing | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Unsupervised keyword extraction | es_ES |
dc.subject | Offensive language detection | es_ES |
dc.subject | Attention mechanism | es_ES |
dc.subject | Graph representation | es_ES |
dc.title | Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1007/s00779-021-01605-5 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PGC2018-096212-B-C31/ES/DESINFORMACION Y AGRESIVIDAD EN SOCIAL MEDIA: AGREGANDO INFORMACION Y ANALIZANDO EL LENGUAJE/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.description.bibliographicCitation | Peña-Sarracén, GLDL.; Rosso, P. (2021). Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation. Personal and Ubiquitous Computing. 1-13. https://doi.org/10.1007/s00779-021-01605-5 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1007/s00779-021-01605-5 | es_ES |
dc.description.upvformatpinicio | 1 | es_ES |
dc.description.upvformatpfin | 13 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.relation.pasarela | S\450464 | es_ES |
dc.contributor.funder | AGENCIA ESTATAL DE INVESTIGACION | es_ES |
dc.description.references | Ao X, Yu X, Liu D, Tian H (2020) News keywords extraction algorithm based on textrank and classified TF-IDF. In: 2020 international wireless communications and mobile computing (IWCMC). IEEE, pp 1364–1369 | es_ES |
dc.description.references | Basile V, Bosco C, Fersini E, Debora N, Patti V, Pardo FMR, Rosso P, Sanguinetti M, et al. (2019) Semeval-2019 task 5: multilingual detection of hate speech against immigrants and women in twitter. In: 13th international workshop on semantic evaluation. Association for Computational Linguistics, pp 54–63 | es_ES |
dc.description.references | Berry MW, Kogan J (2010) Text mining: applications and theory. John Wiley & Sons, New York | es_ES |
dc.description.references | Boudin F (2013) A comparison of centrality measures for graph-based keyphrase extraction. In: Proceedings of the sixth international joint conference on natural language processing, pp 834–838 | es_ES |
dc.description.references | Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the seventh international conference on World Wide Web, pp 107–117 | es_ES |
dc.description.references | Büttcher S, Clarke CL, Cormack GV (2016) Information retrieval: implementing and evaluating search engines. Mit Press, Cambridge | es_ES |
dc.description.references | Casula C, Aprosio AP, Menini S, Tonelli S (2020) Fbk-dh at semeval-2020 task 12: using multi-channel bert for multilingual offensive language detection. In: Proceedings of the fourteenth workshop on semantic evaluation, pp 1539–1545 | es_ES |
dc.description.references | Chaudhari S, Polatkan G, Ramanath R, Mithal V (2019) An attentive survey of attention models. arXiv:1904.02874 | es_ES |
dc.description.references | Dai W, Yu T, Liu Z, Fung P (2020) Kungfupanda at semeval-2020 task 12: Bert-based multi-task learning for offensive language detection. arXiv:2004.13432 | es_ES |
dc.description.references | Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 | es_ES |
dc.description.references | Fersini E, Rosso P, Anzovino M (2018) Overview of the task on automatic misogyny identification at IberEval 2018. IberEval@ SEPLN 2150:214–228 | es_ES |
dc.description.references | Firoozeh N, Nazarenko A, Alizon F, Daille B (2020) Keyword extraction: issues and methods. Nat Lang Eng 26(3):259–291 | es_ES |
dc.description.references | Hasan KS, Ng V (2014) Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1262–1273 | es_ES |
dc.description.references | Hu X, Wu B (2006) Automatic keyword extraction using linguistic features. In: Sixth IEEE international conference on data mining-workshops (ICDMW’06). IEEE, pp 19–23 | es_ES |
dc.description.references | Kathait SS, Tiwari S, Varshney A, Sharma A (2017) Unsupervised key-phrase extraction using noun phrases. Int J Comput Appl 162(1) | es_ES |
dc.description.references | Kaur J, Gupta V (2010) Effective approaches for extraction of keywords. Int J Comput Sci Issues (IJCSI) 7(6):144 | es_ES |
dc.description.references | Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980 | es_ES |
dc.description.references | Mandl T, Modha S, Majumder P, Patel D, Dave M, Mandlia C, Patel A (2019) Overview of the HASOC track at FIRE 2019: hate speech and offensive content identification in indo-european languages. In: Proceedings of the 11th forum for information retrieval evaluation, pp 14–17 | es_ES |
dc.description.references | Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411 | es_ES |
dc.description.references | Nasar Z, Jaffry SW, Malik MK (2019) Textual keyword extraction and summarization: state-of-the-art. Inf Process Manag 56(6):102088 | es_ES |
dc.description.references | Newman ME (2008) The mathematics of networks. New Palgrave Encycl Econ 2(2008):1–12 | es_ES |
dc.description.references | Pappagari R, Zelasko P, Villalba J, Carmiel Y, Dehak N (2019) Hierarchical transformers for long document classification. In: 2019 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, pp 838–844 | es_ES |
dc.description.references | De la Pena Sarracén GL, Rosso P (2020) Prhlt-upv at semeval-2020 task 12: Bert for multilingual offensive language detection. In: Proceedings of the fourteenth workshop on semantic evaluation, pp 1605–1614 | es_ES |
dc.description.references | Pitsilis GK, Ramampiaro H, Langseth H (2018) Detecting offensive language in tweets using deep learning. arXiv:1801.04433 | es_ES |
dc.description.references | Poletto F, Basile V, Sanguinetti M, Bosco C, Patti V (2020) Resources and benchmark corpora for hate speech detection: a systematic review. Lang Resour Eval pp 1–47 | es_ES |
dc.description.references | Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M, et al. (1995) Okapi at trec-3. Nist Spec Publ 109:109 | es_ES |
dc.description.references | Rosenthal S, Atanasova P, Karadzhov G, Zampieri M, Nakov P (2020) A large-scale semi-supervised dataset for offensive language identification. arXiv:2004.14454 | es_ES |
dc.description.references | Sahrawat D, Mahata D, Kulkarni M, Zhang H, Gosangi R, Stent A, Sharma A, Kumar Y, Shah RR, Zimmermann R (2019) Keyphrase extraction from scholarly articles as sequence labeling using contextualized embeddings. arXiv:1910.08840 | es_ES |
dc.description.references | Uglow H, Zlocha M, Zmyślony S (2019) An exploration of state-of-the-art methods for offensive language detection. arXiv:1903.07445 | es_ES |
dc.description.references | Vashistha N, Zubiaga A (2020) Online multilingual hate speech detection: experimenting with Hindi and English social media | es_ES |
dc.description.references | Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008 | es_ES |
dc.description.references | Wang S, Liu J, Ouyang X, Sun Y (2020) Galileo at semeval-2020 task 12: multi-lingual learning for offensive language identification using pre-trained language models. arXiv:2010.03542 | es_ES |
dc.description.references | Wani AH, Molvi NS, Ashraf SI (2019) Detection of hate and offensive speech in text. In: International conference on intelligent human computer interaction. Springer, pp 87–93 | es_ES |
dc.description.references | Wiedemann G, Yimam SM, Biemann C (2020) Uhh-lt at semeval-2020 task 12: fine-tuning of pre-trained transformer networks for offensive language detection. In: Proceedings of the fourteenth workshop on semantic evaluation, pp 1638–1644 | es_ES |
dc.description.references | Wiegand M, Ruppenhofer J, Kleinbauer T (2019) Detection of abusive language: the problem of biased datasets. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, volume 1 (long and short papers), pp 602–608 | es_ES |
dc.description.references | Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG (2005) KEA: practical automated keyphrase extraction. In: Design and usability of digital libraries: case studies in the asia pacific. IGI Global, pp 129–152 | es_ES |
dc.description.references | Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Predicting the type and target of offensive posts in social media. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics (NAACL), pp 1415–1420 | es_ES |
dc.description.references | Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Predicting the type and target of offensive posts in social media. arXiv:1902.09666 | es_ES |
dc.description.references | Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Semeval-2019 task 6: identifying and categorizing offensive language in social media (offenseval). arXiv:1903.08983 | es_ES |
dc.description.references | Zampieri M, Nakov P, Rosenthal S, Atanasova P, Karadzhov G, Mubarak H, Derczynski L, Pitenis Z, Çöltekin Ç (2020) Semeval-2020 task 12: multilingual offensive language identification in social media (offenseval 2020). arXiv:2006.07235 | es_ES |