- -

Transformer-based models for multimodal irony detection

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Transformer-based models for multimodal irony detection

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Tomás, David es_ES
dc.contributor.author Ortega-Bueno, Reynier es_ES
dc.contributor.author Zhang, Guobiao es_ES
dc.contributor.author Rosso, Paolo es_ES
dc.contributor.author Schifanella, Rossano es_ES
dc.date.accessioned 2023-08-29T18:01:09Z
dc.date.available 2023-08-29T18:01:09Z
dc.date.issued 2023-06 es_ES
dc.identifier.uri http://hdl.handle.net/10251/195757
dc.description.abstract [EN] Irony is nowadays a pervasive phenomenon in social networks. The multimodal functionalities of these platforms (i.e., the possibility to attach audio, video, and images to textual information) are increasingly leading their users to employ combinations of information in different formats to express their ironic thoughts. The present work focuses on the study of irony detection in social media posts involving image and text. To this end, a transformer architecture for the fusion of textual and image information is proposed. The model leverages disentangled text attention with visual transformers, improving F1-score up to 9% over previous existing works in the field and current state-of-the-art visio-linguistic transformers. The proposed architecture was evaluated in three different multimodal datasets gathered from Twitter and Tumblr. The results revealed that, in many situations, the text-only version of the architecture was able to capture the ironic nature of the message without using visual information. This phenomenon was further analysed, leading to the identification of linguistic patterns that could provide the context necessary for irony detection without the need for additional visual information. es_ES
dc.description.sponsorship Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was partially supported by the Spanish Ministry of Science and Innovation and Fondo Europeo de Desarrollo Regional (FEDER) in the framework of project "Technological Resources for Intelligent VIral AnaLysis through NLP (TRIVIAL)" (PID2021-122263OB-C22). es_ES
dc.language Inglés es_ES
dc.publisher Springer es_ES
dc.relation.ispartof Journal of Ambient Intelligence and Humanized Computing es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Irony detection es_ES
dc.subject Transformer es_ES
dc.subject Multimodality es_ES
dc.subject Image text fusion es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Transformer-based models for multimodal irony detection es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s12652-022-04447-y es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MICINN//PID2021-122263OB-C22/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Tomás, D.; Ortega-Bueno, R.; Zhang, G.; Rosso, P.; Schifanella, R. (2023). Transformer-based models for multimodal irony detection. Journal of Ambient Intelligence and Humanized Computing. 14:7399-7410. https://doi.org/10.1007/s12652-022-04447-y es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1007/s12652-022-04447-y es_ES
dc.description.upvformatpinicio 7399 es_ES
dc.description.upvformatpfin 7410 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 14 es_ES
dc.identifier.eissn 1868-5145 es_ES
dc.relation.pasarela S\482281 es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Ministerio de Ciencia e Innovación es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
dc.description.references Agarap AF (2018) Deep learning using rectified linear units (ReLU). arXiv:1803.08375 es_ES
dc.description.references Alam F, Cresci S, Chakraborty T, et al (2021) A survey on multimodal disinformation detection. arXiv:2103.12541 es_ES
dc.description.references Cai Y, Cai H, Wan X (2019) Multi-modal sarcasm detection in Twitter with hierarchical fusion model. In: Proceedings of the 57th annual meeting of the ACL. Association for Computational Linguistics, pp 2506–2515. https://doi.org/10.18653/v1/P19-1239 es_ES
dc.description.references Cignarella AT, Basile V, Sanguinetti M, et al (2020a) Multilingual irony detection with dependency syntax and neural models. In: Proceedings of the 28th international conference on computational linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), pp 1346–1358. https://doi.org/10.18653/v1/2020.coling-main.116 es_ES
dc.description.references Cignarella AT, Sanguinetti M, Bosco C, et al (2020b) Marking irony activators in a Universal Dependencies treebank: the case of an Italian Twitter corpus. In: Proceedings of the 12th language resources and evaluation conference. European Language Resources Association, Marseille, France, pp 5098–5105. https://aclanthology.org/2020.lrec-1.627 es_ES
dc.description.references Conneau A, Khandelwal K, Goyal N, et al (2020) Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Online, pp 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747 es_ES
dc.description.references Devlin J, Chang MW, Lee K, et al (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics. Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423 es_ES
dc.description.references Dosovitskiy A, Beyer L, Kolesnikov A, et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations, pp 1–21. https://openreview.net/forum?id=YicbFdNTTy es_ES
dc.description.references Gadzicki K, Khamsehashari R, Zetzsche C (2020) Early vs late fusion in multimodal convolutional neural networks. In: 2020 IEEE 23rd international conference on information fusion (FUSION), pp 1–6. https://doi.org/10.23919/FUSION45008.2020.9190246 es_ES
dc.description.references Giachanou A, Zhang G, Rosso P (2020) Multimodal fake news detection with textual, visual and semantic information. Text, speech, and dialogue. Springer, Cham, pp 30–38 es_ES
dc.description.references Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, Oxford es_ES
dc.description.references He P, Liu X, Gao J, et al (2020) Deberta: decoding-enhanced BERT with disentangled attention. arXiv:2006.03654 es_ES
dc.description.references Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proceedings of the 56th annual meeting of the ACL. Association for Computational Linguistics, pp 328–339. https://doi.org/10.18653/v1/P18-1031 es_ES
dc.description.references Hutto C, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. Proc Int AAAI Conf Web Soc Media 8(1):216–225 es_ES
dc.description.references Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR, pp 448–456 es_ES
dc.description.references Joshi A, Bhattacharyya P, Carman MJ (2017) Automatic sarcasm detection: a survey. ACM Comput Surv 50(5):1–22. https://doi.org/10.1145/3124420 es_ES
dc.description.references Kiela D, Firooz H, Mohan A, et al (2021) The hateful memes challenge: competition report. In: Escalante HJ, Hofmann K (eds) Proceedings of the NeurIPS 2020 competition and demonstration track, proceedings of machine learning research, vol 133. PMLR, pp 344–360 es_ES
dc.description.references Li LH, Yatskar M, Yin D, et al (2019) Visualbert: a simple and performant baseline for vision and language. arXiv:1908.03557 es_ES
dc.description.references Liu Y, Ott M, Goyal N, et al (2019) Roberta: a robustly optimized bert pretraining approach, pp 1–13. arXiv preprint arXiv:1907.11692 es_ES
dc.description.references Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML workshop on deep learning for audio, speech and language processing, Atlanta, Georgia, USA, pp 1–6 es_ES
dc.description.references Mikolov T, Sutskever I, Chen K, et al (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems, vol 2. Curran Associates Inc., NIPS’13, pp 3111–3119 es_ES
dc.description.references Naseer M, Ranasinghe K, Khan S, et al (2021) Intriguing properties of vision transformers. arXiv:2105.10497 es_ES
dc.description.references Nguyen DQ, Vu T, Tuan Nguyen A (2020) BERTweet: a pre-trained language model for English tweets. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations. Association for Computational Linguistics, Online, pp 9–14. https://doi.org/10.18653/v1/2020.emnlp-demos.2, https://aclanthology.org/2020.emnlp-demos.2 es_ES
dc.description.references Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623 es_ES
dc.description.references Pan H, Lin Z, Fu P, et al (2020) Modeling intra and inter-modality incongruity for multi-modal sarcasm detection. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 1383–1392. https://doi.org/10.18653/v1/2020.findings-emnlp.124 es_ES
dc.description.references Schifanella R, de Juan P, Tetreault J, et al (2016) Detecting sarcasm in multimodal social platforms. In: Proceedings of the 24th ACM international conference on multimedia. Association for Computing Machinery, New York, NY, USA, MM ’16, pp 1136–1145. https://doi.org/10.1145/2964284.2964321 es_ES
dc.description.references Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958 es_ES
dc.description.references Tan H, Bansal M (2019) LXMERT: learning cross-modality encoder representations from transformers. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp 5100–5111. https://doi.org/10.18653/v1/D19-1514 es_ES
dc.description.references Van Hee C, Lefever E, Hoste V (2018) SemEval-2018 task 3: irony detection in English tweets. In: Proceedings of The 12th international workshop on semantic evaluation. Association for Computational Linguistics, New Orleans, Louisiana, pp 39–50. https://doi.org/10.18653/v1/S18-1005, https://aclanthology.org/S18-1005 es_ES
dc.description.references Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30. Curran Associates, Inc., pp 5998–6008 es_ES
dc.description.references Wang X, Sun X, Yang T, et al (2020) Building a bridge: a method for image-text sarcasm detection without pretraining on image-text data. In: Proceedings of the first international workshop on natural language processing beyond text. Association for Computational Linguistics, pp 19–29. https://doi.org/10.18653/v1/2020.nlpbt-1.3 es_ES
dc.description.references Xu N, Zeng Z, Mao W (2020) Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 3777–3786. https://doi.org/10.18653/v1/2020.acl-main.349 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem