Transformer-based models for multimodal irony detection

Tomás, David; Ortega-Bueno, Reynier; Zhang, Guobiao; Rosso, Paolo; Schifanella, Rossano

doi:10.1007/s12652-022-04447-y

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Transformer-based models for multimodal irony detection

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: TomasOrtega-BuenoZhang ...

Tamaño: 896.2Kb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Tomás, David	es_ES
dc.contributor.author	Ortega-Bueno, Reynier	es_ES
dc.contributor.author	Zhang, Guobiao	es_ES
dc.contributor.author	Rosso, Paolo	es_ES
dc.contributor.author	Schifanella, Rossano	es_ES
dc.date.accessioned	2023-08-29T18:01:09Z
dc.date.available	2023-08-29T18:01:09Z
dc.date.issued	2023-06	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/195757
dc.description.abstract	[EN] Irony is nowadays a pervasive phenomenon in social networks. The multimodal functionalities of these platforms (i.e., the possibility to attach audio, video, and images to textual information) are increasingly leading their users to employ combinations of information in different formats to express their ironic thoughts. The present work focuses on the study of irony detection in social media posts involving image and text. To this end, a transformer architecture for the fusion of textual and image information is proposed. The model leverages disentangled text attention with visual transformers, improving F1-score up to 9% over previous existing works in the field and current state-of-the-art visio-linguistic transformers. The proposed architecture was evaluated in three different multimodal datasets gathered from Twitter and Tumblr. The results revealed that, in many situations, the text-only version of the architecture was able to capture the ironic nature of the message without using visual information. This phenomenon was further analysed, leading to the identification of linguistic patterns that could provide the context necessary for irony detection without the need for additional visual information.	es_ES
dc.description.sponsorship	Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was partially supported by the Spanish Ministry of Science and Innovation and Fondo Europeo de Desarrollo Regional (FEDER) in the framework of project "Technological Resources for Intelligent VIral AnaLysis through NLP (TRIVIAL)" (PID2021-122263OB-C22).	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer	es_ES
dc.relation.ispartof	Journal of Ambient Intelligence and Humanized Computing	es_ES
dc.rights	Reconocimiento (by)	es_ES
dc.subject	Irony detection	es_ES
dc.subject	Transformer	es_ES
dc.subject	Multimodality	es_ES
dc.subject	Image text fusion	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Transformer-based models for multimodal irony detection	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1007/s12652-022-04447-y	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MICINN//PID2021-122263OB-C22/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.description.bibliographicCitation	Tomás, D.; Ortega-Bueno, R.; Zhang, G.; Rosso, P.; Schifanella, R. (2023). Transformer-based models for multimodal irony detection. Journal of Ambient Intelligence and Humanized Computing. 14:7399-7410. https://doi.org/10.1007/s12652-022-04447-y	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1007/s12652-022-04447-y	es_ES
dc.description.upvformatpinicio	7399	es_ES
dc.description.upvformatpfin	7410	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	14	es_ES
dc.identifier.eissn	1868-5145	es_ES
dc.relation.pasarela	S\482281	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES
dc.contributor.funder	Ministerio de Ciencia e Innovación	es_ES
dc.contributor.funder	Universitat Politècnica de València	es_ES
dc.description.references	Agarap AF (2018) Deep learning using rectified linear units (ReLU). arXiv:1803.08375	es_ES
dc.description.references	Alam F, Cresci S, Chakraborty T, et al (2021) A survey on multimodal disinformation detection. arXiv:2103.12541	es_ES
dc.description.references	Cai Y, Cai H, Wan X (2019) Multi-modal sarcasm detection in Twitter with hierarchical fusion model. In: Proceedings of the 57th annual meeting of the ACL. Association for Computational Linguistics, pp 2506–2515. https://doi.org/10.18653/v1/P19-1239	es_ES
dc.description.references	Cignarella AT, Basile V, Sanguinetti M, et al (2020a) Multilingual irony detection with dependency syntax and neural models. In: Proceedings of the 28th international conference on computational linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), pp 1346–1358. https://doi.org/10.18653/v1/2020.coling-main.116	es_ES
dc.description.references	Cignarella AT, Sanguinetti M, Bosco C, et al (2020b) Marking irony activators in a Universal Dependencies treebank: the case of an Italian Twitter corpus. In: Proceedings of the 12th language resources and evaluation conference. European Language Resources Association, Marseille, France, pp 5098–5105. https://aclanthology.org/2020.lrec-1.627	es_ES
dc.description.references	Conneau A, Khandelwal K, Goyal N, et al (2020) Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Online, pp 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747	es_ES
dc.description.references	Devlin J, Chang MW, Lee K, et al (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics. Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423	es_ES
dc.description.references	Dosovitskiy A, Beyer L, Kolesnikov A, et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations, pp 1–21. https://openreview.net/forum?id=YicbFdNTTy	es_ES
dc.description.references	Gadzicki K, Khamsehashari R, Zetzsche C (2020) Early vs late fusion in multimodal convolutional neural networks. In: 2020 IEEE 23rd international conference on information fusion (FUSION), pp 1–6. https://doi.org/10.23919/FUSION45008.2020.9190246	es_ES
dc.description.references	Giachanou A, Zhang G, Rosso P (2020) Multimodal fake news detection with textual, visual and semantic information. Text, speech, and dialogue. Springer, Cham, pp 30–38	es_ES
dc.description.references	Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, Oxford	es_ES
dc.description.references	He P, Liu X, Gao J, et al (2020) Deberta: decoding-enhanced BERT with disentangled attention. arXiv:2006.03654	es_ES
dc.description.references	Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proceedings of the 56th annual meeting of the ACL. Association for Computational Linguistics, pp 328–339. https://doi.org/10.18653/v1/P18-1031	es_ES
dc.description.references	Hutto C, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. Proc Int AAAI Conf Web Soc Media 8(1):216–225	es_ES
dc.description.references	Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR, pp 448–456	es_ES
dc.description.references	Joshi A, Bhattacharyya P, Carman MJ (2017) Automatic sarcasm detection: a survey. ACM Comput Surv 50(5):1–22. https://doi.org/10.1145/3124420	es_ES
dc.description.references	Kiela D, Firooz H, Mohan A, et al (2021) The hateful memes challenge: competition report. In: Escalante HJ, Hofmann K (eds) Proceedings of the NeurIPS 2020 competition and demonstration track, proceedings of machine learning research, vol 133. PMLR, pp 344–360	es_ES
dc.description.references	Li LH, Yatskar M, Yin D, et al (2019) Visualbert: a simple and performant baseline for vision and language. arXiv:1908.03557	es_ES
dc.description.references	Liu Y, Ott M, Goyal N, et al (2019) Roberta: a robustly optimized bert pretraining approach, pp 1–13. arXiv preprint arXiv:1907.11692	es_ES
dc.description.references	Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML workshop on deep learning for audio, speech and language processing, Atlanta, Georgia, USA, pp 1–6	es_ES
dc.description.references	Mikolov T, Sutskever I, Chen K, et al (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems, vol 2. Curran Associates Inc., NIPS’13, pp 3111–3119	es_ES
dc.description.references	Naseer M, Ranasinghe K, Khan S, et al (2021) Intriguing properties of vision transformers. arXiv:2105.10497	es_ES
dc.description.references	Nguyen DQ, Vu T, Tuan Nguyen A (2020) BERTweet: a pre-trained language model for English tweets. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations. Association for Computational Linguistics, Online, pp 9–14. https://doi.org/10.18653/v1/2020.emnlp-demos.2, https://aclanthology.org/2020.emnlp-demos.2	es_ES
dc.description.references	Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623	es_ES
dc.description.references	Pan H, Lin Z, Fu P, et al (2020) Modeling intra and inter-modality incongruity for multi-modal sarcasm detection. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 1383–1392. https://doi.org/10.18653/v1/2020.findings-emnlp.124	es_ES
dc.description.references	Schifanella R, de Juan P, Tetreault J, et al (2016) Detecting sarcasm in multimodal social platforms. In: Proceedings of the 24th ACM international conference on multimedia. Association for Computing Machinery, New York, NY, USA, MM ’16, pp 1136–1145. https://doi.org/10.1145/2964284.2964321	es_ES
dc.description.references	Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958	es_ES
dc.description.references	Tan H, Bansal M (2019) LXMERT: learning cross-modality encoder representations from transformers. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp 5100–5111. https://doi.org/10.18653/v1/D19-1514	es_ES
dc.description.references	Van Hee C, Lefever E, Hoste V (2018) SemEval-2018 task 3: irony detection in English tweets. In: Proceedings of The 12th international workshop on semantic evaluation. Association for Computational Linguistics, New Orleans, Louisiana, pp 39–50. https://doi.org/10.18653/v1/S18-1005, https://aclanthology.org/S18-1005	es_ES
dc.description.references	Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30. Curran Associates, Inc., pp 5998–6008	es_ES
dc.description.references	Wang X, Sun X, Yang T, et al (2020) Building a bridge: a method for image-text sarcasm detection without pretraining on image-text data. In: Proceedings of the first international workshop on natural language processing beyond text. Association for Computational Linguistics, pp 19–29. https://doi.org/10.18653/v1/2020.nlpbt-1.3	es_ES
dc.description.references	Xu N, Zeng Z, Mao W (2020) Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 3777–3786. https://doi.org/10.18653/v1/2020.acl-main.349	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [49240]

Mostrar el registro sencillo del ítem

Transformer-based models for multimodal irony detection

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Transformer-based models for multimodal irony detection

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)