TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter

González-Barba, José Ángel; Hurtado Oliver, Lluis Felip; Pla Santamaría, Ferran

doi:10.1016/j.neucom.2020.09.078

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Gonzalez-BarbaHur ...

Tamaño: 573.6Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: TWilBERT_Neurocom ...

Tamaño: 1.289Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	González-Barba, José Ángel	es_ES
dc.contributor.author	Hurtado Oliver, Lluis Felip	es_ES
dc.contributor.author	Pla Santamaría, Ferran	es_ES
dc.date.accessioned	2022-10-13T18:07:08Z
dc.date.available	2022-10-13T18:07:08Z
dc.date.issued	2021-02-22	es_ES
dc.identifier.issn	0925-2312	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/187684
dc.description.abstract	[EN] In recent years, the Natural Language Processing community have been moving from uncontextualized word embeddings towards contextualized word embeddings. Among these contextualized architectures, BERT stands out due to its capacity to compute bidirectional contextualized word representations. However, its competitive performance in English downstream tasks is not obtained by its multilingual version when it is applied to other languages and domains. This is especially true in the case of the Spanish language used in Twitter. In this work, we propose TWiLBERT, a specialization of BERT architecture both for the Spanish language and the Twitter domain. Furthermore, we propose a Reply Order Prediction signal to learn inter-sentence coherence in Twitter conversations, which improves the performance of TWilBERT in text classification tasks that require reasoning on sequences of tweets. We perform an extensive evaluation of TWilBERT models on 14 different text classification tasks, such as irony detection, sentiment analysis, or emotion detection. The results obtained by TWilBERT outperform the state-of-the-art systems and Multilingual BERT. In addition, we carry out a thorough analysis of the TWilBERT models to study the reasons of their competitive behavior. We release the pre-trained TWilBERT models used in this paper, along with a framework for training, evaluating, and fine-tuning TWilBERT models.	es_ES
dc.description.sponsorship	This work has been partially supported by the Spanish Ministerio de Ciencia, Innovacion y Universidades and FEDER founds under project AMIC (TIN2017-85854-C4-2-R), and the Generalitat Valenciana under GiSPRO (PROMETEU/2018/176) and GUAITA (INNVA1/2020/61) projects. Work of Jose Angel Gonzalez is financed by Universitat Politecnica de Valencia under grant PAID-01-17.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Elsevier	es_ES
dc.relation.ispartof	Neurocomputing	es_ES
dc.rights	Reconocimiento - No comercial - Sin obra derivada (by-nc-nd)	es_ES
dc.subject	Contextualized Embeddings	es_ES
dc.subject	Spanish	es_ES
dc.subject	Twitter	es_ES
dc.subject	TWilBERT	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1016/j.neucom.2020.09.078	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-85854-C4-2-R/ES/AMIC-UPV: ANALISIS AFECTIVO DE INFORMACION MULTIMEDIA CON COMUNICACION INCLUSIVA Y NATURAL/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/UPV//PAID-01-17/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//INNVA1%2F2020%2F61//GUAITA/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//PROMETEO%2F2018%2F176//GISPRO-GENOMIC INFORMATION SYSTEMS PRODUCTION/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	González-Barba, JÁ.; Hurtado Oliver, LF.; Pla Santamaría, F. (2021). TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter. Neurocomputing. 426:58-69. https://doi.org/10.1016/j.neucom.2020.09.078	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.neucom.2020.09.078	es_ES
dc.description.upvformatpinicio	58	es_ES
dc.description.upvformatpfin	69	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	426	es_ES
dc.relation.pasarela	S\429113	es_ES
dc.contributor.funder	GENERALITAT VALENCIANA	es_ES
dc.contributor.funder	AGENCIA ESTATAL DE INVESTIGACION	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES
dc.contributor.funder	Universitat Politècnica de València	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)