NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

Ahuir-Esteve, Vicent; Hurtado Oliver, Lluis Felip; González-Barba, José Ángel; Segarra Soriano, Encarnación

doi:10.3390/app11219872

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Ahuir-EsteveHurta ...

Tamaño: 442.6Kb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Ahuir-Esteve, Vicent	es_ES
dc.contributor.author	Hurtado Oliver, Lluis Felip	es_ES
dc.contributor.author	González-Barba, José Ángel	es_ES
dc.contributor.author	Segarra Soriano, Encarnación	es_ES
dc.date.accessioned	2022-05-24T18:05:05Z
dc.date.available	2022-05-24T18:05:05Z
dc.date.issued	2021-11	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/182874
dc.description.abstract	[EN] Most of the models proposed in the literature for abstractive summarization are generally suitable for the English language but not for other languages. Multilingual models were introduced to address that language constraint, but despite their applicability being broader than that of the monolingual models, their performance is typically lower, especially for minority languages like Catalan. In this paper, we present a monolingual model for abstractive summarization of textual content in the Catalan language. The model is a Transformer encoder-decoder which is pretrained and fine-tuned specifically for the Catalan language using a corpus of newspaper articles. In the pretraining phase, we introduced several self-supervised tasks to specialize the model on the summarization task and to increase the abstractivity of the generated summaries. To study the performance of our proposal in languages with higher resources than Catalan, we replicate the model and the experimentation for the Spanish language. The usual evaluation metrics, not only the most used ROUGE measure but also other more semantic ones such as BertScore, do not allow to correctly evaluate the abstractivity of the generated summaries. In this work, we also present a new metric, called content reordering, to evaluate one of the most common characteristics of abstractive summaries, the rearrangement of the original content. We carried out an exhaustive experimentation to compare the performance of the monolingual models proposed in this work with two of the most widely used multilingual models in text summarization, mBART and mT5. The experimentation results support the quality of our monolingual models, especially considering that the multilingual models were pretrained with many more resources than those used in our models. Likewise, it is shown that the pretraining tasks helped to increase the degree of abstractivity of the generated summaries. To our knowledge, this is the first work that explores a monolingual approach for abstractive summarization both in Catalan and Spanish.	es_ES
dc.description.sponsorship	This work was partially supported by the Spanish Ministerio de Ciencia, Innovacion y Universidades and FEDER founds under the project AMIC (TIN2017-85854-C4-2-R), and by the Agencia Valenciana de la Innovacio (AVI) of the Generalitat Valenciana under the GUAITA (INNVA1/2020/61) project.	es_ES
dc.language	Inglés	es_ES
dc.publisher	MDPI AG	es_ES
dc.relation.ispartof	Applied Sciences	es_ES
dc.rights	Reconocimiento (by)	es_ES
dc.subject	Abstractive summarization	es_ES
dc.subject	Monolingual models	es_ES
dc.subject	Multilingual models	es_ES
dc.subject	Transformer models	es_ES
dc.subject	Transfer learning	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.3390/app11219872	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-85854-C4-2-R/ES/AMIC-UPV: ANALISIS AFECTIVO DE INFORMACION MULTIMEDIA CON COMUNICACION INCLUSIVA Y NATURAL/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AGENCIA VALENCIANA DE LA INNOVACION//INNVA1%2F2020%2F61//GUAITA: MONITORIZACION Y ANALISIS DE REDES SOCIALES PARA LA AYUDA A LA TOMA DE DECISIONES/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Ahuir-Esteve, V.; Hurtado Oliver, LF.; González-Barba, JÁ.; Segarra Soriano, E. (2021). NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish. Applied Sciences. 11(21):1-16. https://doi.org/10.3390/app11219872	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.3390/app11219872	es_ES
dc.description.upvformatpinicio	1	es_ES
dc.description.upvformatpfin	16	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	11	es_ES
dc.description.issue	21	es_ES
dc.identifier.eissn	2076-3417	es_ES
dc.relation.pasarela	S\453465	es_ES
dc.contributor.funder	AGENCIA ESTATAL DE INVESTIGACION	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES
dc.contributor.funder	AGENCIA VALENCIANA DE LA INNOVACION	es_ES
upv.costeAPC	2000	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)