- -

NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Ahuir-Esteve, Vicent es_ES
dc.contributor.author Hurtado Oliver, Lluis Felip es_ES
dc.contributor.author González-Barba, José Ángel es_ES
dc.contributor.author Segarra Soriano, Encarnación es_ES
dc.date.accessioned 2022-05-24T18:05:05Z
dc.date.available 2022-05-24T18:05:05Z
dc.date.issued 2021-11 es_ES
dc.identifier.uri http://hdl.handle.net/10251/182874
dc.description.abstract [EN] Most of the models proposed in the literature for abstractive summarization are generally suitable for the English language but not for other languages. Multilingual models were introduced to address that language constraint, but despite their applicability being broader than that of the monolingual models, their performance is typically lower, especially for minority languages like Catalan. In this paper, we present a monolingual model for abstractive summarization of textual content in the Catalan language. The model is a Transformer encoder-decoder which is pretrained and fine-tuned specifically for the Catalan language using a corpus of newspaper articles. In the pretraining phase, we introduced several self-supervised tasks to specialize the model on the summarization task and to increase the abstractivity of the generated summaries. To study the performance of our proposal in languages with higher resources than Catalan, we replicate the model and the experimentation for the Spanish language. The usual evaluation metrics, not only the most used ROUGE measure but also other more semantic ones such as BertScore, do not allow to correctly evaluate the abstractivity of the generated summaries. In this work, we also present a new metric, called content reordering, to evaluate one of the most common characteristics of abstractive summaries, the rearrangement of the original content. We carried out an exhaustive experimentation to compare the performance of the monolingual models proposed in this work with two of the most widely used multilingual models in text summarization, mBART and mT5. The experimentation results support the quality of our monolingual models, especially considering that the multilingual models were pretrained with many more resources than those used in our models. Likewise, it is shown that the pretraining tasks helped to increase the degree of abstractivity of the generated summaries. To our knowledge, this is the first work that explores a monolingual approach for abstractive summarization both in Catalan and Spanish. es_ES
dc.description.sponsorship This work was partially supported by the Spanish Ministerio de Ciencia, Innovacion y Universidades and FEDER founds under the project AMIC (TIN2017-85854-C4-2-R), and by the Agencia Valenciana de la Innovacio (AVI) of the Generalitat Valenciana under the GUAITA (INNVA1/2020/61) project. es_ES
dc.language Inglés es_ES
dc.publisher MDPI AG es_ES
dc.relation.ispartof Applied Sciences es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Abstractive summarization es_ES
dc.subject Monolingual models es_ES
dc.subject Multilingual models es_ES
dc.subject Transformer models es_ES
dc.subject Transfer learning es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.3390/app11219872 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-85854-C4-2-R/ES/AMIC-UPV: ANALISIS AFECTIVO DE INFORMACION MULTIMEDIA CON COMUNICACION INCLUSIVA Y NATURAL/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AGENCIA VALENCIANA DE LA INNOVACION//INNVA1%2F2020%2F61//GUAITA: MONITORIZACION Y ANALISIS DE REDES SOCIALES PARA LA AYUDA A LA TOMA DE DECISIONES/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Ahuir-Esteve, V.; Hurtado Oliver, LF.; González-Barba, JÁ.; Segarra Soriano, E. (2021). NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish. Applied Sciences. 11(21):1-16. https://doi.org/10.3390/app11219872 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.3390/app11219872 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 16 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 11 es_ES
dc.description.issue 21 es_ES
dc.identifier.eissn 2076-3417 es_ES
dc.relation.pasarela S\453465 es_ES
dc.contributor.funder AGENCIA ESTATAL DE INVESTIGACION es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder AGENCIA VALENCIANA DE LA INNOVACION es_ES
upv.costeAPC 2000 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem