MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge

Jorge-Cano, Javier; Giménez Pastor, Adrián; Baquero-Arnal, Pau; Iranzo-Sánchez, Javier; Pérez-González de Martos, Alejandro Manuel; Garcés Díaz-Munío, Gonçal; Silvestre Cerdà, Joan Albert; Civera Saiz, Jorge; Sanchis Navarro, José Alberto; Juan, Alfons

doi:10.21437/IberSPEECH.2021-25

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Jorge-CanoGimenez ...

Tamaño: 198.7Kb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Jorge-Cano, Javier	es_ES
dc.contributor.author	Giménez Pastor, Adrián	es_ES
dc.contributor.author	Baquero-Arnal, Pau	es_ES
dc.contributor.author	Iranzo-Sánchez, Javier	es_ES
dc.contributor.author	Pérez-González de Martos, Alejandro Manuel	es_ES
dc.contributor.author	Garcés Díaz-Munío, Gonçal	es_ES
dc.contributor.author	Silvestre Cerdà, Joan Albert	es_ES
dc.contributor.author	Civera Saiz, Jorge	es_ES
dc.contributor.author	Sanchis Navarro, José Alberto	es_ES
dc.contributor.author	Juan, Alfons	es_ES
dc.date.accessioned	2023-03-08T06:48:22Z
dc.date.available	2023-03-08T06:48:22Z
dc.date.issued	2021-03-25	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/192413
dc.description.abstract	[EN] This paper describes the automatic speech recognition (ASR) systems built by the MLLP-VRAIN research group of Universitat Politecnica de València for the Albayzin-RTVE 2020 Speech-to-Text Challenge. The primary system (p-streaming_1500ms_nlt) was a hybrid BLSTM-HMM ASR system using streaming one-pass decoding with a context window of 1.5 seconds and a linear combination of an n-gram, a LSTM, and a Transformer language model (LM). The acoustic model was trained on nearly 4,000 hours of speech data from different sources, using the MLLP's transLectures-UPV toolkit (TLK) and TensorFlow; whilst LMs were trained using SRILM (n-gram), CUED-RNNLM (LSTM) and Fairseq (Transformer), with up to 102G tokens. This system achieved 11.6% and 16.0% WER on the test-2018 and test-2020 sets, respectively. As it is streaming-enabled, it could be put into production environments for automatic captioning of live media streams, with a theoretical delay of 1.5 seconds. Along with the primary system, we also submitted three contrastive systems. From these, we highlight the system c2-streaming_600ms_t that, following the same configuration of the primary one, but using a smaller context window of 0.6 seconds and a Transformer LM, scored 12.3% and 16.9% WER points respectively on the same test sets, with a measured empirical latency of 0.81+-0.09 seconds (mean+-stdev). This is, we obtained state-of-the-art latencies for high-quality automatic live captioning with a small WER degradation of 6% relative.	es_ES
dc.description.abstract	[CA] En aquest article, es descriuen els sistemes de reconeixement automàtic de la parla (RAP) creats pel grup d'investigació MLLP-VRAIN de la Universitat Politecnica de València per a la competició Albayzin-RTVE 2020 Speech-to-Text Challenge. El sistema primari (p-streaming_1500ms_nlt) és un sistema de RAP híbrid BLSTM-HMM amb descodificació en temps real en una passada amb una finestra de context d'1,5 segons i una combinació lineal de models de llenguatge (ML) d'n-grames, LSTM i Transformer. El model acústic s'ha entrenat amb vora 4000 hores de parla transcrita de diferents fonts, usant el transLectures-UPV toolkit (TLK) del grup MLLP i TensorFlow; mentre que els ML s'han entrenat amb SRILM (n-grames), CUED-RNNLM (LSTM) i Fairseq (Transformer), amb 102G paraules (tokens). Aquest sistema ha obtingut 11,6 % i 16,0 % de WER en els conjunts test-2018 i test-2020, respectivament. És un sistema amb capacitat de temps real, que pot desplegar-se en producció per a subtitulació automàtica de fluxos audiovisuals en directe, amb un retard teòric d'1,5 segons. A banda del sistema primari, s'han presentat tres sistemes contrastius. D'aquests, destaquem el sistema c2-streaming_600ms_t que, amb la mateixa configuració que el sistema primari, però amb una finestra de context més reduïda de 0,6 segons i un ML Transformer, ha obtingut 12,3 % i 16,9 % de WER, respectivament, sobre els mateixos conjunts, amb una latència empírica mesurada de 0,81+-0,09 segons (mitjana+-desv). És a dir, s'han obtingut latències punteres per a subtitulació automàtica en directe d'alta qualitat amb una degradació del WER petita, del 6 % relatiu.	es_ES
dc.description.sponsorship	The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement no. 761758 (X5Gon); the Government of Spain¿s research project Multisub (ref. RTI2018-094879-B-I00, MCIU/AEI/FEDER,EU) and FPU scholarships FPU14/03981 and FPU18/04135; and the Generalitat Valenciana¿s research project Classroom Activity Recognition (ref. PROMETEO/2019/111) and predoctoral research scholarship ACIF/2017/055	es_ES
dc.language	Inglés	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Natural language processing	es_ES
dc.subject	Automatic speech recognition	es_ES
dc.subject	Streaming	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge	es_ES
dc.type	Comunicación en congreso	es_ES
dc.identifier.doi	10.21437/IberSPEECH.2021-25	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094879-B-I00/ES/SUBTITULACION MULTILINGUE DE CLASES DE AULA Y SESIONES PLENARIAS/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F111/ES/CLASSROOM ACTIVITY RECOGNITION/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/761758/EU/X5gon: Cross Modal, Cross Cultural, Cross Lingual, Cross Domain, and Cross Site Global OER Network/X5gon	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//ACIF%2F2017%2F055/ES/Subvenciones para la contratación de personal investigador de carácter predoctoral	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MECD/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016 en I+D+i/FPU14%2F03981/ES/Ayudas para la formación de profesorado universitario de los subprogramas de Formación y Movilidad	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MIU//FPU18%2F04135/ES/NOVEL CONTRIBUTIONS TO NEURAL SPEECH TRANSLATION/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escuela Politécnica Superior de Alcoy - Escola Politècnica Superior d'Alcoi	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.description.bibliographicCitation	Jorge-Cano, J.; Giménez Pastor, A.; Baquero-Arnal, P.; Iranzo-Sánchez, J.; Pérez-González De Martos, AM.; Garcés Díaz-Munío, G.; Silvestre Cerdà, JA.... (2021). MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge. 118-122. https://doi.org/10.21437/IberSPEECH.2021-25	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.conferencename	XI Jornadas en Tecnologías del Habla and VII Iberian SLTech Workshop (iberSPEECH 2020)	es_ES
dc.relation.conferencedate	Marzo 24-25,2021	es_ES
dc.relation.conferenceplace	Online	es_ES
dc.relation.publisherversion	https://doi.org/10.21437/IberSPEECH.2021-25	es_ES
dc.description.upvformatpinicio	118	es_ES
dc.description.upvformatpfin	122	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.relation.pasarela	S\432398	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)