Baquero-Arnal, P.; Jorge-Cano, J.; Giménez Pastor, A.; Iranzo-Sánchez, J.; Pérez-González De Martos, AM.; Garcés Díaz-Munío, G.; Silvestre Cerdà, JA.... (2022). MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension. Applied Sciences. 12(2):1-14. https://doi.org/10.3390/app12020804
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/194315
Título:
|
MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension
|
Autor:
|
Baquero-Arnal, Pau
Jorge-Cano, Javier
Giménez Pastor, Adrián
Iranzo-Sánchez, Javier
Pérez-González de Martos, Alejandro Manuel
Garcés Díaz-Munío, Gonçal
Silvestre Cerdà, Joan Albert
Civera Saiz, Jorge
Sanchis Navarro, José Alberto
Juan, Alfons
|
Entidad UPV:
|
Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica
Universitat Politècnica de València. Escuela Politécnica Superior de Alcoy - Escola Politècnica Superior d'Alcoi
|
Fecha difusión:
|
|
Resumen:
|
[EN] This paper describes the automatic speech recognition (ASR) systems built by the MLLP-VRAIN research group of Universitat Politècnica de València for the Albayzín-RTVE 2020 Speech-to-Text Challenge, and includes an ...[+]
[EN] This paper describes the automatic speech recognition (ASR) systems built by the MLLP-VRAIN research group of Universitat Politècnica de València for the Albayzín-RTVE 2020 Speech-to-Text Challenge, and includes an extension of the work consisting of building and evaluating equivalent systems under the closed data conditions from the 2018 challenge. The primary system (p-streaming_1500ms_nlt) was a hybrid ASR system using streaming one-pass decoding with a context window of 1.5 seconds. This system achieved 16.0% WER on the test-2020 set. We also submitted three contrastive systems. From these, we highlight the system c2-streaming_600ms_t which, following a similar configuration as the primary system with a smaller context window of 0.6 s, scored 16.9% WER points on the same test set, with a measured empirical latency of 0.81 ± 0.09 s (mean ± stdev). That is, we obtained state-of-the-art latencies for high-quality automatic live captioning with a small WER degradation of 6% relative. As an extension, the equivalent closed-condition systems obtained 23.3% WER and 23.5% WER, respectively. When evaluated with an unconstrained language model, we obtained 19.9% WER and 20.4% WER; i.e., not far behind the top-performing systems with only 5% of the full acoustic data and with the extra ability of being streaming-capable. Indeed, all of these streaming systems could be put into production environments for automatic captioning of live media streams.
[-]
|
Palabras clave:
|
Natural language processing
,
Automatic speech recognition
,
Streaming
|
Derechos de uso:
|
Reconocimiento (by)
|
Fuente:
|
Applied Sciences. (eissn:
2076-3417
)
|
DOI:
|
10.3390/app12020804
|
Editorial:
|
MDPI AG
|
Versión del editor:
|
https://doi.org/10.3390/app12020804
|
Coste APC:
|
1800 €
|
Código del Proyecto:
|
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094879-B-I00/ES/SUBTITULACION MULTILINGUE DE CLASES DE AULA Y SESIONES PLENARIAS/
...[+]
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094879-B-I00/ES/SUBTITULACION MULTILINGUE DE CLASES DE AULA Y SESIONES PLENARIAS/
info:eu-repo/grantAgreement/EC/Erasmus+/2020-1-SI01-KA226-SCH-093604/EU/Educational eXplanations and Practices in Emergency Remote Teaching
info:eu-repo/grantAgreement/EC/H2020/761758/EU/X5gon: Cross Modal, Cross Cultural, Cross Lingual, Cross Domain, and Cross Site Global OER Network/X5gon
info:eu-repo/grantAgreement/MECYD//AP2014%2F03981//AYUDA CONTRATO FPU 2014-JORGE CANO/
info:eu-repo/grantAgreement/EC/H2020/952215/EU/Foundations of Trustworthy AI - Integrating Reasoning, Learning and Optimization/TAILOR
info:eu-repo/grantAgreement/MIU//FPU18%2F04135/ES/NOVEL CONTRIBUTIONS TO NEURAL SPEECH TRANSLATION/
info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F111//CLASSROOM ACTIVITY RECOGNITION/
info:eu-repo/grantAgreement/GVA//ACIF%2F2017%2F055/ES/Subvenciones para la contratación de personal investigador de carácter predoctoral
info:eu-repo/grantAgreement/UPV/Programas de Apoyo a la I+D+i/PAID-01-17/ES/Ayudas para Contratos de Acceso de personal investigador doctor en estructuras de investigación de la Universitat Politècnica de València 2017- Subprograma 1/
[-]
|
Agradecimientos:
|
The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements no. 761758 (X5Gon) and 952215 (TAILOR), and Erasmus+ Education ...[+]
The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements no. 761758 (X5Gon) and 952215 (TAILOR), and Erasmus+ Education programme under grant agreement no. 20-226-093604-SCH (EXPERT); the Government of Spain's grant RTI2018-094879-B-I00 (Multisub) funded by MCIN/AEI/10.13039/501100011033 & "ERDF A way of making Europe", and FPU scholarships FPU14/03981 and FPU18/04135; the Generalitat Valenciana's research project Classroom Activity Recognition (ref. PROMETEO/2019/111), and predoctoral research scholarship ACIF/2017/055; and the Universitat Politecnica de Valencia's PAID-01-17 R&D support programme.
[-]
|
Tipo:
|
Artículo
|