Baquero-Arnal, P.; Jorge-Cano, J.; Giménez Pastor, A.; Silvestre Cerdà, JA.; Iranzo-Sánchez, J.; Sanchis Navarro, JA.; Civera Saiz, J.... (2020). Improved Hybrid Streaming ASR with Transformer Language Models. 2127-2131. https://doi.org/10.21437/Interspeech.2020
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/213952
Título:
|
Improved Hybrid Streaming ASR with Transformer Language Models
|
Autor:
|
Baquero-Arnal, Pau
Jorge-Cano, Javier
Giménez Pastor, Adrián
Silvestre Cerdà, Joan Albert
Iranzo-Sánchez, Javier
Sanchis Navarro, José Alberto
Civera Saiz, Jorge
Juan, Alfons
|
Entidad UPV:
|
Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica
Universitat Politècnica de València. Escuela Politécnica Superior de Alcoy - Escola Politècnica Superior d'Alcoi
|
Fecha difusión:
|
|
Resumen:
|
[EN] Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay ...[+]
[EN] Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay after the incoming audio stream. Following our previous work on streaming one-pass decoding with hybrid ASR systems and LSTM language models, in this work we report further improvements by replacing LSTMs with Transformer models. First, two key ideas are discussed so as to run these models fast during inference. Then, empirical results on LibriSpeech and TED-LIUM are provided showing that Transformer language models lead to improved recognition rates on both tasks. ASR systems obtained in this work can be seamlessly transfered to a streaming setup with minimal quality losses. Indeed, to the best of our knowledge, no better results have been reported on these tasks when assessed under a streaming setup.
[-]
|
Palabras clave:
|
Streaming
,
Hybrid ASR
,
Language models
,
Transformer
|
Derechos de uso:
|
Reserva de todos los derechos
|
Fuente:
|
|
DOI:
|
10.21437/Interspeech.2020
|
Versión del editor:
|
https://doi.org/10.21437/Interspeech.2020
|
Título del congreso:
|
21st Annual Conference of the International Speech Communication Association (INTERSPEECH 2020)
|
Lugar del congreso:
|
Online
|
Fecha congreso:
|
Octubre 25-29,2020
|
Código del Proyecto:
|
info:eu-repo/grantAgreement/EC/H2020/761758/EU/X5gon: Cross Modal, Cross Cultural, Cross Lingual, Cross Domain, and Cross Site Global OER Network/
info:eu-repo/grantAgreement/GVA//ACIF%2F2017%2F055/ES/Subvenciones para la contratación de personal investigador de carácter predoctoral
info:eu-repo/grantAgreement/AEI//RTI2018-094879-B-I00-AR/ES/SUBTITULACIÓN MULTILINGÜE DE CLASES DE AULA Y SESIONES PLENARIAS/
|
Agradecimientos:
|
The research leading to these results has received funding from the European Union s Horizon 2020 research and innovation program under grant agreement no. 761758 (X5Gon); the Government of Spain s research project Multisub, ...[+]
The research leading to these results has received funding from the European Union s Horizon 2020 research and innovation program under grant agreement no. 761758 (X5Gon); the Government of Spain s research project Multisub, ref. RTI2018-094879-B-I00 (MCIU/AEI/FEDER,EU); and the Generalitat Valenciana predoctoral research scholarship ACIF/2017/055
[-]
|
Tipo:
|
Comunicación en congreso
Artículo
|