Jorge-Cano, J.; Giménez Pastor, A.; Silvestre Cerdà, JA.; Civera Saiz, J.; Sanchis Navarro, JA.; Juan, A. (2022). Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models. IEEE/ACM Transactions on Audio Speech and Language Processing. 30:148-161. https://doi.org/10.1109/TASLP.2021.3133216
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/182807
Title:
|
Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models
|
Author:
|
Jorge-Cano, Javier
Giménez Pastor, Adrián
Silvestre Cerdà, Joan Albert
Civera Saiz, Jorge
Sanchis Navarro, José Alberto
Juan, Alfons
|
UPV Unit:
|
Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
|
Issued date:
|
|
Abstract:
|
[EN] Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After ...[+]
[EN] Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After gaining considerable experience on this regard in recent years, in this paper we show how an optimized, low-latency streaming decoder can be built in which bidirectional LSTM acoustic models, together with general interpolated language models, can be nicely integrated with minimal performance degradation. In brief, our streaming decoder consists of a one-pass, real-time search engine relying on a limited-duration window sliding over time and a number of ad hoc acoustic and language model pruning techniques. Extensive empirical assessment is provided on truly streaming tasks derived from the well-known LibriSpeech and TED talks datasets, as well as from TV shows on a main Spanish broadcasting station.
[-]
|
Subjects:
|
Automatic speech recognition
,
Streaming
,
Decoding
,
Acoustic modeling
,
Language modeling
,
Neural networks
|
Copyrigths:
|
Reconocimiento (by)
|
Source:
|
IEEE/ACM Transactions on Audio Speech and Language Processing. (issn:
2329-9290
)
|
DOI:
|
10.1109/TASLP.2021.3133216
|
Publisher:
|
Institute of Electrical and Electronics Engineers
|
Publisher version:
|
https://doi.org/10.1109/TASLP.2021.3133216
|
Coste APC:
|
2500 €
|
Project ID:
|
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094879-B-I00/ES/SUBTITULACION MULTILINGUE DE CLASES DE AULA Y SESIONES PLENARIAS/
info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//PROMETEO%2F2019%2F111//CLASSROOM ACTIVITY RECOGNITION/
info:eu-repo/grantAgreement/EC/H2020/761758/EU
info:eu-repo/grantAgreement/COMISION DE LAS COMUNIDADES EUROPEA//2020-1-SI01-KA226-SCH-093604//EDUCATIONAL EXPLANATIONS AND PRACTICES IN EMERGENCY REMOTE TEACHING/
info:eu-repo/grantAgreement/EC/H2020/952215/EU
|
Thanks:
|
This work was supported in part by European Union's Horizon 2020 Research and Innovation Programme under Grant 761758 (X5gon), and 952215 (TAILOR) and Erasmus+ Education Program under Grant Agreement 20-226-093604-SCH, in ...[+]
This work was supported in part by European Union's Horizon 2020 Research and Innovation Programme under Grant 761758 (X5gon), and 952215 (TAILOR) and Erasmus+ Education Program under Grant Agreement 20-226-093604-SCH, in part by MCIN/AEI/10.13039/501100011033 ERDF A way of making Europe under Grant RTI2018-094879-B-I00, and in part by Generalitat Valenciana's Research Project Classroom Activity Recognition under Grant PROMETEO/2019/111. Funding for open access charge: CRUE-Universitat Politecnica de Valencia. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Lei Xie.
[-]
|
Type:
|
Artículo
|