- -

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Jorge-Cano, Javier es_ES
dc.contributor.author Giménez Pastor, Adrián es_ES
dc.contributor.author Silvestre Cerdà, Joan Albert es_ES
dc.contributor.author Civera Saiz, Jorge es_ES
dc.contributor.author Sanchis Navarro, José Alberto es_ES
dc.contributor.author Juan, Alfons es_ES
dc.date.accessioned 2022-05-23T18:04:00Z
dc.date.available 2022-05-23T18:04:00Z
dc.date.issued 2022 es_ES
dc.identifier.issn 2329-9290 es_ES
dc.identifier.uri http://hdl.handle.net/10251/182807
dc.description.abstract [EN] Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After gaining considerable experience on this regard in recent years, in this paper we show how an optimized, low-latency streaming decoder can be built in which bidirectional LSTM acoustic models, together with general interpolated language models, can be nicely integrated with minimal performance degradation. In brief, our streaming decoder consists of a one-pass, real-time search engine relying on a limited-duration window sliding over time and a number of ad hoc acoustic and language model pruning techniques. Extensive empirical assessment is provided on truly streaming tasks derived from the well-known LibriSpeech and TED talks datasets, as well as from TV shows on a main Spanish broadcasting station. es_ES
dc.description.sponsorship This work was supported in part by European Union's Horizon 2020 Research and Innovation Programme under Grant 761758 (X5gon), and 952215 (TAILOR) and Erasmus+ Education Program under Grant Agreement 20-226-093604-SCH, in part by MCIN/AEI/10.13039/501100011033 ERDF A way of making Europe under Grant RTI2018-094879-B-I00, and in part by Generalitat Valenciana's Research Project Classroom Activity Recognition under Grant PROMETEO/2019/111. Funding for open access charge: CRUE-Universitat Politecnica de Valencia. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Lei Xie. es_ES
dc.language Inglés es_ES
dc.publisher Institute of Electrical and Electronics Engineers es_ES
dc.relation.ispartof IEEE/ACM Transactions on Audio Speech and Language Processing es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Automatic speech recognition es_ES
dc.subject Streaming es_ES
dc.subject Decoding es_ES
dc.subject Acoustic modeling es_ES
dc.subject Language modeling es_ES
dc.subject Neural networks es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.subject.classification BIBLIOTECONOMIA Y DOCUMENTACION es_ES
dc.title Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1109/TASLP.2021.3133216 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094879-B-I00/ES/SUBTITULACION MULTILINGUE DE CLASES DE AULA Y SESIONES PLENARIAS/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//PROMETEO%2F2019%2F111//CLASSROOM ACTIVITY RECOGNITION/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/761758/EU es_ES
dc.relation.projectID info:eu-repo/grantAgreement/COMISION DE LAS COMUNIDADES EUROPEA//2020-1-SI01-KA226-SCH-093604//EDUCATIONAL EXPLANATIONS AND PRACTICES IN EMERGENCY REMOTE TEACHING/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/952215/EU es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Jorge-Cano, J.; Giménez Pastor, A.; Silvestre Cerdà, JA.; Civera Saiz, J.; Sanchis Navarro, JA.; Juan, A. (2022). Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models. IEEE/ACM Transactions on Audio Speech and Language Processing. 30:148-161. https://doi.org/10.1109/TASLP.2021.3133216 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1109/TASLP.2021.3133216 es_ES
dc.description.upvformatpinicio 148 es_ES
dc.description.upvformatpfin 161 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 30 es_ES
dc.relation.pasarela S\454972 es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Universitat Politècnica de València es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem