Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Jorge-Cano, Javier; Giménez Pastor, Adrián; Silvestre Cerdà, Joan Albert; Civera Saiz, Jorge; Sanchis Navarro, José Alberto; Juan, Alfons

doi:10.1109/TASLP.2021.3133216

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Jorge-CanoGimenez ...

Tamaño: 1.609Mb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Jorge-Cano, Javier	es_ES
dc.contributor.author	Giménez Pastor, Adrián	es_ES
dc.contributor.author	Silvestre Cerdà, Joan Albert	es_ES
dc.contributor.author	Civera Saiz, Jorge	es_ES
dc.contributor.author	Sanchis Navarro, José Alberto	es_ES
dc.contributor.author	Juan, Alfons	es_ES
dc.date.accessioned	2022-05-23T18:04:00Z
dc.date.available	2022-05-23T18:04:00Z
dc.date.issued	2022	es_ES
dc.identifier.issn	2329-9290	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/182807
dc.description.abstract	[EN] Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After gaining considerable experience on this regard in recent years, in this paper we show how an optimized, low-latency streaming decoder can be built in which bidirectional LSTM acoustic models, together with general interpolated language models, can be nicely integrated with minimal performance degradation. In brief, our streaming decoder consists of a one-pass, real-time search engine relying on a limited-duration window sliding over time and a number of ad hoc acoustic and language model pruning techniques. Extensive empirical assessment is provided on truly streaming tasks derived from the well-known LibriSpeech and TED talks datasets, as well as from TV shows on a main Spanish broadcasting station.	es_ES
dc.description.sponsorship	This work was supported in part by European Union's Horizon 2020 Research and Innovation Programme under Grant 761758 (X5gon), and 952215 (TAILOR) and Erasmus+ Education Program under Grant Agreement 20-226-093604-SCH, in part by MCIN/AEI/10.13039/501100011033 ERDF A way of making Europe under Grant RTI2018-094879-B-I00, and in part by Generalitat Valenciana's Research Project Classroom Activity Recognition under Grant PROMETEO/2019/111. Funding for open access charge: CRUE-Universitat Politecnica de Valencia. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Lei Xie.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Institute of Electrical and Electronics Engineers	es_ES
dc.relation.ispartof	IEEE/ACM Transactions on Audio Speech and Language Processing	es_ES
dc.rights	Reconocimiento (by)	es_ES
dc.subject	Automatic speech recognition	es_ES
dc.subject	Streaming	es_ES
dc.subject	Decoding	es_ES
dc.subject	Acoustic modeling	es_ES
dc.subject	Language modeling	es_ES
dc.subject	Neural networks	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.subject.classification	BIBLIOTECONOMIA Y DOCUMENTACION	es_ES
dc.title	Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1109/TASLP.2021.3133216	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094879-B-I00/ES/SUBTITULACION MULTILINGUE DE CLASES DE AULA Y SESIONES PLENARIAS/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//PROMETEO%2F2019%2F111//CLASSROOM ACTIVITY RECOGNITION/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/761758/EU	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/COMISION DE LAS COMUNIDADES EUROPEA//2020-1-SI01-KA226-SCH-093604//EDUCATIONAL EXPLANATIONS AND PRACTICES IN EMERGENCY REMOTE TEACHING/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/952215/EU	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Jorge-Cano, J.; Giménez Pastor, A.; Silvestre Cerdà, JA.; Civera Saiz, J.; Sanchis Navarro, JA.; Juan, A. (2022). Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models. IEEE/ACM Transactions on Audio Speech and Language Processing. 30:148-161. https://doi.org/10.1109/TASLP.2021.3133216	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1109/TASLP.2021.3133216	es_ES
dc.description.upvformatpinicio	148	es_ES
dc.description.upvformatpfin	161	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	30	es_ES
dc.relation.pasarela	S\454972	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES
dc.contributor.funder	Universitat Politècnica de València	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)