Del Agua Teba, MA.; Giménez Pastor, A.; Sanchis Navarro, JA.; Civera Saiz, J.; Juan, A. (2018). Speaker-Adapted Confidence Measures for ASR using Deep Bidirectional Recurrent Neural Networks. IEEE/ACM Transactions on Audio Speech and Language Processing. 26(7):1198-1206. https://doi.org/10.1109/TASLP.2018.2819900
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/121369
Título:
|
Speaker-Adapted Confidence Measures for ASR using Deep Bidirectional Recurrent Neural Networks
|
Autor:
|
Del Agua Teba, Miguel Angel
Giménez Pastor, Adrián
Sanchis Navarro, José Alberto
Civera Saiz, Jorge
Juan, Alfons
|
Entidad UPV:
|
Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
|
Fecha difusión:
|
|
Resumen:
|
[EN] In the last years, Deep Bidirectional Recurrent Neural Networks (DBRNN) and DBRNN with Long Short-Term Memory cells (DBLSTM) have outperformed the most accurate classifiers for confidence estimation in automatic speech ...[+]
[EN] In the last years, Deep Bidirectional Recurrent Neural Networks (DBRNN) and DBRNN with Long Short-Term Memory cells (DBLSTM) have outperformed the most accurate classifiers for confidence estimation in automatic speech recognition. At the same time, we have recently shown that speaker adaptation of confidence measures using DBLSTM yields significant improvements over non-adapted confidence measures. In accordance with these two recent contributions to the state of the art in confidence estimation, this paper presents a comprehensive study of speaker-adapted confidence measures using DBRNN and DBLSTM models. Firstly, we present new empirical evidences of the superiority of RNN-based confidence classifiers evaluated over a large speech corpus consisting of the English LibriSpeech and the Spanish poliMedia tasks. Secondly, we show new results on speaker-adapted confidence measures considering a multi-task framework in which RNN-based confidence classifiers trained with LibriSpeech are adapted to speakers of the TED-LIUM corpus. These experiments confirm that speaker-adapted confidence measures outperform their non-adapted counterparts. Lastly, we describe an unsupervised adaptation method of the acoustic DBLSTM model based on confidence measures which results in better automatic speech recognition performance.
[-]
|
Palabras clave:
|
Automatic speech recognition
,
Confidence estimation
,
Confidence measures
,
Deep bidirectional recurrent neural networks
,
Long short-term memory
,
Speaker adaptation
,
Speech
,
Adaptation models
,
Computer architecture
,
Training
,
Recurrent neural networks
,
Speech processing
,
Task analysis
|
Derechos de uso:
|
Reserva de todos los derechos
|
Fuente:
|
IEEE/ACM Transactions on Audio Speech and Language Processing. (issn:
2329-9290
)
|
DOI:
|
10.1109/TASLP.2018.2819900
|
Editorial:
|
Institute of Electrical and Electronics Engineers
|
Versión del editor:
|
http://doi.org/10.1109/TASLP.2018.2819900
|
Código del Proyecto:
|
info:eu-repo/grantAgreement/MINECO//TIN2015-68326-R/ES/RECURSOS MULTILINGUES ABIERTOS PARA EDUCACION/
info:eu-repo/grantAgreement/EC/FP7/287755/EU/Transcription and Translation of Video Lectures/
info:eu-repo/grantAgreement/EC/H2020/761758/EU/X5gon: Cross Modal, Cross Cultural, Cross Lingual, Cross Domain, and Cross Site Global OER Network/
|
Descripción:
|
© 2018 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
|
Agradecimientos:
|
This work was supported in part by the European Union's Horizon 2020 research and innovation programme under Grant 761758 (X5gon), in part by the Seventh Framework Programme (FP7/2007-2013) under Grant 287755 (transLectures), ...[+]
This work was supported in part by the European Union's Horizon 2020 research and innovation programme under Grant 761758 (X5gon), in part by the Seventh Framework Programme (FP7/2007-2013) under Grant 287755 (transLectures), in part by the ICT Policy Support Programme (ICT PSP/2007-2013) as part of the Competitiveness and Innovation Framework Programme under Grant 621030 (EMMA), and in part by the Spanish Government's TIN2015-68326-R (MINECO/FEDER) research project MORE.
[-]
|
Tipo:
|
Artículo
|