Martí Guerola, A.; Cobos Serrano, M.; López Monfort, JJ. (2012). Automatic speech recognition in cocktail-party situations : a specific training for separated speech. Journal of the Acoustical Society of America. 131(2):1529-1535. doi:10.1121/1.3675001
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/56829
Title:
|
Automatic speech recognition in cocktail-party situations : a specific training for separated speech
|
Author:
|
Martí Guerola, Amparo
Cobos Serrano, Máximo
López Monfort, José Javier
|
UPV Unit:
|
Universitat Politècnica de València. Departamento de Comunicaciones - Departament de Comunicacions
Universitat Politècnica de València. Instituto Universitario de Telecomunicación y Aplicaciones Multimedia - Institut Universitari de Telecomunicacions i Aplicacions Multimèdia
|
Issued date:
|
|
Abstract:
|
Automatic speech recognition (ASR) refers to the task of extracting a transcription of the linguistic content of an acoustical speech signal automatically. Despite several decades of research in this important area of ...[+]
Automatic speech recognition (ASR) refers to the task of extracting a transcription of the linguistic content of an acoustical speech signal automatically. Despite several decades of research in this important area of acoustic signal processing, the accuracy of ASR systems is still far behind human performance, especially in adverse acoustic scenarios. In this context, one of the most challenging situations is the one concerning simultaneous speech in cocktail-party environments. Although source separation methods have already been investigated to deal with this problem, the separation process is not perfect and the resulting artifacts pose an additional problem to ASR performance. In this paper, a specific training to improve the percentage of recognized words in real simultaneous speech cases is proposed. The combination of source separation and this specific training is explored and evaluated under different acoustical conditions, leading to improvements of up to a 35% in ASR performance. (C) 2012 Acoustical Society of America. [DOI: 10.1121/1.3675001]
[-]
|
Subjects:
|
Automatic speech recognition
,
Human performance
,
Separation process
,
Source separation
,
Speech signals
,
Acoustics
,
Physics
,
Separation
,
Algorithm
,
Article
,
Human
,
Noise
,
Perception
,
Speech
,
Speech perception
,
Standard
,
Algorithms
,
Humans
,
Perceptual Masking
|
Copyrigths:
|
Cerrado |
Source:
|
Journal of the Acoustical Society of America. (issn:
0001-4966
) (eissn:
1520-8524
)
|
DOI:
|
10.1121/1.3675001
|
Publisher:
|
Acoustical Society of America
|
Publisher version:
|
http://dx.doi.org/10.1121/1.3675001
|
Project ID:
|
info:eu-repo/grantAgreement/MICINN//TEC2009-14414-C03-01/ES/Procesado De Sonido Para Entornos Emergentes De Comunicacion/ /
|
Thanks:
|
The Spanish Ministry of Science and Innovation supported this work under Grant No. TEC2009-14414-C03-01.
|
Type:
|
Artículo
|