Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish

Gimeno-Gómez, David; Martínez-Hinarejos, Carlos-D.

doi:10.3390/app13116521

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Gimeno-GomezMarti ...

Tamaño: 648.0Kb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Gimeno-Gómez, David	es_ES
dc.contributor.author	Martínez-Hinarejos, Carlos-D.	es_ES
dc.date.accessioned	2024-05-23T18:05:55Z
dc.date.available	2024-05-23T18:05:55Z
dc.date.issued	2023-05-26	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/204394
dc.description.abstract	[EN] Visual speech recognition (VSR) is a challenging task that aims to interpret speech based solely on lip movements. However, although remarkable results have recently been reached in the field, this task remains an open research problem due to different challenges, such as visual ambiguities, the intra-personal variability among speakers, and the complex modeling of silence. Nonetheless, these challenges can be alleviated when the task is approached from a speaker-dependent perspective. Our work focuses on the adaptation of end-to-end VSR systems to a specific speaker. Hence, we propose two different adaptation methods based on the conventional fine-tuning technique or the so-called Adapters. We conduct a comparative study in terms of performance while considering different deployment aspects such as training time and storage cost. Results on the Spanish LIP-RTVE database show that both methods are able to obtain recognition rates comparable to the state of the art, even when only a limited amount of training data is available. Although it incurs a deterioration in performance, the Adapters-based method presents a more scalable and efficient solution, significantly reducing the training time and storage cost by up to 80%.	es_ES
dc.description.sponsorship	This work was partially supported by the Grant CIACIF/2021/295 funded by Generalitat Valenciana and by the Grant PID2021-124719OB-I00 under the LLEER (PID2021-124719OB-100) project funded by MCIN/AEI/10.13039/501100011033/ and by ERDF EU, A way of making Europe .	es_ES
dc.language	Inglés	es_ES
dc.publisher	MDPI AG	es_ES
dc.relation.ispartof	Applied Sciences	es_ES
dc.rights	Reconocimiento (by)	es_ES
dc.subject	Visual speech recognition	es_ES
dc.subject	Speaker adaptation	es_ES
dc.subject	Fine-tuning	es_ES
dc.subject	Adapters	es_ES
dc.subject	Spanish language	es_ES
dc.subject	End-to-end architectures	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.3390/app13116521	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-124719OB-I00/ES/LECTURA DE LABIOS EN ESPAÑOL EN ESCENARIOS REALISTAS/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//CIACIF%2F2021%2F295//Contributions to Automatic Lipreading for Spanish/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/FEDER//C22%2FERDF/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.description.bibliographicCitation	Gimeno-Gómez, D.; Martínez-Hinarejos, C. (2023). Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish. Applied Sciences. 13(11). https://doi.org/10.3390/app13116521	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.3390/app13116521	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	13	es_ES
dc.description.issue	11	es_ES
dc.identifier.eissn	2076-3417	es_ES
dc.relation.pasarela	S\494441	es_ES
dc.contributor.funder	GENERALITAT VALENCIANA	es_ES
dc.contributor.funder	AGENCIA ESTATAL DE INVESTIGACION	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)