- -

Continuous lipreading based on acoustic temporal alignments

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Continuous lipreading based on acoustic temporal alignments

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Gimeno-Gómez, David es_ES
dc.contributor.author Martínez-Hinarejos, Carlos-D. es_ES
dc.date.accessioned 2024-11-14T19:13:46Z
dc.date.available 2024-11-14T19:13:46Z
dc.date.issued 2024-05-06 es_ES
dc.identifier.issn 1687-4722 es_ES
dc.identifier.uri http://hdl.handle.net/10251/211808
dc.description.abstract [EN] Visual speech recognition (VSR) is a challenging task that has received increasing interest during the last few decades. Current state of the art employs powerful end-to-end architectures based on deep learning which depend on large amounts of data and high computational resources for their estimation. We address the task of VSR for data scarcity scenarios with limited computational resources by using traditional approaches based on hidden Markov models. We present a novel learning strategy that employs information obtained from previous acoustic temporal alignments to improve the visual system performance. Furthermore, we studied multiple visual speech representations and how image resolution or frame rate affect its performance. All these experiments were conducted on the limited data VLRF corpus, a database which offers an audio-visual support to address continuous speech recognition in Spanish. The results show that our approach significantly outperforms the best results achieved on the task to date. es_ES
dc.description.sponsorship This work was partially supported by Grant CIACIF/2021/295 funded by Generalitat Valenciana and by Grant PID2021-124719OB-I00 under project LLEER (PID2021-124719OB-100) funded by MCIN/AEI/10.13039/501100011033/ and by ERDF, EU A way of making Europe. es_ES
dc.language Inglés es_ES
dc.publisher Springer (Biomed Central Ltd.) es_ES
dc.relation.ispartof EURASIP Journal on Audio, Speech and Music Processing es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Visual speech recognition es_ES
dc.subject Limited computation es_ES
dc.subject Data scarcity es_ES
dc.subject Speech processing es_ES
dc.subject Computer vision es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Continuous lipreading based on acoustic temporal alignments es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1186/s13636-024-00345-7 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-124719OB-I00/ES/LECTURA DE LABIOS EN ESPAÑOL EN ESCENARIOS REALISTAS/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//CIACIF%2F2021%2F295//Contributions to Automatic Lipreading for Spanish/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Gimeno-Gómez, D.; Martínez-Hinarejos, C. (2024). Continuous lipreading based on acoustic temporal alignments. EURASIP Journal on Audio, Speech and Music Processing. 2024(1). https://doi.org/10.1186/s13636-024-00345-7 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1186/s13636-024-00345-7 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 2024 es_ES
dc.description.issue 1 es_ES
dc.relation.pasarela S\517636 es_ES
dc.contributor.funder GENERALITAT VALENCIANA es_ES
dc.contributor.funder AGENCIA ESTATAL DE INVESTIGACION es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
upv.costeAPC 1900 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem