Mostrar el registro sencillo del ítem
dc.contributor.author | Gimeno-Gómez, David | es_ES |
dc.contributor.author | Martínez-Hinarejos, Carlos-D. | es_ES |
dc.date.accessioned | 2024-11-14T19:13:46Z | |
dc.date.available | 2024-11-14T19:13:46Z | |
dc.date.issued | 2024-05-06 | es_ES |
dc.identifier.issn | 1687-4722 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/211808 | |
dc.description.abstract | [EN] Visual speech recognition (VSR) is a challenging task that has received increasing interest during the last few decades. Current state of the art employs powerful end-to-end architectures based on deep learning which depend on large amounts of data and high computational resources for their estimation. We address the task of VSR for data scarcity scenarios with limited computational resources by using traditional approaches based on hidden Markov models. We present a novel learning strategy that employs information obtained from previous acoustic temporal alignments to improve the visual system performance. Furthermore, we studied multiple visual speech representations and how image resolution or frame rate affect its performance. All these experiments were conducted on the limited data VLRF corpus, a database which offers an audio-visual support to address continuous speech recognition in Spanish. The results show that our approach significantly outperforms the best results achieved on the task to date. | es_ES |
dc.description.sponsorship | This work was partially supported by Grant CIACIF/2021/295 funded by Generalitat Valenciana and by Grant PID2021-124719OB-I00 under project LLEER (PID2021-124719OB-100) funded by MCIN/AEI/10.13039/501100011033/ and by ERDF, EU A way of making Europe. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Springer (Biomed Central Ltd.) | es_ES |
dc.relation.ispartof | EURASIP Journal on Audio, Speech and Music Processing | es_ES |
dc.rights | Reconocimiento (by) | es_ES |
dc.subject | Visual speech recognition | es_ES |
dc.subject | Limited computation | es_ES |
dc.subject | Data scarcity | es_ES |
dc.subject | Speech processing | es_ES |
dc.subject | Computer vision | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Continuous lipreading based on acoustic temporal alignments | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1186/s13636-024-00345-7 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-124719OB-I00/ES/LECTURA DE LABIOS EN ESPAÑOL EN ESCENARIOS REALISTAS/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//CIACIF%2F2021%2F295//Contributions to Automatic Lipreading for Spanish/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica | es_ES |
dc.description.bibliographicCitation | Gimeno-Gómez, D.; Martínez-Hinarejos, C. (2024). Continuous lipreading based on acoustic temporal alignments. EURASIP Journal on Audio, Speech and Music Processing. 2024(1). https://doi.org/10.1186/s13636-024-00345-7 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1186/s13636-024-00345-7 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 2024 | es_ES |
dc.description.issue | 1 | es_ES |
dc.relation.pasarela | S\517636 | es_ES |
dc.contributor.funder | GENERALITAT VALENCIANA | es_ES |
dc.contributor.funder | AGENCIA ESTATAL DE INVESTIGACION | es_ES |
dc.contributor.funder | Universitat Politècnica de València | es_ES |
upv.costeAPC | 1900 | es_ES |