Mostrar el registro sencillo del ítem
dc.contributor.author | Gimeno-Gómez, David | es_ES |
dc.contributor.author | Martínez-Hinarejos, Carlos-D. | es_ES |
dc.date.accessioned | 2024-05-23T18:05:55Z | |
dc.date.available | 2024-05-23T18:05:55Z | |
dc.date.issued | 2023-05-26 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/204394 | |
dc.description.abstract | [EN] Visual speech recognition (VSR) is a challenging task that aims to interpret speech based solely on lip movements. However, although remarkable results have recently been reached in the field, this task remains an open research problem due to different challenges, such as visual ambiguities, the intra-personal variability among speakers, and the complex modeling of silence. Nonetheless, these challenges can be alleviated when the task is approached from a speaker-dependent perspective. Our work focuses on the adaptation of end-to-end VSR systems to a specific speaker. Hence, we propose two different adaptation methods based on the conventional fine-tuning technique or the so-called Adapters. We conduct a comparative study in terms of performance while considering different deployment aspects such as training time and storage cost. Results on the Spanish LIP-RTVE database show that both methods are able to obtain recognition rates comparable to the state of the art, even when only a limited amount of training data is available. Although it incurs a deterioration in performance, the Adapters-based method presents a more scalable and efficient solution, significantly reducing the training time and storage cost by up to 80%. | es_ES |
dc.description.sponsorship | This work was partially supported by the Grant CIACIF/2021/295 funded by Generalitat Valenciana and by the Grant PID2021-124719OB-I00 under the LLEER (PID2021-124719OB-100) project funded by MCIN/AEI/10.13039/501100011033/ and by ERDF EU, A way of making Europe . | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | MDPI AG | es_ES |
dc.relation.ispartof | Applied Sciences | es_ES |
dc.rights | Reconocimiento (by) | es_ES |
dc.subject | Visual speech recognition | es_ES |
dc.subject | Speaker adaptation | es_ES |
dc.subject | Fine-tuning | es_ES |
dc.subject | Adapters | es_ES |
dc.subject | Spanish language | es_ES |
dc.subject | End-to-end architectures | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.3390/app13116521 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-124719OB-I00/ES/LECTURA DE LABIOS EN ESPAÑOL EN ESCENARIOS REALISTAS/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//CIACIF%2F2021%2F295//Contributions to Automatic Lipreading for Spanish/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/FEDER//C22%2FERDF/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica | es_ES |
dc.description.bibliographicCitation | Gimeno-Gómez, D.; Martínez-Hinarejos, C. (2023). Comparing Speaker Adaptation Methods for Visual Speech Recognition for Continuous Spanish. Applied Sciences. 13(11). https://doi.org/10.3390/app13116521 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.3390/app13116521 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 13 | es_ES |
dc.description.issue | 11 | es_ES |
dc.identifier.eissn | 2076-3417 | es_ES |
dc.relation.pasarela | S\494441 | es_ES |
dc.contributor.funder | GENERALITAT VALENCIANA | es_ES |
dc.contributor.funder | AGENCIA ESTATAL DE INVESTIGACION | es_ES |
dc.contributor.funder | European Regional Development Fund | es_ES |