- -

Multimodal interactive structured prediction

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Multimodal interactive structured prediction

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.advisor Casacuberta Nolla, Francisco es_ES
dc.contributor.advisor Sanchis Navarro, José Alberto es_ES
dc.contributor.author Alabau Gonzalvo, Vicente es_ES
dc.date.accessioned 2014-01-27T08:10:48Z
dc.date.available 2014-01-27T08:10:48Z
dc.date.created 2014-01-10T11:00:11Z es_ES
dc.date.issued 2014-01-27T08:10:45Z es_ES
dc.identifier.uri http://hdl.handle.net/10251/35135
dc.description.abstract This thesis presents scientific contributions to the field of multimodal interac- tive structured prediction (MISP). The aim of MISP is to reduce the human effort required to supervise an automatic output, in an efficient and ergonomic way. Hence, this thesis focuses on the two aspects of MISP systems. The first aspect, which refers to the interactive part of MISP, is the study of strate- gies for efficient human¿computer collaboration to produce error-free outputs. Multimodality, the second aspect, deals with other more ergonomic modalities of communication with the computer rather than keyboard and mouse. To begin with, in sequential interaction the user is assumed to supervise the output from left-to-right so that errors are corrected in sequential order. We study the problem under the decision theory framework and define an optimum decoding algorithm. The optimum algorithm is compared to the usually ap- plied, standard approach. Experimental results on several tasks suggests that the optimum algorithm is slightly better than the standard algorithm. In contrast to sequential interaction, in active interaction it is the system that decides what should be given to the user for supervision. On the one hand, user supervision can be reduced if the user is required to supervise only the outputs that the system expects to be erroneous. In this respect, we define a strategy that retrieves first the outputs with highest expected error first. Moreover, we prove that this strategy is optimum under certain conditions, which is validated by experimental results. On the other hand, if the goal is to reduce the number of corrections, active interaction works by selecting elements, one by one, e.g., words of a given output to be supervised by the user. For this case, several strategies are compared. Unlike the previous case, the strategy that performs better is to choose the element with highest confidence, which coincides with the findings of the optimum algorithm for sequential interaction. However, this also suggests that minimizing effort and supervision are contradictory goals. With respect to the multimodality aspect, this thesis delves into techniques to make multimodal systems more robust. To achieve that, multimodal systems are improved by providing contextual information of the application at hand. First, we study how to integrate e-pen interaction in a machine translation task. We contribute to the state-of-the-art by leveraging the information from the source sentence. Several strategies are compared basically grouped into two approaches: inspired by word-based translation models and n-grams generated from a phrase-based system. The experiments show that the former outper- forms the latter for this task. Furthermore, the results present remarkable improvements against not using contextual information. Second, similar ex- periments are conducted on a speech-enabled interface for interactive machine translation. The improvements over the baseline are also noticeable. How- ever, in this case, phrase-based models perform much better than word-based models. We attribute that to the fact that acoustic models are poorer estima- tions than morphologic models and, thus, they benefit more from the language model. Finally, similar techniques are proposed for dictation of handwritten documents. The results show that speech and handwritten recognition can be combined in an effective way. Finally, an evaluation with real users is carried out to compare an interactive machine translation prototype with a post-editing prototype. The results of the study reveal that users are very sensitive to the usability aspects of the user interface. Therefore, usability is a crucial aspect to consider in an human evaluation that can hinder the real benefits of the technology being evaluated. Hopefully, once usability problems are fixed, the evaluation indicates that users are more favorable to work with the interactive machine translation system than to the post-editing system. en_EN
dc.language Inglés es_ES
dc.publisher Universitat Politècnica de València es_ES
dc.rights Reserva de todos los derechos es_ES
dc.source Riunet es_ES
dc.subject Structured prediction es_ES
dc.subject Handwritten text transcription
dc.subject Automatic speech recognition
dc.subject Machine translation
dc.subject Human interaction
dc.subject Interactive pattern recognition
dc.subject Multimodal interaction
dc.subject Interactive machine translation
dc.subject Computer assisted transcription
dc.subject Active interaction
dc.subject Passive interaction
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Multimodal interactive structured prediction
dc.type Tesis doctoral es_ES
dc.identifier.doi 10.4995/Thesis/10251/35135 es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Alabau Gonzalvo, V. (2014). Multimodal interactive structured prediction [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/35135 es_ES
dc.description.accrualMethod TESIS es_ES
dc.type.version info:eu-repo/semantics/acceptedVersion es_ES
dc.relation.tesis 8383 es_ES
dc.description.award Premios Extraordinarios de tesis doctorales es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem