Abstract:
|
This thesis presents different contributions in the fields of fully-automatic statistical machine
translation and interactive statistical machine translation.
In the field of statistical machine translation there are ...[+]
This thesis presents different contributions in the fields of fully-automatic statistical machine
translation and interactive statistical machine translation.
In the field of statistical machine translation there are three problems that are to be addressed,
namely, the modelling problem, the training problem and the search problem. In this
thesis we present contributions regarding these three problems.
Regarding the modelling problem, an alternative derivation of phrase-based statistical
translation models is proposed. Such derivation introduces a set of statistical submodels governing
different aspects of the translation process. In addition to this, the resulting submodels
can be introduced as components of a log-linear model.
Regarding the training problem, an alternative estimation technique for phrase-based
models that tries to reduce the strong heuristic component of the standard estimation technique
is proposed. The proposed estimation technique considers the phrase pairs that compose
the phrase model as part of complete bisegmentations of the source and target sentences.
We theoretically and empirically demonstrate that the proposed estimation technique can be
efficiently executed. Experimental results obtained with the open-source THOT toolkit also
presented in this thesis, show that the alternative estimation technique obtains phrase models
with lower perplexity than those obtained by means of the standard estimation technique.
However, the reduction in the perplexity of the model did not allow us to obtain improvements
in the translation quality.
To deal with the search problem, we propose a search algorithm which is based on the
branch-and-bound search paradigm. The proposed algorithm generalises different search
strategies that can be accessed bymodifying the input parameters. We carried out experiments
to evaluate the performance of the proposed search algorithm.
[-]
|