Rabu, 14 Juli 2010

Proceedings of the Workshop on Statistical Machine Translation, pages 47–54,
New York City, June 2006. c
2006 Association for Computational Linguistics
Searching for alignments in SMT. A novel approach based on an Estimation
of Distribution Algorithm ¤
Luis Rodr´ıguez, Ismael Garc´ıa-Varea, Jos´e A. G´amez
Departamento de Sistemas Inform´aticos
Universidad de Castilla-La Mancha
luisr@dsi.uclm.es, ivarea@dsi.uclm.es, jgamez@dsi.uclm.es
Abstract
In statistical machine translation, an alignment
defines a mapping between the
words in the source and in the target sentence.
Alignments are used, on the one
hand, to train the statistical models and, on
the other, during the decoding process to
link the words in the source sentence to the
words in the partial hypotheses generated.
In both cases, the quality of the alignments
is crucial for the success of the translation
process. In this paper, we propose an algorithm
based on an Estimation of Distribution
Algorithm for computing alignments
between two sentences in a parallel
corpus. This algorithm has been tested
on different tasks involving different pair
of languages. In the different experiments
presented here for the two word-alignment
shared tasks proposed in the HLT-NAACL
2003 and in the ACL 2005, the EDAbased
algorithm outperforms the best participant
systems.
1 Introduction
Nowadays, statistical approach to machine translation
constitutes one of the most promising approaches
in this field. The rationale behind this approximation
is to learn a statistical model from a parallel
corpus. A parallel corpus can be defined as a set
¤This work has been supported by the Spanish Projects
JCCM (PBI-05-022) and HERMES 05/06 (Vic. Inv. UCLM)
of sentence pairs, each pair containing a sentence in
a source language and a translation of this sentence
in a target language. Word alignments are necessary
to link the words in the source and in the target
sentence. Statistical models for machine translation
heavily depend on the concept of alignment,
specifically, the well known IBM word based models
(Brown et al., 1993). As a result of this, different
task on aligments in statistical machine translation
have been proposed in the last few years (HLTNAACL
2003 (Mihalcea and Pedersen, 2003) and
ACL 2005 (Joel Martin, 2005)).
In this paper, we propose a novel approach to deal
with alignments. Specifically, we address the problem
of searching for the best word alignment between
a source and a target sentence. As there is
no efficient exact method to compute the optimal
alignment (known as Viterbi alignment) in most of
the cases (specifically in the IBM models 3,4 and 5),
in this work we propose the use of a recently appeared
meta-heuristic family of algorithms, Estimation
of Distribution Algorithms (EDAs). Clearly, by
using a heuristic-based method we cannot guarantee
the achievement of the optimal alignment. Nonetheless,
we expect that the global search carried out
by our algorithm will produce high quality results
in most cases, since previous experiments with this
technique (Larra˜naga and Lozano, 2001) in different
optimization task have demonstrated. In addition to
this, the results presented in section 5 support the
approximation presented here.
This paper is structured as follows. Firstly, Statistical
word alignments are described in section 2.
Estimation of Distribution Algorithms (EDAs) are
47

Tidak ada komentar:

Posting Komentar