Lattice desegmentation for statistical machine translation

AuthorSearch for: ; Search for: ; Search for:
Proceedings title52nd Annual Meeting of the Association for Computational Linguistics
Conference52nd Annual Meeting of the Association for Computational Linguistics, June 23-25, 2014, Baltimore, Maryland
Pages100110; # of pages: 11
AbstractMorphological segmentation is an effective sparsity reduction strategy for statistical machine translation (SMT) involving morphologically complex languages. When translating into a segmented language, an extra step is required to desegment the output; previous studies have desegmented the 1-best output from the decoder. In this paper, we expand our translation options by desegmenting n-best lists or lattices. Our novel lattice desegmentation algorithm effectively combines both segmented and desegmented views of the target language for a large subspace of possible translation outputs, which allows for inclusion of features related to the desegmentation process, as well as an unsegmented language model (LM). We investigate this technique in the context of English-to-Arabic and English-to-Finnish translation, showing significant improvements in translation quality over desegmentation of 1-best decoder outputs.
Publication date
AffiliationInformation and Communication Technologies; National Research Council Canada
Peer reviewedNo
NPARC number21275904
Export citationExport as RIS
Report a correctionReport a correction
Record identifier72236846-b40a-4563-94bd-5d6d0bb299aa
Record created2015-07-31
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)