Segment Choice Models: Feature-Rich Models for Global Distortion in Statistical Machine Translation

Download
  1. (PDF, 246 KB)
AuthorSearch for: ; Search for: ; Search for: ; Search for: ; Search for: ; Search for: ; Search for:
TypeArticle
ConferenceHuman Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL 2006), June 5, 2006., New York City, New York, USA
AbstractThis paper presents a new approach to distortion (phrase reordering) in phrase-based machine translation (MT). Distortion is modeled as a sequence of choices during translation. The approach yields trainable, probabilistic distortion models that are global: they assign a probability to each possible phrase reordering. These “segment choice” models (SCMs) can be trained on “segment-aligned” sentence pairs; they can be applied during decoding or rescoring. The approach yields a metric called “distortion perplexity” (“disperp”) for comparing SCMs offline on test data, analogous to perplexity for language models. A decision-tree-based SCM is tested on Chinese-to-English translation, and outperforms a baseline distortion penalty approach at the 99% confidence level.
Publication date
LanguageEnglish
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number48752
NPARC number5763206
Export citationExport as RIS
Report a correctionReport a correction
Record identifierd91d8d1e-2ad7-4710-9e57-1b902811e2a1
Record created2009-03-29
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)