Vector space model for adaptation in statistical machine translation

Download
  1. (PDF, 265 KB)
AuthorSearch for: ; Search for: ; Search for:
TypeArticle
Proceedings titleProceedings of the 51st Annual Meeting of the Association for Computational Linguistics
Series titleAnnual Meeting of the Association for Computational Linguistics
Conference2013 North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 9-15, 2013, Atlanta, GA
Article numberP13-1126
AbstractThis paper proposes a new approach to domain adaptation in statistical machine translation (SMT) based on a vector space model (VSM). The general idea is first to create a vector profile for the in-domain development (“dev”) set. This profile might, for instance, be a vector with a dimensionality equal to the number of training subcorpora; each entry in the vector reflects the contribution of a particular subcorpus to all the phrase pairs that can be extracted from the dev set. Then, for each phrase pair extracted from the training data, we create a vector with features defined in the same way, and calculate its similarity score with the vector representing the dev set. Thus, we obtain a decoding feature whose value represents the phrase pair’s closeness to the dev. This is a simple, computationally cheap form of instance weighting for phrase pairs. Experiments on large scale NIST evaluation data show improvements over strong baselines: +1.8 BLEU on Arabic to English and +1.4 BLEU on Chinese to English over a non-adapted baseline, and significant improvements in most circumstances over baselines with linear mixture model adaptation. An informal analysis suggests that VSM adaptation may help in making a good choice among words with the same meaning, on the basis of style and genre.
Publication date
PublisherACL
LanguageEnglish
AffiliationInformation and Communication Technologies; National Research Council Canada
Peer reviewedYes
NPARC number21270511
Export citationExport as RIS
Report a correctionReport a correction
Record identifierd1f017c8-aab2-4739-88a6-53999a36d713
Record created2014-02-14
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)