Conditional significance pruning : discarding more of huge phrase tables

  1. (PDF, 260 KB)
AuthorSearch for:
ConferenceThe Tenth Biennial Conference of the Association for Machine Translation in the Americas (AMTA), 28 October - 1 November 2012, San Diego, California, USA
AbstractThe technique of pruning phrase tables that are used for statistical machine translation (SMT) can achieve substantial reductions in bulk and improve translation quality, especially for very large corpora such at the Giga- FrEn. This can be further improved by conditioning each significance test on other phrase pair co-occurrence counts resulting in an additional reduction in size and increase in BLEU score. A series of experiments using Moses and the WMT11 corpora for French to English have been performed to quantify the improvement. By adhering strictly to the recommendations for the WMT11 baseline system, a strong reproducible research baseline was employed.
Publication date
AffiliationInformation and Communication Technologies; National Research Council Canada
Peer reviewedYes
NPARC number21249500
Export citationExport as RIS
Report a correctionReport a correction
Record identifierbb26b75e-ff34-47e4-82db-2f71940cf9bf
Record created2013-02-20
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)