Manageable Phrase-based Statistical Machine Translation Models

  1. (PDF, 257 KB)
DOIResolve DOI:
AuthorSearch for: ; Search for: ; Search for: ; Search for:
Proceedings titleComputer Recognition Systems 2 (Advances in Intelligent and Soft Computing, vol. 45)
Conference5th International Conference on Computer Recognition Systems CORES 07, Wroclaw, Poland, October 22-25, 2007
AbstractStatistical Machine Translation (SMT) is an evolving field where many techniques in Syntactic Pattern Recognition (SPR) are needed and applied. A typical phrase-based SMT system for translating from a T (target) language to an S (source) language contains one or more n-gram language models (LMs) and one or more phrase translation models (TMs). These LMs and TMs have a large memory footprint (up to several gigabytes). This paper describes novel techniques for filtering these models that ensure only relevant patterns in the LMs and TMs are loaded during translation. In experiments on a large Chinese-English task, these techniques yielded significant reductions in the amount of information loaded during translation: up to 58% reduction for LMs, and up to 75% for TMs.
Publication date
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedYes
NRC number49891
NPARC number9183591
Export citationExport as RIS
Report a correctionReport a correction
Record identifierf2a4386f-564f-44d4-9c01-c437390b8bb3
Record created2009-06-30
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)