Searching for poor quality machine translated text : learning the difference between human writing and machine translations

Download
  1. (PDF, 351 KB)
  2. Get@NRC: Searching for poor quality machine translated text : learning the difference between human writing and machine translations (Opens in a new window)
DOIResolve DOI: http://doi.org/10.1007/978-3-642-30353-1
AuthorSearch for: ; Search for:
TypeArticle
Proceedings titleAdvances in Artificial Intelligence
Series titleLecture Notes in Artificial Intelligence (LNAI); Volume 7310
Conference25th Canadian Conference on Artificial Intelligence, Canadian AI 2012, 28-30 May 2012, Toronto, Ontario, Canada
ISSN0302-9743
ISBN978-3-642-30352-4
Pages4960; # of pages: 12
AbstractAs machine translation (MT) tools have become mainstream, machine translated text has increasingly appeared on multilingual websites. Trustworthy multilingual websites are used as training corpora for statistical machine translation tools; large amounts of MT text in training data may make such products less effective. We performed three experiments to determine whether a support vector machine (SVM) could distinguish machine translated text from human written text (both original text and human translations). Machine translated versions of the Canadian Hansard were detected with an F-measure of 0.999. Machine translated versions of six Government of Canada web sites were detected with an F-measure of 0.98.We validated these results with a decision tree classifier. An experiment to find MT text on Government of Ontario web sites using Government of Canada training data was unfruitful, with a high rate of false positives. Machine translated text appears to be learnable and detectable when using a similar training corpus.
Publication date
LanguageEnglish
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedYes
NPARC number20496817
Export citationExport as RIS
Report a correctionReport a correction
Record identifiercf9f7d1a-96a1-4b36-8355-6c808a7f3f4d
Record created2012-08-16
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)