Real-Time Identification of Parallel Texts from Bilingual Newsfeed

  1. (PDF, 232 KB)
AuthorSearch for: ; Search for:
Proceedings titleCLINE 2004, Computational Linguistics in the North East
ConferenceProceedings of the Computational Linguistic in the North-East (CLINE'2004), August 30, 2004., Montréal, Québec, Canada
AbstractParallel texts are documents that present parallel translations. This paper describes a simple method that can be deployed on a real-time news feed to create an infinitely growing source of parallel texts in French and English. Our experiment was lead on theCanada Newswire news feed. Given some of its intrinsic properties, it was possible to deploy a relatively simple text matching techniques that rely on language independent cognates such numbers, capitalized words, punctuation and new lines characters. On three week of press releases, our system correctly identified the vast majority of parallel press release. It committed only minor errors on repeated news items.
Publication date
AffiliationNational Research Council Canada; NRC Institute for Information Technology
Peer reviewedNo
NRC number48081
NPARC number5764063
Export citationExport as RIS
Report a correctionReport a correction
Record identifiere6d2a7f8-a74d-406d-b1f9-7ca7d7d6720c
Record created2009-03-29
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)