Aligning and Using an English-Innukitut Parallel Corpus

DOIResolve DOI: http://doi.org/10.3115/1118905.1118925
AuthorSearch for: ; Search for: ; Search for: ; Search for:
TypeArticle
Proceedings titleHLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond
ConferenceHLT-NAACL-PARALLEL '03 : Human Language Technology and North American Chapter of Association of Computational Linguistics 2003, May 27 - June 1, 2003.
Volume3
Pages115118; # of pages: 4
AbstractA parallel corpus of texts in English and in Inuktitut, an Inuit language, is presented. These texts are from the Nunavut Hansards. The parallel texts are processed in two phases, the sentence alignment phase and the word correspondence phase. Our sentence alignment technique achieves a precision of 91.4% and a recall of 92.3%. Our word correspondence technique is aimed at providing the broadest coverage collection of reliable pairs of Inuktitut and English morphemes for dictionary expansion. For an agglutinative language like Inuktitut, this entails considering substrings, not simply whole words. We employ a Pointwise Mutual Information method (PMI) and attain a coverage of 72.3% of English words and a precision of 87%.
Publication date
LanguageEnglish
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number47119
NPARC number5765030
Export citationExport as RIS
Report a correctionReport a correction
Record identifierbce8df0d-20c8-4b42-a200-223ed4fb92b3
Record created2009-03-29
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)