Sensespotting: never let your parallel data tie you to an old domain

AuthorSearch for: ; Search for: ; Search for: ; Search for: ; Search for: ; Search for:
TypeArticle
Proceedings titleACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Conference51st Annual Meeting of the Association for Computational Linguistics, August 4-9 2013, Sofia, Bulgaria
ISBN9781937284503
Volume1
Pages14351445
AbstractWords often gain new senses in new domains. Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. We define a task, SenseSpotting, in which we build systems to spot tokens that have new senses in new domain text. Instead of difficult and expensive annotation, we build a goldstandard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation. Our system is able to achieve F-measures of as much as 80%, when applied to word types it has never seen before. Our approach is based on a large set of novel features that capture varied aspects of how words change when used in new domains.
Publication date
PublisherAssociation for Computational Linguistics
LanguageEnglish
AffiliationNational Research Council Canada; Information and Communication Technologies
Peer reviewedYes
NPARC number23000603
Export citationExport as RIS
Report a correctionReport a correction
Record identifier559b8e7b-80bf-4aec-a2a0-33ddb4572af4
Record created2016-08-04
Record modified2016-08-04
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)