Corpus-based learning of analogies and semantic relations

Download
  1. (PDF, 868 KB)
  2. Get@NRC: Corpus-based learning of analogies and semantic relations (Opens in a new window)
DOIResolve DOI: http://doi.org/10.1007/s10994-005-0913-1
AuthorSearch for: ; Search for:
TypeArticle
Journal titleMachine Learning
Volume60
Issue1-3
Pages251278; # of pages: 28
Subjectanalogy; metaphor; semantic relations; Vector Space Model; cosine similarity; noun-modifier pairs
AbstractWe present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning “A is to B as C is to D”; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct; the average college-bound senior high school student answers about 57% correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relation in noun-modifier pairs. The problem is to classify the noun-modifier pair, such as “laser printer”, according to semantic relation between the noun (printer) and the modifier (laser). We use supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations.
Publication date
LanguageEnglish
AffiliationNational Research Council Canada; NRC Institute for Information Technology
Peer reviewedNo
NRC number48273
NPARC number5765715
Export citationExport as RIS
Report a correctionReport a correction
Record identifierc90dbad5-9d17-4f74-bb76-78403f44f94f
Record created2009-03-29
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)