Automatic classification and indexing: a supplement

  1. (PDF, 1 MB)
DOIResolve DOI:
AuthorSearch for:
TypeTechnical Report
Series titleERB; no. ERB-793
Physical description18 p.
AbstractThe occurrence of a word, one or more times, in a document is taken as an attribute of that document. Using a simple formula from Bayes probability, a probability is derived, based on that word, that the document belongs in a certain category. The procedure is applied to all the words of a document and the words are then ordered by probability to form a list. The procedure is also used to form category lists from existing categories although original categories could be formed. Document lists are compared to category lists and probability sums formed for indexing. Two sample category lists, derived from abstracts are given. Simple modifications show the ease of modifying list characteristics – two occurrences of a word, or occurrence in two documents being substituted for a single simple occurrence.
Publication date
PublisherNational Research Council of Canada, Radio and Electrical Engineering Division
AffiliationNational Research Council Canada
Peer reviewedNo
NPARC number21277228
Export citationExport as RIS
Report a correctionReport a correction
Record identifier8a11fe7a-f62e-4bad-9bcd-4b2ff50852e1
Record created2016-01-14
Record modified2016-10-03
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)