A Probabilistic Model for Fast and Confident Categorisation of Textual Documents

  1. (PDF, 330 KB)
AuthorSearch for:
EditorSearch for: Berry, Michael W.; Search for: Castellanos, Malu
TypeBook Chapter
Book titleSurvey of Text Mining II: Clustering, Classification, and Retrieval
AbstractWe describe the National Research Council's (NRC) entry in the Anomaly Detection/Text Mining competition organized at the Text Mining Workshop 2007. This entry relies on a straightforward implementation of a probabilistic categorizer described earlier [GGPC02]. This categorizer is adapted to handle multiple labeling and a piecewise-linear confidence estimation layer is added to provide an estimate of the labeling confidence. This technique achieves a score of 1.689 on the test data. This model has potentially useful features and extensions such as the use of a category-specific decision layer or the extraction of descriptive category keywords from the probabilistic profile.
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number49829
NPARC number5764844
Export citationExport as RIS
Report a correctionReport a correction
Record identifier05e3038a-f734-4b14-bcc4-d90f41df31e8
Record created2009-03-29
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)