Learning the Ontological Positions of Natural Language Objects

Download
  1. (PDF, 765 KB)
AuthorSearch for:
TypeThesis
AbstractThis thesis endeavors to solve a text classification (TC) problem of a real-world system, New Brunswick Opportunities Network (NBON), an online tendering system that helps the vendors and the purchasing agents to provide and obtain information about business opportunities. The solution mainly involves techniques in the areas of machine learning and natural language processing (NLP). We use a Naïve Bayes classifier, a simple and effective machine learning approach for TC tasks, to automatically classify the tenders of the NBON system. We implement three smoothing algorithms for the Naïve Bayes classifier, namely, no-match, Laplace correction, Lidstone's law of succession, and we show that the difference between the accuracies obtained for the three algorithms is negligible. We show that the effectiveness of the Naïve Bayes classifier is better than that of three other TC techniques that are equally simple, namely, Strong Predictors (a modification of Term Frequency), TF-IDF (Term Frequency - Inverse Document Frequency), and WIDF (Weighted Inverse Document Frequency). NLP tools such as stop lists and stemmers are adopted for the text operations on the historic NBON data that is used to train the classifiers. We experiment with variations of such tools and show that NLP techniques do not have much impact on the effectiveness of a classifier.
Publication date
PublisherNational Research Council. Institute for Information Technology
LanguageEnglish
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number46523
NPARC number8913393
Export citationExport as RIS
Report a correctionReport a correction
Record identifier59b6207b-f9c1-4e19-abc5-52b257859f47
Record created2009-04-22
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)