Apprentissage d'un espace de concepts de mots pour une nouvelle représentation des données textuelles

Download
  1. (PDF, 683 KB)
  2. Get@NRC: Apprentissage d'un espace de concepts de mots pour une nouvelle représentation des données textuelles (Opens in a new window)
DOIResolve DOI: http://doi.org/10.3166/dn.13.1.63-82
AuthorSearch for: ; Search for: ; Search for: ; Search for:
TypeArticle
Journal titleDocument numérique
Volume13
Issue1
Pages6382; # of pages: 20
SubjectUnsupervised learning; Term clustering; Document Clustering
AbstractIn this paper, we present an unsupervised learning technique for dimensionality reduction of textual data. This approach is based on the assumption that terms co-occuring in the same context with the same frequency are semantically related. We hence find term clusters using a classifiant version of the EM algorithm (CEM) and documents are then represented in the space of these term clusters. We then generalize this approach by extending the PLSA model for a simulataneous clustering of documents and terms. We evaluate our techniques on the task of document clustering and show the effectiveness of our approach on three standard classification collections of Reuters, 20Newsgroups adn WebKB
Publication date
LanguageFrench
Peer reviewedYes
NRC publication
This is a non-NRC publication

"Non-NRC publications" are publications authored by NRC employees prior to their employment by NRC.

NPARC number16488511
Export citationExport as RIS
Report a correctionReport a correction
Record identifier2bfacf2f-62e5-4b66-aef0-43cc7641b8d6
Record created2010-12-03
Record modified2016-07-14
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)