Adapting LDA Model to Discover Author-Topic Relations for Email Analysis

Download
  1. (PDF, 366 KB)
DOIResolve DOI: http://doi.org/10.1007/978-3-540-85836-2_32
AuthorSearch for: ; Search for: ; Search for: ; Search for:
TypeArticle
Conference10th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2008), September 1-5, 2008., Turin, Italy
Pages337346; # of pages: 10
AbstractAnalyzing the author and topic relations in email corpus is an important issue in both social network analysis and text mining. The Author-Topic model is a statistical method that identifies the author-topic relations. However, in its inference process, it ignores the information at the document level, i.e., the co-occurrence of words within documents are not taken into account in deriving topics. This may not be suitable for email analysis. We propose to adapt the Latent Dirichlet Allocation model for analyzing email corpus. This method takes into account both the author-document relations and the document-topic relations. We use the Author-Topic model as the baseline method and propose measures to compare our method against the Author-Topic model. We did empirical analysis based on experimental results on both simulated data sets and real Enron email data set to show that our method obtains better performance than the Author-Topic model.
Publication date
LanguageEnglish
AffiliationNRC Institute for Information Technology; National Research Council Canada; NRC Industrial Materials Institute
Peer reviewedNo
NRC number50384
NPARC number5765577
Export citationExport as RIS
Report a correctionReport a correction
Record identifier9a8cac81-1dcf-4a5a-b905-7b002bb891e8
Record created2009-03-29
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)