Automatic detecting documents containing personal health information

Download
  1. Get@NRC: Automatic detecting documents containing personal health information (Opens in a new window)
DOIResolve DOI: http://doi.org/10.1007/978-3-642-02976-9_46
AuthorSearch for: ; Search for: ; Search for: ; Search for: ; Search for:
TypeBook Chapter
Proceedings titleArtificial Intelligence in Medicine : 12th Conference on Artificial Intelligence in Medicine, AIME 2009, Verona, Italy, July 18-22, 2009. Proceedings
Series titleLecture Notes In Computer Science; Volume 5651
Conference12th Conference on Artificial Intelligence in Medicine (AIME 2009), July 18-22, 2009, Verona, Italy
ISSN0302-9743
ISBN978-3-642-02975-2
978-3-642-02976-9
Pages335344; # of pages: 10
AbstractWith the increasing usage of computers and Internet, personal health information (PHI) is distributed across multiple institutes and often scattered on multiple devices and stored in diverse formats. Non-traditional medical records such as emails and e-documents containing PHI are in a high risk of privacy leakage. We are facing the challenges of locating and managing PHI in the distributed environment. The goal of this study is to classify electronic documents into PHI and non-PHI. A supervised machine learning method was used for this text categorization task. Three classifiers: SVM, decision tree and Naive Bayesian were used and tested on three data sets. Lexical, semantic and syntactic features and their combinations were compared in terms of their effectiveness of classifying PHI documents. The results show that combining semantic and/or syntactic with lexical features is more effective than lexical features alone for PHI classification. The supervised machine learning method is effective in classifying documents into PHI and non-PHI.
Publication date
PublisherSpringer Berlin Heidelberg
LanguageEnglish
AffiliationNational Research Council Canada; NRC Institute for Information Technology
Peer reviewedYes
NPARC number19291881
Export citationExport as RIS
Report a correctionReport a correction
Record identifier41312d40-70a5-4b35-b3b1-9ecf45abf14a
Record created2012-01-24
Record modified2016-08-03
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)