Identifying and Preventing Data Leakage in Multi-relational Classification

Download
  1. (PDF, 333 KB)
DOIResolve DOI: http://doi.org/10.1109/ICDMW.2010.33
AuthorSearch for: ; Search for: ; Search for:
TypeArticle
Proceedings title2010 IEEE International Conference on Data Mining Workshops
ConferenceIEEE International Conference on Data Mining Workshops, Dec 14-17, 2010, Sydney, Australia
Pages458465; # of pages: 8
AbstractRelational database mining, where data are mined across multiple relations, is increasingly commonplace. When considering a complex database schema, it becomes difficult to identify all possible relationships between attributes from the different relations. That is, seemingly harmless attributes may be linked to confidential information, leading to data leaks when building a model. In this way, we are at risk of disclosing unwanted knowledge when publishing the results of a data mining exercise. For instance, consider a financial database classification task to determine whether a loan is considered to be high risk. Suppose that we are aware that the database contains another confidential attribute, such as income level, which should not be divulged. In order to prevent potential privacy leakage, one may thus choose to eliminate, or distort, the income level from the database. However, even after distortion, a learning model against the modified database may accurately determine the income level values. It follows that the database is still unsafe and may be compromised. This paper demonstrates this potential for privacy leakage in multirelational classification and illustrates how such potential leaks may be detected. We propose a method to generate a ranked list of subschemas which maintains the predictive performance on the class attribute, while limiting the disclosure risk, and predictive accuracy, of confidential attributes. We illustrate our method against a financial database.
Publication date
LanguageEnglish
AffiliationNational Research Council Canada (NRC-CNRC); NRC Institute for Information Technology
Peer reviewedYes
NPARC number16285565
Export citationExport as RIS
Report a correctionReport a correction
Record identifier886ff751-891b-46e4-aead-bd81dbf5df2e
Record created2010-11-03
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)