Privacy Disclosure and Preservation in Learning with Multi-Relational Databases

Télécharger
  1. (PDF, 1 Mo)
  2. Obtenir@CNRC : Privacy Disclosure and Preservation in Learning with Multi-Relational Databases (Ouvre dans une nouvelle fenêtre)
DOITrouver le DOI : http://doi.org/10.5626/JCSE.2011.5.3.183
AuteurRechercher : ; Rechercher : ; Rechercher :
TypeArticle
Titre de la revueJournal of Computing Science and Engineering (JCSE)
Volume5
Numéro3
Pages183196; nbre. de pages : 14
SujetPrivacy preserving data mining; multi-relational mining; Relational database
RésuméThere has recently been a surge of interest in relational database mining that aims to discover useful patterns across multiple interlinked database relations. It is crucial for a learning algorithm to explore the multiple inter-connected relations so that important attributes are not excluded when mining such relational repositories. However, from a data privacy perspective, it becomes difficult to identify all possible relationships between attributes from the different relations, considering a complex database schema. That is, seemingly harmless attributes may be linked to confidential information, leading to data leaks when building a model. Thus, we are at risk of disclosing unwanted knowledge when publishing the results of a data mining exercise. For instance, consider a financial database classification task to determine whether a loan is considered high risk. Suppose that we are aware that the database contains another confidential attribute, such as income level, that should not be divulged. One may thus choose to eliminate, or distort, the income level from the database to prevent potential privacy leakage. However, even after distortion, a learning model against the modified database may accurately determine the income level values. It follows that the database is still unsafe and may be compromised. This paper demonstrates this potential for privacy leakage in multi-relational classification and illustrates how such potential leaks may be detected. We propose a method to generate a ranked list of subschemas that maintains the predictive performance on the class attribute, while limiting the disclosure risk, and predictive accuracy, of confidential attributes. We illustrate and demonstrate the effectiveness of our method against a financial database and an insurance database.
Date de publication
Maison d’éditionKorean Institute of Information Scientists and Engineers
Langueanglais
AffiliationInstitut de technologie de l'information du CNRC; Conseil national de recherches Canada
Publications évaluées par des pairsOui
Numéro NPARC19291031
Exporter la noticeExport en format RIS
Signaler une correctionSignaler une correction
Identificateur de l’enregistrement2bd0285f-8b27-4b50-bd9b-75a968f686dd
Enregistrement créé2012-01-24
Enregistrement modifié2016-05-09
Mettre en signet et diffuser
  • Partagez cette page avec Facebook (Ouvre dans une nouvelle fenêtre)
  • Partagez cette page avec Twitter (Ouvre dans une nouvelle fenêtre)
  • Partagez cette page avec Google+ (Ouvre dans une nouvelle fenêtre)
  • Partagez cette page avec Delicious (Ouvre dans une nouvelle fenêtre)