Aggregation and privacy in multi-relational databases
; Jafer, Yasser
; Viktor, Herna L.
Information and Communication Technologies; National Research Council Canada; Information and Communication Technologies
10th Annual Conference on Privacy, Security and Trust (PST) 2012, 16-18 July 2012, Paris, France
Conference Proceedings : 10th Annual Conference on Privacy, Security and Trust
3D Imaging, Modeling and Visualization; Imagerie 3D, modélisation et visualisation
Visual Information Technology; Technologie de l'information visuelle
The aim of privacy-preserving data mining is to construct highly accurate predictive models while not disclosing
privacy information. Aggregation functions, such as sum and count are often used to pre-process the data prior to applying data mining techniques to relational databases. Often, it is implicitly assumed that the aggregated (or summarized) data are
less likely to lead to privacy violations during data mining. This paper investigates this claim, within the relational database domain. We introduce the PBIRD (Privacy Breach Investigation in Relational Databases) methodology. Our experimental results
show that aggregation potentially introduces new privacy violations. That is, potentially harmful attributes obtained with aggregation are often different from the ones obtained from nonaggregated databases. This indicates that, even when privacy is
enforced on non-aggregated data, it is not automatically enforced on the corresponding aggregated data. Consequently, special care
should be taken during model building in order to fully enforce privacy when the data are aggregated.