Reducing re-identification risk of anonymised big data analytics – An improved framework from ATMOSPHERE

European and Brazilian Research Innovation Action cloud computing project ATMOSPHERE (Adaptive, Trustworthy, Manageable, Orchestrated, Secure Privacy-assuring Hybrid, Ecosystem for REsilient Cloud Computing), presented a new framework that increases the privacy protection of anonymized information manipulated by data analytics platforms at the EDCC 2018 (European Dependable Computing Conference) last September in Romania.

Through this new framework, the probability of data reidentification decreased, reducing the harmful impact of future privacy attacks.

EDCC is a leading venue for presenting and discussing the latest research, industrial practice and innovations in dependable and secure computing, since 1994. ATMOSPHERE joined researchers and practitioners to present and discuss latest research results on theory, techniques, systems, and tools for the design, validation, operation and evaluation of dependable and secure computing systems.

Increasing privacy protection of anonymized data, with low impact on data utility

Large amounts of data about individuals are analysed through big data analytics platforms. While using these platforms, there’s a possibility that sensitive data is released to malicious actors, raising privacy concerns.

A widely-used approach to address privacy preservation in big data analytics is data anonymization: the process of changing data that will be used or published in a way that prevents the identification of key information.

However, selecting and applying these techniques is not an obvious task because they can reduce the data utility, and this can lead to incorrect results and conclusions when mining data.

ATMOSPHERE presents paper highlighting a new privacy protection framework based on anonymization and re-identification risk, for big data analytics platforms.

The paper “A Re-identification Risk-based Anonymization Framework for Data Analytics Platforms”, written by ATMOSPHERE researchers and brought to EDCC 2018, presents a better equilibrium on the trade-off between preserving privacy, and preserving the usefulness of data handled by these platforms.

The solution proposed by ATMOSPHERE helps data scientists and privacy analysts identify the impact of data anonymization, allowing a conscious choice of applying these techniques in two stages: during the ETL (Extract, Transform, Load) process and before exporting the statistical results of data analytics.

The new component was integrated into Ophidia, a data analytics platform for managing multi-dimensional heterogeneous data sets. Linkage attacks were performed to evaluate the probability of de-anonymization and the results demonstrated that users were able to increase the privacy protection of individuals data with less impact in data utility.

READ THE FULL PAPER TO KNOW MORE