Preserving individual privacy is one of the major issues in the context of Big Data, since handling huge volumes of data may contribute to the disclosure of sensitive or personally identifiable information. In fact, even when data is anonymized there is a risk of re-identification through privacy attacks. This paper presents a re-identification risk-based anonymization framework for big data analytics platforms. This framework is based on anonymization policies and allows applying anonymization techniques and models in two stages: during the ETL process and before exporting the statistical results of data analytics. This second stage evaluates the data re-identification risk and increases the anonymity level if it is necessary to reduce this risk. Although generic, the plementation of the framework reported in this work was integrated into Ophidia as a case study. Privacy attacks were performed to check the effectiveness of the re-identification process. Results are promising, showing a low probability of reidentification in two different scenarios.
Where: EDCC 2018, European Dependable Computing Conference, September 2018, Iasi (Romania)