Overview of Anonymization with Multi-Dimensional Sensitivity-based Anonymization

Keywords: Encryption; Big data; Cloud Computing; Data privacy; Anonymization;

Introduction:

In the last decade, due to the increase of using the internet and web-based applications, data security concerns turn out to be important than ever before. Big data gives many opportunities to advance science by improving health care promoting economic growth and reforming the education system. Recently, the effect of deep learning models is concerned with model size and training data set. Moreover, deep learning is extensively used at learning non-learning features from complex data and widely used in image recognition, feature extraction, classification, and prediction. The collection of heterogeneous data is more convenient nowadays. [1] Vulnerability raises major concerns in security-sensitive environments. Leakage of private data is caused by network attacks or eavesdropping tricks yet these opportunities come with some challenges related to data security and privacy. Encryption algorithms play an essential role in information security systems. However, many cryptosystems have been developed and widely used for information security. But few attacks such as triggered by the adversary to interrupt the analytics process which is harmful and cause the endless loops and effects on the user end. To overcome such adversarial attacks few privacy-preserving techniques are being proposed which are based on anonymization of data, the list is following;

K- Anonymity

L-Diversity

T- Closeness

Randomization

Data Distribution

Multidimensional Sensitivity Based Anonymization (MDSBA).

1. In this report we will try to cover the concept of MDSBA and their methodology and related work, also other popular anonymity methods known as K-Anonymity [2]. To make it more understandable we will briefly discuss the related work and then discuss the benefit and cons of MBSBA. The well-known method for data anonymization identified as k-anonymity was offered for conventional data. It is a dimensional method which is not suitable and disturbed the anonymized data and lessen the information gained. To solve such issue few anonymized method is proposed by different researcher such as ℓ-diversity [3], and (X, Y) Anonymity [4]. Such techniques still can’t resolve the actual problem which is a one-dimensional concern, however, data is arranged in a multi-dimensional pattern. Later on multi-dimensional LKC- Privacy techniques were proposed by the researcher to overcome the one-dimensional data structure. This method is also known as Multi-Dimensional Top-Down anonymization it specializes data on best scores [5]. However in big data procedures and concepts are different it operated on a parallel distributed environment, MapReduce is one of the techniques which affect scalability and performance. In a two-phase Multi-Dimensional Top-down it breaks down the large data into small chunks which gives the negative effect of the anonymized information and causes the loss of information. Also in this method, each chunk of data requires n times of iteration to get the best score. However, iterations lock nodes until the end of the process, which affects and disturbs the parallel computing principle. In [6] Multi-Dimensional Sensitivity-Based Anonymization method (MDSBA) it provides bottom-up anonymization with distributed in parallel process and also it decreases the MapReduce. According to the author and experiments after making a comparison with the result this method gives a data owner a multi-levels of anonymizations for the multi-access level of users. Also in this method, it merges anonymization in one domain. MDSBA enforces secure computation on user’s access and running processes because this method is embedded in (RBAC) role-based access control model. RBAC is selected over MAC because of scalability and flexibility on the user level for access and control. Also in MDSBA method they use four quasi-identifiers for data anonymization, and data is vertically separated into different groups, this function helps to protect users from the contextual familiarity of attacks. This method is only to give the best score when it is used for large scale data, and it cannot have used for streaming data. The MDSBA method aims to define the privacy method and masking pattern for each access level.

The algorithm for MDBSA is following [6]:

1. Data owner determines the object Q-IDs, and the object sensitive attributes S.

2. Data owner determines the Obsolescence object value, aging participation percentage, and object age.

3. System determines the sensitivity level of S, based on the user access-level.

4. System groups the records as per equivalent sensitive value. 4

5. System applies MDSBA to find the best-anonymised Q-IDs

6. System applies MDSBA to determine the appropriate masking pattern.

7. The RBAC protects the anonymization processes.

Definition of MDBSA is given below:

Sensitivity Level (ψ) implies a scale of data anonymization prominence, so the anonymized data T delegates a multi-level of distorted data.

Definition 2: The k-anonymity is the maximum equivalent records number of the ownership level k̄. Hence, k̄= k-i, where i ∈ ℤ, and i= {k-1, …, 0} and k̄≤ k.

To make it more understandable we will define the algorithm, in which MDBSA has three steps involves which are grouping, calculating sensitivity and applying proper masking and specialization pattern. In fig 1 algorithm explained by author about the steps and the functionality. In the initial step they aggregate the records as per equivalent sensitive values, whereas each group joins a separate domain G where G={G0,G1,G2…Gn}, and each Gi domain represents one sensitive value only.

Conclusion

In this article, we focused on a privacy issue on big data and anonymization, we also briefly discuss the K-anonymity. Also compared with a popular multidimensional privacy method, known by MDTDS. MDSBA is examined for a scalable multi-access level in conventional data.

Reference:

[1] https://mc.ai/overview-on-advanced-encryption-standard-aes/

[2] K. G. Shin, X. Ju, Z. Chen, and X. Hu, “Privacy protection for users of location-based services,” IEEE Wireless Communications, vol. 19, pp. 30–39, 2012.

[3] P. Russom, “Big data analytics,” TDWI Best Practices Report, Fourth Quarter, pp. 1–35, 2011.

[4] P. Samarati, “Protecting respondents identities in microdata release,” Knowledge and Data Engineering, IEEE Transactions on, vol. 13, pp. 1010–1027, 2001.

[5] L. T. Y. Xuyun Zhang, Chang Liu, Jinjun Chen,, “A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud,” IEEE, 2014..

[6] Al-Zobbi, Mohammed, Seyed Shahrestani, and Chun Ruan. “Improving MapReduce privacy by implementing multi-dimensional sensitivity-based anonymization.” Journal of Big Data 4.1 (2017): 45.

Researcher at NBIC Shandong Provincial Key Lab,China