Secure Multi Party Computation Protocols for Collaborative Data Publishing

DOI : 10.17577/IJERTCONV2IS15014

Download Full-Text PDF Cite this Publication

Text Only Version

Secure Multi Party Computation Protocols for Collaborative Data Publishing

K . Sekar

Associate professor, Dept. of CSE

S. V. Engineering College for Women Tirupathi, chittoor, Andhra pradesh, India

N . Harish

M. Tech Student in CS

S.V. college of Engineering Tirupathi, chittoor, Andhra pradesh, India

K . Renuka

M. Tech Student in CSE

    1. college of Engineering Tirupathi, chittoor, Andhra pradesh, India

      AbstractCooperatively collecting scattered information from multiple servers and preventing sensible information from attackers is becoming a challenging task in distributed information system. We proposed an insider attack which will allow the data providers to collude from large collection of data subsets from other data providers. We address this new threat as M Privacy which satisfies a given constraint with the help of heuristic algorithm effectively which checks strategies to full fill high utility and m privacy from anonymyzed data effectively against any group of upto m colluding data providers with aware anonymization algorithm. Our Proposed protocol secure multi party computation protocols analyzes the data providers privacy through aware anonymiztion algorithm

      KeywordsPrivacy, security, integrity, protection, distributed databases

      The classification of attribute in a table is given as key attributes, quasi-identifier (QI) and sensitive attributes. The key attribute is said to be the identifiers, which must be removed before publishing to public, since the attacker can easily identify the particular individual details. For example, consider a table 2 which is student table which is being released by a college by removing the identifiers.

      1. K ANONYMITY METHOD

        Privacy The k-anonymity model requires that within any equivalence class of the micro data there are at least k-records. K-anonymity requires each tuple in the published table to be indistinguishable from at least k-1 other tuples. The idea in k- anonymity is to reduce the granularity of representation of the data in such a way that a given record cannot be distinguished from at least (k 1) other records [2]. In the given table 1, students details are provided such as Department, Age and Course.

        1. INTRODUCTION

          Privacy preserving is mainly used to prevent information disclosure. There are two type of information disclosure and they are Identity disclosure and attribute disclosure. Identity disclosure occurs when an individual is linked to a Particular record in the released table, such that attacker can easily identified from the release table. Attribute disclosure occurs when new information about some individuals is revealed. Privacy preserving is different from conventional data security. Privacy preservation techniques are mainly used to reduce the leakage of formation about the particular individual while the data are shared and released to public..

          The Anonymization process is carried out to change the data, before its being published to public. The two ways to achieve privacy are, first is to release limited data , so that personal information cannot be identified and second is to pre- compute heuristics and release them instead of any data.

          Various Anonymization techniques are being used to maintain privacy and high data utility and they are generalization, suppression, anatomization, permutation and perturbation [1]. Most of the privacy preserving methods use generalization techniques. Various methods are used for Privacy Preserving Data Mining and they are Statistical methods which include Randomization methods, Swapping, Micro Aggregation and Synthetic data generation and the next method is Group based anonymization methods which include K-anonymity, L-diversity, T-closeness.

          Table1. Students Micro Data

          S.No

          Department

          Age

          Course

          1

          ME

          20

          Mechanics

          2

          MME

          21

          Mechanics

          3

          ME

          20

          Mechanics

          4

          CHE

          22

          Algorithm

          5

          CHE

          23

          Psychology

          6

          CHM

          22

          Real Analysis

          7

          CSE

          26

          Algorithm

          8

          CSE

          25

          Algorithm

          9

          CSE

          26

          Mechanics

          In Table 2 provides 3 equivalent classes, here 3- anonymity by generalization is achieved.

          Table2. 3-Anonymous Students Micro Data

          S.No

          Department

          Age

          Course

          1

          M*

          20 21

          Mechanics

          2

          M*

          20 21

          Mechanics

          3

          M*

          20 21

          Mechanics

          4

          CH*

          22 23

          Algorithm

          5

          CH*

          22 23

          Psychology

          6

          CH*

          22 23

          Real Analysis

          7

          CS*

          25 26

          Algorithm

          8

          CS*

          25 26

          Algorithm

          9

          CS*

          25 26

          Mechanics

          K-anonymity cannot provide a safeguard against attribute disclosure. Various types of attacks are addressed in k-

          anonymity and they are homogeneity attack and the background knowledge attack. In table 2, the first equivalence class has courses as Mechanics, which is same for all students with in age (20-21). This type of attack is Said to be homogeneity attack.

          In the same way, in table 2 if a student is known who is doing CSE and he is not interested in mechanics, then it is easy to predict that particular student is from the third equivalent class with help of the background Knowledge of the particular person. This type of attack is considered background knowledge attack.

      2. L DIVERSITY METHOD

        -diversity is used to overcome the drawback of k- anonymity and tries to put constraints on minimum number of distinct values seen within an equivalence class for any sensitive attribute.

        Definition 1 (The -diversity Principle): n equivalence class is said to have -diversity if there are at least well – represented values for the sensitive attribute. A table is said to have -diversity if every equivalence class of the table has -diversity [3].

        The given table is said to be -diversified if every equivalence classes in the table contains at least well represented sensitive attribute values. -diversity must guarantee that the SA value of a particular person cannot be identified unless the adversary has enough background knowledge to eliminate 1 SA values in the person's EC.Several measures were proposed to quantify the meaning of well-represented of -diversity. These include entropy – diversity [3], recursive (c,)-diversity [3] and simple – diversity.

        There are two type of attacks faced in -diversity and they are Skwness attack and Similarity attack. The attributedisclosure cannot be overcome in -diversity, but identity disclosure is successfully handled.

      3. DISTRIBUTED DATA PUBLISHING

        After the text edit has been completed, the paper is ready for the template. Duplicate the template file by using the Save As command, and use the naming convention prescribed by your conference for the name of your paper. In this newly created file, highlight all of the contents and import your prepared text file. You are now ready to style your paper; use the scroll down window on the left of the MS Word Formatting toolbar.

        1. Authors and Affiliations

          The data are gathered from multiple users and they are collaborated [4] and two process can be carried out one is aggregation is done and then it is anonymized and another type is first the data are anonymized and then they are aggregated.

          In figure 1(b), the Collaborative data publishing is carried out successfully with help of trusted third party (TTP) or Secure Multi-Party Computation (SMC) protocols, that guarantees that the information or data about particular individual is not disclosed anywhere, the privacy is maintained with help of SMC and there will be better data utility. Here it is assumed that the data providers are semi

          honest. So certain protocols are set and the all data providers accept that protocol and they continue the process.

          T1

          P1

          T2 T3 T4

          T1 T2 T3 T4

          BK P0

          Figure 1(a). Anonymize-and-aggregate

          In figure 1(a), the data providers are T1, T2, T3 and T4, here the data provider anonymize their own data and then they are aggregated and represented as T* and they are provided to the final user.

          T1

          P1

          T2 T3 T4

          Trusted Third Party / SMC

          T*

          BK P0

          Figure 1(b). Aggregate-and-anonymize

          In figure 1(b), the whole data is collected from the data providers and they are aggregated using trusted third partyor SMC and then they are anonymized. In these two types of methods two types of attacks are faced and they are insider attack and outsider attack. If the attack is made by the data providers then they are treated as insider attack and if the attack is carried out by the outsider then that type of attack is said to be outside attack. Here it is mainly focused on insider attack.

          T1 T2 T3 T4

          Anonymization

          T*

          Figure 2.Collaborating 4 database of different providers

          Anonymization

          While collaborating data from different data providers, three types of algorithms are used here, to maintain privacy and they are

          • The notion of m-privacy algorithm

          • Heuristic algorithms

        Data provider-aware anonymization algorithm

        1. Notion of m-privacy algorithm:

          In notion of m-privacy algorithm, main aim is to prevent data of an individual in anonymized table from adversaries. Where m-adversaries is a coalition of data users with m data providers cooperating to breach privacy of anonymized records. Here constraint C is set and privacy is checked against C for the data in anonym zed data. M-privacy is defined with respect to privacy constraint C.

          C holds the truthfulness of record level. Privacy is maintained for duplicate record too. For example if same record is provided from two different hospitals, then the particular individual detail can be easily identified with help of background knowledge, but it can be prevented with help of constraint C.

          Monotonicity of privacy constraints is defined for a single equivalence group of records, i.e., a group of records that QI attributes share the same generalized values.

          Definition 2.2: (GENERALIZATION MONOTONICITY OF A PRIVACY CONSTRAINT [3], [6]) A privacy

          constraint C is generalization monotonic if and only if, for any two Equivalence groups A1(T) and A1(T*) that satisfy C, their union satisfies C as well C(A1(T)) = true & C(A1(T*)) = true so, C(A1(T) A1(T)) = true.

        2. Figures and Tables:

          In heuristic algorithm m-privacy is efficiently checked with respect to an EG monotonic constraint. Then, it is modified to check m-privacy with respect to a non-EG monotonic constraint. The main aim for heuristic for EG monotonic privacy constraints is to search the adversaries with effective pruning, so that no need to check m-adversaries.

          Here two types of pruning strategies are used and they are downward pruning and upward pruning. In Downward pruning approach, if a coalition does not maintain privacy, then the sub-coalition of m-adversaries is no need to be checked, since that too wont maintain privacy. In upward pruning process, If the coalition is able to maintain privacy, then the super coalition will also maintain privacy.

          The algorithm used here is Top-down algorithm and Bottomup algorithm. The Top-down algorithm uses downward pruning strategies such that The top down algorithm will check all (n 1)-adversaries first, then smaller coalitions up to all m-adversaries and the Bottom up algorithm uses upward pruning such that the bottom-up algorithm will check 0-adversary up to all m-adversaries. By using these algorithms the time needed to check m-adversaries is saved. And the process is carried out fast.

        3. Data provider-aware anonymization algorithm:

        The Data provider-aware anonymization algorithm is presented with adaptive m-privacy checking strategies to ensure high utility and m-privacy of anonymized data with efficiency. The above said algorithm is used on different condition, depending upon the data providers. The pruning

        strategies are selected according to the privacy and data utility and suitable algorithm is selected. Mostly top down algorithm with downward pruning is used which reduces m-adversary check. These are the three algorithm used in collaboration process to maintain privacy. Generalized.

        The above used algorithm runs with help of Trusted Third Party (TTP). The third party used here might be semi honest, and cant be trusted. To overcome this SMC protocol is used. Secure protocol verifies the privacy with respect to constraint

        1. SMC protocols are based on Shamirs secret sharing [7], encryption, and other secure schemas. SMC protocol uses bottom-up approach.

          TTP can identify if duplicate record occurs from the data providers, but SMC protocol cannot detect the duplicate record. SMC [5] is mainly used to control the insider attacker. The SMC uses two computation concepts and they are Ideal model and Real model paradigm

      4. M – ANONYMIZER

        The process carried out in m-Anonymizer is explained with help of flow chart. These are the following steps followed:

          • Data from m providers are collected andcollaborated.

          • Next step is to identify the split point which is splithorizontally until privacy is maintained.

          • After doing splitting, the privacy constraint C is selected such a way that ensures privacy for all individual data. M privacy is checked with respectto the Constraint C.

          • Next step is to check whether it is again split able, if it is possible then again the score is detectedand the process from step 2 is again carried out.

          • Next step is finding the privacy fitness score, which quantifies the level of privacy fulfillment of thegroup and the most suitable algorithm, is selected.

          • If it is not split able then the final anonym zed table is finalized, which maintains privacy anddata utility.

        M-anoymizer

        T1 T2 T3 Tn

        Split points identification

        M Privacy Verification

        Partitions

        Splitable?

        Splitting

        Split points scoring & selection

        Y

        T*

        Privacy constraints C

        N Generalized

        Figure 4(b).Average privacy fitness score per provider = 2.3

        2) m-Anonymizer runtime and query error for different anonymizers vs number of data records:

        Figure 3.m-Aonymizer

        1) M-Privacy verification runtime for different algorithms vs m:

        Figure 4(c).Computation time vs. m and the number of providers

        In Fig.4(c) shows the estimated computation time with varying m for both approaches. In addition, both approaches have comparable computation times with negligible differences

        Figure 1(b). Aggregate-and-anonymize

        In this experiment, we compare m-privacy verification heuristics against different attack powers, and different number of data providers. Fig. 4(a) shows computation time with varying m and nG for all heuristics.

      5. M – ANONYMIZER

While gathering data from different data provider, a new type of attack was identified that is insider attack. It occurs in between the data providers because of compromising, collision & privacy. Here we are using SMC (secure multiparty computation) protocol, in this protocol we use three algorithms to prevent privacy between providers they are notion of m-privacy, heuristic algorithm and adaptive provider aware algorithm. Our Experiment proved that m-privacy algorithm will be among three algorithms.

REFERENCES

  1. Slawomir Goryczka, Li Xiong, and Benjamin C. M. Fung, M- PRIVACY FOR COLLABORATIVE DATA PUBLISHING, Ieee Transactions On Knowledge And Data Engineering, 2013, Vol: 18No:10.

  2. Karthikeyan.B,Manikandan. G,Vaithiyanathan. V, A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING, Journal of Theoretical and Applied Information Technology,2011,Vol. 32 No.2.

  3. Samarati P., Sweeney L. Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement Through Generalization and Suppression. IEEE Symp. On Security and Privacy, 1998.

  4. AshwinMachanavajjhala, Johannes Gehrke, DanielKifer, Muthuramakrishnan Venkita subramaniam , -diversity: privacy beyond kanonymity, IEEE International Conference on Data Engineering, 2006, p. 24.

  5. N. Mohammed, B. C. M. Fung, P. C. K. Hung, and C. Lee,Centralized and distributed anonymization for high-dimensional healthcare data, ACM Trans. on Knowl. Discovery from Data, vol. 4,no. 4, pp. 18:1 18:33, October 2010.

  6. Y. Lindell and B. Pinkas, Secure multiparty computation for privacypreserving data mining, The Journal of Privacy and Confidentiality, vol. 1, no. 1, pp. 5998, 20.

  7. N. Li and T. Li, t-Closeness: Privacy beyond k-anonymity and ldiversity, in ICDE, 2007.

  8. A. Shamir, How to share a secret, Commun. ACM, vol. 22, no. 11,pp. 612613, nov 1979.

Leave a Reply