A Survival Study on Privacy Preservation of Data Sharing with Optimal Side Effects

P. Tamil Selvan; Dr. S. Veni

doi:10.17577/IJERTV4IS061036

Volume 04, Issue 06 (June 2015)

A Survival Study on Privacy Preservation of Data Sharing with Optimal Side Effects

DOI : 10.17577/IJERTV4IS061036

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 68
Total Downloads : 185
Authors : P. Tamil Selvan, Dr. S. Veni
Paper ID : IJERTV4IS061036
Volume & Issue : Volume 04, Issue 06 (June 2015)
DOI : http://dx.doi.org/10.17577/IJERTV4IS061036
Published (First Online): 27-06-2015
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

A Survival Study on Privacy Preservation of Data Sharing with Optimal Side Effects

P. Tamil Selvan,

Research Scholar, Department of Computer Science,

Karpagam University Coimbatore, India

Dr. S. Veni,

Research Supervisior, Department of Computer Science, Karpagam University Coimbatore, India

Abstract – Privacy preserving is a significant task in the success delivery of the data to the users using data mining techniques. Privacy preserving data mining (PPDM) protects the individual sensitive data while sharing to the public users. PPDM was used to reduce privacy threats by hiding sensitive information while allowing required information to be mined from databases. Many existing PPDM techniques like data sanitization do not hide the sensitive information. The authenticity of original database using sensitive item hiding technique also alters the originality of the database in data sanitization techniques. In this work, association rule mining technique is used to send data to all the users with optimal side effects. Our research work helps to maintain the individual privacy for these sensitive attributes

Keywords: Privacy Preserving Data Mining, Data Sanitization, Optimal Side Effects Sensitive Attributes.

I.INTRODUCTION

Privacy preserving data mining (PPDM) is a primary issue in determining the results of privatizing users data. PPDM techniques adopting sensitive item hiding alters the innovation of the record. The existing method of privacy- preservation results in the failure of information for data mining functions. The loss of information is taken as a loss of efficiency in privacy preserving data mining functions. Traditional data mining techniques examine database to identify potential relations between items. Several applications require protection beside the disclosure of private, confidential, or secure data. PPDM technique is used to minimize the privacy threats by hiding sensitive information when permitting the necessary information extracted from databases.

In PPDM, data sanitization is used to hide sensitive information with the minimum side effects for preserving the original database as reliable. The spontaneous method of data sanitization to hide sensitive information is straightly to remove sensitive information from amount of data. The key goal in many distributed methods for privacy-preserving data mining is to allow the computation of useful aggregate statistics over the whole data set without compromising the privacy of the individual data sets for various participants.

Privacy preserving data mining is an innovative analysis in data mining and also in statistical databases. In PPDM, data mining algorithms are examined for side effects attaining the data privacy. There are two fold concerns in privacy preserving data mining techniques. Initially, it is sensitive raw data that is secured from unauthorized access like identifiers, names, addresses adapted from original database for receiver of data which fails to compromise another persons privacy. Next, sensitive knowledge is eliminated from a database using data mining algorithms.

The privacy preservation data mining using association rule mining aim is to attain quality privacy preservation for distributed data mining with optimal side effects on the original database. This also improves the efficiency of privacy preserving association rule mining with constraint minimization and the privacy preserving mechanism with efficient data utility.

This paper is organized as follows: Section II discusses Privacy Preserving Data Mining, Section III shows the study and analysis of the existing privacy preserving techniques on Data mining, Section IV identifies the possible comparison between them and Section V discusses the limitation of privacy preserving data mining techniques.

LITERATURE SURVEY

Data mining is the combination of various fields counting machine learning, database systems, data visualization, statistics and information theory. From [1], Hiding-Missing-Artificial Utility (HMAU) algorithm is designed to hide sensitive information through item set removal. Though, HMAU algorithm fails to control the highest frequency in sensitive rules regarding current sensitive transaction. In [4], Fast Distributed Mining Algorithm presented secured mining of association rules in parallel distributed databases with its secure multi-party protocol for calculating combination of private subsets with two secure multi-party algorithms.

Perturbation-based PPDM to Multilevel Trust (MLT-PPDM) [2] facilitated flexibility and created perturbed copies of data for different trust levels, though, data owner is unable to forecast all possible trust level that requires prior requirement. As described in [3], Exact Knowledge Hiding through Database Extension arrives in optimal solution for hiding sensitive frequent item sets. It protects sensitive knowledge with minimum effect on sensitive item and tests on real-world data sets with minimum threshold. However, exact border-based are typically of lower quality and an increment observed in number of constraints are produced.

Privacy preserving mining of association rules from outsourced transaction databases [5] calculated encrypt/decrypt (E/D) module to alter client data sooner than it is distributed to server and regain true patterns with exact support. Though E/D module supposes the attacker which fails to contain knowledge on hiding aspect and relaxation break encryption scheme and takes privacy vulnerabilities. Logical framework [6] is planned to reveal secret information and databases that enclose nulls. Query answers implicitly informative lack of exposing original

content. But, query rewriting method does not include combination of nulls and negation.

In [7], Anonymous Publication of Sensitive Transactional Data changes the binary data into a band matrix by executing the variation of rows and columns in the unusual table. Numerous anonymization methods like generalization and bucketization are intended for privacy preserving, however methods fails to offer an efficient data utility. Slicing divides the data both horizontally and vertically and protects better data utility than generalization
[8] and tradeoff happens in controlling the constant attributes by minimizing the dimensionality.
PRIVACY PRESERVATION DATA MINING USING ASSOCIATION RULE MINING

In privacy preserving data mining techniques, association rules are used for examining and forecasting customer behavior. Increase in the demand for privacy, secure data mining is the development of techniques that includes the privacy and security with effective data circulating. Complexity of running data mining algorithms on private data is carried out in privacy-preserving data mining (PPDM) techniques. The key goal in many distributed methods for privacy-preserving data mining is to allow the computation of useful aggregate statistics over the whole data set without compromising the privacy of the individual data sets for various participants.
1. Minimization of Side Effects on Hiding Sensitive Itemsets in Privacy Preserving Data Mining
  
  Data mining is accepted to regain and examine knowledge from large amount of data. Privacy preserving data mining (PPDM) is used to reduce privacy threats by hiding sensitive information while permitting the necessary information mined from databases. Privacy information contains some personal or confidential information in business like social security numbers, home address, credit card numbers, credit ratings, purchasing behavior, and best-
  
  From figure.1 the main goal is to transfer the data from one sensitive itemsets to another sensitive item sets. When the non sensitive information is transferred, the side effects are created. Privacy preserving data mining (PPDM) resulted in major issue for hiding private, confidential, or secure information. The original database is sanitized for hiding sensitive information. The sensitive technique of data sanitization is employed to hide sensitive information which used to delete the sensitive information directly from amount of data. Data sanitization process has three side effects namely hiding failure, missing cost, and artificial cost.
  
  Hiding-missing-artificial utility (HMAU) algorithm is introduced for calculating the operations required to be removed for hiding sensitive itemsets by taking three dimensions as hiding failure dimension (HFD), missing item set dimension (MID), and artificial itemset dimension (AID). The transactions with sensitive item set are to be identified with the minimum HMAU values among transactions that eliminated from the database. Data sanitization is the general technique for sensitive knowledge from disclosure in PPDM.
2. Privacy-Preserving Mining of Association Rules
  
  An organization subcontracts its mining requirements like resources or capability to a third party service provider. The issues of association rule mining task is studied in privacy preserving framework. The main function of privacy preserving is carried out on privacy- preserving data mining (PPDM) techniques with grouping of frameworks. Feature of the patterns mined from the data is planned to share with parties than the data owner.
  
  Client Current Side
  
  Patterns of true Support
  
  Encrypt
  
  / Decrypt Module
  
  Original TDB D
  
  Serv er
  
  user
  
  selling commodity. In PPDM, data sanitization is employed to hide sensitive information with the minimal side effects
  
  for keeping the original database .
  
  Sensitive Itemsets
  
  Non- Sensitive Itemsets
  
  Mining Query
  
  Encrypted Encrypted Patterns
  
  Sensitive Itemsets
  
  Non- Sensitive Itemsets
  
  Goals
  
  Side Effects
  
  Figure. 1 Goals and Side Effects of Privacy Preserving
  
  Figure.2 Architecture of Mining-as-a-Service
  
  Figure. 2 describes the mining as-a service. The client/owner encrypts its data using encrypt/decrypt (ED) module is taken as black box from its viewpoint. This module is dependable for altering the input data into an encrypted database. The server performs data mining and sends the (encrypted) patterns to the owner. Encryption scheme has the property which returned supports are not true supports. The ED module recovers the true identity of the returned patterns as true supports.
  
  Slicing manage high-dimensional data without a clear separation of QIs and SAs. Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH) transform data into band matrix by performing permutations of rows and columns. Efficient linear-time heuristic creates anonymized groups based on data organization. Initially, an attack model is designed for the adversary and creates the environment
  
  knowledge for an exact solution. Idea of privacy requires for each ciphertext item where at least k1 distinct cipher items are identical from the item about their supports. Next, an encryption scheme called RobFrugal is the E/D module facilitates to change client data before it is shipped to the server.
  
  Table 4.1Tabulation for Privacy Level on Privacy Preserving Data Mining with Optimal Side Effects
  
  Third, E/D module allows for enhancing the true patterns and exact support, it creates and preserves a compact structure called synopsis. Privacy preserving mining provide the E/D module with an effective approach for preserving the synopsis in the form of attachments. Next, a formal analysis is designed on attack model for privacy preserving mining and verifies the probability an individual item, a transaction, or patterns that broken by the server are controlled by the owner by locating the anonymity threshold.
3. Anonymous Publication of Sensitive Transactional Data
Privacy Preserving Data Mining is the search of data mining side-effects on privacy that obtains a rising attention from the research community. Anonymity is the condition of having ones name or identity unknown or concealed. Anonymous provides valuable social ideas and allows individuals against institutions by bordering examination, however it is used by incorrect achievers to cover the events or avoid the ability to permit anonymous contact to services that avoid tracking of user's personal information and user behavior like user location, frequency of a service usage.

Two reorganization methods for sensitive transactional data are designed. The initial method changes the data into a band matrix by executing the variations of rows and columns in the innovative table. The changes obtain the benefits of data sparseness and find the nonzero openings near the main diagonal. The benefit is the neighboring rows containing high correlation. The second data transformation method depends on arranging regarding Gray encoding: the QID items in each transaction t are recognized as the Gray code of t. The transaction set is sorted consistent with the rank in the Gray sequence. The result of two methods is fed to an efficient linear-time heuristic which groups mutually the transactions resulting in the minimization of the search space of the solution. Since both data transformations capture correlation well, groups contain transactions with similar QID leading to increased data utility.

COMPARISON OF PRIVACY PRESERVATION DATA MINING WITH OPTIMAL SIDE EFFECTS AND

SUGGESTIONS

In order to compare the privacy perseverance in data mining using association rule mining, number of files is taken to execute the experiment. Various parameters are used to measure the privacy preserving of the data mining techniques.

A. Privacy Level

Privacy level is defined as the level at which the data is privately sent to the required user without showing to the public users. Privacy level increases the information delivery to the private users. It is measured in terms of percentage (%).

No. of Files (Number)	Privacy Level (%)
No. of Files (Number)	HMAU Algorithm	Privacy Preserving Data Mining (PPDM)	Nearest-Neighbor (NN) and Locality- Sensitive Hashing (LSH)
25	52	56	60
50	55	59	63
75	59	62	65
100	63	65	68
125	65	68	72
150	69	72	75
175	71	76	79
200	75	80	85

The privacy level comparison takes place on existing Hiding-Missing-Artificial Utility (HMAU) Algorithm, Privacy Preserving Data Mining (PPDM) and Nearest- Neighbor (NN) and Locality-Sensitive Hashing (LSH).

.Figure. 4.1 Privacy Level on Privacy Preserving Data Mining with

Optimal Side Effects

Figure 4.1describes the privacy level on Privacy preserving Data Mining. As the number of files increases, privacy level gets increased automatically. The experiment shows that Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH) greatly lift up the privacy level when compared with Hiding-Missing-Artificial Utility (HMAU) Algorithm and Privacy Preserving Data Mining (PPDM). Research in Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH) is 10 15 % higher private when compared to Hiding-Missing-Artiicial Utility (HMAU) Algorithm and 4-7 % higher private when compared with Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH).

B.Throughput

Throughput is defined as the rate of successful data delivery over a communication channel. Throughput increases the overall efficiency of the system. Throughput values provide the information about the delivery of the data while transferring the data to the various network channels. Throughput level is measured in terms of percentage (%).

Table 4.2 Tabulation for Throughput on Privacy Preserving Data Mining with Optimal Side Effects

No. of Files (Number)	Throughput (%)
No. of Files (Number)	HMAU Algorithm	Privacy Preserving Data Mining (PPDM)	Nearest- Neighbor (NN) and Locality- Sensitive
25	65	55	60
50	68	59	62
75	70	62	65
100	73	66	69
125	75	69	71
150	79	73	75
175	82	75	78
200	85	78	83

The throughput comparison takes place on existing Hiding-Missing-Artificial Utility (HMAU) Algorithm, Privacy Preserving Data Mining (PPDM) and Nearest- Neighbor (NN) and Locality-Sensitive Hashing (LSH).

Figure.4.2 Throughput on Privacy Preserving Data Mining with

Optimal Side Effects

Figure 4.2 shows that throughput level of Privacy Preserving Data Mining (PPDM). Research in Hiding- Missing-Artificial Utility (HMAU) Algorithm has higher throughput than Privacy Preserving Data Mining (PPDM) and Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH). Throughput of HMAU Algorithm is 8-15% higher than the Privacy Preserving Data Mining (PPDM) and 2-8 % higher than Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH).

Privacy Preserving Efficiency

Privacy preserving efficiency is defined as the rate at which the data is effectively transferred to correct user with high privacy. Privacy preserving efficiency plays an important part in delivering the data to the private user without showing the data to the public users. Privacy preserving efficiency is measured in terms percentage (%).

Table 4.3 Tabulation for Efficiency on Privacy Preserving Data Mining with Optimal Side Effects

No. of Files (Number)	Privacy Preserving Efficiency (%)
No. of Files (Number)	HMAU Algorithm	Privacy Preserving Data Mining (PPDM)	Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH)
25	48	62	54
50	50	65	57
75	55	68	60
100	59	71	63
125	62	75	65
150	65	78	67
175	67	81	71
200	70	85	75

Privacy preserving efficiency comparison takes place on existing Hiding-Missing-Artificial Utility (HMAU) Algorithm, Privacy Preserving Data Mining (PPDM) and Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH).

Figure 4.3 Efficiency on Privacy Preserving Data Mining with Optimal Side

Effects

Figure 4.3 shows the performance of efficiency of existing methods. Privacy Preserving Efficiency of Privacy Preserving Data Mining (PPDM) is comparatively higher than that of HMAU Algorithm and Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH). Research in Privacy Preserving Data Mining (PPDM) has 17-22% higher efficient than HMAU Algorithm and 10-13% higher efficient than Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH) technique.

DISCUSSION ON LIMITATION OF PRIVACY PRESERVING DATA MINING TECHNIQUES WITH

OPTIMAL SIDE EFFECTS

In Hiding-Missing-Artificial Utility (HMAU) algorithm, PPDM has significant issue for hiding private, confidential, or secure information. Highest frequency in sensitive rules is linked to current sensitive transaction. Noise addition and data modification are significant to hide sensitive information in PPDM is not implemented. Secure multi-party protocol use homomorphism encryption when computational costs are significantly higher. Logarithmic communication overhead happens only when size of intersection of two sets is cleared by a constant. Server obtains no information regarding other pairs in servers database. Privacy-preserving data mining (PPDM) framework supposes the attacker which fails to hold such knowledge. It is not appropriate for commercial privacy where the analytical properties are revealed.

In anti-discrimination techniques, indirect discrimination comprises rules or procedures which are not clearly revealing discriminatory attributes. Post processing approach fails to permit the data set which is to be distributed. The technique has huge effects on of changing minimum support and minimum confidence. In SQL standard, query rewriting methodology does not contain mixture of nulls and negation. SQL nulls in context are more significant queries.

(k, p)- Anonymity requires severe restriction on PR equality and results in serious pattern loss. K-anonymity model fails to deal with issues as it experiences severe pattern loss. Micro aggregation suffers pattern loss in uncontrolled manner. Requirement is not flexible and sometimes impossible to meet. Slicing associations between column values of a bucket and probable to lose the data utility. Column generalization also leads to the information loss. Connections among attributes in different columns are lost in marginal publication. Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH) Dimensionality reduction techniques are more efficient anonymization. Dimensionality mapping is efficient only for low- dimensional QIDs and not suited for the transactional data.
The future direction of using the privacy preserving data mining techniques are attaining the quality privacy preservation for distributed data mining with optimal side effects on the original database, improving the efficiency of privacy preserving association rule mining with constraint minimization and improving the privacy preserving mechanism with efficient data utility
CONCLUSION

Examination about the existing privacy preserving data mining techniques such as Hiding-Missing-Artificial Utility (HMAU) Algorithm, Nearest-Neighbor (NN) and Locality-Sensitive Hashing (LSH), Privacy Preserving Data Mining (PPDM). Novel Hiding-Missing-Artificial Utility (HMAU) algorithm hides sensitive itemsets through transaction deletion and transaction with maximal ratio of sensitive to non-sensitive is chosen to whole deletion. Though, the noise addition and data modification are important for hiding the sensitive information in PPDM is not employed. Nearest-Neighbor (NN) and Locality-

Sensitive Hashing (LSH) transform data into band matrix by performing permutations of rows and columns. However, mapping of dimensionality is efficient only for low- dimensional QIDs and does not suit for the transactional data.

In PPDM, encrypt/decrypt (E/D) module employs to transform clent data before it shipped to server. E/D module recovers true patterns and their correct support. However, framework which containing the attacker does not contains such knowledge. Relaxations break the encryption scheme and provide the privacy vulnerabilities. PPDM is not suited for corporate privacy where the analytical properties are revealed. Observation was increasing the privacy preserving data mining efficiency using association rule mining techniques. The wide range of experiments on existing techniques calculates the relative performance of the various privacy preserving techniques and its limitations. The result shows that the research work can be done in the privacy preserving data mining techniques with minimal side effects which increases the privacy preserving efficiency.

REFERENCES

Chun-Wei Lin, Tzung-Pei Hong, and Hung-Chuan Hsu, Reducing Side Effects of Hiding Sensitive Itemsets in Privacy Preserving Data Mining, Hindawi Publishing Corporation, e Scientific World Journal Volume 2014.
Yaping Li, Minghua Chen, Qiwei Li, and Wei Zhang, Enabling Multilevel Trust in Privacy Preserving Data Mining, IEEE Transaction on Knowledge and Data Engineering, SEPTEMBER 2012
Aris Gkoulalas-Divanis., and Vassilios S. Verykios., Exact Knowledge Hiding through Database Extension, IEEE Transaction on Knowledge and Data Engineering, MAY 2009
Tamir Tassa., Secure Mining of Association Rules in Horizontally Distributed Databases, IEEE Transactions on Knowledge and Data Engineering, VOL. 26, NO. 4, APRIL 2014
Fosca Giannotti., Laks V. S. Lakshmanan., Anna Monreale., Dino Pedreschi., and Hui (Wendy) Wang., Privacy-Preserving Mining of Association Rules From Outsourced Transaction Databases IEEE Systems Journal, VOL. 7, NO. 3, September 2013
Leopoldo Bertossi and Lechen Li, Achieving Data Privacy through Secrecy Views and Null-Based Virtual Updates IEEE Transactions on Knowledge and Data Engineering, MAY 2013
Tiancheng Li., Ninghui Li., Jian Zhang., Ian Molloy., Slicing: A New Approach to Privacy Preserving Data Publishing, IEEE Transactions on Knowledge and Data Engineering, Volume: 24, Issue: 3, 2012
Gabriel Ghinita., Panos Kalnis., and Yufei Tao., Anonymous Publication of Sensitive Transactional Data, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO: 2, FEBRUARY 2011

Tamil Selvan P. completed his M.Phil in Computer Science from Karpagam University in 2009. He is working as Assistant Professor in Department of Computer Science, Karpagam University, Coimbatore. His experience is 7 yrs. He has presented a paper in International

Conference. His research interests are Data mining and warehousing.

Dr. S. Veni completed her Ph.D in Computer Science from Bharathiar University in 2014. she is working as Associate Professor in Department of Computer Science, Karpagam University,

Coimbatore. Her experience is 12 yrs. she has presented various papers in National and International Conference. Her research interests are Computer Networks.

A Survival Study on Privacy Preservation of Data Sharing with Optimal Side Effects

Leave a Reply