A Frequent Data Mining Technique for Transactional Data

P Geetha

doi:10.17577/IJERTCONV3IS30027

RTPPTDM - 2015 (Volume 3 - Issue 30)

A Frequent Data Mining Technique for Transactional Data

DOI : 10.17577/IJERTCONV3IS30027

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 1,351
Total Downloads : 10
Authors : P Geetha
Paper ID : IJERTCONV3IS30027
Volume & Issue : RTPPTDM – 2015 (Volume 3 – Issue 30)
Published (First Online): 30-07-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

A Frequent Data Mining Technique for Transactional Data

P Geetha

Assistant Professor, Department of Computer Science

Dr.Umayal Ramanathan College for Women, Karaikudi-3.

Abstract–In Data Mining research extracting frequent itemset has been considered as an important task. Apriori is a classical algorithm for mining frequent itemset. But it is not efficient in number of datasets scanning. Based on this algorithm, this paper proposed a modified algorithm of Apriori that improves the efficiency by reducing the wasted time in a number of database scanning. The paper explained this concept with an example.

INTRODUCTION

Associative rules are one of the main techniques of Data mining. In Day to day activities the volume of data increased dramatically for many technologies to help in business such as cross marketing, inventory control, finding faults in telecom network, Basket data analysis promotion assortment. The aim of data mining process is to extract information from a dataset and transform it into an understandable structure.

Associative rule is mainly used to discover frequent itemsets. The data have traditionally focused on identifying the relationship between item telling some aspect of human behavior, usually buying behavior for determining items that customer buys together.

Apriori algorithm represents the candidate generative approach. It generates candidate (k+1) itemsets based on frequent k-itemsets.

MODIFIED ALGORITHM OF APRIORI

In Apriori algorithm we are getting a number of iterations. But using this algorithm the number of iterations are reduced and extract the frequent itemset from the largest database. Based on this, it is possible to reduce the time consumed in transaction scanning for candidate itemset. When the k-itemset increases, the transaction between modified Apriori and original Apriori increases from the view of time consuming.The following steps are needed to extract the frequent itemsets in given time.

Step 1: Scan all the transactions Step 2: Generate a table, L1

Step 3: calculate size of the transaction (SOT) for each transaction, count the support for each item and keep the transaction ID.

Step 4: Construct candidate item set of self going (C) Step 5: Get the desired item set based on SOT.

Step 6: Use L to identify the target transactions for C Step 7: Scan the target transactions to generate CK.

A. An example of modified Algorithm

Suppose we have id transaction and the minimum support the transaction set is shown in the following table1. Table:1

Tid	items
T1	I1,I2,I5
T2	I2,I4
T3	I2
T4	I2,I4
T5	I3
T6	I1,I2,I4
T7	I1,I3
T8	I2,I3
T9	I2
T10	I1,I2,I3,I5
T11	I1,I2,I3
T12	I1,I3

Initially then calculate SOT for each transaction that is as follows.

Table:2

Tid	I1	I2	I3	I4	I5	SOT
T1	1	1	0	0	1	3
T2	0	1	0	1	0	2
T3	0	1	0	0	0	1
T4	0	1	0	1	0	2
T5	0	0	1	0	0	1
T6	1	1	0	1	0	3
T7	1	0	1	0	0	2
T8	0	1	1	0	0	2
T9	0	1	0	0	0	1
T10	1	1	1	0	1	4
T11	1	1	1	0	0	3
T12	1	0	1	0	0	2

Then, scan all transactions to get frequent

1itemset which contains items support count and the transaction id (transit) and eliminate the in frequent itemset (support less than minimum support)

Item	Support	Min	T_IDs
I1,I2,I3	2	I1	T1,T6,T7,T10,T11,T12
I1,I2,I4	1	I4	T2,T4,T6
I1,I3,I4	–	I4	T2,T4,T6
I2,I3,I4	–	I4	T2,T4,T6

Item	Support	Min	T_IDs
I1,I2,I3	2	I1	T1,T6,T7,T10,T11,T12
I1,I2,I4	1	I4	T2,T4,T6
I1,I3,I4	–	I4	T2,T4,T6
I2,I3,I4	–	I4	T2,T4,T6

Items	Support	T IDS
I1	6	T1,T6,T7,T10, T11,T12
I2	9	T1,T2,T3,T4,T6,T8,T9,T10,T11
I3	6	T5,T7,T8,T10.T11,T12
I4	3	T2,T4,T6
I5	2	T1,T10(Deleted)

Items	Support	T IDS
I1	6	T1,T6,T7,T10, T11,T12
I2	9	T1,T2,T3,T4,T6,T8,T9,T10,T11
I3	6	T5,T7,T8,T10.T11,T12
I4	3	T2,T4,T6
I5	2	T1,T10(Deleted)

Table:3

The next step is to generate 2-itemset candidate set from L, Select the transaction based on SOT. Transaction size which is greater than or equal to 2 are considered in this time.

Here it removes all the transactions because support is less than equal to 2.

CONCLUSION

The typical Apriori algorithm has some bottleneck in performance for reduced the no of transaction to be scnned so it needs to optimize the algorithm. This paper proposed a new modified algorithm for overcoming this problem.

Tid	I1	I2	I3	I4	I5	SOT
T1	1	1	0	0	1	3
T2	0	1	0	1	0	2
T4	0	1	0	1	0	2
T6	1	1	0	1	0	3
T7	1	0	1	0	0	2
T8	0	1	1	0	0	2
T10	1	1	1	0	1	4
T11	1	1	1	0	0	3
T12	1	0	1	0	0	2

Tid	I1	I2	I3	I4	I5	SOT
T1	1	1	0	0	1	3
T2	0	1	0	1	0	2
T4	0	1	0	1	0	2
T6	1	1	0	1	0	3
T7	1	0	1	0	0	2
T8	0	1	1	0	0	2
T10	1	1	1	0	1	4
T11	1	1	1	0	0	3
T12	1	0	1	0	0	2

So if SOT <2 are eliminated from the list now totally 9 transactions are available the list reduces the no% of scans (transaction t3,t5 and t9 are removed)

Table:4

Items	Support	Min	T id
I1,I2	4	I1	T1,T6,T7,T10, T11,T12
I1,I3	4	I1	T1,T6,T7,T10, T11,T12
I1,I4	1	I4	T2,T4,T6 (Deleted)
I2,I3	3	I3	T5,T7,T8,T10,T11,T12
I2,I4	2	I4	T2,T4,T6 (Deleted)
I3,I4	–	I5	T1,T10 (Deleted)

Then generate the 2- item set based on the following table the removed the transaction id is available in IDs list there is no need to check that transaction it will reduce the number of scanning.

The same thing to generate 3- item set depending on the modified SOT table here it considered the transaction which is greater than or equal to 3(SOT>=3) Table:5

TID	I1	I2	I3	I4	I5	SOT
TI	1	1	0	0	1	3
T2	0	1	0	1	0	2 (deleted)
deT3	0	1	0	0	0	1 (deleted)
T4	0	1	0	1	0	2 (deleted)
T5	0	0	1	0	0	1 (deleted)
T6	1	1	0	1	0	3
T7	1	0	1	0	0	2 (deleted)
T8	0	1	1	0	0	2 (deleted)
T9	0	1	0	0	0	1(deleted)
T10	1	1	1	0	1	4
T11	1	1	1	0	0	3
T12	1	0	1	0	0	2 (deleted)

Then generate 3-itemset bend on the transaction IDs

The performance of modified algorithm is optimized and can be extracted the knowledge from large database faster. This paper proposed an idea for reducing the number of scanning. In future, it will be implemented and check the performance with typical algorithm.

REFERENCES

Rakesh Agarwal, Ramakrishnan srikant, Fast algorithms for mining Association Rules, proceeding of the 20th VLDB conference Santiago, chile, 1994.
Rakesh Agarwal, Tomas Zlmielinski, Arun Swami , Mining Association Rules between sets of items in large Databases, proceedings of the 1993 ACM SIGMOD conference Washington DC, USA, May1993.
Anurag Chouboy, Ravindra patel, J.L. Rana, A Survey of efficient algorithms and New approach for fast discovery of frequent item set for Association Rule Mining.
Dr.E.Ramaraj, K.Kavitha,Transaction Reduction in Actionable Pattern Mining for High voluminous datasets based on Bitmap and Class Labels International Journal on Computer Science and Engineering.
Bayardo, R.J., Efficiently mining long patterns fromdatabases. Proceedings of the1998 ACM SIGMODInternationalConferenceonManagement of Data, June 1-4, 1998. New York, USA., pp: 85-93.
Bucila, C., J. Gehrke, D. Kifer and W.White, DualMiner: A dual pruningalgorithmforitemsets with constraints.Data Mining andKnowledgeDiscovery, 72003. 241-272.
Calders, T. and B. Goethals, Depth-firstnon-derivable itemset mining. Proc. SIAMInt. Conf. Data Min., 119 2005. pp: 250-261.
Cheung, D.W., J. Han, V.T. Ng and C.Y.Wong, Maintenance of discoveredassociation rules in large databases: Anincremental updating technique.Proceedings of International Conference on Data Engineering, Feb. 26-Mar. 1, 1996.New Orleans, Louisiana, pp: 106-114.
Park, J.S., Chen M.S.and Yu, P.S. Efficientparallel mining for association rules.Proceedings of the 4th International Conference on Information and KnowledgeManagement, Nov. 29-Dec. 2, 1995.Baltimore, MD., pp: 31-36.

A Frequent Data Mining Technique for Transactional Data

Leave a Reply