- Open Access
- Total Downloads : 1058
- Authors : Ranjita Kumari Dash
- Paper ID : IJERTV2IS3226
- Volume & Issue : Volume 02, Issue 03 (March 2013)
- Published (First Online): 30-03-2013
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Selection Of The Best Classifier From Different Datasets Using WEKA
Selection Of The Best Classifier From Different Datasets Using WEKA
Ranjita kumari Dash
Assistant Professor, Institute Of Technical Education and Research, SOA University
Abstract
In todays world large amount of data is available in science, industry, business and many other areas. These data can provide valuable information which can be used by management for making important decisions. By using data mining we can find valuable information. Data mining is the popular topic among researchers. There is a lot of work that cannot be explored till now. But, this paper focuses on the fundamental concept of the Data mining that is Classification Techniques. In this paper, Naive Bays , Functions, Lazy, Meta, Nested dichotomies, Rules and Trees classifiers are used for the classification of data set. The performance of these classifiers analyzed with the help of correctly classified instances, incorrectly classified instances and time taken to build the model and the result can be shown statistical as well as graphically. WEKA data mining tool is used for this purpose. WEKA stands for Waikato Environment for Knowledge Analysis. Three datasets are used on which different classifiers are applied to check which classifier is giving the best result, where different measurements are taken.71 different classifiers are applied on this dataset. The dataset is in ARFF format.10 fold cross validation is used to provide better accuracy. Finally the classification technique which provides the best result will be suggested. The result shows that no single algorithm always performed the best for each dataset.
KEY TERMS
Bays Net, J48, Mean Absolute Error, Naive Bays, Root Mean-Squared Error
-
Introduction
Data mining is the process of extracting patterns from data [10, 11]. It is seen as an increasingly important tool by modern business to transform data as the technology advances and the need for efficient data analysis is required. Data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data set. It is currently used in a wide range of areas like marketing, surveillance, fraud detection, and scientific discovery etc.
In this paper we process a cancer dataset and use different classification methods to learn from the test data set.
Classification is a basic task in the data analysis that requires the construction of a classifier, that is, a function that assigns a class label to instances described by a set of attributes. It is one of the important applications of data mining. This technique predicts categorical class labels. In this paper, we are giving the comparison of various classification techniques using WEKA. Our aim is to investigate the performance of different classification methods using WEKA. Classification of data is very typical task in data mining. There are large number of classifiers that are used to classify the data such as Bayes, function, lazy learners, Meta, rule based and Decision tree etc. The goal of classification is to correctly predict the value.
For Breast cancer, there is a substantial amount of research with machine learning algorithm [1]. Machine learning covers such a broad range of processes that it is difficult to define precisely [6]. Young women being diagnosed in their teens, twenties and thirties. Even if the percentage is very low compared to that of older women aged 40 years and older [7, 8, 9]. 1% of all diagnosed breast cancers are in men. We report the case of a 34-year-old woman affected by
breast cancer that had metastasized to the bone. Today, about one in eight women over their lifetime have been affected by breast cancer in the United States. In recent years, the incidence rate keeps increasing. However the appropriate methods to predict the breast cancer survival have not been established. In this study, we use those models to evaluate the prediction rate of breast cancer patients from the perspectives of accuracy.
-
WEKA
WEKA stands for Waikato Environment for Knowledge Analysis. WEKA is created by researchers at the University of Waikato in New Zealand. WEKA was first implemented in its modern form in 1997. The GNU General Public License (GPL) is used here. The figure of WEKA is shown in the figure .The software is written in the Java language and contains a GUI for interacting with data files .For working of WEKA, we do not need the deep knowledge of data mining for which WEKA a very popular data is mining tool. WEKA also provides the graphical user interface of the user and provides many facilities. In this paper, we are giving the comparison of various classification techniques using WEKA. WEKA is a state of-the-art facility for developing machine learning (ML) techniques and their application to real-world data mining problems. The data file normally used by WEKA is in ARFF file format. ARFF stands for Attribute Relation File Format, which consists of special tags to indicate differentiating in the data file. WEKA implements algorithms for data pre-processing, classification, regression and clustering and association rules. It also includes visualization tools. It has a set of panels, each of which can be used to perform a certain task. The new machine learning schemes can also be developed with this package. WEKA is open source software issued under General Public License. The algorithms are applied directly to a dataset. The main features of WEKA includes
-
49 data pre-processing tools
-
76 classification/regression algorithms
-
8 clustering algorithms
-
15 attribute/subset evaluators + 10 search Algorithms for feature selection.
-
3 algorithms for finding association rules
-
3 graphical user interfaces
-
-
METHODS
This section describes the classification methods used in this paper. We discuss each method and explain how the method has been used in our experiment. For this Breast cancer dataset we have taken eight methods Bayes, Functions, Lazy, Meta, Misc, Nested dichotomies, Rules and Trees classifiers for the classification of data set.
-
NAIVE BAYES CLASSIFIER
Bayes methods are also used as one of the classification solutions in data mining. In our work we use six main Bayesian methods namely AODE, AODEsr, Naive Bayes, Bayesian net, Naive Bayes simple and Naive Bayes updateable, that are implemented in WEKA software for classification
.Naive Bayes is an extension of Bayes theorem in that it assumes independence of attributes[3]. This assumption is not strictly correct when considering classification based on text extraction from a document as there are relationships between the words that accumulate into concepts. Problems of this kind, called problems of supervised classification, are ubiquotous.Naive Bayes sometimes also called as idiot's Bayes, simple Bayes and independence Bayes.This is important for several reasons.
It is easy to construct without any need for complicated iterative parameter estimation schemes. This means it may be readily applied to huge datasets. It is robust, easy to interpret, and often does surprisingly well though it may not be the best classifier in any particular application.
-
FUNCTION CLASSIFIER
Function classifier uses the concept of neural network and regression. Here two examples from neural network and regression will be taken for discussing the scenario[2]. A multilayer perceptron is a free forward artificial neural netwrk model that maps sets of input data onto a set of appropriate output. It is a modification of the standard linear perceptron in that it uses three or more layers of neurons with nonlinear activation functions and it is more powerful than the perceptron in that it can distinguish data that is not linearly separable or separable by a hyperplane[4].A multilayer perceptron has distinctive characteristics. The model of each neuron in the network includes a non linear activation function. The network contains one or more layers of hidden neurons that are not part of the input or output of the network. These hidden
neurons enable the network to learn complex tasks by extracting progressively more meaningful features from the input patterns. The network exhibits a high degree of connectivity determined by the network. A change in the connectivity of the network requires a change in the population of synaptic connections on their weights[5].
-
RULES CLASSIFIER
Association rules are used to find interesting correction relationship among all the attributes. They may predict more than one conclusion. The number of records an association rule can predict correctly is called coverage. Support is defined as coverage divided by total number of records[5]. Accuracy is the number of records that is predicted correctly expressed as a percentage of all instances that are applied to the methods of this algorithm are Conjunctive Rule, Decision table,DTNB,JRip,NNge,Oner,Rider and Zero. Rules are easier to understand than large trees. One root is created for each path from the root to the leaf. Each attribute value pair along a path forms a conjunction. The leaf holds the class prediction. Rules are mutually exclusive. These are learned one at a time .Each time a rule is learned ,the tuples are covered by the rules are removed.
-
LAZY CLASSIFIER
When making a classification or prediction, lazy learners can be computationally expensive. They require efficient storage techniques and well suited to implementation on parallel hardware. They offer little explanation or insight into the structure of the data. Lazy learners however, naturally support incremental learning. They are able to model complex decision spaces having hyper polygonal shapes that may not be as easily describable by other learning algorithms. The methods of this algorithm are IBI, IBK,K- Star, LBK and LWL.
-
META CLASSIFIER
Meta classifier includes a wide range of classifier. When the attributes have a large number of values because the time and space complexities depend not only on the number of attributes, but also on the number of values for each attribute.
-
DECISSION TREES
Decision tree induction has been studied in details in both areas of pattern recognition and machine learning [13, 14]. This synthesizes the experience gained by people working in the area of machine learning and describes a computer program called ID3.
-
-
DISCUSSION AND RESULT
By investigating the performance on the selected classification methods or algorithms namely Bayes ,Function, Lazy ,Meta ,Rules ,Misc ,nested dichotomies and Trees we use the same experiment procedure as suggested by WEKA. The 75% data is used for training and the remaining is for testing purposes.
In WEKA, all data are considered as instances and features in the data are known as attributes. The simulation results are partitioned into several sub items for easier analysis and evaluation. On the first part, correctly and incorrectly classified instances will be partitioned in numeric and percentage value and subsequently time taken to build model will be in second .The results of the simulation are shown in Tables. These are the graphical representation of the simulation result. On the basis of comparison done over accuracy and error rates the classification techniques with highest accuracy are obtained for this dataset in given different machine learning tools. We can clearly see that the highest accuracy is 75.52% and the lowest is 51.74%.In fact, the highest accuracy belongs to the Meta classifier. The total time required to build the model is also a crucial parameter in comparing the classification algorithm. In this experiment, we can say that a single conjunctive rule learner requires the shortest time which is around 0.15 seconds compared to the others.
With the help of figures we are showing the working of various algorithms used in WEKA. We are showing also advantages and disadvantages of each algorithm. Every algorithm has their own importance and we use them on the behaviour of the data. Deep knowledge of algorithms is not required for working in WEKA. This is the main reason WEKA is more
suitable tool for data mining applications. This paper shows only the clustering operations in the WEKA, we will try to make a complete reference paper of WEKA.
Table for best algorithms:-
Name of algorithm
Correctly classified instance
Incorrectl y classified instances
Time taken to build the model
Bayesnet
72.028
27.972
0.03
Simple logistic
75.1748
24.8252
1.44
K-Star
73.4266
26.5734
0
Filtered classifier
75.5245
24.4755
0
Ordinal classifier
75.5245
24.4755
0.01
Misc
69.9301
30.0699
0
Decision Table
73.4266
26.5734
0.5
J48
75.5245
24.4755
0.01
correctly classified instances
correctly classified instances
76
75
74
73
72
71
70
69
68
67
76
75
74
73
72
71
70
69
68
67
Figure no-1
35
30
25
20
15
10
5
0
incorrectly
classcified instance
35
30
25
20
15
10
5
0
incorrectly
classcified instance
bayesnet
simple logistic
kstar filteredclasscifier oridinalclassclass
edmisc.hyperpipes
decission table
j48
bayesnet
simple logistic
kstar filteredclasscifier oridinalclassclass
edmisc.hyperpipes
decission table
j48
Figure no-2
bayesnet simple logistic
Figure no-3
correctly
classified instances
correctly
classified instances
bayesnet
simple logistic
kstar filteredclasscifier
oridinalclassclasscif edmisc.hyperpipes
decission table
j48
bayesnet
simple logistic
kstar filteredclasscifier
oridinalclassclasscif edmisc.hyperpipes
decission table
j48
4.2 Comparison between LUNG dataset, HEART dataset, DIABETES DATASET
Algorith m
Correctly Classified Instances in
%
Inco rrec tly Clas sifie d Inst anc es in
%
T P
R
a t e
FP
Rat e
Time taken to build mode l in secon ds (s)
Multilaye
100
0
0
0.4
0.2
r
.
36
Perceptr
7
on
5
Multiclas
77.2135
22.7
0
0.3
0.02
s
865
.
21
Classifier
7
7
2
SPegasos
77.7344
22.2
0
0.3
0.19
656
.
27
7
7
7
Table no -2(lung dataset,heardataset,diabetes dataset)
MultilayerPer ceptron
Multiclass Classifier
MultilayerPer ceptron
Multiclass Classifier
Figure no-4
Figure no-5
Figure no-6
MultilayerPer ceptron
Multiclass Classifier
MultilayerPer ceptron
Multiclass Classifier
MultilayerPer ceptron
Multiclass Classifier
MultilayerPer ceptron
Multiclass Classifier
MultilayerPerc eptron
MultilayerPerc eptron
MultilayerPerce ptron
Multiclass Classifier
SPegasos
MultilayerPerce ptron
Multiclass Classifier
SPegasos
Figure no-7
Figure no-8
-
References
-
D.Lavanya, Dr.K.Usha Rani,.., Analysis of feature selection with classification: Breast cancer datasets,Indian Journal of Computer Science and Engineering (IJCSE),October 2011.
-
E.Osuna, R.Freund, and F. Girosi, Training support vector machines: Application to face detection. Proceedings of computer vision and pattern recognition, Puerto Rico pp. 130136.1997.
-
Buntine, Theory refinement on Bayesian networks. In B.
D. DAmbrosio, P. Smets, & P.P. Bonissone (Eds.), In Press of Proceedings
of the Seventh Annual Conference on Uncertainty Artificial Intelligent (pp. 52-60). San Francisco, CA
-
S. V. Chakravarthy and J. Ghosh (1994), Scale Based Clustering
using Radial Basis Function Networks, In Press of Proceeding of
IEEE International Conference on Neural Networks, Orlando, Florida.
pp. 897-902. 5. M. D. Buhmann (2003), Radial Basis Functions: Theory and Implementations,
-
Howell, A.J. and Buxton, H. (2002). RBF Network Methods for Face
Detection and Attentional Frames, Neural Processing Letters (15),
Pp.197-2114. Daniel Grossman and Pedro Domingos (2004). Learning Bayesian
Network Classifiers by Maximizing Conditional Likelihood.
In Press
of Proceedings of the 21st International Conference on Machine
Learning, Banff, Canada.
-
U.S. Cancer Statistics Working Group. United States Cancer Statistics: 19992008 Incidence and Mortality Web- based Report. Atlanta (GA): Department of Health and Human Services, Centres for Disease Control and Prevention, and National Cancer Institute; 2012.
-
Lyon IAfRoC: World Cancer Report. International Agency for Research on Cancer Press 2003:188-193.
-
Elattar, Inas. Breast Cancer: Magnitude of the Problem, Egyptian Society of Surgical Oncology Conference, Taba, Sinai, in Egypt (30 March 1 April 2005).
-
Daniel F. Roses (2005). Clinical Assessment of Breast Cancer and Benign Breast Disease, In: Breast Cancer: Vol. 2, Ch. 14, M. N. Harris [editor], Churchill Livingstone, and Philadelphia.
-
S. Aruna et al. (2011). Knowledge based analysis of various statistical tools in detecting breast cancer.
-
J. Han and M. Kamber, (2000) Data Mining: Concepts and Techniques, Morgan Kaufmann.
-
William H. Wolberg, M.D., W. Nick Street, Ph.D., Dennis M. Heisey, Ph.D., Olvi L. Mangasarian, Ph.D. computerized breast cancer diagnosis and prognosis from fine needle aspirates, Western Surgical Association meeting in Palm Desert, California, November 14, 1994.
-
Chen, Y., Abraham, A., Yang, B.(2006), Feature Selection and Classification using Flexible Neural Tree. Journal of Neurocomputing 70(1-3): 305313.
-
K. Golnabi, et al., "Analysis of firewall policy rules using data mining techniques," 2006, pp. 305-315.
-
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis, In: Wiley-Interscience Publication, New York (1973)
-
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1999).
-
Vapnik, V.N., The Nature of Statistical Learning Theory, 1st ed., Springer-Verlag, New York, 1995.
-
Ross Quinlan, (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA.
-
Cabena, P., Hadjinian, P., Stadler, R., Verhees, J. and Zanasi, A. (1998). Discovering Data Mining: From Concept to Implementation, Upper Saddle River, N.J., Prentice Hall.
-
E.Osuna, R.Freund, and F. Girosi, Training support vector machines: Application to face detection. Proceedings of computer vision and pattern recognition, Puerto Rico pp. 130136.1997.
-
Vaibhav Narayan Chunekar, Hemant P. Ambulgekar (2009). Approach of Neural Network to Diagnose Breast Cancer on three different Data Set. 2009
-
D. Lavanya, Ensemble Decision Tree Classifier for Breast Cancer Data, International Journal of Information Technology Convergence and Services, vol. 2, no. 1, pp. 17- 24, Feb. 2012.
-
Yoav Freund, Robert E. Schapire, (1999) "Large Margin Classification Using the Perceptron Algorithm." In: Machine Learning, 37(3).
-
J.D.M.Rennie, L.Shih, J.Teevan, and D.R.Karger, 2003. Tackling the poor assumptions of naive bayes text classification. In ICML2003, pages 616623.
-
Kanako Komiya, Naoto Sato er. Al., Negation Naïve Bayes for Categorization of Product Pages on the Web, Proceedings of recent advances in Natural Language Processing, pages 586-591, Hissar, Bulgaria, 12-14 September 2011
-
Cheng J. Greiner, R, (2001). Learning Bayesian Belief Networks Classifiers: Algorithms and Systems, In Stroulia, E. & Marwin, S.(ed.), AI 2001,, 141-151, LNAI 2056