Selection Of The Best Classifier From Different Datasets Using WEKA

Ranjita Kumari Dash

doi:10.17577/IJERTV2IS3226

Volume 02, Issue 03 (March 2013)

Selection Of The Best Classifier From Different Datasets Using WEKA

DOI : 10.17577/IJERTV2IS3226

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 2,820
Total Downloads : 1058
Authors : Ranjita Kumari Dash
Paper ID : IJERTV2IS3226
Volume & Issue : Volume 02, Issue 03 (March 2013)
Published (First Online): 30-03-2013
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Selection Of The Best Classifier From Different Datasets Using WEKA

Ranjita kumari Dash

Assistant Professor, Institute Of Technical Education and Research, SOA University

Abstract

In todays world large amount of data is available in science, industry, business and many other areas. These data can provide valuable information which can be used by management for making important decisions. By using data mining we can find valuable information. Data mining is the popular topic among researchers. There is a lot of work that cannot be explored till now. But, this paper focuses on the fundamental concept of the Data mining that is Classification Techniques. In this paper, Naive Bays , Functions, Lazy, Meta, Nested dichotomies, Rules and Trees classifiers are used for the classification of data set. The performance of these classifiers analyzed with the help of correctly classified instances, incorrectly classified instances and time taken to build the model and the result can be shown statistical as well as graphically. WEKA data mining tool is used for this purpose. WEKA stands for Waikato Environment for Knowledge Analysis. Three datasets are used on which different classifiers are applied to check which classifier is giving the best result, where different measurements are taken.71 different classifiers are applied on this dataset. The dataset is in ARFF format.10 fold cross validation is used to provide better accuracy. Finally the classification technique which provides the best result will be suggested. The result shows that no single algorithm always performed the best for each dataset.

KEY TERMS

Bays Net, J48, Mean Absolute Error, Naive Bays, Root Mean-Squared Error

Introduction

Data mining is the process of extracting patterns from data [10, 11]. It is seen as an increasingly important tool by modern business to transform data as the technology advances and the need for efficient data analysis is required. Data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data set. It is currently used in a wide range of areas like marketing, surveillance, fraud detection, and scientific discovery etc.

In this paper we process a cancer dataset and use different classification methods to learn from the test data set.

Classification is a basic task in the data analysis that requires the construction of a classifier, that is, a function that assigns a class label to instances described by a set of attributes. It is one of the important applications of data mining. This technique predicts categorical class labels. In this paper, we are giving the comparison of various classification techniques using WEKA. Our aim is to investigate the performance of different classification methods using WEKA. Classification of data is very typical task in data mining. There are large number of classifiers that are used to classify the data such as Bayes, function, lazy learners, Meta, rule based and Decision tree etc. The goal of classification is to correctly predict the value.

For Breast cancer, there is a substantial amount of research with machine learning algorithm [1]. Machine learning covers such a broad range of processes that it is difficult to define precisely [6]. Young women being diagnosed in their teens, twenties and thirties. Even if the percentage is very low compared to that of older women aged 40 years and older [7, 8, 9]. 1% of all diagnosed breast cancers are in men. We report the case of a 34-year-old woman affected by

breast cancer that had metastasized to the bone. Today, about one in eight women over their lifetime have been affected by breast cancer in the United States. In recent years, the incidence rate keeps increasing. However the appropriate methods to predict the breast cancer survival have not been established. In this study, we use those models to evaluate the prediction rate of breast cancer patients from the perspectives of accuracy.
WEKA

WEKA stands for Waikato Environment for Knowledge Analysis. WEKA is created by researchers at the University of Waikato in New Zealand. WEKA was first implemented in its modern form in 1997. The GNU General Public License (GPL) is used here. The figure of WEKA is shown in the figure .The software is written in the Java language and contains a GUI for interacting with data files .For working of WEKA, we do not need the deep knowledge of data mining for which WEKA a very popular data is mining tool. WEKA also provides the graphical user interface of the user and provides many facilities. In this paper, we are giving the comparison of various classification techniques using WEKA. WEKA is a state of-the-art facility for developing machine learning (ML) techniques and their application to real-world data mining problems. The data file normally used by WEKA is in ARFF file format. ARFF stands for Attribute Relation File Format, which consists of special tags to indicate differentiating in the data file. WEKA implements algorithms for data pre-processing, classification, regression and clustering and association rules. It also includes visualization tools. It has a set of panels, each of which can be used to perform a certain task. The new machine learning schemes can also be developed with this package. WEKA is open source software issued under General Public License. The algorithms are applied directly to a dataset. The main features of WEKA includes
- 49 data pre-processing tools
- 76 classification/regression algorithms
- 8 clustering algorithms
- 15 attribute/subset evaluators + 10 search Algorithms for feature selection.
- 3 algorithms for finding association rules
- 3 graphical user interfaces
METHODS

This section describes the classification methods used in this paper. We discuss each method and explain how the method has been used in our experiment. For this Breast cancer dataset we have taken eight methods Bayes, Functions, Lazy, Meta, Misc, Nested dichotomies, Rules and Trees classifiers for the classification of data set.
1. NAIVE BAYES CLASSIFIER
  
  Bayes methods are also used as one of the classification solutions in data mining. In our work we use six main Bayesian methods namely AODE, AODEsr, Naive Bayes, Bayesian net, Naive Bayes simple and Naive Bayes updateable, that are implemented in WEKA software for classification
  
  .Naive Bayes is an extension of Bayes theorem in that it assumes independence of attributes[3]. This assumption is not strictly correct when considering classification based on text extraction from a document as there are relationships between the words that accumulate into concepts. Problems of this kind, called problems of supervised classification, are ubiquotous.Naive Bayes sometimes also called as idiot's Bayes, simple Bayes and independence Bayes.This is important for several reasons.
  
  It is easy to construct without any need for complicated iterative parameter estimation schemes. This means it may be readily applied to huge datasets. It is robust, easy to interpret, and often does surprisingly well though it may not be the best classifier in any particular application.
2. FUNCTION CLASSIFIER
  
  Function classifier uses the concept of neural network and regression. Here two examples from neural network and regression will be taken for discussing the scenario[2]. A multilayer perceptron is a free forward artificial neural netwrk model that maps sets of input data onto a set of appropriate output. It is a modification of the standard linear perceptron in that it uses three or more layers of neurons with nonlinear activation functions and it is more powerful than the perceptron in that it can distinguish data that is not linearly separable or separable by a hyperplane[4].A multilayer perceptron has distinctive characteristics. The model of each neuron in the network includes a non linear activation function. The network contains one or more layers of hidden neurons that are not part of the input or output of the network. These hidden
  
  neurons enable the network to learn complex tasks by extracting progressively more meaningful features from the input patterns. The network exhibits a high degree of connectivity determined by the network. A change in the connectivity of the network requires a change in the population of synaptic connections on their weights[5].
3. RULES CLASSIFIER
  
  Association rules are used to find interesting correction relationship among all the attributes. They may predict more than one conclusion. The number of records an association rule can predict correctly is called coverage. Support is defined as coverage divided by total number of records[5]. Accuracy is the number of records that is predicted correctly expressed as a percentage of all instances that are applied to the methods of this algorithm are Conjunctive Rule, Decision table,DTNB,JRip,NNge,Oner,Rider and Zero. Rules are easier to understand than large trees. One root is created for each path from the root to the leaf. Each attribute value pair along a path forms a conjunction. The leaf holds the class prediction. Rules are mutually exclusive. These are learned one at a time .Each time a rule is learned ,the tuples are covered by the rules are removed.
4. LAZY CLASSIFIER
  
  When making a classification or prediction, lazy learners can be computationally expensive. They require efficient storage techniques and well suited to implementation on parallel hardware. They offer little explanation or insight into the structure of the data. Lazy learners however, naturally support incremental learning. They are able to model complex decision spaces having hyper polygonal shapes that may not be as easily describable by other learning algorithms. The methods of this algorithm are IBI, IBK,K- Star, LBK and LWL.
5. META CLASSIFIER
  
  Meta classifier includes a wide range of classifier. When the attributes have a large number of values because the time and space complexities depend not only on the number of attributes, but also on the number of values for each attribute.
6. DECISSION TREES
  
  Decision tree induction has been studied in details in both areas of pattern recognition and machine learning [13, 14]. This synthesizes the experience gained by people working in the area of machine learning and describes a computer program called ID3.

DISCUSSION AND RESULT

By investigating the performance on the selected classification methods or algorithms namely Bayes ,Function, Lazy ,Meta ,Rules ,Misc ,nested dichotomies and Trees we use the same experiment procedure as suggested by WEKA. The 75% data is used for training and the remaining is for testing purposes.

In WEKA, all data are considered as instances and features in the data are known as attributes. The simulation results are partitioned into several sub items for easier analysis and evaluation. On the first part, correctly and incorrectly classified instances will be partitioned in numeric and percentage value and subsequently time taken to build model will be in second .The results of the simulation are shown in Tables. These are the graphical representation of the simulation result. On the basis of comparison done over accuracy and error rates the classification techniques with highest accuracy are obtained for this dataset in given different machine learning tools. We can clearly see that the highest accuracy is 75.52% and the lowest is 51.74%.In fact, the highest accuracy belongs to the Meta classifier. The total time required to build the model is also a crucial parameter in comparing the classification algorithm. In this experiment, we can say that a single conjunctive rule learner requires the shortest time which is around 0.15 seconds compared to the others.

With the help of figures we are showing the working of various algorithms used in WEKA. We are showing also advantages and disadvantages of each algorithm. Every algorithm has their own importance and we use them on the behaviour of the data. Deep knowledge of algorithms is not required for working in WEKA. This is the main reason WEKA is more

suitable tool for data mining applications. This paper shows only the clustering operations in the WEKA, we will try to make a complete reference paper of WEKA.

Table for best algorithms:-

Name of algorithm	Correctly classified instance	Incorrectl y classified instances	Time taken to build the model
Bayesnet	72.028	27.972	0.03
Simple logistic	75.1748	24.8252	1.44
K-Star	73.4266	26.5734	0
Filtered classifier	75.5245	24.4755	0
Ordinal classifier	75.5245	24.4755	0.01
Misc	69.9301	30.0699	0
Decision Table	73.4266	26.5734	0.5
J48	75.5245	24.4755	0.01

correctly classified instances

76

75

74

73

72

71

70

69

68

67

76

75

74

73

72

71

70

69

68

67

Figure no-1

35

30

25

20

15

10

5

0

incorrectly

classcified instance

35

30

25

20

15

10

5

0

incorrectly

classcified instance

bayesnet

simple logistic

kstar filteredclasscifier oridinalclassclass

edmisc.hyperpipes

decission table

j48

bayesnet

simple logistic

kstar filteredclasscifier oridinalclassclass

edmisc.hyperpipes

decission table

j48

Figure no-2

bayesnet simple logistic

Figure no-3

correctly

classified instances

correctly

classified instances

bayesnet

simple logistic

kstar filteredclasscifier

oridinalclassclasscif edmisc.hyperpipes

decission table

j48

bayesnet

simple logistic

kstar filteredclasscifier

oridinalclassclasscif edmisc.hyperpipes

decission table

j48

4.2 Comparison between LUNG dataset, HEART dataset, DIABETES DATASET

s

Algorith m	Correctly Classified Instances in %	Inco rrec tly Clas sifie d Inst anc es in %	T P R a t e	FP Rat e	Time taken to build mode l in secon ds (s)
Multilaye	100	0	0	0.4	0.2
r			.	36
Perceptr			7
on			5
Multiclas	77.2135	22.7	0	0.3	0.02
	865	.	21
Classifier			7
			7
			2
SPegasos	77.7344	22.2	0	0.3	0.19
		656	.	27
			7
			7
			7

Table no -2(lung dataset,heardataset,diabetes dataset)

MultilayerPer ceptron

Multiclass Classifier

MultilayerPer ceptron

Multiclass Classifier

Figure no-4

Figure no-5

Figure no-6

MultilayerPer ceptron

Multiclass Classifier

MultilayerPer ceptron

Multiclass Classifier

MultilayerPer ceptron

Multiclass Classifier

MultilayerPer ceptron

Multiclass Classifier

MultilayerPerc eptron

MultilayerPerce ptron

Multiclass Classifier

SPegasos

MultilayerPerce ptron

Multiclass Classifier

SPegasos

Figure no-7

Figure no-8

References

D.Lavanya, Dr.K.Usha Rani,.., Analysis of feature selection with classification: Breast cancer datasets,Indian Journal of Computer Science and Engineering (IJCSE),October 2011.
E.Osuna, R.Freund, and F. Girosi, Training support vector machines: Application to face detection. Proceedings of computer vision and pattern recognition, Puerto Rico pp. 130136.1997.
Buntine, Theory refinement on Bayesian networks. In B.

D. DAmbrosio, P. Smets, & P.P. Bonissone (Eds.), In Press of Proceedings

of the Seventh Annual Conference on Uncertainty Artificial Intelligent (pp. 52-60). San Francisco, CA
S. V. Chakravarthy and J. Ghosh (1994), Scale Based Clustering

using Radial Basis Function Networks, In Press of Proceeding of

IEEE International Conference on Neural Networks, Orlando, Florida.

pp. 897-902. 5. M. D. Buhmann (2003), Radial Basis Functions: Theory and Implementations,
Howell, A.J. and Buxton, H. (2002). RBF Network Methods for Face

Detection and Attentional Frames, Neural Processing Letters (15),

Pp.197-2114. Daniel Grossman and Pedro Domingos (2004). Learning Bayesian

Network Classifiers by Maximizing Conditional Likelihood.

In Press

of Proceedings of the 21st International Conference on Machine

Learning, Banff, Canada.
U.S. Cancer Statistics Working Group. United States Cancer Statistics: 19992008 Incidence and Mortality Web- based Report. Atlanta (GA): Department of Health and Human Services, Centres for Disease Control and Prevention, and National Cancer Institute; 2012.
Lyon IAfRoC: World Cancer Report. International Agency for Research on Cancer Press 2003:188-193.
Elattar, Inas. Breast Cancer: Magnitude of the Problem, Egyptian Society of Surgical Oncology Conference, Taba, Sinai, in Egypt (30 March 1 April 2005).
Daniel F. Roses (2005). Clinical Assessment of Breast Cancer and Benign Breast Disease, In: Breast Cancer: Vol. 2, Ch. 14, M. N. Harris [editor], Churchill Livingstone, and Philadelphia.
S. Aruna et al. (2011). Knowledge based analysis of various statistical tools in detecting breast cancer.
J. Han and M. Kamber, (2000) Data Mining: Concepts and Techniques, Morgan Kaufmann.
William H. Wolberg, M.D., W. Nick Street, Ph.D., Dennis M. Heisey, Ph.D., Olvi L. Mangasarian, Ph.D. computerized breast cancer diagnosis and prognosis from fine needle aspirates, Western Surgical Association meeting in Palm Desert, California, November 14, 1994.
Chen, Y., Abraham, A., Yang, B.(2006), Feature Selection and Classification using Flexible Neural Tree. Journal of Neurocomputing 70(1-3): 305313.
K. Golnabi, et al., "Analysis of firewall policy rules using data mining techniques," 2006, pp. 305-315.
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis, In: Wiley-Interscience Publication, New York (1973)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1999).
Vapnik, V.N., The Nature of Statistical Learning Theory, 1st ed., Springer-Verlag, New York, 1995.
Ross Quinlan, (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA.
Cabena, P., Hadjinian, P., Stadler, R., Verhees, J. and Zanasi, A. (1998). Discovering Data Mining: From Concept to Implementation, Upper Saddle River, N.J., Prentice Hall.
E.Osuna, R.Freund, and F. Girosi, Training support vector machines: Application to face detection. Proceedings of computer vision and pattern recognition, Puerto Rico pp. 130136.1997.
Vaibhav Narayan Chunekar, Hemant P. Ambulgekar (2009). Approach of Neural Network to Diagnose Breast Cancer on three different Data Set. 2009
D. Lavanya, Ensemble Decision Tree Classifier for Breast Cancer Data, International Journal of Information Technology Convergence and Services, vol. 2, no. 1, pp. 17- 24, Feb. 2012.
Yoav Freund, Robert E. Schapire, (1999) "Large Margin Classification Using the Perceptron Algorithm." In: Machine Learning, 37(3).
J.D.M.Rennie, L.Shih, J.Teevan, and D.R.Karger, 2003. Tackling the poor assumptions of naive bayes text classification. In ICML2003, pages 616623.
Kanako Komiya, Naoto Sato er. Al., Negation NaÃ¯ve Bayes for Categorization of Product Pages on the Web, Proceedings of recent advances in Natural Language Processing, pages 586-591, Hissar, Bulgaria, 12-14 September 2011
Cheng J. Greiner, R, (2001). Learning Bayesian Belief Networks Classifiers: Algorithms and Systems, In Stroulia, E. & Marwin, S.(ed.), AI 2001,, 141-151, LNAI 2056

Selection Of The Best Classifier From Different Datasets Using WEKA

Abstract

Introduction

WEKA

Figure no-1

Figure no-2

Figure no-3

4.2 Comparison between LUNG dataset, HEART dataset, DIABETES DATASET

Table no -2(lung dataset,heardataset,diabetes dataset)

Figure no-4

Figure no-5

Figure no-6

Figure no-7

Figure no-8

Leave a Reply