A Comparative Analysis of K-NN and ANN Techniques in Machine Learning

Igiri Chinwe Peace; Anyama Oscar Uzoma; Silas Abbasiama Ita; Sam Iibi

doi:10.17577/IJERTV4IS030625

Volume 04, Issue 03 (March 2015)

A Comparative Analysis of K-NN and ANN Techniques in Machine Learning

DOI : 10.17577/IJERTV4IS030625

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 89
Total Downloads : 325
Authors : Igiri Chinwe Peace, Anyama Oscar Uzoma, Silas Abbasiama Ita, Sam Iibi
Paper ID : IJERTV4IS030625
Volume & Issue : Volume 04, Issue 03 (March 2015)
DOI : http://dx.doi.org/10.17577/IJERTV4IS030625
Published (First Online): 20-03-2015
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

A Comparative Analysis of K-NN and ANN Techniques in Machine Learning

Igiri Chinwe Peace Department of Computer Science University of Port Harcourt Nigeria

Silas Abbasiama Ita Department of Computer Science

University of Port Harcourt Nigeria

Anyama Oscar Uzoma Department of Computer Science University of Port Harcourt Nigeria

Sam Iibi

Department of Computer Science University of Port Harcourt Nigeria

Abstract Different machine learning algorithms have been applied in various domains and have yielded good results. The application of a preferred technique to a named field is determined by the type of datasets and target goal in question. Although some researches have shown different techniques resulting to the same prediction result. However, in this study, a critical analysis of the application of k- Nearest Neighbour (k- NN) and Artificial Neural Network (ANN) has been carried out. This comparative analysis was done using the same datasets (English Premiership League) on this same platform (Rapid Miner). K-NN classification showed a prediction success of 53.33% while that of ANN was 70%. This proved that ANN is a better technique than k-NN for a polynomial label.

KeywordsANN; K-NN; Machine Learning; Prediction

INTRODUCTION

The use of machine learning techniques in modelling and predictive analysis cannot be overstated. The application of these techniques in various domains has resulted in a better world. Today we can forecast weather, predict games outcome, perform diagnosis and improve various distance e- learning methods using Machine Learning techniques.
The role and adjustments of these strategies provides for good classification and prediction result hence the application in these areas can never be underscored.
1. What is a Machine Learning?
  
  The term Machine Learning can be referred to as a scientific and systematic domain that critically explores the creation, study and overall application of a broad-based algorithms that can learn from datasets or model datasets. These algorithms operate by simulating a model from inputs datasets called example sets or test sets, then using the model results to further forecast, make predictions or varying forms of decisions in different application domains.
  
  Machine Learning helps in eliminating the static, fixed and strict approach of a well-structured programming which usually provides for either poor optimization, non-efficient use of memory space and time-based factors.
  
  In the field of Computer Science, the definition and role of machine learning overlaps with fields of mathematics, computational statistics and Artificial Intelligence. These overlap creates for a more thorough and rigorous application of the Machine learning. Machine learning can be divided into three forms of learning.
  - Supervised Learning
    
    In this form of learning, the computer is presented with test inputs and their possible anticipated outputs, given by a "trainer", and the underlying goal is to learn a general rule that maps inputs to outputs.
  - Unsupervised learning
    
    In unsupervised learning, there are no labels given to the learning algorithm or the trainer, leaving it on its own to find structure in its input in an unstructured manner. Unsupervised learning can be a goal in itself in terms of ascertaining hidden patterns in data or a possible means towards an end, which is in the prediction proper.
  - Reinforcement learning
    
    In reinforcement learning, a computer suite is designed to interact with an active environment outside its environment in which it must perform a certain without a trainer explicitly telling it whether it has come very close to its goal or not. Various application of reinforcement learning exists such as in learning how to drive, e-learning, team strategy to study an opponent playing pattern used by the managers in decision support systems.
2. Neural Network Learning
Learning is a very important module to every intelligent system. Looking at artificial neural network, learning typically happens during a precise training/classification phase. Once the neural network has been trained, it goes into a phase called the production phase where it produces results independently. In This phase, training can take on diverse forms, using a mixture of learning archetypes, learning guidelines, and learning algorithms. A network which has discrete learning and production phases is referred to as a static network. Networks that are able to continue learning during production phase are known as dynamical systems.
RELATED LITERATURE

K- Nearest Neighbor (K-NN) and Artificial Neural Network (ANN) are both machine learning techniques that have being used to implement prediction or forecasting system as the case maybe. However, the choice of technique depends on the aim and objectives of the project in view. According to [1], K-NN can be a classification or regression model depending on the input data. The k-nearest-neighbor classifier is commonly based on the Euclidean distance between a test sample and the specified training samples, [2]. [3] pointed out that one of the difficulties that arises when utilizing this technique is that each of the labeled samples is given equal importance in deciding the class memberships of the pattern to be classified, regardless of their `typicalness'.

The application of MultiClass Classifier, Logitboost, Rotation Forest, BayesNet, and Naive Bayes techniques to Home and Away win as football dataset by [4] yielded 55% prediction accuracy. The highlights in [4] study is that the four different techniques yielded the same result on the same data set. He recommended that higher prediction could be obtained with more relevant dataset. Based on [4] recommendations, [5] developed improved system by applying ANN on eighteen

(18) datasets among which are match streak, players performance index, managers performance index etc. [5]s system yielded 70% prediction accuracy when applied to the 10th and 11th week of 2014/2015 season of English premier league match.

Hidden Markov Process Model fitted with Newton Raphsons method was applied to train World Chess Federation rating systems with 2000 chess players by [6]. A prediction of 55.64% was obtained. [6], however was not satisfied with result, hence, suggested that a different technique could improve the result. Although, other researches have proved [6]s recommendation right. [7], for instance applied K-NN on National Football League data with 80% prediction success. An improved prediction of 90.32% was obtained

when ANN techniques was applied to the same dataset in [7] by [8].

A research by [9], showed the application of Baysian Network Model on psychological and non- psychological factors that affect Barcelona team in the 2008-2009 Spanish League. The success of the application of Bayesian Network model to the thirteen features was 92%. The detailed streamlined research focused on just Barcelona team yielded excellent prediction, proving that Bayesian Network is the appropriate model for the dataset used.
EXISTING SYSTEM

The existing system is the research work done by [5]. In their research work, an Improved Prediction System for Football a Match Result was developed using a straightforward application logistic regression and Artificial Neural network modeling in the prediction of English Premier League results of 2014/2015 season as shown in fig.1.

The following feature sets were used:
- number of goals
- attack and defense strength,
- players performance index
- managers index
- match streak, etc.
  
  Result shows a prediction accuracy of 85 percent.
  
  Fig. 1: Diagram showing results of existing system
- Advantages of existing system Good prediction accuracy Unique features
  
  Large datasets
- Disadvantages of existing system Limited capabilities
Cost intensive
COMPARATIVE ANALYSIS OF K-NN AND ANN TECHNIQUES
1. k-NN technique
  
  The k-Nearest Neighbor algorithm is based on learning by analogy, that is, by comparing a given test example with training examples that are similar to it. The training examples are described by n attributes. Each example represents a point
  
  in an n-dimensional space. In this way, all of the training examples are stored in an n-dimensional pattern space. When given an unknown example, a k-nearest neighbor algorithm searches the pattern space for the k training examples that are closest to the unknown example. These k training examples are the k "nearest neighbors" of the unknown example. "Closeness" is defined in terms of a distance metric, such as the Euclidean distance.
  
  The k-nearest neighbor algorithm is amongst the simplest of all machine learning algorithms: an example is classified by a majority vote of its neighbors, with the example being assigned to the class most common amongst its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the example is simply assigned to the class of its nearest neighbour. The same method can be used for regression, by simply assigning the label value for the example to be the average of the values of its k nearest neighbors. It can be useful to weight the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones.
  
  The neighbors are taken from a set of examples for which the correct classification (or, in the case of regression, the value of the label) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.
  
  The basic k-Nearest Neighbor algorithm is composed of two steps: Find the k training examples that are closest to the unseen example. Take the most commonly occurring classification for these k examples (or, in the case of regression, take the average of these k label values).
  
  Fig 2: Illustration of the kNN search problem for k = 4
  
  As seen in fig. 2, the white points correspond to the reference points and the black points correspond to the query point. The circle gives the distance between the query point and the closest reference point.
  
  The basic k-Nearest Neighbor algorithm is composed of two steps: 1. the algorithm found the k training sports data that are closest to the unseen dataset. 2. Then took the most commonly occurring classification for these k datasets. The k value found in this research is 5. This was verified by
  
  adjusting the k-value between 1 and 15 to obtain the best possible prediction of 53.33%.
  1. Input
    
    Sports data as training set
    
    This input was transformed by normalization and fed to the k-NN model process for training to find the k- value.
  2. Output
    
    Model
    
    The k-Nearest neighbour was delivered from the output port which was applied on the prediction data sets to generate the label attributes.
  3. Parameters
    
    K = 5
    
    Measure type = mix measure
    
    Mix measure = Mixed Euclidean Distance Training set examples = 80
    
    Prediction data sets = 30 Number of attributes = 9 Number of class label = 3 Prediction = 53.33% see fig.3
    
    Fig. 3: Screenshot of simulation result of k-NN prediction result
2. ANN (Artificial Neural Network)
  
  ANN (Artificial Neural Network) can be referred to as both the natural and artificial alternatives, though classically this term is used to refer to artificial and external systems only. Mathematically, neural nets are regarded as nonlinear objects with each layer representing non-linear combination/variations of non-linear functions from the prior layers. Each neuron in the network is a multiple-input, multiple-output (MIMO) scheme that receives pointers from the inputs, produces a subsequent signal, and communicates that signal to all possible outputs.
  
  ANN has algorithms in different forms to help address the problem domains that Artificial Neural Network (ANN) solves. Some of the algorithms include:
  1. Back propagation with single modified neuron
  2. Back propagation with linear neuron
  3. Matrix approach
  4. Gradient Descent
    
    The back propagation algorithm has been the most popular approach for neural networks training/classification due to its flexibility and robustness. This method has been used to solve used to solve various real life problems.
    
    The network is a multiple-input, multiple-output (MIMO) scheme that receives pointers from the inputs, produces a subsequent signal, and communicates that signal to all possible outputs.
    
    Basically, neurons in an Artificial Neural Network (ANN) are arranged into different discrete layers. The first and topmost layer is the one that interacts with the surroundings to receive various combinations of possible input is known as the input layer.
    
    The last and final layer that interacts with the output to present the final processed data is known as the output layer.
    
    Fig. 4: Diagram showing Artificial Neural Network
    
    While the layers that are between the input and the output layer that do not have any real communication with the environment are known as hidden layers. Hence increasing the complexity of an Artificial Neural Network (ANN), and its computational ability, requires the additions of a lot of more hidden layers and neurons per layer, as seen in fig. 4.
  5. Input
    
    Sports data as training set
    
    The input port was fed with transformed sports data in neural net model process.
  6. Output
Model

The Neural Net model was delivered from the model output port which was applied on unseen data sets for prediction of the label attribute.

Class 'WIN' (Sigmoid) Node 1: 1.535

Node 2: -0.250

Node 3: -0.052

Node 4: -0.466

Node 5: -0.767

Node 6: -1.914

Node 7: -0.240

Node 8: 1.115

Node 9: 0.469

Node 10: -0.402

Node 11: -0.207

Node 12: 1.784

Node 13: -0.146

Node 14: -0.152

Node 15: -0.366

Node 16: 2.435

Node 17: -1.110

Node 18: -0.615

Node 19: -0.204

Node 20: -0.970

Node 21: -0.516

Node 22: -0.225

Node 23: 0.870

Node 24: -0.241

Node 25: -0.999

Node 26: -0.492

Node 27: 1.680

Node 28: 0.036

Node 29: -0.644

Node 30: -0.436

Node 31: -0.483

Node 32: 0.247

Node 33: -0.643

Threshold: -2.179

Class 'DRAW' (Sigmoid) Node 1: -2.108

Node 2: -0.140

Node 3: -1.121

Node 4: -2.322

Node 5: 2.170

Node 6: 2.250

Node 7: -0.11

Node 8: -1.357

Node 9: 2.193

Node 10: -0.297

Node 11: -0.145

Node 12: -2.628

Node 13: -0.196

Node 14: -1.305

Node 15: 0.307

Node 16: -2.856

Node 17: -0.062

Node 18: -2.079

Node 19: 0.592

Node 20: -1.482

Node 21: -0.047

Node 22: -1.598

Node 23: -1.320

Node 24: -0.049

Node 25: -0.934

Node 26: 0.077

Node 27: -2.659

Node 28: -0.310

Node 29: 0.889

Node 30: 0.304

Node 31: 0.845

Node 32: 0.634

Node 33: -1.803

Threshold: 1.052

Class 'LOSS' (Sigmoid) Node 1: -1.505

Node 2: 0.130

Node 3: 0.759

Node 4: 1.830

Node 5: -2.479

Node 6: -0.821

Node 7: -0.063

Node 8: -1.572

Node 9: -3.037

Node 10: 0.469

Node 11: -0.038

Node 12: -1.223

Node 13: 0.063

Node 14: 1.190

Node 15: -0.186

Node 16: -1.636

Node 17: 0.848

Node 18: 1.930

Node 19: -1.176

Node 20: 1.778

Node 21: 0.178

Node 22: 1.441

Node 23: -1.176

Node 24: -0.101

Node 25: 1.337

Node 26: -0.036

Node 27: -0.980

Node 28: -0.197

Node 29: -0.458

Node 30: -0.253

Node 31: -0.886

Node 32: -1.857

Node 33: 1.711

Threshold: -0.125 Parameters

Hidden layers = 33 nodes Training cycle = 500 Learning rate = 0.9 Momentum = 0.2

Prediction accuracy = 70% see fig. 6 Number of label = 3

Fig.5: Neural network prediction accuracy
RESULT DISCUSSION

Looking at the total games that were predicted, a prediction accuracy of 53.3 percent was obtained using Euclidean distance of 5 as shown in fig. 2. The predicted result of 9th 11th week of 2014/2015 EPL season is shown in fig. 3. Most of the incorrectly predicted games by the K-NN (see fig. 6) were predicted correctly by ANN as shown in fig. 7, this confirm the performance of the ANN over the KNN with respect to the given datasets. The actual result of predicted result of 30 matches played in 9th -11th week of 2014/2015 EPL season is shown in fig 6. Also, some of the games incorrectly predicted by the ANN can be referred to as upsets.

Fig. 6: Screenshots of 30 predicted sports result using k-NN

Fig. 7: Screenshots of 30 predicted sports result using ANN

Table 1: Summary of both techniques using sports data.

KNN

ANN

PRED

ACCURACY

53.3 PERCENT

70 PERCENT

OPTIMIZATION

LEARNING

RATE = 0.9

K=5

DATASETS

ENGLISH

LEAGUE

ENGLISH

LEAGUE

EXECUTION

TIME

2 SECS

10 SECS
CONCLUSION

This comparative analysis has shown the effectiveness of using machine learning techniques (K-NN and ANN) in the development of models used in sports prediction. At the end of the experiment, ANN yielded to be a better technique using same datasets and same platform. This has further shown the effectiveness of the use data mining techniques in sports mining.

A) Research Highlights

The research highlights of this paper are:

This paper compares the effect of K-Nearest Neighbour and Artificial Neural Network (ANN) on sports data.
The approach uses K-Nearest Neighbour for implementation.
The results shows prediction accuracies of 53.3 percent for the K-NN and 70% percent for the ANN respectively.

REFERENCES

Akhtar, F. and Hahne, C. Rapid Miner 5 Operator Reference. Rapid-I GmbH, 2012, Retrieved 13:15, February 13, 2015 from: http://rapidminer.com/wpcontent/uploads/2013/10/RapidMin er_OperatorReference_en.pdf
Peterson, Leif E. "K-nearest neighbor." Scholarpedia 4(2), 1883, 2009.
Keller, J. M., Gray, M.R. and Givens, J.A.A fuzzy K-nearest neighbor algorithm. Systems, Man and Cybernetics, IEEE Transactions 15(4), 580 585, 1985.
Buursma, D. Predicting sports event from past result: Towards effective betting on football matches. Preceding 14th Twente Student Conference on IT. University of Twente, Faculty Electrical Engineering, Mathematics and Computer Science, Netherlands. Conference paper 7226, 2011.
Igiri, C. P. and Nwachukwu, E. O. An Improved Prediction System for Football a Match Result. IOSR Journal of Engineering (IOSRJEN. 04(1), 12-20, 2014.
Zheyuan F., Yuming K., Xiaolin, L. Chess game results prediction system. Stanford University CS Machine Learning Project Report, 2013, Retrieved from:http://cs229.stanford.edu/proj2013/FanKuangLinChess GameResultPredictionSystem.pdf.
Anyama O. U. and Nwachukwu E. O. A Hybrid Prediction System for American NFL Results. International Journal of Computer Applications Technology and Research (IJCATR), 4(1) 42-47, 2015.
Anyama, O. U. and Igiri, C. P. An Application of Linear Regression & Artificial Neural Network Model in the NFL Result Prediction. International Journal of Engineering Research & Technology (IJERT), 4(1), 457-461, 2015.
Farzin, O., Parinaz, E., and Faezeh, S. M. Football result prediction with Bayesian network in Spanish league- Barcelona team. International Journal of Computer Theory and Engineering, 5(5), 812-815, 2013.

	KNN	ANN
PRED ACCURACY	53.3 PERCENT	70 PERCENT
OPTIMIZATION	LEARNING RATE = 0.9	K=5
DATASETS	ENGLISH LEAGUE	ENGLISH LEAGUE
EXECUTION TIME	2 SECS	10 SECS

A Comparative Analysis of K-NN and ANN Techniques in Machine Learning

Leave a Reply