A Novel Evolutionary Higher Order Neural Network for pattern Classification

Sibarama Panigrahi; Sreetoma Pandey; Ria Singh

doi:10.17577/IJERTV2IS90958

Volume 02, Issue 09 (September 2013)

A Novel Evolutionary Higher Order Neural Network for pattern Classification

DOI : 10.17577/IJERTV2IS90958

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 83
Total Downloads : 294
Authors : Sibarama Panigrahi, Sreetoma Pandey, Ria Singh
Paper ID : IJERTV2IS90958
Volume & Issue : Volume 02, Issue 09 (September 2013)
Published (First Online): 25-09-2013
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

A Novel Evolutionary Higher Order Neural Network for pattern Classification

Sibarama Panigrahi 1, Sreetoma Pandey 2, Ria Singh 3

Department of Computer Science and Engineering, MITS, Rayagada, Odisha, India

Abstract

Over the last few decades pattern classification, pattern matching and mathematical function approximation were predominately performed using various neural networks (NNs). Compared to traditional NNs, higher order neural networks (HONNs) have several unique characteristics, including: 1) stronger approximation property; 2) faster convergence; 3) greater storage capacity; and 4) higher fault tolerance capability. In this paper, a novel evolutionary HONN especially the Pi-Sigma network (PSN) is used for pattern classification. The PSN is trained using a strictly greedy chemical reaction optimization algorithm. In contrast to basic CRO algorithms, in this CRO algorithm the reactant size (population size) is remained fixed throughout all the iteration, which makes it easier to implement. The performance of proposed methodology for pattern classification is evaluated through three well-known real world classification problems from UCI machine learning data library. The results obtained from the proposed method for classification is compared with results obtained by applying the two most popular variants of differential evolution algorithm (DE/rand/1/bin and DE/best/1/bin). It is observed that the proposed method provides better classification accuracy than that of other methods.

Introduction

Classification is the process of assigning an object into a predefined group or class of objects based on the number of attributes of that object. It is one of the most frequently encountered decision making tasks of human activity. Traditionally, statistical procedures such as discriminant analysis were widely used to perform pattern classification. However, the effectiveness of these methods depends to a large extent on the various assumptions or conditions under which the models are developed and thus, these procedures work well only when the underlying assumptions are satisfied. Users must have a good knowledge of both data properties and model capabilities before the models can be successfully applied. Considering the above pitfalls several classifiers using various data mining and computational intelligence methods have been developed.

Out of various types of classifiers neural network based classifiers are most popular and predominantly found in the literature [1]. Despite of advantageous features of HONN models over traditional NN models, only few papers were found in the literature for pattern classification using HONN models [2-4]. Therefore, in this paper the class of HONNs and in particular Pi- Sigma Networks (PSNs) has been studied. The PSNs were introduced by Shin and Ghosh [4]. The PSNs have addressed several difficult tasks such as zeroing polynomials [5] and polynomial factorization [6] more effectively than traditional feed-forward neural networks (FFNNs). Moreover, PSN employ less number of weights than other HONNs, but still manage to incorporate the capability of first order HONN indirectly. Therefore, in this paper Pi-Sigma Network is considered for pattern classification.

The rest of this paper is organized as follows. Section-2 briefly describes the background related to architecture and mathematical model of PSN; chemical reaction optimization; and differential evolution. The method used for classification using an evolutionary PSN is explained in Section-3. Experimental results are presented in section-4. And finally conclusion are described in Section-5.
Preliminaries
1. Pi-Sigma Network
  
  PiSigma Network (PSN) is a special type of feed forward higher order neural network that calculates the product of sum of the input components and passes it to a nonlinear function. The network architecture of PSN is shown in Figure 1. It consists of a single hidden layer with summing units and an output layer of product units. The weights connecting the input and hidden layer are adapted during the training process, while those connecting the neurons of the hidden layer to the output layer are fixed to one and they are not trainable. Such a network topology with only one layer of trainable weights drastically reduces the training time [2, 7]. Moreover, the product units of PSN gives higher order capabilities which increase its computational power. This is because; the product units enable to expand the input space into higher dimensional space, thus easily separates nonlinearly separable classes to linear separable. Thus, PSN provides nonlinear
  
  decision boundaries offering a better classification capability than the linear neuron (Guler and Sahin, 1994). In addition, Shin and Ghosh (1991) also
  
  suggested that PSNs not only offers better classification
  
  Algorithm 1 (CRO-HONNT)
  
  Set the iteration-counter i=0
  
  /*Randomly Initialize the ReacNum of Reactants from a uniform distribution [U;L]: Pi={R i, R i ,
  
  over a broad class of problems but also requires less
  
  R i., R
  
  i}, with R i ={ W
  
  1
  
  i,.,W
  
  2
  
  i} for
  
  k
  
  k
  
  3 ReacNum
  
  j j,1
  
  j,D
  
  memory and need at least two orders of magnitude less number of computations as compared to MLP for similar performance level.
  
  Consider a PSN with NOIN (number of input neurons), NOHN (number of hidden neurons) and one output neuron. The number of hidden neurons in the hidden layer defines the order of a PSN. For a NOHNth order PSN the number of trainable weights is NOIN Ã— NOHN considering each summing unit is associated with NOIN weights. The output of the PSN is computed by making product of the output of NOHN hidden units and passing it to a nonlinear function, which is defined as follows:
  
  NOHN
  
  Y ( hj )
  
  j1
  
  Where is a nonlinear transfer function and hj is the output of jth hidden unit which is computed by making sum of the products of each input (xi) with the
  
  corresponding weight (wij) between ith input and jth
  
  hidden unit. The output of hidden unit is computed as follows:
  
  NOIN
  
  j=1,2,3….. ReacNum, D=length of each Reactant (NOINÃ—NOHN), Wj, i=kth atom of jth reactant in ith iteration representing a weight of PSN.
  
  for j=1 to ReacNum
  
  Calculate the enthalpy e(Rj)
  
  end of for
  
  While (termination criteria is not satisfied) do
  
  begin
  
  for j=1 to ReacNum
  
  // perform all reaction over the reactants of Pi Get rand1 randomly in an interval [0, 1]
  if rand1 0.5
  
  Get rand2 randomly in an interval [0, 1]
  if rand2 0.5 Decomposition (Rj);
  
  else
  
  Redox1(Rj)
  
  end of if else
  
  Get rand3 randomly in an interval [0, 1]
  if rand3 0.33
  
  Select the best reactant Rk(Rk Rj) Synthesis (Rj, Rk)
  
  hj
  
  (wij xi )
  
  i1
  
  else if rand3 0.66
  
  Select another reactant Rk (Rk Rj) randomly Displacement(Rj, Rk);
  
  else
  
  Select another reactant Rk(Rk Rj) randomly Redox2(Rj, Rk)
  
  end of if end of if
  
  Apply strictly greedy Reversible Reaction for increased enthalpy to update reactants
  
  end of for
  
  Figure 1: Architecture of a Typical Pi-Sigma Network
Proposed Method

The efficiency of any supervised neural network depends on the algorithm used for its training. The objective of any supervised Pi-Sigma network training is to minimize the error between the approximation by the HONN and the target output. For this the optimal weight set o a HONN must be obtained. The optimal weight set of a HONN can be obtained by using either gradient or evolutionary learning algorithms. The objective function of HONN training is going to be a multimodal search problem, since it depends on number of parameters.

Set the iteration counter i=i+1

end of while

Use the reactant having best enthalpy as the optimal weight set of PSN and perform classification.

Therefore, the gradient based training algorithms often suffer from several shortcomings, including: 1) easily getting trapped to local minima; 2) have slow convergence properties; 3) training performance is sensitive to initial values of its parameters. Due to these disadvantages, research on different optimization techniques that are dedicated to HONN training is still required. There are many optimization techniques such as differential evolution (DE) [8], genetic algorithm (GA) [9], particle swarm optimization (PSO) [10], ant

colony optimization (ACO) [11], a bee colony optimization (BCO) [12], an evolutionary strategy (ES) [13], quantum inspired algorithms (QEA) [14], chemical reaction optimization (CRO) [15-17] etc. can be used for HONN training. Chemical reaction optimization (CRO) is a new optimization technique, inspired by the nature of chemical reactions. CRO has demonstrated excellent performance in solving many engineering problems such as the mining classification rules [18], quadratic assignment problem [15] etc. Therefore, in this paper a novel strictly greedy CRO algorithm has been used to train PSN and further used for classification. The algorithm is described as follows.

The proposed training algorithm operates in three phases: initialization phase, iteration phase and final phase. The initial phase assigns the value to initial parameters like termination criterion, length of reactants/molecules, ReacNum and generates initial reactants. The iteration phase simulates the reaction processes. The reactions may be monomolecular or bimolecular. For monomolecular reaction, Decomposition and Redox1 reactions are considered; and for bimolecular reactions three types of reactions such as: Synthesis, Displacement and Redox2 are considered. The reaction types are chosen considering both intensification and diversification. Moreover, a strictly greedy reversible reaction is used to update the reactants. All the reactions have been elaborated in the following subsequent subsections. In final phase the reactant having best enthalpy is used as the optimal solution (i.e. optimal weight set of a PSN). The pseudo- code of the proposed method is explained in Algorithm 1.
Experimental Setup and Simulation Results

The simulations in this paper were carried out on a system with Intel Â® core(TM) 2Duo E7500 CPU, 2.93 GHz with 2GB RAM and implemented using

occurs).

t+1=2.3(t)

2sin( t)

SCILAB5.4.1. All ANNs are trained using proposed CRO, DE/rand/1/bin and DE/best/1/bin with population

Output: Two reactants R1 and R2
Every monomolecular or bimolecular reaction is

Where NOP is number of test patterns (NOP/2); Ci- the

coefficient representing the correctness of the classification of the ith testing pattern which is determined as follows:

followed by a strictly greed reversible reaction to update the reactants. In the strictly greedy reversible reaction, for a monomolecular reaction the product produced replaces the worst reactant of the reactant set

for better enthalpy; and for a bimolecular reaction if

1,

C

C

i 1,

0,

when Yi 1and Ti 1 when Yi 1and Ti 1 Otherwise

two products are produced, best product replace the worst reactant for better enthalpy. Thus the number of reactants of the population remains same throughout the reaction process. The pseudo-code of the strictly greedy reversible reaction is elaborated in algorithm 7.

Where Yi and Ti are the output of PSN and target for ith test pattern.

4.1. Data Sets and Simulation Results

For experimental analysis three binary classification problems (Sonar, Breast Cancer Wisconsin, Habermans Survival) are considered from the well known UCI machine learning data library.

For the Sonar problem the task is to train a PSN to distinguish between sonar signals bounced off a metal cylinder (mine) and those bounced off a roughly

cylindrical rock. In this experiment the dataset contains 208 samples obtained by bouncing sonar signals off a metal cylinder and a rock at various angles and under various conditions. There exist 111 samples obtained from mines and 97 samples obtained from rocks. Each pattern consists of 60 real numbers in the range [0.0, 1.0]. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The trained PSNs have one unit in the middle layer with 6011 architecture. For comparative performance analysis, the results obtained from 100 independent simulations using the proposed method (using proposed strictly greedy CRO algorithm), DE/rand/1/bin and DE/best/1/bin methods are shown in Table 1 where mean, St.Dev, Min and Max represents the mean, standard deviation, minimum and maximum classification accuracy of the 100 independent simulations.

Table 1 Classification Accuracy (%) for Sonar problem.

Method

Mean

St.Dev.

Min

Max

Proposed Method

77.88

3.64

56.67

84.13

DE/rand

73.81

4.24

52.88

81.73

DE/best

73.35

4.34

53.36

80.77

For the breast cancer wisconsin problem the task is to train a PSN to discriminate between benign and malignant based on ten attributes. In the original dataset 2 represents benign (367 patterns) and 4 represents malignant (332 patterns) which are converted to -1 and 1 respectively for our simulation. 100 independent simulations were carried out and the mean, standard deviation, min and max values of the obtained results are shown in Table 2. Note that PSNs with 10-1-1 architectures are trained with 50% of total samples and tested with other 50%.

Table 2 Classification Accuracy (%) for breast cancer wisconsin problem.

Method

Mean

St.Dev.

Min

Max

Proposed Method

75.15

6.75

52.44

85.67

DE/rand

71.46

8.06

52.15

83.38

DE/best

73.59

7.94

49.86

84.53

Havermans survival dataset contains cases from study conducted on the survival of patients who had undergone surgery for breast cancer. For the Habermans survival problem the task is to train a PSN to predict whether the patient will survive more than 5 years or not based on three attributes. The dataset contains 306 patterns with three attributes each and a

class level -1 for less than 5 years and 1 for more than five years of survival. 100 independent simulations were carried out and the mean, standard deviation, min and max values of the obtained results are shown in Table 3. Note that PSNs with 3-1-1 architectures are trained with 50% of total samples and tested with other 50%.

Table 3 Classification Accuracy (%) for Havermans survival problem.

Method

Mean

St.Dev.

Min

Max

Proposed Method

70.66

4.51

43.14

73.20

DE/rand

68.99

9.71

30.72

72.55

DE/best

68.25

9.36

28.76

73.20

It can be observed from the results that the proposed methodology provides better classification accuracy for the three problems considered.
Conclusion

In this paper, we have studied evolutionary HONN models especially; the PiSigma network for pattern classification. A novel strictly greedy chemical reaction optimization algorithm is developed for its training. For comparative performance analysis of the proposed method three real world binary classification problems are considered. It is found that the strictly greedy CRO algorithm along with PSN provides better classification accuracy than the two most popular variants of differential evolution algorithm i.e. DE/rand/1/bin and DE/best/1/bin. In future more efficient training algorithm for PSN can be developed to improve the classification accuracy.
References

G. P. Zhang, Neural Networks for Classification: A Survey, IEEE Transaction on Systems, Man, and Cybernetics- Part C: Applications and Reviews, vol. 30, no. 3, pp. 451-462, 2000.
Y. Shin and J. Ghosh, Efficient higher-order neural networks for classification and function approximation, In: International Journal on Neural Systems, vol. 3, pp.323350, 1992.
M. G. Epitropakis, V. P. Plagianakos, M. N. Vrahatis, Hardware-friendly Higher-Order Neural Network Training using Distributed Evolutionary Algorithms, Applied Soft Computing, vol. 10, pp. 398-408, 2010.
Y. Shin, J. Ghosh, The pisigma network: An efficient higher-order neural network for pattern classification and function approximation, International Joint Conference on Neural Networks, 1991.
D. S. Huang, H. H. S. Ip, K. C. K. Law and Z. Chi, Zeroing polynomials using modified constrained neural network approach, IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 721732, 2005.
S. Perantonis, N. Ampazis, S. Varoufakis and G. Antoniou, Constrained learning in neural networks: Application to stable factorization of 2-d polynomials, Neural Processing Letter, vol.7, no. 1, pp. 514, 1998.
Y. Shin and J. Ghosh, Realization of Boolean functions using binary pi-sigma networks, in: C. H. Dagli, S. R. T. Kumara, Y. C. Shin (Eds.), Intelligent Engineering Systems through Artificial Neural Networks, ASME Press, pp. 205 210, 1991.
R. Storn and K.Price, Differential evolution- A simple and efficient heuristic for global optimization over continuous spaces, Journal of Global Optimization, vol. 11, no.4, pp. 341-359, 1997.
D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA:Addison- Wesley, 1989.
J. Kennedy, R. C.Eberhart and Y.Shi, Swarm intelligence, San Francisco, CA:Morgan Kaufmann, 2001.
K. Socha and M. Doringo, Ant colony optimization for continuous domains, Europian Journal of Operation Research, vol. 185, no. 3, pp. 1155-1173, 2008.
D. T. Pham, A. Ghanbarzadeh, E. Koc, S. Otri, S. Rahim and M. Zaidi The bees algorithm- A novel tool for complex optimization problems, in IPROMS Oxford, U.K.: Elsevier, 2006.
H.G. Beyer and H.P. Schwefel, Evolutionary Strategies: A Comprehensive introduction, Nat. Comput., vol. 1, no. 1, pp. 3-52, 2002.
K. H. Han and J.H. Kim, Quantum-inspired evolutionary algorithm for a class of combinatorial optimization, IEEE Transactions on Evolutionary Computation, vol. 6, pp. 580593,2002.
A. Y. S. Lam and V. O. K. Li, Chemical-Reaction- inspired metaheuristic for optimization, IEEE Transactionson on Evolutionary Computation, vol. 14, no.3, pp. 381399, 2010.
A.Y.S. Lam, Real-Coded Chemical Reaction Optimization, IEEE Transaction on Evolutionary Computation, vol. 16, no. 3,pp. 339-353, 2012.
B. Alatas, ACROA: Articial Chemical Reaction Optimization Algorithm for global optimization, Expert Systems with Applications, vol. 38, pp. 1317013180, 2011.
B. Altas, A novel chemistry based metaheuristic optimization method for mining of classification rules, Expert Systems with Applications, vol. 39, pp. 11080-11088, 2012.
T. K. Truong, K. Li and Y. Xu, Chemical reaction optimization with greedy strategy for the 01 knapsack problem, Applied Soft Computing, vol. 13, pp. 1774-1880, 2013.
J.J.Q. Yu, A.Y.S. Lam and V.O.K. Li, Evolutionary Artificial Neural Network based on chemical reaction optimization, in: IEEE Congress on Evolutionary Computation (CEC), pp. 20832090, 2011.

Volume 02, Issue 09 (September 2013)

A Novel Evolutionary Higher Order Neural Network for pattern Classification

A Novel Evolutionary Higher Order Neural Network for pattern Classification

Algorithm 1 (CRO-HONNT)

for j=1 to ReacNum

end of for

While (termination criteria is not satisfied) do

for j=1 to ReacNum

else

end of if else

else

end of if end of if

end of for

end of while

Reactant Encoding

Enthalpy of A Reactant

Chemical Reactions

Monomolecular reactions

Bimolecular reactions

For Monomolecular Reactions

end of if

For Bimolecular Reactions

end of if

Reactant update

Table 1 Classification Accuracy (%) for Sonar problem.

Table 2 Classification Accuracy (%) for breast cancer wisconsin problem.

Table 3 Classification Accuracy (%) for Havermans survival problem.

Leave a Reply

Method	Mean	St.Dev.	Min	Max
Proposed Method	77.88	3.64	56.67	84.13
DE/rand	73.81	4.24	52.88	81.73
DE/best	73.35	4.34	53.36	80.77

Method	Mean	St.Dev.	Min	Max
Proposed Method	75.15	6.75	52.44	85.67
DE/rand	71.46	8.06	52.15	83.38
DE/best	73.59	7.94	49.86	84.53

Method	Mean	St.Dev.	Min	Max
Proposed Method	70.66	4.51	43.14	73.20
DE/rand	68.99	9.71	30.72	72.55
DE/best	68.25	9.36	28.76	73.20