Deployment of Neural Network on Multi-Core Architecture

Jigisha Gandhi; Shitanshu Parekh

doi:10.17577/IJERTV1IS3076

Volume 01, Issue 03 (May 2012)

Deployment of Neural Network on Multi-Core Architecture

DOI : 10.17577/IJERTV1IS3076

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 107
Total Downloads : 873
Authors : Jigisha Gandhi, Shitanshu Parekh
Paper ID : IJERTV1IS3076
Volume & Issue : Volume 01, Issue 03 (May 2012)
Published (First Online): 30-05-2012
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Deployment of Neural Network on Multi-Core Architecture

International Journal of Engineering Research & Technology (IJERT)

ISSN: 2278-0181

Vol. 1 Issue 3, May – 2012

Jigisha Gandhi1

Assistant Professor, Information Technology Department

Sarvajanik College of Engineering and

Technology, Surat, India

Shitanshu Parekp

Lecturer, MCA Department, Sarvajanik College of Engineering and Technology, Surat, India

Abstract

Traditional computational methods are highly structured and linear, properties which they drive from the digital nature of computers. These methods are highly effective at solving certain classes of problems: physics simulations, mathematical models, or the analysis of proteins. Classical computational methods are not effective at solving other problems, such as pattern recognition, adaptive learning, and spam filtering. Some biological systems, however, excel at the latter class of problems. For example, the human mind can quick ly identify a face, even if it has changed heavily from the last time it was seen, while traditional computational systems are unable to accomplish facial recognition efficiently and accurately even if minor facial or environmental alterations occur. Attempts to create facsimiles of these biological systems electronically have resulted in the creation of artificial neural network s.

Similar to theirbiological counterparts, artificial neural network s are massively parallel systems capable of learning and mak ing generalizations. The inherent parallelism in the network allows for a distributed software implementation of the artificial neural network , causing the network to learn and operate in parallel, theoretically resulting in a performance improvement. This project will address a parallel neural network implementation, the network s relative strengths and weaknesses, and conclude by comparing the performance using different Intel tools.

Introduction

In recent years there has been a great rising of interest in a method of computing that was originally investigated in the 1940s. This method is modelled

generally after bio logical nervous system and is called neural networks (NN), art ificia l neural networks (ANN), para lle l distributed processing (PDP) and perhaps others.

A paralle l imp le mentation of neural co mputations is a possible solution for me mory and time consuming neural network applications (for instance real-time data processing). The two main ideas are to distribute the patterns that are used for train ing or to distribute the computation performed by the neural network. Pattern partitioning schemes require large pattern sets. Network partitioning schemes require la rge neural networks. Due mostly to their learn ing capability, artificia l neural networks are increasingly recognized in academic and engineering communities as powerful tools for co mple x p roble m solving tasks. Unfortunately, their use in time-crit ical applicat ions often demands high performance, and therefore high cost hardware systems.

Obtaining optima l solution for engineering design problem is often e xpensive because the process typically requires numerous iteration involving analysis and optimization progra ms. Many researchers have shown that optimu m solution can be obtained in less time by simulat ing a slow, expensive analysis with a fast inexpensive Artificia l Neu ral Network fro m a process perspective. And on a hardware point of view this has led to two major directions the accelerations of execution speed of microprocessor and the parallel application of more than one processor to the problem solution. The ma jor reason of selecting ANN for para llel programming is its own basic paralle l topology, which is easily viable to parallel processing. The proposed approach explores the parallelis m in ANN on Decomposition of network, we ight initia lization, instance presentation, calculation of activation in a

MC (Multi-Core) environ ment for better performance.

Neural co mputation means organizing processing into a number of processing elements that are massively interconnected and that exchange signals. Processing within ele ments usually involves adding weighted input values, applying a (non-) linear function to the input sum, and forwa rding the result to other ele ments. Since the basic principle of neurocomputation is learning by e xa mple , such processing must be repeated again and again, with we ights being changed until a network lea rns the problem. As matrix-vector operations are at the core of many neuroalgorithms, processing is often organized in such a way as to ensure their effic ient imp le mentation.

In this research we have developed some of the neural network models with the help of OpenMP and C++ language. The evaluation and results are compared using diffe rent intel tools intelVtune performance analyser, intel thread checker, intel thread profile r.

Here the main concentration can be on how the object-oriented programming style can be used in the context of OpenMP and hoe to exploit C++ language features to improve scalability. The beauty of OpenMP is that it provides an abstract model. Users can develop OpenMP program on any piece of hardware with OpenMP compliant compiler and then run it on any paralle l system. Possibly users need to recompile if we change architecture.
Deployment of Neural Network on Multi-core Architecture

To achieve the objective here we are using parallel programming concept which is imple mented by OpenMP programming. We have chosen three of the neural network models which are mostly used to give solutions to comple x proble ms in d igital communicat ions due to their nonlinear processing, paralle l distributed architecture, and self- organization, capacity of learn ing and generalization, and effic ient hardware imple mentation. These are single layer Feed- Forwa rd Pe rceptron, NN with Back-propogation algorith m and SOM (Self-Organizing Map). We are gathering the statistical data for each mod el with the help of d ifferent intel tools. These data helps us to compare the performance of each model.

With the help of parallel progra mming a problem can be solved in a reasonable time; situations arise when the same proble m has to be evaluated multip le times with diffe rent input

values. This situation is especially applicable to paralle l co mputers, since without any alteration to the program, multip le instances of the same program can be e xecuted on different processors/computers simu ltaneously.

OpenMP programming is helping in the following way, at run-time; the applicat ion will go paralle l at the point where the OpenMP part comes. The threads are created and the work is distributed over the threads. In this case work means the various loop iterations. Each thread will get assigned a chunk out of the total number of iterations that need to be executed. At the end of the loop, the thread synchronizes and one thread (the s o-called master thread) resumes e xecution.

In the proposed methodology we are comparing the performance of Neural Network using OpenMP with sequential programming on dual core architecture. It also ensures parallelis m of ANN on Multi-Core (M C) environment in the following levels of imple mentation of ANN:
1. First level para lle lism can be achieved through the topology of ANN by decomposing the ANN into sub-networks depending on available cores.
2. Once subnet have been defined advantage of thread level parallelis m can be taken into picture for achieving parallelis m at following basic tages of ANN:
  1. Weight initia lizat ion
  2. Instance presentation to input layer
  3. Ca lculation of activation on different layers according to the application and ANN used.
    
    To improve the computation capability of Neural Network we are trying to imp le ment it on dual core by parallelizing the unit of the program wh ich seems to be easily parallelized, because as matrix- vector operations are at the core of many neuroalgorithms, processing is often organized in such a way as to ensure their effic ient imple mentation (parallel imple mentation)[5].
3. Neural Network

An artific ial neural network is a massively paralle l distributed processor made up of simple processing units (neurons), which has the ability to learn functional dependencies from data. It resembles the brain in two respects:

Knowledge is acquired by the network fro m its environ ment through a learning process.
Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge.

A typical feedforward neural network will consist of a set of nodes. Some of these are designed input nodes, some output nodes, and those in between hidden nodes. There are also connections between the neurons, with a nu mber re ferred to as a weight associated with each connection. When the network is in operation, a value will be applied to each input node the values being fed in by a hu man operator, or fro m environmental sensors, or perhaps from some other program.

Each node than passes its given value to the connections leading out from it, and on each connection the value is mu ltiplied by the weight associated with that connection. Each node in the next layer than receives a value which is the sum of the values produced by the connections leading into it, and in each node a simp le computation is performed on the value a sigmoid function is typical. This process is then repeated, with the results being passed through subsequent layers of nodes until the output nodes are reached.

Figure 1.Graphical representation of a neuron

Each neuron is a simple processing unit which receives some we ighted data, sums them with a bias and calculates an output to be passed on (Figure 1). The function that the neuron uses to calculate the output is called the activation function.

O = f(x1.W1+x2.W 2+x3.W3+.+x1w1+b) =

f(j=1to n xj Wj + b ) where f is the activation function[1].

Typically, activation functions are generally non –

linear having a squashing effect. Linear functions are limited because the output is simp ly proportional to the input.

Types of Neural Networks

Neural Networks can be viewed as weighted directed graphs in which artific ial neurons are nodes and directed edges (with weights) are connections between neuron outputs and neuron inputs.

Based on the connection pattern (architecture),

Neural Netwo rks can be grouped into two categories (Figure 2):
1. Fee d-for war d networks: Feed-forwa rd networks, in wh ich graphs have no recurrent (or feedback) networks, in which loops occur because of feedback connections. Feed-forward networks are static, that is, they produce only one set of output values rather than a sequence of values from a given input. These networks are me mory -less in the sense that their response to an input is independent of the previous network state.
  
  There are three types of networks in this category:
  1. Single -layer perceptron
  2. Multilayer perceptron
  3. Radia l basis function nets
2. Recurrent or fee dback networks:Recurrent, or feedback networks on the other hand, are dynamic systems. When a new input pattern is presented, the neuron outputs are computed. Because of the feedback paths, the inputs to each neuron are then modified, which leads the network to enter a new state.
  
  There are four types of networks in this
  
  category:
  1. Co mpetitive Networks
  2. Kohonens SOM
  3. Hopfie ld Net works
  4. ART models
Learning

A learning process in the ANN context can be viewed as the problem of updating network architecture and connection weights so that a network can effic iently perform a specific task.

There are three main lea rning paradig ms: supervised, unsupervised, and hybrid.

In supervised learning, or learning with a teacher, the network is provided with a correct answer (output) for every input

Figure 2.A ta xonomy of feed-forward and recurrent/feedback network architectures

pattern. Weights are determined to allow the network to produce answers as close as possible to the known correct answers.

Re inforce ment learning is a variant of supervised learning in wh ich the network is provided with only a critique on the correctness of network outputs, not the correct answers themselves.
Unsupervised learning, or lea rning without a teacher, does not require a correct answer associated with each input pattern in the training data set. It e xp lores the underlying structure in the data, or correlations between patterns in the data, and organizes patterns into categories fro m these correlations.
Hybrid learning co mbines supervised and unsupervised learning. Parts of the weights are usually determined through supervised learning, while the others are obtained through unsupervised learning.

In this research paper we are trying to imp le ment three basic learning algorith ms of neural networks. These are perceptron learning algorith m, backpropogation learning algorithm and SOM (self-organizing maps) learn ing algorithm [2].
OpenMP

What is Ope nMP?

OpenMP is a shared-me mo ry application programming interface (API) whose features are based on prior efforts to facilitate shared-me mo ry paralle l progra mming.

OpenMP uses a directive based approach to paralle lize an applicat ion. The one limitation of OpenMP is that an application can only run within a single address space. In other words, we cannot run an OpenMP application on a cluster. This is a diffe rence with MPI. OpenMP is built on top of a native threading model and therefore adds overhead, but the additional cost is fairly low. Unless we use OpenMP in the wrong way. One golden rule is to create large portions of parallel work to a mort ize the cost of the so-called parallel region in OpenMP.

Creating an OpenMP Pr ogram

OpenMPs directives let the user tell the compiler which instructions to execute parallel and how to distribute them a mong the threads that will run the code. An OpenMP directive is an instruction in a special format that is understood by OpenMP compilers only. In fact, it looks like a comment to a regular Fortran co mpiler or a p ragma to a C/C++ compiler, so that the program may run just as it did beforehand if a comp ile r is not OpenMP-aware. The API does not have many diffe rent directives,

but they are powerfu l enough to cover a variety of needs.

The first step in creating an OpenMP program a sequential one is to identify the paralle lism it contains. Basically, this means finding instructions, sequences of instructions, or even large regions of code that may be e xecuted concurrently by diffe rent processors.

The second step in creating an OpenMP program is to exp ress, using OpenMP, the paralle lism that has been identified. A huge practical benefit of OpenMP is that it can be applied to incrementally create a paralle l progra m fro m an e xisting sequential code. The developer can insert directives into a portion of the program version has been successfully co mpiled and tested, another portion of the code can be parallelized. The programmer can terminate this process once the desired speedup has been obtained [3].

OpenMP Language Fe atures

OpenMP provides diectives, library functions, and environment variables to create and control the e xecution of paralle l progra ms.

OpenMP Directive In C/C++, a #pragma and in Fortran, a co mment, that specifies OpenMP progra m behaviour.

Executable Directive An OpenMP directive that is not declarative; that is, it may be p laced in an e xecutable context.

Construct An OpenMP executable directive (and, for Fortran, the paired end directive, if any) and the associated statement, loop, or structured block, if any, not including the code in any called routines, that is, the le xica l e xtent of an e xecutable directive [6].

This set comprises the following constructs, some of the clauses that make the m powe rful, and (informally) a few of the OpenMP library routines [7]:

Paralle l Constucts

Work-Sharing Constructs
1. Loop Construct
2. Sections Construct
3. Single Construct
4. Workshare Construct (FORTRAN only)

Data-Sharing, No wa it, and Schedule Clauses

Other constructs

Barrie r Construct
Critica l Construct
Atomic Construct
Locks
Master Construct

Tools Used
1. Intel VTune Performance Analyzer
  
  This tool helps to streamline the code in just a few clic ks. It locates and removes performance bottlenecks with low overhead through a graphical interface on Windows platforms, with strong Visual Studio .NET integration.
2. Intel Thread Checker
  
  The tool is designed to observe the execution of a program and to in form the user of places in the application where proble m may e xist. The problems detected are specific to the threads. These include incorrect use of the threading and synchronization API functions.
3. Intel Thread Profiler
  
  The tool is very useful for analysing bottlenecks in our threaded code. Thread Profiler quic kly pinpointed problem areas and showed us the reasons for the slowdown, so user is able to restructure the code for better threaded performance [4].

Results and Discussion

In this research paper we have imple mented perceptron algorithm of neural network model. There are t wo types both sequential and parallel methodologies have been used to develop this algorith m. He re Intel VTune Performance Analyzer is used to evaluate this program.

The sequential and paralle l both programs have the same input values. The performance is checked by intel tools. Intel VTune Perfo rmance Analyzer finds out the hotspots in the program. In hotspots analysis it views results of time and event samp ling on multip le levels, drilling down to the exact operating system process, thread, module e xecutable, function/method, individual line of source code, or individual machine/assembly language instruction to identify specific bottlenecks.

Evaluati on: Ne ural Network model with Perceptron algorithm

The program contains one class named neuron having four me mber functions as:

Initia lizat ion

Ca lculation of act ivation

Weight change Weight adjustment

Statistical data of perce ptron algorithm:

Table 1.Statistical data of sequential program Perceptron algorithm

Exec uti on ti me : 0.03100 sec
Ti me statistics
1) Clockt icks	1,778,000,000 events
2) Processor Time	0.64 sec
Charac terizati on data
1. System CPI	7.47 Cloc ktic ks per Instruction Retired
2. Para lle l act ivity	16.99 %
3. Processor Utilization	58.5 %

Table 2.Statistical data of parallel program Perceptron algorithm

Exec uti on ti me : 0.0460 sec
Ti me statistics
1) Clockt icks	1,156,400,000 events
2) Processor Time	0.41 sec
Charac terizati on data
1. System CPI	4.49 Cloc ktic ks per Instruction Retired
2. Para lle l act ivity	28.82 %
3. Processor Utilization	64.41 %

References

Alexandra Oliveira, Neural network software tool development: exploring programming language options, INEB Instituto de EngenhariaBiomedica FEUP/DEEC,

RuaDr. Roberto Frias, 4200-645 PORTO
Anil K.Jain, M ichigan State University, Jianchang M ao, K.M .Mohiuddin, ZBMAZmaden Research Center, Artificial Neural Networks : A tutorial
Christian Terbovan, C++ and OpenM P and

OpenM P and C++, Center for Computing and Communication, RWTH Aachen University, Germany
Information about intel tools. [Online]. Available: www.intel.com
LiM in Fu, NEURAL NETWORKS IN COMPUTER

INTELLIGENCE, University of Florida, Gainesville.
OpenMP and C++, article of M SDN M agazine Available: http://www.indopedia.org/Neuaral_network.html
Tim M attson of Intel Corporation, OpenM P C/C++

Application Program Interface version 1.0, October 1998

Deployment of Neural Network on Multi-Core Architecture

Recurrent or fee dback networks:Recurrent, or feedback networks on the other hand, are dynamic systems. When a new input pattern is presented, the neuron outputs are computed. Because of the feedback paths, the inputs to each neuron are then modified, which leads the network to enter a new state.

What is Ope nMP?

Creating an OpenMP Pr ogram

OpenMP Language Fe atures

Evaluati on: Ne ural Network model with Perceptron algorithm

Statistical data of perce ptron algorithm:

Leave a Reply