- Open Access
- Total Downloads : 139
- Authors : Ranjan Kumar Singh, Sushila Maheshkar, Vikas Maheshkar
- Paper ID : IJERTV6IS050313
- Volume & Issue : Volume 06, Issue 05 (May 2017)
- DOI : http://dx.doi.org/10.17577/IJERTV6IS050313
- Published (First Online): 12-05-2017
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Static Signature Authentication based on J48 and Random Forest
Ranjan Kumar Singh Sushila Maheshkar
Department of Computer Science Indian Institute of Technology (ISM) Dhanbad (Jharkhand – 826004), India
Vikas Maheshkar
Division of Information Technology Netaji Subhas Institute of Technology Delhi 110078
Abstract With the exposure and development of technology, a better security system is needed to protect the data. Off-line signature verification is one of the prime area of research which is most widely recommended by the research community for security issues. For the Off-line signature verification from spontaneous handwritten signature image a precise and effective method is proposed which contributes significantly to that area and comprises of image prepossessing, feature extraction and decision tree classifier such as J48 and Random Forest. It exploit totally different features of writing like number of closed loop, standard deviation, skewness, number of edge points, kurtosis, centroid and number of cross points by computing a collection of options from writing samples at totally different levels of observations. These features are extracted from the standard signature database and extracted features are analyzed correspondingly. The method proposed here monitor an effective accuracy of the proposed algorithm.
Index Terms Decision Tree, J48, True Negative Rate, False Negative Rate, Accuracy, False Positive Rate, True Positive Rate.
-
INTRODUCTION
The necessity for security is difficult often compounded by a tendency towards individual levels of general apathy. Biometrics has received recognition from a vivid range of applications available [1]. Signature has been the most important and useful behavioral biometrics that is widely accepted [1]. The surge in the application of authentication and identification for personal verification and security related application can be observed these years. The increasing studies and development on signature verification has drawn the attention of researchers from different fields of education.
Each person has their own traits on the basis of which they can be distinguished and hence authenticated [2].
One can verify and identify the claimed person by two mean of biometric system. Verification of signature means to verify whether the claimed person is the actual genuine person whereas signature identification means persons existing is there in the given database or not. Biometric system relies on specific data about unique biological traits [11]. It thus verifies or identifies the claimed users. Biometric identifier are usually classified as behavioral and physiological. Physiological characteristics refers to the form of body similar to fingerprints, face recognition, DNA, etc while behavioral
characteristics are related to the pattern of behavior of a person including voice, gait etc. [6, 10].
Automatic Off-line Signature authentication and verification system occupy a very special position among the other automatic identification methods [13, 8]. Handwritten signatures are persistent and are being used for identity verification. They are used for getting permission in banks related works and transactions and thus have paved a way for criminal deception. Hence automatic signature verification is required for access grant in such areas. Depending upon ownership, signature can be genuine or insincere. Verifications are done on the basis of the traits of users that can neither be changed nor be remodeled [7].
Signature verification is advantageous since it is biologically linked to a specific individual [4]. Digitalized system may exacerbate the problem of technological obsolescence. A signature is more difficult to be forged then a fingerprint when a person is in unconscious state. Handwritten signature results in complex process and features. It is used as a legal mean of verifying individuals identity by financial sectors and administrative [4, 5].
-
RELATED WORK
A lots of research had already been examined and shown their pivot contribution in the field of biometric system mainly in off-line signature verification. Since off-line signature verification is the most important biometric verification in todays era, still it is very challenging area of research.
Bhargava et.al [11] proposed a discussion about univariate and multivariate approaches about decision tree and implement various algorithm in a data mining tool WEKA. Offline Handwritten Signature verification using Neural Network proposed by Hatkar et.al [9] proposed their work in which they used machine learning as classifier and have found accuracy 86.25%. Suryawanshi et.al [3] proposed as signature verification method that uses Artificial Neural Network as classifier. Jain et.al [1] proposed a method for Offline signature verification using Adaptive Resonance Theory 1. A standard database of 250 signatures is used for calculating the performance of system in this paper. The false rejection rate (FRR) and false acceptance rate (FAR) are found as 3.9% and 2.7%. Handwritten signature verification using neural network proposed by Ashwini and Shalini [2] having false rejection ratio (FRR) as 20%. The test is carried out with 150 as standard signature database. Ferrer et.al [7]
proposed a method in which classification of the signature is done in very precise way and uses high intensity variation and cross over points. Prakash and Guru [14] proposed score level fusion method for offline signature verification and found AER for mean as 17.33. Grid Based feature extraction techniques has been optimize by using 2D Gaussian filter which is proposed by Nguyen and Blumenstein [15]. Vielhauer and Dittmann [12] proposed different behaviours related to various sets of reference signatures, a multi-level verification strategy is proposed which uses well-selected sets of reference signatures.
The flow of the remaining part of paper: complete proposed work comprises in section III. Experiment results is shown in section IV and in section VI the conclusion of the work is given.
-
PROPOSED WORK
The proposed method is based on behavioral biometric authentication system and the core part of our discussion is static and offline signature verification. Now a days this signature is having broad application that includes banking application, students credentials, government document, treaties between two countries etc. Due to demand of such mode of verification we need a system which is most reliable which is having proficient success rate as well as least time consumable. Such an effective method whose success rate is 75% will help to eradicate and eliminate the verification which is to be done manually. The signatures are stored in database and finally compared with specimen signature using feature like kurtosis, skewness, entropy, edge points, cross points and centroid so as to verify whether the specimens signature is forged of genuine.
The three main phases of signature verifications used here include:-
Fig.2. Flow Diagram of Authentication of Signature based on Machine Learning
-
Signature Pre-Processing
The preprocessing step is applied for already stored signature in database as well as the signature is to be test. The purpose of this step is to make the signature normalized in one form and improve the quality of image which is suitable for feature extraction. The preprocessing stages includes
-
Pre – Processing
-
Feature extraction
-
Classifier (Machine Learning Approach)
Fig. 1. Block Diagram of Authenticaton of Signature based on Decision Tree
Block diagram of the work is depicted in Fig.1. The proposed method is having different stages. The flow of the proposed method is given by Fig.2 which is having separate procedure for training the supervised machine.
-
Binarization
Binarization of gray scale signature is obtained by adaptive thresholding value. It begin with a threshold value for different signature which is being calculated by the method of thresholding. For each signature pixel element it computes associate intensity gradient by selecting a maximum of difference of left and right signature pixel intensity and upper and lower signature pixel intensity respectively to calculate the corresponding threshold. Finally, thresholding method is exploited for Binarization. During this technique intensity of every signature pixel intensity is compared with the threshold value. The signature pixel intensity is set to 1 for the pixel intensity value higher than the threshold while for lower value it is set to 0.
-
Complementation
Complement of a binarized signature means converting the zeros into ones and ones into zeros. Complementation of signature image results in better visibility of great difference of gray levels. It helps us to identify the fine detail in precise manner for signature image which helps to correctly calculate the features to classify about signature between genuine or forged. Complemented signature image is having more clarity for further operation since the lighter pixel in signature become dark and the dark area become lighter.
-
Noise Reduction
A noise is some specific portion of signature image which do not represent the part of signature. These type of portion need to be nullified. After getting complemented binary image filter is used which reduces the noise by excluding or ignoring the single black pixel of signature on the background of white. With the help of connectivity properties of pixels as 8- neighbors a chosen pixel is examined whether that is alone or not.
-
Thinning
Removal of some selected signature pixel from the binarized signature with morphological operation such as erosion or opening known as thinning. Skeletonization is the prime scheme for thinning with many other applications is remaining. In this work, it is used to precise the output by reducing the lines into the single pixel which results in better number of edge detection. Thinning means reducing binary objects or shapes to strokes that are single pixel wide. Thinning preserves the properties of Euler number En where the number En is the total number of objects in the image obtained by subtraction of number of Holes Hn in the object from the connected component Cc which is explained in (1).
= (1)
In the operation thinning, four condition (2), (3), (4) and (5) determines whether the pixel should be deleted or not which is listed below.
=1
Condition 1:
-
-
-
Feature Extraction
It is quite difficult that which feature is to be choose for further verification of signature. Feature of signature image is thus been extracted and can be used as a reference points for further verification of signature. In our work, feature as a vector of seven entities which include entropy, number of closed loop and so on are used to precisely authenticate and verify the test signature. These features that has been depicted in Table 1.
TABLE 1
Extracted features from sample signature
Features
Sample Signature
1
Sample Signature
2
Sample Signature
3
Entropy
0.2586
0.2311
0.2586
No. of Closed Loop
1
1
2
No. of cross
points
155
74
91
No. of edge points
8
13
23
Skewness
4.4691
4.8623
4.4691
Kurtosis
20.97
24.6424
20.9729
centroid
66.1576
77.5271
82.0974
85.2767
61.7636
58.7769
1) Entropy
Image entropy is a quantity which is used to describe the
Where
() = 4
(2)
amount of information that must be coded for by an algorithm for better observation on image data. Image
1, 21 = 0 (2 = 1 2+1 = 1)
having low entropy must contain a lots of black pixel and
= {
0,
having low amount of white pixel. Image having entropy as zero contain only flat pixel. Image having high entropy
() = 1, 1, . . . , 8 represents eight neighbor of pixel (p) such that 1 is the east pixel 2 to 8 can be obtained in traversing in counter clockwise direction.
Condition 2:
(2 3 ~8) 1 = 0 (3)
Condition 3:
(2 3 ~8) 1 = 0 (4)
Condition 4:
(6 7 ~4) 5 = 0 (5)
On the basis of these listed conditions pixel (p) is deleted in first sub iteration, if condition 1, 2 and 3 are satisfied whereas in the second sub iteration if condition 1, 2 and 4 are satisfied.
-
Bounding Box of Signature
Bounding box of signature basically draw a rectangle trace which form outline to the area of signature. With the help of this process one can reduce the effort and time to process the signature further. It is the possible region of interest of an image. Maximum and minimum coordinates value of x and y are calculated which can be fixed as the corner of rectangles and by this the bounding box is calculated. The coordinated represents as the top left and the bottom right of the bounding box.
contain larger number of white pixel than that of the black pixel and cannot be compressed as much as low entropy images. Entropy that is calculated over here is the same formula used by Gallileo Imaging Team and which is calculated using equation (6).
= ()2()
(6)
In above equation q(i) refers to the probability that the difference between two adjacent pixels is equal to i, and log2 is the base 2 logarithm.
-
Edge Points Number
Edge detection is the identification of points (edge points) in digital images where brightness of the images changes sharply or does not continued. The pixel position where an abrupt changes occur are basically organized into a set of curved line segment which is termed as edges. The pixel which has only one neighbor, which belongs to the signature, in 8-neighbor is known as edge points.
-
Skewness
-
The symmetricity of distribution of pixel intensity is measured by skewness. The distribution is symmetric when
there is equal distribution from the center point that is, it has identical distribution to the right and the left with respect to center. For normal data distribution the skewness will be zero and for symmetric data it closes to zero. If the data is partially distributed or distribution is tilted toward left than it value is negative. Skewness value is positive when the skewness of data is right. Hence it can be state that the surface having darkness tends to be have more skewness than that of the lighter surface.
The skewness of a random variable X is the third standardized moment 1, defined and explained in (7):
2) Random Forest
Random Forest is decision tree type machine learning algorithm. The features are randomly selected for replacement for each learner. In this algorithm, one modifies the training data. However, this modification is performed in the feature space.
D.Proposed Algorithm
Input: signature from database
Output: classified sigature as forged or genuine
-
Extract signature image from a database.
-
Pre-processing the extracted signature.
-
Converting signature image into binary image.
-
Image complementation.
-
Noise Reduction of complimented image.
-
Finding bounding box of the thinned signature.
-
Thinning (Ithin).
-
Feature extraction of Ithin.
-
Finding Entropy of Ithin.
-
Calculate total number of edge points of Ithin.
-
Calculate total number of cross points of Ithin.
-
Count total number of closed loop for Ithin.
-
Find skewness and kurtosis of Ithin.
-
Find centroid of Ithin.
-
Feature vector is formed by combining extracted features.
-
Train a neural network with a normalized vector.
-
Steps 1 to 16 are repeated for testing signatures.
-
Feature vector is then apply for test signature to trained neural network using WEKA tools with two different algorithm.
-
Result generated by WEKA declaring a signature as a forged or genuine.
[( 3
-
Kurtosis
1 = ) ] (7)
Kurtosis helps us to verify and compute whether the data are sharped or flat, when compare with normal distribution. If the data is having large kurtosis than it has a distinct number of peak near mean of the data. In normal distribution, it decline with high rate and larger tails which is the extreme case. Similarly data with flat top near mean is having low kurtosis with no sharp peak.
The Kurtosis is standardized and can be calculated as shown in (8).
Kt[Y] = 4 = [()4]
(8)
2 ([()2])2
Where is the central moment and is the standard deviation.
-
Centroid
We find the geometric centroid of the image by traversing the whole signature image iteratively and consider the centroid of that moment. In this case we are having the centroid as the coordinate of both in x-coordinate as well as y-coordinate (center X, center Y).
C. Classifier
In this proposed method of off-line signature verification we use WEKA tools for classification of feature extracted. The most widely used data mining as an open source software is WEKA [11]. Full form of WEKA is Waikato Environment for knowledge analysis. The different operation like association, filtering, classification, clustering, visualization, regression etc. can be performed using this tools which is written using object oriented language java. J48 and Random Forest is used which is basically a type of decision tree.
1) J48
J48 is associated with open source Java implementation of C4.5 algorithm data mining tool WEKA. The improved version of ID3 (Iterative Dicho 3) is C4.5. This essentially produce a threshold so it split the list into two. First list consists of data having higher value than the threshold while second list comprises of those with equal or lesser value.
-
-
EXPERIMENTAL RESULTS
In our work we have implemented the algorithm with the help of data mining tool WEKA on the system having configuration i3-5005U CPU 2.00 GHz, 4.00 GB RAM of 64- bit operating system. We have performed the experiment with the help of standard J48 and Random Forest as an algorithm.
Signature Database
We have used MCYT corpus for our experiment and analysis. The dataset contain 28 users and for each user 5 genuine and 5 forged signatures are used. Thus in all 280 signature images are present in the data set.
A. Experimental Configuration
We have used WEKA (Waikto Environment for Knowledge Analysis) as a simulation tools for experimentation. WEKA provided a homogeneous interface platform to different number of machine learning algorithm for calculation of result of learning method on any given dataset. WEKA gives the performance of learning algorithm that can easily applicable to any dataset and also include diversity of tools for transforming the dataset. This tool allows huge support for the complete process of experimental data mining together with preparing the input data, calculate and evaluating learning method statistically, visualizing the learning data and the result of learning.
-
PERFORMANCE MEASURE
The list of symbols used for performance measure in experimentation is shown in Table 2. The performance measure of the signature verification is measured in term of different evaluation metric as shown in Table 2.
TABLE 2
Different Evaluation metric
Evaluation Metric
Equation
TPP(True Positive
Rate)
TP/(TP+FN)
TNR(True Negative Rate)
TN/(FP+TN)
FPR(False
Positive Rate)
FP/(FP+TN)
FNR(False Negative Rate)
FN/(TP+FN)
ACC(Accuracy)
(TP+TN) / (P+N)
F-Measure
2TP=(2TP+FN+FP)
MCC(Mathews Correlation Coefficient)
(TP TN FP FN)
(TN + FP)(TP + FP)(TN + FN)(TP + FN)
A. Experimental Result
In this proposed method, the experimentation is done excessively on two different classifier. The different feature obtained after pre-processing stage extracted from signatures are centroid, closed area, edge points, cross points, kurtosis, skewness and entropy.
1) J48
The time taken by J48 to build the model is 0.09 sec and to test the model on training data is 0.03 sec. The performance measure by J48 with 10 iteration and base learner is upto 71%.
2) Random Forest
The time taken by Random Forest decision tree to build the model is 0.27 sec. The performance measure by Random Forest with 10 iteration and base learner is up to 80 %.
Table 3 shows the final acceptance and rejection of signature of different classifiers in terms confusion matrix. Here, the number of genuine and number of forged signatures are 140 each.
TABLE 3
Measured value of confusion matrix for different classifier
Model
TP
FN
TN
FP
J48
104
36
94
46
Random Forest
110
27
113
30
The different accuracy measured by class of different classifier is shown in Table 4. It includes rate like TPR, FPR, FNR, TNR and ACC. Accuracy of J48 model is calculated as 70.71% and that for the Random Forest is 79.64%.
TABLE 4
Different accuracy by class of different classifier
Model
TPR
FPR
FNR
TNR
ACC
J48
0.74
0.32
0.25
0.67
70.71%
Random Forest
0.80
0.20
0.19
0.79
79.64%
The measured statistical value of different classifier are shown in Table 5. From the set of 280 instance J48 correctly classified 198 instance while 82 are misclassified. This results in 70.71% accuracy. In the case of Random Forest out of 280 instance 223 are correctly classified but still 57 are misclassified. This results in 79.64 as accuracy in Random forest classification. The different rate valued measured by two classifiers are shown in Table 5.
TABLE 5
Different rate valued measured by two classifier
Statistical Variables
J48
Random Forest
Correctly Classified Instances
198
223
Incorrecly Classified Instances
82
57
Kappa Statistics
0.4143
0.5929
Mean absolute error
0.3306
0.3331
Root mean square error
0.4951
0.3953
Relative absolute error
66.12%
66.62%
Root relative squared error
99.02%
79.08%
Table 6 depicts the comparison between the existing methods with the proposed method of the paper.
TABLE 6
Comparison of this work with existing methods
Author
Parameter
Matching Technique
ACC
Database
Prakash et al. [14]
Centroid Based
NN, HMM, SVM
NS
MCYT-75
Vu Nguyen et al. [15]
Gaussian Grid Based
SVM
NS
GPDS-960
Proposed Work
Local & Global
Processing
J48,
Random
Forest
70.71%
79.64%
MCYT-75
-
CONCLUSION
In this work, a new approach is proposed on the base of two classifier for offline signature verification. The proposed method classified good between the genuine and forged signature. The features like Skewness, Kurtosis, Centroid, Edge Number, Cross Points, Closed loop and Entropy are calculated of the standard signature corpus MCYT-75. The validation of these extracted feature is done with the help of machine learning approach. Now the performance of proposed method aimed at learning the ability to differentiate the given signature between genuine and forgery. We have use J48 and Random forest for classification of signature. The extracted features are now given to the both machine learning classifier one by one and come up with experimental results. J48 Classification gives 70% accuracy but Random Forest decision tree having the accuracy of 80%.
-
REFERENCES