- Open Access
- Total Downloads : 373
- Authors : S. S. Magare, Y. K. Gedam, D. S. Randhave, Prof. R. R. Deshmukh
- Paper ID : IJERTV3IS11115
- Volume & Issue : Volume 03, Issue 01 (January 2014)
- Published (First Online): 31-01-2014
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Character Recognition of Gujarati and Devanagari Script : A Review
S. S. Magare
Dept. of CS-IT Dr. B.A.M. University
Aurangabad (M.S)
Y. K. Gedam
Dept. of CS-IT Dr. B.A.M. University
Aurangabad (M.S)
-
S. Randhave
Dept. of CS-IT Dr. B.A.M. University
Aurangabad (M.S)
Prof. R. R. Deshmukh
Dept. of CS-IT Dr. B.A.M. University
Aurangabad (M.S)
Abstract
In this paper, we describe the different techniques of character recognition for Gujarati and Devanagari script. Character recognition is usually referred to as OCR. Review of this paper will provide a way for researcher to develop a tool for Gujarati and Devanagari script recognition. This paper describes basics of character recognition, its type, challenges associated with it and the special properties of Gujarati and Devanagari script.
Keywords
Offline character recognition, Online Character recognition, Handwritten character recognition, Printed character recognition, feature extraction techniques, classification.
-
Introduction
Optical character recognition is the process of recognizing optically scanned characters. Character recognition has two types: Offline and Online .One of the challenging problem in pattern recognition is Offline character recognition. Offline character recognition takes scanned image of required document paper. Scanned image can be in color form. For the reorganization process this image is converted to grayscale and then Binarization is applied on grayscale image. So that image can contain information only in 0 or 1.Offline character recognition can be done in two ways: Handwritten and Printed.
Handwritten character recognition is abbreviated as HCR; handwritten characters have number of variations as different people has different writing styles. HCR can recognize offline character and online characters. Offline HCR takes input from scanned image of paper document and Online HCR takes input from digital pen. There are many
handwritten historical documents exist in electronic form, HCR is used to recognize such documents.
-
Gujarati Script
Gujarati script is derived during 16th century from Devanagari script and it is modern language of India. Main difference between Gujarati and Devanagari script is the lack of horizontal line at header of character in Gujarati script and small modification in the characters. Until the 19th century Gujarati script was only used for accounting and writing letters, because of this Gujarati script was also known as Bankers, Merchants and traders script. Recognition of Gujarati script
-
Devanagari Script
Devanagari Script consists of 14 vowels and 34 consonants. Devanagari script is base for writing
28 languages such as Marathi, Hindi, Sanskrit, Kashmiri, Bhojpuri and many more. Devanagari script is formerly used to write Guajarati. It is written from left to right. Devanagari script consists of horizontal line at the header of character called Shirorekha. Devanagari is most used and adopted writing system. Character recognition of Devanagari Script is somewhat challenging due to curve involved in most characters.
Figure 1. Consonants of Gujarati Script
Figure 2. Vowels and Consonants of Devanagari Script
-
-
HCR system
The block diagram of HCR system is shown in fig.1. It consists of Preprocessing, Segmentation, feature extraction, classification and recognition.
-
Preprocessing
Preprocessing technique is used to do improvement of image data that enhances some image features required for processing and suppresses unwanted noise and distortion from image data and aims to correct degradation in an image
-
Binarization
Binarization is the process of converting grayscale image in to binary (Black and White) image, so that image data will only contain 0 and 1. Binarization technique is usually used for separating foreground from background using required level of thresholding.
-
Noise Removal
Digital image consist of variety of noises. These noises are required to be removed from an image for better processing. Morphological operation, Median filter and Weiner filter is used to remove noise from an image. Median filter reduces blurring of edges.
-
Thinning and Filling
Smoothing implies both Filling and Thinning. Thinning reduces width of character while Filling eliminates gap, small breaks and holes in digitized character.
-
Normalization
To obtain characters of uniform size, rotation and slant Normalization is applied on image. To
improve the accuracy of character recognition Normalization reduces shape variation.
-
Skew detection and correction
During the digitization of document page it is often that image is not align correctly or it may be happen by human while writing document. To make in correctly align Skew detection and correction technique is used.
Skew detection technique can be classified in to groups: Analysis of Projection profile, Hough transform, clustering, connected component and correlation between line techniques.
-
Segmentation
Segmentation of an image is the process of subdividing image into number of parts. Segmentation takes the form as Paragraph Segmentation, Line Segmentation, Word Segmentation and Character Segmentation.
Paragraph wise segmentation divides the document into paragraph. Line wise segmentation divides paragraph into line. Line wise segmentation can use a horizontal projection profile based techniques Word wise segmentation divides line into word. Finally, Character wise segmentation divides words into characters.
Chain code histogram can be used for each segment. Horizontal projection file method is used for segmentation.
-
Feature Extraction
Feature extraction technique is aims to extract the essential and important features and characteristic of the given image. In Pattern recognition this is one of the difficult stages to implement. Selection of right feature extraction technique leads to achieving high performance for recognition.
Feature extraction technique is divided into three groups: Distribution of points, Transformation & series expansion and Structural analysis. Structural analysis extracts the feature which represents geometric and topological structure of character. Structural analysis gives feature with high tolerance of noise and style variation. Commonly used features are intersection between lines and loops.
Table 1. FE method for various image representation forms
Feature Extraction Method
Gray scale sub- image
Binary Image
Vector (skelto n)
Solid chara cter
Outlet Conto ur
Template matching
Yes
Yes
No
Yes
Deformable templates
Yes
No
No
Yes
Graph description
No
No
No
Yes
Unitary transform
Yes
Yes
No
No
Discrete features
No
No
No
Yes
Zoning
Yes
Yes
Yes
Yes
Fourier descriptors
No
No
Yes
Yes
Geometric moments
Yes
Yes
No
No
Zernike moments
Yes
Yes
No
No
Projection histogram
No
Yes
No
No
Contour profile
No
No
Yes
No
Spline curve
No
No
Yes
No
-
Classification
After selection of the features next step is to classify them according to its properties. Training and testing is done at the classification phase. Number of classifier can be used to train the character. K-NN method is mostly used at classification stage.
-
Neural network
Neural network is one of the well known classifier used for character recognition system. Neural network have advantage of their adaptive nature. Feed forward NN and Back propagation NN is used for character recognition.
-
SVM
-
Support vector machine construct the hyper-planes in high or infinite dimensional space. SVM is based upon statistical learning theory. The SVM was defined for the two class problem and it looked optimal hyper-plane, which maximized the distance, margin, between the nearest examples of both classes.
-
-
-
-
Techniques used for scripts
-
Gujarati Script
In paper [1] presented Zone identification technique. Zone identification technique identifies three zones from Gujarati characters i.e. Base character zone, Upper modifier zone and Lower Modifier zone and Lower zone. They have found that several characters are discriminated by specific modifier, which exist in upper and lower zone. Therefore they have used Zone identification technique.
Thinning & skew correction is used for preprocessing and use Multi Layered Feed Forward Network for classifying digits.
-
Devanagari Script
In Paper [3] has described various Feature Extraction Method, such as Template matching, Deformable templates, Graph description, Discrete features, Zoning and Fourier descriptor. They found that Real- Valued feature vectors are ideal for statistical Classifier.
Chain coding used to extract chain code features at the feature extraction stage and use Combined MLP and Minimum Edit Distance Classifier for classification [4].
In paper [5], Median and Wiener filter for de- noising. They have used Structural segmentation algorithm for segmentation purpose and for feature extraction they have used Zone based approach.
Encode binary variation method for extracting the features. For classification purpose use SVM Comparison Techniques [6].
Segmentation based on character height and width [6]. At the classification process they have used MLP learning algorithm for two hidden layers with back propagation for character identification.
-
-
Dataset
There is no standard dataset available for handwritten characters. Researcher has to develop own character dataset collected from minimum 10- 15 people. For better result and accuracy collect dataset from large number of people, as different people has different writing styles, it will include variation in the character which will be useful while training and testing phase for character recognition.
-
Acknowledgement
The authors would like to thank the University Authorities for providing the infrastructure to carry out the research. This work is supported by University Grants Commission.
-
References
-
-
J. Dholakia, A. Negi, S. Rama Mohan. Zone Identification in Printed Gujarati Text, ICDAR, Vol. 1, pp. 272-276, 2005.
-
A. Desai, Gujarati Handwritten Numeral Optical Character Recognition through Neural Network, Pattern Recognition, Vol. 43, pp. 2582-2589, 2010.
-
Olivind Due Tier, Anil K Jain, Torfin Tax, Feature Extraction Method For Character Recognition: A Survey, Pattern Recognition Vol. 29, No. 4, pp. 641-662, 1996.
-
S. Arora, D.Bhattacharjee, M. Nasipuri, D. K. Basu & M. Kundu, Recognition of Non-Compound Handwritten Devanagari Characters using a Combination of MLP and Minimum Edit Distance, International Journal of Computer Science and Security (IJCSS), Vol. 4, Issue 1.
-
Veena Bansal and R.M.K. Sinha, Segmentation of touching and fused Devanagari character, Pattern Recognition, Vol. 35, Issue 4, pp. 875-893, 2002.
-
U. Garain, B.B. Chaudhari, Touching Characters in Printed Devanagari and Bangla Scripts Using Fuzzy Multifactorial Analysis, IEEE Transaction on Vol. 32, Issue 4, pp. 449-459, 2002
-
Stuti Asthana, Farha Haneef and RakeshK Bhajade, Handwritten Multiscript Numeral Recognition using Artificial Neural Nerworks, International Journal of Software Computing and Engineering ISSN: 2231-2307, Vol.1,Issue-1,March-2011.