Character Recognition of Gujarati and Devanagari Script : A Review

S. S. Magare; Y. K. Gedam; D. S. Randhave; Prof. R. R. Deshmukh

doi:10.17577/IJERTV3IS11115

Volume 03, Issue 01 (January 2014)

Character Recognition of Gujarati and Devanagari Script : A Review

DOI : 10.17577/IJERTV3IS11115

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 129
Total Downloads : 373
Authors : S. S. Magare, Y. K. Gedam, D. S. Randhave, Prof. R. R. Deshmukh
Paper ID : IJERTV3IS11115
Volume & Issue : Volume 03, Issue 01 (January 2014)
Published (First Online): 31-01-2014
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Character Recognition of Gujarati and Devanagari Script : A Review

S. S. Magare

Dept. of CS-IT Dr. B.A.M. University

Aurangabad (M.S)

Y. K. Gedam

Dept. of CS-IT Dr. B.A.M. University

Aurangabad (M.S)

S. Randhave

Dept. of CS-IT Dr. B.A.M. University

Aurangabad (M.S)

Prof. R. R. Deshmukh

Dept. of CS-IT Dr. B.A.M. University

Aurangabad (M.S)

Abstract

In this paper, we describe the different techniques of character recognition for Gujarati and Devanagari script. Character recognition is usually referred to as OCR. Review of this paper will provide a way for researcher to develop a tool for Gujarati and Devanagari script recognition. This paper describes basics of character recognition, its type, challenges associated with it and the special properties of Gujarati and Devanagari script.

Keywords

Offline character recognition, Online Character recognition, Handwritten character recognition, Printed character recognition, feature extraction techniques, classification.

Introduction

Optical character recognition is the process of recognizing optically scanned characters. Character recognition has two types: Offline and Online .One of the challenging problem in pattern recognition is Offline character recognition. Offline character recognition takes scanned image of required document paper. Scanned image can be in color form. For the reorganization process this image is converted to grayscale and then Binarization is applied on grayscale image. So that image can contain information only in 0 or 1.Offline character recognition can be done in two ways: Handwritten and Printed.

Handwritten character recognition is abbreviated as HCR; handwritten characters have number of variations as different people has different writing styles. HCR can recognize offline character and online characters. Offline HCR takes input from scanned image of paper document and Online HCR takes input from digital pen. There are many

handwritten historical documents exist in electronic form, HCR is used to recognize such documents.
1. Gujarati Script
  
  Gujarati script is derived during 16th century from Devanagari script and it is modern language of India. Main difference between Gujarati and Devanagari script is the lack of horizontal line at header of character in Gujarati script and small modification in the characters. Until the 19th century Gujarati script was only used for accounting and writing letters, because of this Gujarati script was also known as Bankers, Merchants and traders script. Recognition of Gujarati script
2. Devanagari Script
  
  Devanagari Script consists of 14 vowels and 34 consonants. Devanagari script is base for writing
  
  28 languages such as Marathi, Hindi, Sanskrit, Kashmiri, Bhojpuri and many more. Devanagari script is formerly used to write Guajarati. It is written from left to right. Devanagari script consists of horizontal line at the header of character called Shirorekha. Devanagari is most used and adopted writing system. Character recognition of Devanagari Script is somewhat challenging due to curve involved in most characters.
  
  Figure 1. Consonants of Gujarati Script
  
  Figure 2. Vowels and Consonants of Devanagari Script

HCR system

The block diagram of HCR system is shown in fig.1. It consists of Preprocessing, Segmentation, feature extraction, classification and recognition.

Preprocessing

Preprocessing technique is used to do improvement of image data that enhances some image features required for processing and suppresses unwanted noise and distortion from image data and aims to correct degradation in an image

Binarization

Binarization is the process of converting grayscale image in to binary (Black and White) image, so that image data will only contain 0 and 1. Binarization technique is usually used for separating foreground from background using required level of thresholding.
Noise Removal

Digital image consist of variety of noises. These noises are required to be removed from an image for better processing. Morphological operation, Median filter and Weiner filter is used to remove noise from an image. Median filter reduces blurring of edges.
Thinning and Filling

Smoothing implies both Filling and Thinning. Thinning reduces width of character while Filling eliminates gap, small breaks and holes in digitized character.
Normalization

To obtain characters of uniform size, rotation and slant Normalization is applied on image. To

improve the accuracy of character recognition Normalization reduces shape variation.

Skew detection and correction

During the digitization of document page it is often that image is not align correctly or it may be happen by human while writing document. To make in correctly align Skew detection and correction technique is used.

Skew detection technique can be classified in to groups: Analysis of Projection profile, Hough transform, clustering, connected component and correlation between line techniques.

Segmentation

Segmentation of an image is the process of subdividing image into number of parts. Segmentation takes the form as Paragraph Segmentation, Line Segmentation, Word Segmentation and Character Segmentation.

Paragraph wise segmentation divides the document into paragraph. Line wise segmentation divides paragraph into line. Line wise segmentation can use a horizontal projection profile based techniques Word wise segmentation divides line into word. Finally, Character wise segmentation divides words into characters.

Chain code histogram can be used for each segment. Horizontal projection file method is used for segmentation.

Feature Extraction

Feature extraction technique is aims to extract the essential and important features and characteristic of the given image. In Pattern recognition this is one of the difficult stages to implement. Selection of right feature extraction technique leads to achieving high performance for recognition.

Feature extraction technique is divided into three groups: Distribution of points, Transformation & series expansion and Structural analysis. Structural analysis extracts the feature which represents geometric and topological structure of character. Structural analysis gives feature with high tolerance of noise and style variation. Commonly used features are intersection between lines and loops.

Table 1. FE method for various image representation forms

Feature Extraction Method	Gray scale sub- image	Binary Image		Vector (skelto n)
Feature Extraction Method	Gray scale sub- image	Solid chara cter	Outlet Conto ur	Vector (skelto n)
Template matching	Yes	Yes	No	Yes
Deformable templates	Yes	No	No	Yes
Graph description	No	No	No	Yes
Unitary transform	Yes	Yes	No	No
Discrete features	No	No	No	Yes
Zoning	Yes	Yes	Yes	Yes
Fourier descriptors	No	No	Yes	Yes
Geometric moments	Yes	Yes	No	No
Zernike moments	Yes	Yes	No	No
Projection histogram	No	Yes	No	No
Contour profile	No	No	Yes	No
Spline curve	No	No	Yes	No

Classification

After selection of the features next step is to classify them according to its properties. Training and testing is done at the classification phase. Number of classifier can be used to train the character. K-NN method is mostly used at classification stage.
1. Neural network
  
  Neural network is one of the well known classifier used for character recognition system. Neural network have advantage of their adaptive nature. Feed forward NN and Back propagation NN is used for character recognition.
2. SVM

Support vector machine construct the hyper-planes in high or infinite dimensional space. SVM is based upon statistical learning theory. The SVM was defined for the two class problem and it looked optimal hyper-plane, which maximized the distance, margin, between the nearest examples of both classes.

Techniques used for scripts
1. Gujarati Script
  
  In paper [1] presented Zone identification technique. Zone identification technique identifies three zones from Gujarati characters i.e. Base character zone, Upper modifier zone and Lower Modifier zone and Lower zone. They have found that several characters are discriminated by specific modifier, which exist in upper and lower zone. Therefore they have used Zone identification technique.
  
  Thinning & skew correction is used for preprocessing and use Multi Layered Feed Forward Network for classifying digits.
2. Devanagari Script
  
  In Paper [3] has described various Feature Extraction Method, such as Template matching, Deformable templates, Graph description, Discrete features, Zoning and Fourier descriptor. They found that Real- Valued feature vectors are ideal for statistical Classifier.
  
  Chain coding used to extract chain code features at the feature extraction stage and use Combined MLP and Minimum Edit Distance Classifier for classification [4].
  
  In paper [5], Median and Wiener filter for de- noising. They have used Structural segmentation algorithm for segmentation purpose and for feature extraction they have used Zone based approach.
  
  Encode binary variation method for extracting the features. For classification purpose use SVM Comparison Techniques [6].
  
  Segmentation based on character height and width [6]. At the classification process they have used MLP learning algorithm for two hidden layers with back propagation for character identification.
Dataset

There is no standard dataset available for handwritten characters. Researcher has to develop own character dataset collected from minimum 10- 15 people. For better result and accuracy collect dataset from large number of people, as different people has different writing styles, it will include variation in the character which will be useful while training and testing phase for character recognition.
Acknowledgement

The authors would like to thank the University Authorities for providing the infrastructure to carry out the research. This work is supported by University Grants Commission.
References

J. Dholakia, A. Negi, S. Rama Mohan. Zone Identification in Printed Gujarati Text, ICDAR, Vol. 1, pp. 272-276, 2005.
A. Desai, Gujarati Handwritten Numeral Optical Character Recognition through Neural Network, Pattern Recognition, Vol. 43, pp. 2582-2589, 2010.
Olivind Due Tier, Anil K Jain, Torfin Tax, Feature Extraction Method For Character Recognition: A Survey, Pattern Recognition Vol. 29, No. 4, pp. 641-662, 1996.
S. Arora, D.Bhattacharjee, M. Nasipuri, D. K. Basu & M. Kundu, Recognition of Non-Compound Handwritten Devanagari Characters using a Combination of MLP and Minimum Edit Distance, International Journal of Computer Science and Security (IJCSS), Vol. 4, Issue 1.
Veena Bansal and R.M.K. Sinha, Segmentation of touching and fused Devanagari character, Pattern Recognition, Vol. 35, Issue 4, pp. 875-893, 2002.
U. Garain, B.B. Chaudhari, Touching Characters in Printed Devanagari and Bangla Scripts Using Fuzzy Multifactorial Analysis, IEEE Transaction on Vol. 32, Issue 4, pp. 449-459, 2002
Stuti Asthana, Farha Haneef and RakeshK Bhajade, Handwritten Multiscript Numeral Recognition using Artificial Neural Nerworks, International Journal of Software Computing and Engineering ISSN: 2231-2307, Vol.1,Issue-1,March-2011.

Character Recognition of Gujarati and Devanagari Script : A Review

Figure 1. Consonants of Gujarati Script

Figure 2. Vowels and Consonants of Devanagari Script

Table 1. FE method for various image representation forms

Leave a Reply