A Research Survey of Devnagari Handwritten Word Recognition

Ms. Prachi M. Patil; Prof. Saniya Ansari

doi:10.17577/IJERTV2IS100372

Volume 02, Issue 10 (October 2013)

A Research Survey of Devnagari Handwritten Word Recognition

DOI : 10.17577/IJERTV2IS100372

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 135
Total Downloads : 469
Authors : Ms. Prachi M. Patil, Prof. Saniya Ansari
Paper ID : IJERTV2IS100372
Volume & Issue : Volume 02, Issue 10 (October 2013)
Published (First Online): 12-10-2013
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

A Research Survey of Devnagari Handwritten Word Recognition

Ms. Prachi M. Patil

ME (Elect & Telecom), DYPSOE, Pune

Prof. Saniya Ansari

DYPSOE, Pune

Abstract

Devnagari is the most popular script in India it is used by over 400 million people all over world. Recognition of Devnagari handwritten word has been a popular research area for many years because of its various applications. This paper describes different techniques for pre-processing, segmentation, feature extraction and classification which play an important role for recognition of word.

Introduction

India is multilingual/multiscript country with various languages namely Gujarati, Marathi, Konkani, Bengali, Tamil, Telugu, Punjabi, Sanskrit, Urdu. Handwritten recognition is classified into two types as offline and online. In offline recognition document is scanned and complete writing is available in image. Due to the availability of several computing devices such as Tablet PC, PDA and Smartphone in the market and affordable by common Indian online handwritten word recognition gain enough attention. In online recognition input is given by Tablet PC, PDA and Smartphone which is equipped with pen based input technology. Input data to such a online handwriting recognition consist of (x, y) coordinates along with trajectory of the pen together with a few other possible information such as pen-up, pen-down etc.
1. Features of Devnagari Script
  
  Devnagari script plays an important role in the development of literature. Devnagari is used in many languages like Marathi, Hindi, Konkani and Sanskrit which is used by approximately 400 million people in northern India and it is most widely used Indic script. Devnagari is written from left to right and it does not contain any lower and upper case letters. It consists of 11 vowels and 33 consonants.
  
  Figure 1. Set of vowels
  
  Figure 2. Set of Consonants Set of Consonants Shirorekha or headline is the horizontal line at the
  
  upper part of the character or word. It does not contain any useful information so it should detect and then discarded. N. Joshi [14] describes a Shirorekha detection algorithm in the context of online Devnagari character recognition
Recognition of Devnagari Handwritten Word

The schematic block diagram consists of various stages in Devnagari handwritten word recognition as shown in figure 3.

Figure 3. Stages in Handwritten word recognition
1. Pre-processing
  
  Digital images obtained from scanning may contain some amount of noise depending upon the quality of scanner. For online recognition variations of handwriting occur due to various writers. For this pre- processing is required which involves elimination of noise, binarization of images, Size normalization, skew correction, thresholding and skeletonization techniques can be used [5] [6].
  1. Binarization. It is a method of transforming a gray scale image into a black and white image.
  2. Size Normalization. It is required so each segmented character is normalized to fit within suitable matrix so that all characters have same data size.
  3. Thresholding. Thresholding is the process of reducing a gray scale image or colour image to a binary image.
  4. Noise Removal. It is necessary to eliminate imperfection like disconnected lines, gap of lines, etc. Median Filtering, Wiener Filtering method and morphological operations can be performed to remove noise.
    
    Sobel technique is used to detect edges in binarized image [10].

Comprehensive Study

Below table shows the comprehensive study of different techniques used for handwritten character, word and script recognition.

Reference Paper

Preprocessing

Segmentation

Feature Extraction

Classification

/Recognition

[1]

Preprocessing is done to normalize the position and

NPen++ features are used for curliness, linearity

Hidden Markov Model based lexicon driven and

	size of the sample.		and slope.	lexicon free technique used.
[2]	Image Binarization Thinning of binarized image windowing	Character recognition by neural network	Replacing the recognized characters by standard fonts.	Assembling all the separated characters in the same order as they appeared in the input image to give final output.
[3]	Thresholding method used for Binarization	Lines are segmented by noting the valleys of projection profile	Vertical Feature Bar, Horizontal Zero, Crossing, Moments	Tree Classifiers
[4]	Morphological	Differential	Top, bottom, left,	Preliminary
	operation are used	distance based	right or on a	classification is
	to noise removal	technique used for	<>combination	performed for
	Thinning algorithm	identifying the	technique.	better results.
	is used to remove	Shirorekha and	A single or double
	the distortions Bicubic interpolation are used for standard sized image	spine	vertical line called a Danda (Spine) was traditionally used to indicate the end of phrase
			or sentence
[5]	Gaussian filter		Sequential floating	K-nearest neighbor
	used to make input		search method	and Support Vector
	data stroke		used for Indic	Machine (SVM)
	smoother and		script	used for
	reduce noise.			recognition.
[6]	Edge Detection is done and thinning for slant and slope of word		Global word features are extracted from whole word.	Artificial neural network
[7]	Noise removal		Five different	Neural network
			features from a	classifier known as
			vertical strip of	Bidirectional Long
			uniform width,	Short Term
			using a sliding	Memory (BLSTM)
			window.	used for
				recognition.
[8]	Smoothing,	Cursive stroke	Histogram of the	Modified Quadratic discriminate function (MQDF) classifier is used. It improves
	Resampling and	segmentation for	direction codes
	computation the	line and word	calculated for each
	length of input	segmentation.	sub-stroke. Obtain
	stroke if it less than		co-ordinates of

	set a priori ignore it		centre of gravity	efficiency over
	for next phases		and normalize	QDF.
	this approach is for		these value by
	noise removal.		width and height of
			stroke
[9]	Preprocessing is	Horizontal	Images scaled into	feed forward
	done to normalize	projection file	height and width	algorithm
	the position and	method is used for	using bilinear
	size of the sample	segmentation	interpolation
	and to remove		technique
	local noise so that
	the extracted
	features from the
	sample become
	robust.
[10]	Detection of edges	Preprocessed input	Diagonal feature	A feed forward
	in the binarized	image is	extraction scheme	back propagation
	image using sobel	segmented into	is used to extract	neural network
	technique,	isolated characters	features from each	used for
		by assigning a	zone.	classification
		number to each
		character using a
		labelling process.
[11]	Gabor	Horizontal and	Zone based	Support vector
	Thresholding and	vertical profile	approach is used	machine (SVM)
	Otsu Thresholding	method is used for	for Feature	method is used for
	methods(global)	segmentation	Extraction.	classification.
	are used for
	Binarization
[12]	Detection of edges	Preprocessed input	Diagonal feature	Chromosome
	in binarized image	image is	extraction scheme	function generation
	is done by canny	segmented into	is used to extract	and Chromosome
	technique.	isolated characters	features from each	fitness function are
		by assigning a	zone.	used for
		number to each		classification.
		character using a
		labelling process.
[13]	Thresholding	Histogram method	Character height,	Support Vector
	method used for	used to convert the	width, no. of	Machine(SVM)
	Binarization.	image to glyph	horizontal and	used for
	Thinning algorithm		vertical lines.	classification
	used to thin the
	characters
[14]	Threshold		Encoding binary	Support Vector
	technique used for		variation method	Machine(SVM)
	preprocessing.		used for extract the	used for
			features. Then	classification
			comparing trained

			text and tested image for recognize the characters.
[15]	Global thresholding		Top and bottom	Learning Algorithm
	approach was		profile based	is used for
	used to binarized		features are used	classification.
	the scanned gray		for feature
	scale image		extraction.
[16]	Scanned document	Line and Word	Considers some	K-nearest neighbor
	is Filtered and	segmentation is	selected moment	and neural network
	Binarized.	done through	and shape as its	classifiers are used
		projection files	dimensionality is	for recognition
			reduced by
			principal
			components.

Conclusion

In this paper we have represented a survey of preprocessing, segmentation, feature extraction, classification and recognition techniques for handwritten Devnagari word recognition. This survey paper helps researches and developers to understand various techniques which were implemented for recognition. There is a great scope of research in the area of Devnagari Word Recognition for future research.

References

A. Bharat, Sriganesh Madhvanath, HMM-Based Lexicon-Driven and Lexicon-Free Word Recognition for Online Handwritten Indic Scripts, IEEE, April 2012.
K. Y. Rajput and Sangeeta Mishra Recognition and Editing of Devnagari Handwriting Using Neural Network, Proceedings of SPIT-IEEE Colloquium and International Conference, Mumbai, India Vol. 1, 66.
Veena Bansala and R M K Sinha,A Complete OCR for printed Hindi Text in Devnagari Script , IEEE 2002.
Sandhya Arora, Debotosh Bhattacharjee, Mita Nasipuri, Latesh Malik, A Two Stage Classification Approach for Handwritten Devanagari Characters, Proceedings of the Fifth International Conference on Document Analysis and Recognition,1999, pp.653-656.
Anoop Namboodiri, Online Handwritten Script Recognition, IEEE Vol. 26 No. 1, January 2004.
Somaya Alma, Recognition of Off-Line Handwritten Arabic Words Using Neural Networks, IEEE 2006.
Naveen Sankaram, C. V. Jawahar, Recognition of Printed Devnagari Text Using BLSTM Neural Netwrk, ICPR, Nov. 11-15, 2012.
U. Bhattarchya, A. Nigam, An Analytic Scheme for Online Handwritten Bangla Cursive Word Recognition.
Chandan Biswas, Ujjwal Bhattacharya, Swapan Kumar Parui, HMM Based Online Handwritten Bangla Character Recognition using Dirichlet Distributions, International Conference on Frontiers in Handwriting Recognition 2012.
J. Pradeep, E. Srinivasan, S. Himavathi, Diagonal Feature Extraction Based Handwritten Character System Using Neural Network, International Journal of Computer Applications ,Vol. 8 No.9, October 2010.
Shanthi N and Duraiswami K, A Novel SVM – based Handwritten Tamil character recognition system, Springer Pattern Analysis & Applications,Vol-13, No. 2, 173-180,2010.
Ved Agnihotri, Offline Handwritten Devnagari Script Recognition, IJCSI International Journal of Computer Science Issues, 2012.
Suresh Kumar C and Ravichandran T, Handwritten Tamil Character Recognition using RCS algorithms, Int. J. of Computer Applications, (0975 8887) Volume 8 No.8, October 2010.
U. Garain, B.B. Chaudhuri Segmentation of Touching Characters in Printed Devnagari and Bangla Scripts Using Fuzzy Multifactorial Analysis, Proceedings of the 6th International Conference on Document Analysis and Recognition
M.C. Padma ,P.A. Vijaya, Identification of Telagu, Devnagari and English Scripts using Discriminating features , IJCSIT, Vol.1, No 2, November 2009.
C. V. Jawahar, M. N. S. S. K. Pavan Kumar, S.

S. Ravi Kiran A Bilingual OCR for Hindi- Telugu Documents and its Applications, Proc. of the 11th ICPR, vol. II, pp. 200-203, 1992.
Sandhya Arora et al., Performance Comparison of SVM and ANN for Handwritten Devnagari Character Recognition, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, May 2010.

[18]S. Jaeger, S. Manke, J. Reichert, and A. Waibel, Online Handwriting Recognition: The NPen++ Recognizer, Intl J. Document Analysis and Recognition, vol. 3, no. 3, pp. 169-180, Mar. 2001.

Volume 02, Issue 10 (October 2013)

A Research Survey of Devnagari Handwritten Word Recognition

A Research Survey of Devnagari Handwritten Word Recognition

Features of Devnagari Script

Pre-processing

Segmentation

Feature Extraction

Classification and Recognition

Leave a Reply