Recognition of Handwritten Devnagari Characters through Segmentation and Artificial neural networks

DOI : 10.17577/IJERTV1IS6065

Download Full-Text PDF Cite this Publication

Text Only Version

Recognition of Handwritten Devnagari Characters through Segmentation and Artificial neural networks

Mitrakshi B. Patil #1

Department of Computer Engineering, MGMs College of Engineering and Technology, Navi

Mumbai University of Mumbai, India.

Vaibhav Narawade#2

Head of Department of Information Technology, Padmabhushan Vasantdada Patil College of Engineering and Technology, Mumbai University of Mumbai, India.

Abstract

Handwritten character recognition is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices.Handwritten Marathi Characters are more complex for recognition than corresponding English characters due to many possible variations in order, number, direction and shape of the constituent strokes. The main purpose of this paper is to introduce a new method for recognition of offline handwritten devnagari characters using segmentation and Artificial neural networks. The whole process of recognition includes two phases- segmentation of characters into line, word and characters and then recognition through feed-forward neural network.

Keywordshandwritten character recognition, Segmentation, line segmentation, word segmentation, character segmentation, lower modifier, upper modifier, Header line, Baseline, feed-forward neural network.

  1. Introduction

    Character recognition plays an important role in the modern world. It can solve more complex problems and make humans job easier. An example is handwritten character recognition. Every individual has his own style of writing. Any individual having a very good knowledge of the script of a language can easily read some words written on a paper, though those are written in very bad manner, on the basis of his/her mental dictionary. Such words cannot be easily read by a machine as there may be various irregularities caused in expressing these words which are not easy to handle by a machine. Due to very strange styles of writing, a lot of difficulties are faced in

    machine recognition process. In recent years, a lot of research has been done in handwritten character recognition, but very little is done on the integration on segmentation and recognition based on neural networks. Optical character recognition (OCR) is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is a process that converts words or characters, on a printed page into a digital image, and creates a digital file so that users can later search for that text and characters within that text. Handwritten character recognition is an important field of Optical Character Recognition. Here, in this paper, we will be considering the integration of segmentation and recognition using artificial neural networks.

    The paper is organized as follows- The optical character recognition is introduced in section 2. Applications of OCR are discussed in section 2.1. Section 3 describes the Devnagari script. Section 3.1 gives the properties of the devnagari characters. Section 3.2 discusses structural analysis of touching characters. In section 4, the segmentation process is given. The proposed system is given in section 5. Concluding remarks are given in Section 6.

  2. Optical Character Recognition

    Optical Character Recognition (OCR) translates the scanned printed or handwritten document images into a text document.

    Handwritten Character Recognition is an intelligent OCR capable of handling the complexity of writing, writing environment, materials, etc. Here is the Traditional OCR system structure:

    Document Imaging

    Pre- processing

    Text Lines

    Segmentat ion

    Labels

    Classifier

    Feature Extraction

    Character Image

    Figure 1: Traditional OCR

    1. Applications of OCR

An OCR can convert the text from the image into text that can be easily edited on the computer. Following are the applications of OCR.

  1. Automatic text entry into the computer for desktop publication, library cataloguing, ledgering, etc.

  2. Automatic reading for sorting of postal mail, bank cheques, postal code reading, commercial forms reading government records, manuscripts and their archival and other documents,

  3. Document data compression: from document image to ASCII format,

  4. Language processing such as indexing, spell checking, grammar checking, etc.,

  5. Multi-media system design, etc.

  1. Devnagari Script

    Devnagari script is different from Roman script in several ways. This script has two-dimensional compositions of symbols: core characters in the middle strip, optional modifiers above and/or below core characters. Two characters may be in shadow of each other. While line segments (strokes) are the predominant features for English, most of the characters in Devnagari script are formed by curves, holes, and also strokes. In Devnagari

    language script, the concept of uppercase, the lower-case characters, is absent. But the alphabet itself contains more number of symbols than that of English. Marathi is an Indo-Aryan language spoken by about 71 million people, mainly the Marathi people of western and central India [8]. It is the official language of the state of Maharashtra. Marathi is thought to be a descendent of Maharashtri, one of the Prakrit languages which developed from Sanskrit. We know that the Handwriting style varies from person to person. It has a large character set with curves and lines in the shape formation, which may be over lapping (touch) in a word. Touching characters can touch each other at different position because of individual writing styles vary greatly. Following are the various regions of a devnagari script [11, 15].

    Figure 2: Devnagari script structure

    Devanagari Script has 13 vowels (svar) and 36 consonants (Vyanjan) [2] and 10 numerals along with modifier symbols. All the individual characters are joined by a header line called Shiro Rekha which makes it difficult to isolate individual characters from the words. There are various vowel modifiers which add up to the confusion [3]. Minor variations in similar characters can be there in the handwriting.

    Figure 3: Modifiers

    Category 2: Half character touching to full characters containing sidebars at right end.

    Category 3: Pattern between two vertical bar of touching characters that may have middle bar character.

  2. The Proposed System

    Image Acquisition

    So, the proposed system can be summarized as [6, 7, 9, 10, 12].

    Pre-processing

    Figure 4: vowels and consonants

    Line segmentation

      1. Properties Of The Devnagari Characters

        Basically, there are three classes of basic characters based on presence and position of vertical bar.

        • End bar characters

          Segmentation

          Word Segmentation

          Character Segmentation

          Recognition

        • Non bar characters

        • Middle bar characters

      2. Structural Analysis of Touching Characters

    Based on the above discussion, we can categorize the devnagari characters as follows [1]:

    Category 1: Touching characters containing sidebars or no bar at right end

  3. The Segmentation Process

In the proposed system, the recognition process of scanned text document image to the digitized image consists of the following steps [5, 17] – Preprocessing, Sgmentation of lines, Segmentation of words, Segmentation of Characters, Recognition using neural network

Preprocessing

The total process of preprocessing of the image can be summarized as follows [4, 13, 16] Scanning the Image, Skew detection and correction, noise removal, Binarization, Normalization

Line segmentation

It includes segmentation of lines based on the Bounding box formation. Before that, thinning of characters will be

done. We will make the following assumptions while performing the segmentation.

  1. The height of the character (including the modifiers) should be 100 pixels.

  2. The skew should be less. Original image-

Image after line segmentation-

..

Word segmentation

Character segmentation

Neural Networks

Neural Networks are definitely the preferred approach for recognizers, in cases of small variability of patterns. A neural network is a powerful data modelling tool that is relationships. Neural networks are ideal for specific types of problems, such as processing stock markets or finding trends in graphical patterns.

Here, the Feed forward neural network will be used to recognize the segmented characters of devnagari script [14]. In this paper, we proposed a system capable of recognizing handwritten characters or symbols with the help of neural networks.

  1. Conclusion

    We can conclude that, most of the work in character recognition area is done on either segmentation or on only recognition of segmented characters. Development of handwritten Devnagari OCR is still a challenging task in Pattern recognition area. Here, we propose a method which does the segmentation of handwritten characters into line segmentation, word segmentation and character segmentation. And further recognition process will be done with the help of neural networks. The attempt is to improve the performance in terms of time and to get closer results.

  2. References

  1. Satish Kumar A Three Tier Scheme for Devanagari Hand-printed Character Recognition 978-1-4244-5612- 3/09/$26.00_c 2009 IEEE

  2. S. Arora, D. Bhattacharjee, M. Nasipuri, D. K. Basu & M Kundu, Recognition of Non-Compound Handwritten Devnagari Characters using a Combination of MLP and Minimum Edit Distance

  3. Sandhya Arora, Debotosh Bhatcharjee, Mita Nasipuri, Latesh Malik, A Two Stage Classification Approach for Handwritten Devanagari Characters International Conference on Computational Intelligence and Multimedia Applications 2007, 0-7695-3050-8/07 $25.00 © 2007 IEEE, DOI 10.1109/ICCIMA.2007.254

  4. J.Pradeep1, E.Srinivasan2 and S.Himavathi3,

    DIAGONAL BASED FEATURE EXTRACTION FOR HANDWRITTEN ALPHABETS RECOGNITION SYSTEM

    USING NEURAL NETWORK, International Journal of Computer Science & Information Technology (IJCSIT), Vol 3,

    No 1, Feb 2011

  5. Naresh Kumar Garg, Lakhwinder Kaur, M. K. Jindal,

    Segmentation of Handwritten Hindi Text, ©2010 International Journal of Computer Applications (0975 8887) Volume 1 No. 4

  6. Seong-Whan Lee and Sang-Yup Kim, Integrated Segmentation and Recognition of Handwritten Numerals with Cascade Neural Network, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 29, NO. 2,

    FEBRUARY 1999

  7. Dayashankar Singh, Sanjay Kr. Singh, Dr. (Mrs.) Maitreyee Dutta, Hand Written Character Recognition Using Twelve Directional Feature Input and Neural Network, ©2010 International Journal of Computer Applications (0975 8887) Volume 1 No. 3

  8. Ajmire P.E. and Warkhede S.E., Handwritten Marathi character (vowel) recognition, Advances in Information Mining, ISSN: 09753265, Volume 2, Issue 2, 2010, pp-11-13

  9. http://en.wikipedia.org/wiki/Handwriting_recognition

  10. http://tcts.fpms.ac.be/rdf/hcrinuk.htm

  11. Satish Kumar, An Analysis of Irregularities in Devanagari Script Writing A Machine Recognition Perspective, Satish Kumar / (IJCSE) International Journal on

    Computer Science and Engineering Vol. 2, No. 2, 2010, 274-279

  12. Surbhi Syal, Sandeep Sood , Sunny Sharma, er. Navneet Randhawa, Segmented Character Recognition using Neural networks, International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 1, Issue 4, pp.1731-1735

  13. Pooja Agrawal, M. Hanmandlu, Brejesh Lall, Coarse Classification of Handwritten Hindi Characters, International Journal of Advanced Science and Technology Vol. 10,

    September, 2009

  14. Srinivasa Kumar Devireddy, Settipalli Appa Rao,Hand Written Character Recognition Using Back Propagation Network, Journal of Theoretical and Applied Information Technology © 2005 – 2009 JATIT

  15. N. Sharma, U. Pal, F. Kimura, and S. Pal ,Recognition of Off-Line Handwritten Characters Using Quadratic Classifier,

    P. Kalra and S. Peleg (Eds.): ICVGIP 2006, LNCS 4338, pp. 805 816, 2006. © Springer-Verlag Berlin Heidelberg 2006

  16. Sandhya Arora, Debotosh Bhattacharjee, Mita Nasipuri, Dipak Kumar Basu*, Mahantapas Kundu, Combining Multiple Feature Extraction Techniques for Handwritten Devnagari Character Recognition, 2008 IEEE Region 10 Colloquium and the Third ICIIS, Kharagpur, INDIA December 8-10.

  17. Anita Pal, Dayashankar Singh, Handwritten English Character Recognition Using Neural Network, International Journal of Computer Science & CommunicationVol. 1, No. 2, July-December 2010, pp. 141-144

Leave a Reply