IJERT-EMS
IJERT-EMS

Text Based Language Identification System for Indian Languages Following Devanagiri Script


Text Based Language Identification System for Indian Languages Following Devanagiri Script
Authors : Indhuja K, Indu M, Sreejith C, P. C. Reghu Raj
Publication Date: 10-04-2014

Authors

Author(s):  Indhuja K, Indu M, Sreejith C, P. C. Reghu Raj

Published in:   International Journal of Engineering Research & Technology

License:  This work is licensed under a Creative Commons Attribution 4.0 International License.

Website: www.ijert.org

Volume/Issue:   Vol. 3 - Issue 4 (April- 2014)

e-ISSN:   2278-0181

Abstract

Text based language identification is the task of automatically recognizing a language from a given text of document. It is difficult to discriminate languages within language families than those across families. In this paper, we investigate the performance of statistical measures to determine the text-based language identification system, with an emphasis on five languages used in India based on Devanagiri script - Hindi, Sanskrit, Marathi, Nepali and Bhojpuri. The proposed system uses n-grams as feature for classification. Language Identification is an important pre-processing step in many tasks of Natural Language Processing (NLP). In a multilingual society like India there is wide scope for automatic language identification since it would be a vital step in bridging the digital divide between the Indian masses and the world.

Citations

Number of Citations for this article:  Data not Available

Keywords

Key Word(s):    

Downloads

Number of Downloads:     506
Similar-Paper

Call for Papers - May - 2017

        

 

                 Call for Thesis - 2017 

     Publish your Ph.D/Master's Thesis Online

              Publish Ph.D Master Thesis Online as Book