Automatic Question Generator in Tamil

N.Vignesh; S.Sowmya

doi:10.17577/IJERTV2IS100414

Volume 02, Issue 10 (October 2013)

Automatic Question Generator in Tamil

DOI : 10.17577/IJERTV2IS100414

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 567
Total Downloads : 403
Authors : N.Vignesh, S.Sowmya
Paper ID : IJERTV2IS100414
Volume & Issue : Volume 02, Issue 10 (October 2013)
Published (First Online): 12-10-2013
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Automatic Question Generator in Tamil

Abstract:

1N.Vignesh 2S.Sowmya

Research Associate, Indian Institute of Management Ahmedabad,
SDE, ACS Oracle India,

The automatic question generator for Tamil is a Language processing system which is used to generate questions for valid Tamil sentences which follows the grammatical rules and constraints imposed by the language. This Natural Language Processing (NLP) system combines the knowledge about the language and application domain to automatically generate questions for Tamil sentences. Even though this language processing system dynamically generates optimal questions for sentences, this type of system requires lot of information for processing data. Structural correctness of the input sentence and proper semantic analysis leads to good results wherein misunderstanding and lagging of semantic information of resources may lead to uncontrolled output.

Keywords: NLP, dynamical, optimal questions

INTRODUCTION

Natural language generation (NLG) systems combine knowledge about language and the application domain to automatically produce documents, reports, explanations, help messages, and other kinds of texts. This automatic Tamil Question Generator is a form of NLG which produces optimal question set for any given Tamil sentence provided the input is according to the Tamil Grammar.

The figure below describes the architecture of the Question Generator system.

This paper organized as section 2 explains the state of the are natural language generation, section 3 explains the proposed natural language Tamil question generation system, section 4 describes the algorithm of the proposed system and the final section concludes with the scope of this work.
EXISTING SYSTEM

Automatic generation of natural language text is an essential task in many language processing like summarization, automatic document generation, question answering system, translator etc. It consists of a learner to learn how to realize a sentence from the content of semantic role information. This learner is to be designed as a statistical model that is formulated from a preprocessed corpus of sentences. This preprocessing is handled by pos tagging, chunking and semantic role labeling. A POS tagging tool [1] for Tamil is developed in Anna University. Similarly Chunking tool [2] for Tamil is also developed at Anna University which can also been used for Chunking of POS tagged texts. These systems serve to be the pre runner/ existing works that comply with our project.
PROPOSED SYSTEM
The word which is to be tagged as a verb descriptor

must end with the rhyme either or . By checking against this rule the word can be tagged

as the verb descriptor. In the above given example considering the word ends

with the rhyme thus satisfying the above rule and hence this word gets tagged as the verb quantifier or verb descriptor tag. Thus coming across this tagger section of the system tags all the

words of the input sentence with appropriate taggers and sends this as input to the question generator phase.

Considering the above given example the whole sentence gets tagged as below:

Time Descriptor + Noun Quantifier + Noun + Noun Quantifier + Noun + Verb Quantifier + Noun

+ Past tense Male singular verb.

3.3 QUESTION GENERATOR:

The phase of the language processing system gets the tagged sentence as the input and produces all possible questions as the output. The appropriate tags are used to generate questions by following the rules given below The tagged sentence is again scanned from left to right word by word and each time a word is replaced in the sentence based on appropriate tags to that word to generate questions:

3.3.1 RULES FOR QUESTION GENERATION:

Rule 1: Replace the time marker tag in the sentence with the word to generate

a question. Considering the example the question generated by replacing the time marker will be of the form:

Rule 2: Replace the noun quantifier tag in the sentence with the word to generate a question.

Rule 3: Replace the first occurrence of noun tag in the sentence with the word to generate a question if the gender is either male or female else replace the word with . Considering the example the question generated by replacing the time marker will be of the form:

Rule 4: Replace the verb quantifier tag in the

sentence with the word and also delete the noun and its corresponding noun quantifier to generate a question Considering the example the question generated by replacing the time marker will be of the form:

Rule 5: Find the words tagged as nouns and ending with the rhymes and replace them with

if the verb in that sentence is either has either male or female gender descriptor else replace them with

.

Note: The second question generated by following the rule 5 is not valid which can be detected by checking the structural pattern of the language. (Check 5.Future work portions for further reference regarding this issue)

Thus the various questions generated by following the above rules for the given input sentence are as follows:

3.4 QUESTION OPTIMIZER

This phase of the question generator system is used to narrow down to a single optimal question for a given sentence rather than set of all possible questions. This can be done by following the below described rules:

Allot points to each tag based of precedence of tags say:

TAG

POINTS

Verb Quantifier

10

Time Quantifier

9

Noun

8

Noun Quantifier

7

Verb

5

By using the above precedence check each replacement and choose the question with maximum points as the optimal one. Considering the above approach the optimal question that is generated from the possible set of questions is:
ALGORITHM
FUTURE WORK

This question generation system which we have designed allows replacing only a word in the input sentence with one question word but it follows the structural and grammatical correctness of the language. This efficient question generation may be improved by allowing more than one replacements of tagged words in a sentence with question words and generating optimal questions by keeping an upper and lower boundary on number of replacements permitted. Also classification of nouns based on living and non living things helps us to better understand the noun quantifier and additional information can be provided to the noun descriptor and what it currently quantifies can be identified and generation of corresponding questions can be made possible.
CONCLUSION

Thus this automatic question generation system implemented above can be made use for framing questions given a valid input sentence or group of sentences. This language processing system finds application in generating optimal questions which aids in better understanding of structural and grammatical aspects of a language. This same technique can be applied to other languages only varying the basic blocks of grammar for that corresponding language.
ACKNOWLEGEMENT

I express my deep gratitude to my guide, Dr. Rajeshwari Sridhar, College of Engineering Guindy, Anna University, Chennai for guiding me through every phase of this project. I appreciate her thoroughness, tolerance and ability to share her knowledge with me. I thank her for being easily approachable and quite thoughtful. Apart from adding her own input, she has encouraged me to think on my own and give form to my thoughts. I owe her for harnessing my potential and bringing out the best in me. Without her immense support through every step of the way, I could never have it to this extent.
REFERENCES
1. S.Laksmana Pandian, T.V.Geetha, Morpheme based Language Model for Part of Speech tagging, Research journal "POLIBITS" on Computer science and computer engineering with applications Issue 38 (July-December 2008)
2. S. Lakshmana Pandian, T. V. Geetha: CRF based Tamil part of speech tagging and chunging, Lecture Notes in Computer Science, 2009(5221 ).
3. S.Lakshmana Pandiyan and T.V.Geetha Morpheme based language model for Tamil Part-of-Speech tagging in the
  
  International Journal of Communications and Engineering
4. S.Lakshmana Pandiyan and T.V.Geetha. Semantic role based Tamil sentence generator in 2009 International Conference on Asian Languages Processing.
5. S.Lakshmana Pandiyan and T.V.Geetha, Semantic role labelling for Tamil documents in International Journal of Recent Trends in Engineering, Vol. 1, No.1, May 2009.

N.Vignesh is working as a Research Associate at Indian Institute of Management,

Ahmedabad. He received his B.E degree in Computer Science and

Engineering from College of

Engineering Guindy, Anna University Chennai in 2013. His research interest

includes Natural Language Processing and Computer Networks.

S.Sowmya is working as a Software Development Engineer at ACS, Oracle India. She received her B.E degree in Computer Science and Engineering from

College of Engineering

Guindy, Anna University Chennai in 2013. Her research interest includes Natural Language Processing and

Computer

Networks.

TAG	POINTS
Verb Quantifier	10
Time Quantifier	9
Noun	8
Noun Quantifier	7
Verb	5

Automatic Question Generator in Tamil

CONCLUSION

ACKNOWLEGEMENT

N.Vignesh is working as a Research Associate at Indian Institute of Management,

S.Sowmya is working as a Software Development Engineer at ACS, Oracle India. She received her B.E degree in Computer Science and Engineering from

Leave a Reply