Author(s): S.Sivakumar, Dr.C.Chandrasekar
Published in: International Journal of Engineering Research & Technology
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Volume/Issue: Vol.2 - Issue 2 (February - 2013)
In this paper we propose an innovative method of categorizing text documents. The proposed method preserves the sequence of term occurrence in a document. We have collective the terms of training documents of each class to create a knowledge base. For a given query, a document we generate the category of matrix to preserve the sequence of the term appearance in the query document. As we have collective the terms in the knowledge base we are not preserving the term sequence during the training stage. Along with this, the occurrence of persistent in category matrix does not ensure that the database contains any document having same sequence of terms present in the test document. Instead, we study the sequence of the term appearance using the concept of category matrix even on training documents and there by preserving the topological sequence of term occurrence in a document useful for semantic retrieval. In addition, to avoid sequential matching during classification, we propose to index the terms in balanced search tree, an efficient index scheme. Each term in balanced search tree is associated with a list of class labels of those documents which contain the term. Further, the corresponding classification technique has been proposed.
Number of Citations for this article: Data not Available
7 Paper(s) Found related to your topic:
Publish your Ph.D/Master's Thesis Online