Maximizing User Experience While Navigating Query Results Through Concept Hierarchies

DOI : 10.17577/IJERTV1IS9002

Download Full-Text PDF Cite this Publication

Text Only Version

Maximizing User Experience While Navigating Query Results Through Concept Hierarchies

Ravikumarreddy. B, Department Of CSE,

Bharat Institute Of Engineering And Technology, Ibrahim Patnam, Hyderabad, India.

Abstract

Web databases such as PubMed return large number of tuples against user queries. However, user may not be interested in all the results. To navigate to the required results user has to spend some time in browsing the results. This is the problem that needs research. To solve this problem the query results can be presented in the form of ontologies. This paper provides required algorithms to reduce the number of tuples and also focuses on presentation of query results that would be effective and results in rich user experience by minimizing the navigation cost and reducing the number of expansions. The experiments are done using PubMed database which is having entries with specific annotations that is used to effectively prepare query results for effective navigation by end users. This paper presents a new search interface and the algorithms that make the results to be most effective navigation so as to reduce the browsing time. The experimental results revealed that the proposed system is useful and can be used in the real time applications.

Index TermsEffective navigation, PubMed database, ontologies, navigation cost

INTRODUCTION

The amount of data provided over World Wide Web (WWW) is increasing rapidly every year. In the past decade in started growing drastically. Especially biomedical data and the literature pertaining to it thatreviews the aspects of biomedical data across the globe have seen tremendous growth in terms of quantity. Biological data sources such as [1], [2], and [3]are growing in terms of lakhs of new citations every year. The queries made by people associated with healthcare domain have to search such databases by providing a search keyword. The results are very huge in number and the users are not able to view all the records

when they actually need a subset of them. This has led to users to refine query with other keywords and get the desired results after many trials. Here it has to be observed that user time is wasted in refining search criteria and also the navigation of query results which are abundant and bulky. The navigation cost is more as user has to spend lot of time in finding the required subset of rows from the bulk of search results. This problem has been researched in [1], [2], [3] and the problem is identified as information overload. Figure 1 shows static navigation of MeSh hierarchy of biomedical data.

Fig. 1: Static structure of MeSh hierarchy of biomedical data[7]

The solutions are of two types namely categorization and ranking. However, these two can be combined to have more desired results. The proposed system is specially meant for presenting results in such a way that the navigation cost is reduced. For this purpose categorization techniques is used and concept hierarchies are built. The categorization techniques are supported by simple ranking techniques. The proposed solution uses citations as described in [4] and effectively constructs a navigation tree that can reduce cost of navigation and users experience is much better when compared with existing systems that do not use these techniques. These techniques are being used

by e-Commerce systems to let their users have smooth navigation to the results returned by such systems.

The proposed system uses a cost model that lets it estimate the cost of navigation and make decisions in providing concept hierarchies. The cost of navigation is directly proportionate to the navigation subtree instead of the whole results in the tree. Earlier work on dynamic categorization of query results are in [2], [3], [5] and [6]. They made use of query dependent clusters based on the unsupervised technique. However, they neglect the process of navigation of clusters. In this aspect the proposed system is distinct and provides dynamic navigation on a pre-defined concept hierarchy. Another telling difference between existing systems and the proposed one is that the proposed system uses navigation cost model that minimizes navigation cost no matter what the bulky of search results is. Overall, our contributions are development of a framework for effective navigation of query results; a formal model for cost estimation; algorithm to optimize the results navigation cost; experimental evidence on the effectiveness.

OVERVIEW OF THE PROPOSED FRAMEWORK

MeSH (Medical Subject Headings) is the hierarchy based on which the framework is built. It is a made up of a concept hierarchy. A concept hierarchy is a labeled tree which has concept nodes and edges with a root node. Each node is identified by a unique ID and associated with a label. As discussed in [8], as per the MeSH concept hierarchy the label of parent is not specific when compared with that of child. Almost all concept hierarchies follow the same conventions. We have built a prototype application whose interface after search operation is as shown in fig. 2.

Fig. 2 Prototype application

When a query made by user, the application returns results in terms of citations list associated with MeSH concepts. The application constructs an initial navigation tree which is nothing but a concept hierarchy. Every node in that hierarchy is known as a concept. The navigational tree presented initially my contain nodes that are empty. As MeSH is very large hierarchy, the proposed application removes empty nodes and thus reduces size of the navigation tree. The removal of empty node is done carefully by preserving node relationships. The resultant navigation tree is nothing but the initial navigation tree where the empty nodes have been removed to reduce the size of the tree.

By removing empty nodes from the navigation tree, the tree size is reduced. However, the structure of the navigation tree is very big and cant be presented to end user as it is. The proposed application automatically expands the tree at required place to ensure that user navigation cost is less and user will directly see the desired results. Towards this end the application takes care of hiding unnecessary nodes and presenting desired nodes in such a way that the navigation cost of to the user is negligible. We consider a node expansion at given place as an EdgeCut. In case of trees, in general an EdgeCut is a subset of edges because the removal of edge causes a new component to be created. Visualization of an EdgeCut is shown in fig. 3.

Fig. 3 Visualization of EdgeCut[9]

Navigation tree with an EdgeCut is shown in fig. 9 on the user interface. The EdgeCut results in creation of sub trees and they are of two types namely lower and upper. Before an EdgeCut is performed, the navigation tree is converted into something known as active tree. It is achieved by annotating root node. The navigation tree is reduced both width and height wise due to the EdgeCut and presentation of the resultant tree. We do assume any users preference on the results and every part of the tree can be reached by user as the navigation does not result in information loss in the proposed framework. We also use a convention i.e., >>> used to provide hyperlinks that facilitate user to perform EdgeCut operations in recursive manner. Generally we expect EdgeCut to be triggered by user on the lower component of sub trees. However, it is also possible that such operation may occur at upper component sub tree. Fig. 4 shows the resultant tree with hyperlinks in >>> convention.

Fig. 4 Reultant Tree

ARCHITECTURE OF PROTOTYPE SYSTEM

The architecture of the prototype application is as shown in fig. 5. The architectural operations are classified into online and offline operations.

Fig. 5 Architecture of Proposed Application

As seen in fig. 5, the proposed system operations are classified into online and offline operations. The offline operations include storing MeSH concepts hierarchies into local database, exchanging MeSH concepts between PubMed DB and Local DB. The online operations include question results generation, obtaining citations from MEDLINE database. Navigation tree construction and other navigation sub system operations such as active tree visualization are also online operations. The online operations actively start with query given by end user on biomedical database.

NAVIGATION AND COST MODEL

Once user issues a query keyword, the prototype application generates an initial active tree with single component tree. Its root is the root of the MeSH hierarchy. Later on user performs one of the following operations.

EXPAND: It takes place when user clicks >>> hyperlink that causes EdgeCut to be performed that shows a set of nodes.

SHOWRESULTS: User performs this operation to view results.

IGNORE:User can simply ignore a node after looking at its label and moves to next concept.

BACKTRACK:This action occurs when user performs undo on the last EdgeCut operation.

User continues these operations until he gets the intended results. We simplify this simple navigational model and call it TOPDOWN. The TOPDOWN model has only three operations namely EXPAND, SHOWRESULTS and IGNORE.

Listing 1 TOPDOWN Navigation Model

The cost model as specified in [2]considers the following to compute navigation cost.

  • Number of EXPAND actions.

  • Number of concept nodes shown by a single EXPAN action.

  • Number of citations presented for a single SHOWRESULTS action.

For each of these things cost of 1 is considered. The cost of exploring a component sub tree I(n) rooted at node n is

Cost(I(n)) = PEN (I(n)). (1-Pc(I(n)).|L(I(n))|

+ P (I(n)).(B+|S| + S cost(I (S))) ),

Listing 2 OptEdgeCut Algorithm

The algorithm Opt-EdgeCut is supposed to compute the minimum expected navigation cost required for navigating the tree from bottom up in post order fashion.

Heuristic-ReducedOpt Algorithm

The Opt-EdgeCut algorithm is computational more expensive and cant be practically used for most of the queries. Therefore, we proposed a new algorithm known as Heuristic-ReducedOpt as shown in listing 3.

c S c

where normalized PE(I(n))is represented as PEN(I(n)) thus the sum of normalization of subtree of the component after an EdgeCut is equal to 1.

BEST EDGECUT ALGORITHMS

Optimal cost can be computed by recursively listing all possible sets of EdgeCuts. This starts from the root and traverses every concept in the tree. This algorithm is expensive. To overcome this OptEdgeCut algorithm is proposed which provides minimum expected navigation cost.

Listing 3 Heuristic ReducedOpt Algorithm

This algorithm is based on k-partition algorithm [10]. This is adapted for our use here. The algorithm works in bottom-up

fashion. For every node (n), the algorithm prunes heaviest children one by one till the weight of nfalls below k.

EXPERIMENTAL EVALUATION

For evaluating the proposed application, expansion time performance and average navigation cost are considered. The empirical studies are made in a PC with XP as operating system. Oracle 10 g is used as backend and Java is used to implement all

Fig. 7 shows the number of concepts shown when an EXAPND action takes place. The results revealed that our approach is superior to many other approaches.

800

700

600

algorithms. The proposed application achieves improvement in navigation cost when compared with Top1-Leave|Wise. Minimum improvement is 16 percent and maximum improvement is 41 percent. Fig. 6 shows number of EXPAND actions comparison.

500

400

300

200

100

H-ROpt(B=15)

H-Ropt(B=10) H-Ropt(B=5) H-Ropt(B=1)

Top10Level Satic

60

H_Ropt

50 (B=15)

0

1 2 3 4 5 6 7 8 9 10

# of Concepts Revealed(Biochemistry)

40

30

20

10

0

1 2 3 4 5 6 7 8 9

# of EXPAND Actions(Biochemistry)

H_Ropt (B=10)

H_Ropt (B=5)

H_Ropt (B=1)

Top10L evel

static

Fig. 8 Comparison of overall navigation cost

As can be seen in fig. 8 comparison is made on proportional navigation cost of Heuristic- ReducedOpt over Opt-EdgeCut. Opt- EdgeCut algorithm has best results when executed on biochemistry database.

Fig. 6 Comparison of number of expand operations

H-ROpt

20

18

16

14

12

10

8

6

4

2

0

As can be seen in fig. 6, ten queries have been presented for six types of algorithms. The X axis takes question numbers while the Y axis represents value. Out of all the algorithms the proposed application and its algorithm. Fig. 7 shows the results of number of concepts shown.

1 2 3 4 5 6 7 8 9 10

Overall Navigation cost(Biochemistry)

opt-Edge

cut

Fig. 7 Comparison of number of concepts revealed

1600

1400

1200

1000

800

600

400

200

0

1 3 5 7 9 11 13 15 17 19

Average Execution Time(ms)

Fig. 9-Heuristic-ReducedOpt EXPAND performance.

As seen in fig. 9, the average time of Heuristic-ReducedOpt to execute and EXPAND action with respect to each query of table 1. The average values are taken from the number of EXPAND action provided in fig. 6.

CONCLUSION

This paper focuses on developing a framework that facilitates rich user experience while navigating query results. This framework is required as the large medical databases available over Internet return huge number of records as results. Navigating the results to obtain required information causes wastage of users time. The proposed framework can effectively reduce the navigation cost and provide easy access to required results. Thus it can solve information overload problem effectively. The results organized are associated with MeSH annotations by user a dynamic navigation method. A new cost model is proposed that reduces the navigation cost as it can eliminate unnecessary navigation steps. A prototype application is built with web based interface, which facilitates users to search for required information and also navigate the results. The experimental results revealed that the user navigation cost is reduced substantially and rich user experience is achieved.

REFERENCES

  1. J S. Agrawal, S. Chaudhuri, G. Das and A. Gionis: Automated Ranking of Database Query Results. In Proceedings of First BiennialConference on Innovative Data Systems Research (CIDR),2003.

  2. K. Chakrabarti, S. Chaudhuri and S.W. Hwang: Automatic Categorization of Query Results. SIGMOD Conference 2004: 755-766.

  3. Z. Chen and T. Li: Addressing Diverse User Preferences in SQLQuery- Result Navigation. SIGMOD Conference 2007: 641-652.

  4. Medical Subject Headings (MeSH®). http://www.nlm.nih.gov/mesh/

  5. Vivísimo, Inc. Clusty. (2008) [Online].Available: http://clusty.com/

  6. A. Kashyap, V. Hristidis, M. Petropoulos, and S. Tavoulari: BioNav: Effective Navigation on Query Results of Biomedical Databases. (Short Paper), ICDE 2009, to appear. Available at http://www.cs.fiu.edu/~vagelis/publications/BioNavICDE0 9.pdf

  7. HON (2010): Health On the Net Foundation: Medical information. Available online at: http://www.hon.ch/cgi- bin/HONselect?cat+A [viewed: 15 August 2012]

  1. Medical Subject Headings (MeSH), http:

    //www.nlm.nih.gov/mesh/, 2010.

  2. Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and Sotiria Tavoulari (2011), Effective Navigation of Query Results Based on Concept Hierarchies.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 4.

  3. S. Kundu and J. Misra, A Linear Tree Partitioning Algorithm, SIAM J. Computing, vol. 6, no. 1, pp. 151- 154, 1977.

Leave a Reply