Web Mining Classification: a Survey

Kalaichelvi.   S

doi:10.17577/IJERTV3IS100538

Volume 03, Issue 10 (October 2014)

Web Mining Classification: a Survey

DOI : 10.17577/IJERTV3IS100538

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 92
Total Downloads : 205
Authors : Kalaichelvi. S
Paper ID : IJERTV3IS100538
Volume & Issue : Volume 03, Issue 10 (October 2014)
Published (First Online): 28-10-2014
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Web Mining Classification: a Survey

S. Kalaichelvi

Student M.E. Second Year

Department Of Computer Science And Engineering K.S.Rangasamy College Of Technology Tiruchengode

Abstract:- The World Wide Web is huge, unstructured, universal and heterogeneous. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML and XML documents, pictures and other multimedia files available via internet and the number is still rising. But considering the impressive variety of the web, retrieving interesting content has become a very difficult task. Web usage mining is one of the technique of web mining is very useful to discover knowledge from secondary data obtained from the interaction from users with the web. The web usage mining is very essential for effective website. Web usage mining is mining of usage data captured through various logs stored on server, client or proxy. In this paper, tells basic idea about web usage mining.

INTRODUCTION
Web Usage Mining is the application of data mining techniques to discover interesting usage patterns from Web data in order to understand and better serve the needs of Web-based applications [4]. Usage data captures the identity or origin of Web users along with their browsing behavior at a Web site. It discovers and analyzes user acces patterns. The term web usage mining was introduced by Cooley et.al. in 1997 and in according with their definition: web usage mining is the automatic discovery of user access patterns from web servers. Web usage mining is the process of identifying browsing patterns by analyzing the users navigational behavior. This information takes as input the usage data i.e. the data residing in the web server logs, recording the visits of the users to a web site. Web usage mining itself can be classified further depending on the kind of usage data considered:
1. Web Server Data: The user logs are collected by Web server. Typical data includes IP address, page reference and access time.
2. Application Server Data: Commercial application servers such as Weblogic, StoryServer have significant features to enable E-commerce applications to be built on top of them with little effort. A key feature is the ability to track various kinds of business events and log them in application server logs.
3. Application Level Data: New kinds of events can be defined in an application, and logging can be turned on for them generating histories of these specially defined events. It must be noted however that many end applications require a combination of one or more of the techniques applied in the above the categories.
  
  Figure 1. Taxonomy of web mining
STRUCTURE OF DATA IN WEB LOGS [5]
The log files are text files that can range in size from 1KB to 100MB, depending on the traffic at a given a website. The data will be taken for any particular website at given time. There are various fields in the log data which includes
PHASES TO PERFORM WEB USAGE MINING
[6]
The phases to perform web usage mining are depicted in figure 2.
After the data is preprocessed, this data is utilized for discovering homogeneous patterns [7]
The goal of this technique is to establish a model that is able to represent significant dependencies among the various variables in the Web domain. The modeling technique provides a theoretical framework for analyzing the behavior of users, and is potentially useful for predicting future Web resource consumption.
WEB USAGE MINING ARCHITECTURE:

The WEBMINER is a system that implements parts of this general architecture [11, 12]. The architecture divides the Web usage mining process into two main parts. The first part includes the domain dependent processes of transforming the Web data into suitable transaction form. This includes preprocessing, transaction identification, and data integration components. The second part includes the largely domain independent application of generic data mining and pattern matching techniques (such as the discovery of association rule and sequential patterns) as part of the systems data mining engine. The overall architecture for the Web mining process is depicted in figure 3.

Data cleaning is the first step performed in the Web usage mining process. Some low level data integration tasks may also be performed at this stage, such as combining multiple logs, incorporating referrer logs, etc. After the data cleaning, the log entries must be partitioned into logical clusters using one or a series of transaction identification modules. The goal of transaction identification is to create meaningful clusters of references for each user. The task of identifying transactions is one of either dividing a large transaction into multiple smaller ones or merging small transactions into fewer larger ones. The input and output transaction formats match so that any number of modules to be combined in any order, as the data analyst sees fit. Once the domain-dependent data transformation phase is completed, the resulting transaction data must be formatted to conform to the data model of the appropriate data mining task. For instance, the format of the data for the association rule discovery task may be different than the format necessary for mining sequential patterns. Finally, a query mechanism will allow the user (analyst) to provide more control over the discovery process by specifying various constraints.

helps to produce applications that can more effectively and efficiently utilize the Web of knowledge for humankind.

Figure 3. General Architecture for Web Usage Mining
PROBLEMS FACED WHILE PERFORMING WEB USAGE MINING
[13]
- Processing of logs that is cleaning of log files
- Cleaning of log files that is removing data that is not relevant
- Identification of user sessions
- Identification of user habits
CONCLUSION

The Web has become the world's largest knowledge repository. Extracting knowledge from the Web efficiently and effectively is becoming increasingly important for a variety of reasons. The hidden Web, also known as the invisible Web or deep Web, has given rise to another issue facing Web mining research. The hidden Web refers to documents on the Web that are dynamic and not accessible by general search engines. Most documents in the hidden Web, including pages hidden behind search forms, specialized databases, and dynamically generated Web pages, are not accessible by general Web mining applications. However, without appropriate knowledge representation and knowledge discovery algorithms, it is just like a human being with extraordinary memory but no ability to think and reason. Hence believe that research in Web mining is promising as well as challenging and it will

REFERENCES

http://www.anderson.ucla.edu/faculty/jason. frand/teacher/technologies/palace/datamining.htm
http://en.wikipedia.org/wiki/Web_mining
http://en.wikipedia.org/wiki/Web_mining/ web_ structure_mining
http://en.wikipedia.org/wiki/Web_mining/ web_ usage_mining
http://www.web-datamining.net/usage/
Sonali Muddalwar Shashank Kawar (2012), Applying artificial neural network in web usage mining, Vol 1 Issue 4, International Journal of Computer Science and Management.
Anshuman Sharma (2012), Web usage mining using neural network International Journal of Reviews in Computing.
R. Cooley. Web Usage Mining: Discovery and Application of Interesting Patterns from Web data. PhD thesis, Dept. of Computer Science, University of Minnesota, May 2000.
R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining World Wide Web browsing patterns. Knowledge and Information Systems, 1(1), 1999.
O. Zaiane, M. Xin, J. Han. Discovering Web Access Patterns and Trends by applying OLAP and Data Mining Technology on Web Logs. In Advances in Digital Libraries, pages 19-29, Santa Barbara, CA, 1998.
R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and pattern discovery on the world wide web. Technical Report TR 97-027, University of Minnesota, Dept. of Computer Science, Minneapolis, 1997.
B. Mobasher, N. Jain, E. Han, and J. Srivastava. Web mining: Pattern discovery from world wide web transactions. Technical Report TR 36-050, University of Minnesota, Dept. of Computer Science, Minneapolis, 1996.
Ketki Muzumdar, Ravi Mante, Prashant Chatur, (2013) Neural Network Approach for Web Usage Mining Volume- 2, Issue-2, International Journal of Recent Technology and Engineering (IJRTE).

Web Mining Classification: a Survey

Leave a Reply