Relevance of Web Mining in identifying User Behavior

Abhijit Rao
Department of Computer Engineering
Manipal Institute Of Technology
Manipal, India
abhijit_rao1@rediffmail.com

 

Abstract
The concept of Web Mining is really catching up the industry. Acquiring web data and analyzing data is very fundamental and what we need to comprehend is how we can apply this dynamic technology to life. We need to seek a correlation between User Behavior when in the World Wide Web environment. This white paper gives attention to some of the significant applications of Web Mining to identify user behavior.
 

Introduction
Web sites are most often organized in a way the providers consider appropriate for the majority of the site’s visitors. However, our knowledge of the actual navigational behavior of the visitors is still sparse and fragmentary. Simple access statistics provide only rudimentary feedback, while studies on specific behavioral patterns. Knowledge about the navigation patterns occurring in or dominating the usage of a web site can greatly help the site’s owner or administrator in improving its quality.
Data mining can assist in this task by effectively extracting knowledge from the past, i.e. from the site access recordings. The term “web mining” is suggested to describe this type of mining activities undergone on data collected from the web. It employs an innovative technique for the discovery of navigation patterns over an aggregated materialized view of the web log. This technique offers a mining language as interface to the expert, so that the generic characteristics can be given, which make a pattern interesting to the specific person. Thus, only patterns having the desired characteristics are constructed, while uninteresting patterns are pruned out early.

Applications of Web Mining

 1.    Path Analysis of Users

As visitors navigate through a company‘s web site, their interactions are captured in web logs. Analyses of these web logs provide valuable insight into what products, services and offerings are of interest to visitors, how many percent of those visitors become on-line purchasers, and how and if those purchasers can be turned into loyal customers. Path analysis in particular deals with navigational behavior of its visitors. User navigation paths in the web or even fragments of visits of websites establish an important source of information. For higher level analytical tasks and applications like user segmentation, recommender systems etc., paths of different users have to be compared. Most path distances can be viewed as ordinary distance measures on a feature space of path fragments.

2.    Learning from User Access Patterns for Web Designing

Designing a web site is a complex and difficult problem. As with any user interface, designers must structure and present their content in a way that is clear and intuitive to users, or those users will become lost and disgruntled. Good design is often facilitated by observing people using the software. However, because traditional software is sold to the customer and used in the privacy of a home or office, software designers have had to resort to testing small groups of users in special labs. On the World Wide Web, however, users interact directly with a server maintained by the inventors of the service or authors of the content. Popular web sites, therefore, facilitate large scale direct observation of real users. Any web site can maintain logs of user accesses, and a designer can use this information to improve the site. Raw data, however, is difficult to use; especially at a large and popular site, access logs may amount to megabytes a day - too much for an overworked webmaster to process regularly. Web server logs, therefore, are ripe targets for automated data mining.

Adaptive Sites

Adaptive Sites are web sites that use information about user access patterns to improve their organization and presentation. Adaptive sites observe user activity and user difficulties and learn about types of users, regular access patterns, and common problems with the site.

3.    Prefetching

The problem of predicting web-user accesses has recently attracted significant attention. Prefetching refers to the mechanism of deducing forthcoming page accesses of a client, based on access log information. The objective of prefetching is the reduction of the user perceived latency. Since the Web popularity resulted in heavy traffic in the Internet, the net effect of this growth was a significant increase in the user perceived latency. Potential sources of latency are the web server’s heavy load, network congestion, low bandwidth, bandwidth under utilization and propagation delay. The obvious solution, that is, to increase the bandwidth, does not seem a viable solution, since the Web infrastructure (Internet) cannot be easily changed, without significant economic cost. Moreover, propagation delay cannot be reduced beyond a certain point, since it depends on the physical distance between the communicating end points. Prefetching refers to the process of deducing client’s future requests for Web objects and getting that objects into the cache, in the background, before an explicit request is made for them. The main advantage of employing prefetching is that it prevents bandwidth underutilization and hides part of the latency.

 4.    Site Semantics

Site semantics denotes any kind of formal description of the `meaning' of a site's different URLs. Various kinds of schemes for classifying a site's URLs have been proposed. These allow a larger number of visitor sessions or episodes to be identified as instances of one general pattern. On the other hand, the very specific paths individual visitors take through individual URL is requested so small, that no meaningful results would be obtained by mining the raw log data. This helps towards such diverse goals of analysis as identifying association rules between purchases of goods, determining differences between site designers' goals and visitors' actual behavior, identifying semantically meaningful navigation episodes, improving the interface, and characterizing the work-load of a site.

Conclusion

Web Mining creates an opening wherein Web Analysts can realize some of their designing and conceptual deficiencies and strengths. They have an opportunity to work on various aspects of user behavior. User behavior plays an important role in user-intense functioning like E-Auctions, E-retailing etc. Beyond E-Commerce we can look at Web Mining for investigating user interfaces or usability at site-level.


White paper uploaded on 10th September 2001