Relevance of Web Mining in
identifying User Behavior
Abhijit Rao
Department of Computer Engineering
Manipal Institute Of Technology
Manipal, India
abhijit_rao1@rediffmail.com
Abstract
The concept of Web Mining is really catching up the industry. Acquiring web data
and analyzing data is very fundamental and what we need to comprehend is how we
can apply this dynamic technology to life. We need to seek a correlation between
User Behavior when in the World Wide Web environment. This white paper gives
attention to some of the significant applications of Web Mining to identify user
behavior.
Introduction
Web sites are most often organized in a way the providers consider appropriate
for the majority of the site’s visitors. However, our knowledge of the actual
navigational behavior of the visitors is still sparse and fragmentary. Simple
access statistics provide only rudimentary feedback, while studies on specific
behavioral patterns. Knowledge about the navigation patterns occurring in or
dominating the usage of a web site can greatly help the site’s owner or
administrator in improving its quality.
Data mining can assist in this task by effectively extracting knowledge from the
past, i.e. from the site access recordings. The term “web mining” is suggested
to describe this type of mining activities undergone on data collected from the
web. It employs an innovative technique for the discovery of navigation patterns
over an aggregated materialized view of the web log. This technique offers a
mining language as interface to the expert, so that the generic characteristics
can be given, which make a pattern interesting to the specific person. Thus,
only patterns having the desired characteristics are constructed, while
uninteresting patterns are pruned out early.
Applications of Web Mining
1. Path Analysis of Users
As visitors navigate through a company‘s web site, their
interactions are captured in web logs. Analyses of these web logs provide
valuable insight into what products, services and offerings are of interest to
visitors, how many percent of those visitors become on-line purchasers, and how
and if those purchasers can be turned into loyal customers. Path analysis in
particular deals with navigational behavior of its visitors. User navigation
paths in the web or even fragments of visits of websites establish an important
source of information. For higher level analytical tasks and applications like
user segmentation, recommender systems etc., paths of different users have to be
compared. Most path distances can be viewed as ordinary distance measures on a
feature space of path fragments.
2. Learning from User Access Patterns
for Web Designing
Designing a web site is a complex and difficult problem. As
with any user interface, designers must structure and present their content in a
way that is clear and intuitive to users, or those users will become lost and
disgruntled. Good design is often facilitated by observing people using the
software. However, because traditional software is sold to the customer and used
in the privacy of a home or office, software designers have had to resort to
testing small groups of users in special labs. On the World Wide Web, however,
users interact directly with a server maintained by the inventors of the service
or authors of the content. Popular web sites, therefore, facilitate large scale
direct observation of real users. Any web site can maintain logs of user
accesses, and a designer can use this information to improve the site. Raw data,
however, is difficult to use; especially at a large and popular site, access
logs may amount to megabytes a day - too much for an overworked webmaster to
process regularly. Web server logs, therefore, are ripe targets for automated
data mining.
Adaptive Sites
Adaptive Sites are web sites that use information about
user access patterns to improve their organization and presentation. Adaptive
sites observe user activity and user difficulties and learn about types of
users, regular access patterns, and common problems with the site.
3. Prefetching
The problem of predicting web-user accesses has recently
attracted significant attention. Prefetching refers to the mechanism of deducing
forthcoming page accesses of a client, based on access log information. The
objective of prefetching is the reduction of the user perceived latency. Since
the Web popularity resulted in heavy traffic in the Internet, the net effect of
this growth was a significant increase in the user perceived latency. Potential
sources of latency are the web server’s heavy load, network congestion, low
bandwidth, bandwidth under utilization and propagation delay. The obvious
solution, that is, to increase the bandwidth, does not seem a viable solution,
since the Web infrastructure (Internet) cannot be easily changed, without
significant economic cost. Moreover, propagation delay cannot be reduced beyond
a certain point, since it depends on the physical distance between the
communicating end points. Prefetching refers to the process of deducing client’s
future requests for Web objects and getting that objects into the cache, in the
background, before an explicit request is made for them. The main advantage of
employing prefetching is that it prevents bandwidth underutilization and hides
part of the latency.
4. Site Semantics
Site semantics denotes any kind of formal description of
the `meaning' of a site's different URLs. Various kinds of schemes for
classifying a site's URLs have been proposed. These allow a larger number of
visitor sessions or episodes to be identified as instances of one general
pattern. On the other hand, the very specific paths individual visitors take
through individual URL is requested so small, that no meaningful results would
be obtained by mining the raw log data. This helps towards such diverse goals of
analysis as identifying association rules between purchases of goods,
determining differences between site designers' goals and visitors' actual
behavior, identifying semantically meaningful navigation episodes, improving the
interface, and characterizing the work-load of a site.
Conclusion
Web Mining creates an opening wherein Web Analysts can
realize some of their designing and conceptual deficiencies and strengths. They
have an opportunity to work on various aspects of user behavior. User behavior
plays an important role in user-intense functioning like E-Auctions, E-retailing
etc. Beyond E-Commerce we can look at Web Mining for investigating user
interfaces or usability at site-level.
White paper uploaded on 10th September 2001