Innovative multi-layered conceptual model used for
Customer Behavior Analysis in E-Business environment
Abhijit
Rao
Department
of Computer Engineering,
Manipal
Institute of Technology, Manipal.
This research paper
introduces an innovative multiple layered conceptual model that can be used for
Customer behavior analysis in an E-Business environment. This paper puts forth
the present technology used in Web data collection and how a proposed model
seems more competent in various aspects of Web data collection. The existing technology used for
congregation of data has several limitations. This model caters to some of the
issues that will make the data acquisition from the web more resourceful and
accurate.
A company
can gather large amount of data at its website. The staggering data collected
by aggregation of each point of contact a company has with its customers can
later undergo a segment of analysis and mining. The Web provides companies with an unprecedented opportunity to
analyze customer behavior and preferences. Every visit to a Web site generates
important customer behavioral data, regardless of whether or not a sale is
made. Every visitor action is a digital gesture exhibiting three important
perspectives into the data. Those three points are their habits, preferences
and tendencies.

These
interactions reveal important trends and patterns that can help a company
design a Web site that effectively communicates and markets its products and
services. Companies can aggregate, enhance, and mine Web data to learn what
sells, what works and what doesn't, and who is or isn't buying [1].
Table 1: Web data
utilization in large U.S. corporations
|
Web Data
Applications |
|
|
Marketing |
18% |
|
Customer
Service |
16% |
|
Don't
Use Web Data |
72% |
However,
according to a recent survey by Forrester Research, few companies are
listening: Of 50 of the largest U.S. corporations, only 18 percent are using
their Web data (see Table 1, above) [1].
There are
two main factors, which companies are not concentrating on. They are as
follows:

Fig. 2: Phases in
Web Mining
Let's take
a look at how to collect data on visitors. The four main sources are:
4.1 Log Files
Server log
files provide domain types, time of access, keywords, and search
engines used by visitors. Table 2. Illustrates the amount of information
gathered in a log file. The referer section of a log file provides valuable
information about where visitors are coming from. It can tell you what your
visitors were looking for when they came to your site by identifying the
keywords they used in their search (assuming they found you through a search
engine) and what search engine or banner ad they were referred from.
Table 2:
Information included in a typical log file
|
Anatomy of A Log File |
|
|
|
|
|
|
|
|
|
|
|
|
|
Cookies dispensed from the server can track
browser visits and pages viewed and can provide some insights into how often a
visitor has been to your site and what sections they wander into. Cookies are
special HTTP headers that servers pass to a browser.
They
reside in small text files on a browser's hard disk. You can find the cookie
value in the last field of the extended log format file. A retail Web site can
issue cookies to:
Cookies
are standard components for tracking customer activity in most e-commerce
sites. They are used as counters and unique identification values
that tell retailers who is a first-time visitor and where returning visitors
have been within a site.
4.3 Forms
By far the
most effective method of gathering Web site visitor and customer information is
via registration and purchase forms (see Figure 3, below). Forms can provide
important personal information about visitors, such as gender, age,
and ZIP code. Form submissions can launch a CGI program that returns a
response to the Web site visitor.
Forms are
simple browser-to-server mechanisms that can lead to a complex array of
customer interaction from which relationships can evolve. These customer
relationships can evolve into direct feedback systems through which customers
can communicate with a retailer and servers can continue to gather information
from browsers.

Fig. 3: A Web
registration form for collecting visitor information
Illustration
Source: www.excite.com
Using CGI
forms, you can create either relational tables or comma-delimited flat files
recording the entries from your forms. These customer-provided information
files can be analyzed directly or imported into a relational database.
The database engine not only makes data management easier,
but it also handles issues such as integrity, security, backup, and restoration.
Clickstream
data allow us to investigate how customers respond to advertising over time at
an individual level. The clickstream represents a new source of customer
response data detailing the content and banner ads that customers click on
during the online navigation process [3].
Behavioral
"clickstream" data of customer navigation from Web server access log
files is used in this technique. The main strengths of using server access log
data are
·
They recreate behavior in the actual media environment
·
They are collected unobtrusively and based on observed
behavior, rather than self reports
·
They are free from confounds of researcher interaction
·
Time pattern and order of activity is recorded
·
Longitudinal data on census of customers (and not just a
sample) is obtained
5. Enhancing Your Web Data
The success of any Web-mining
project largely depends on the quality and depth of its data. A common
methodology in data warehousing is to leverage the value of internal customer
information by appending the external demographic and behavioral data.
Similarly, you can append a variety of demographics to the information you
capture from your registration and purchase forms.
Obtaining a cohesive and
comprehensive view of customers involves not only using powerful data mining
technologies; it also requires enhancing internal transactional data with this
external customer information, which describes the tendencies and values of
customers in detail.
Access to different types of
customer demographics, coupled with data mining technology, can significantly
boost customer relationship management in ways that directly affect online
visitor acquisition and retention [2].
Data mining algorithms can search for relationships in Web
data to determine if patterns exist that can yield actionable business and
marketing intelligence. Data mining solutions come in many types, such as association,
segmentation, clustering, classification (prediction), and
visualization:
Association: uses
affinity market basket analysis to determine which products tend to sell
together.
Segmentation:
determines distinguishing features of your most profitable customers.
Clustering: profiles
customers to identify the characteristics of your visitors.
Classification/Prediction:
anticipates customer behavior to discover who is likely to make multiple
purchases.
Visualization: views
distributions and relationships to reveal what your visitors are purchasing [1].
Discussion
on any of the above-mentioned data-mining solutions is beyond the scope of this
paper.
Data
mining is the key to customer knowledge and intimacy in this type of
competitive and crowded marketplace. In hyper-competitive markets, the
strategic use of customer information is critical to survival. In a networked
electronic environment, the margins and profits go to the fast, responsive
players who are able to leverage predictive models to anticipate customer
behavior and preferences.
There are
several confinements to the above-mentioned techniques used for gathering web
data. Some of the crucial and challenging issues that have to be dealt with are
as follows:
Here a
conceptual model is put forward which endeavors to cater to the above
confinements.

There are
essentially four layers involved in this model
The user
layer is the GUI presented by any standard browser. A retailing website generally contains Banner Advertisements,
Hyperlink to the product of choice, hyperlinks to other genre of products,
hyperlinks that provides “add to shopping cart”, hyperlinks that provide
more information or an illustration on the displayed product.
The other
links are the ones provided on the standard browser. This includes the Click-
button links i.e., Next, Back, Stop, Refresh, Home buttons etc. The
mentioned Click-button features can also be achieved by using the shortcut
keys. A particular sub-page of a website can be found by typing the page
address. There can also be explicit buttons to do the Previous and Next
functions on the web page.
This layer
is the most critical layer of this conceptual layer. It does the imperative
task of gathering web data on which the rest of the layers are dependent. This
layer resides in a system that is called Scenario.
Scenario
is a system that is responsible to understand and interpret the digital
behavior of the customer. Scenario is component-based and each component
involves acquisition of data. The two components are:
7.2.1.1 Active Media Environment Information (AcME)
This
component concentrates on the physical interface part of the customer browsing.
This is in turn composed of Active Window Information (AWI) and Visitor
Entry Information (VNI).
7.2.1.1.1 Active Window Information
(AWI)
The
sub-component of AWI contains the following information:
Active window can be defined as the window the customer is presently
using. Any activity taking place on one particular window is considered as
Active Window. Passive window is that window that is momentarily not
used by the customer. Every window changes its state from Active to Passive
mode and vice-versa.
7.2.1.1.2 Visitor Entry Information
(VNI)
The time between the
consumer entry and exit from a Web site is defined as the visit duration
(or session duration if only one site was visited during the session). Separate
data entry is to be considered for active and passive windows.
7.2.1.2 Customer Navigation Information (CuNa)
Access
logs record customer navigation at a Web site in terms of the following
[3]:
The Scenario
gets the above-mentioned data and conveys it to the next layer that is the
server layer. The server layer stores the information after an Error
Checking and Correction Cycle (ECC). This cycle ensures that data stored
will not be redundant and invalid. Hence there is a close correlation between
the Data Acquisition and Server layer. This association will be confirmed when
we discuss the Informer Agent.
Informer
agent is as author-defined term that can be defined as follows:
A
software system that is used for recognition of Psychological behavior
demonstrated by any customer when using a website and this software resides intermediate
to the Server layer and the Data Acquisition layer.
The
Informer Agent is based on the principle of Human Computer Interaction. The
sole component of the Informer Agent is the Flow determination Information.
It is
important we understand a customer not only in terms of his digital behavior
but also in terms of his psychological behavior. Hence the concept of Flow is
to be understood [4].
Let us
first define what is the meaning of Flow is "a holistic sensation
where one acts with total involvement, with a narrowing of focus of
attention." Here are some of the definitions of Flow according to
different authors:
"The
flow experience begins only when challenges and skills are above a certain
level, and are in balance." [5]
"When
both challenges and skills are high, the person is not only enjoying the
moment, but is also stretching his or her capabilities with the likelihood of
learning new skills and increasing self-esteem and personal complexity. This
process of optimal experience has been called flow." [6]
"The two key characteristics of flow are (a) total concentration in an activity and (b) the enjoyment which one derives from an activity...There is an optimum level of challenge relative to a certain skill level. ...A second factor affecting the experience of flow is a sense of control over one's environment." [7]
This Informer agent will first
determine the association of a customer with the website interface. It will
draw few conclusions based on the study of Human Computer Interaction and then
will attain numerical values that will be finally stored as web data.
These are some of the components of
Flow Determination Information [4]:
Skills Factor: This component will check for the
computer and navigation skills the customer inherits. This will help the
company recognize whether the customer is a first-timer or an experienced
shopper.
Challenges Factor: This component will determine how
motivated the customer is in buying or appreciating the product.
Focused Attention Factor: This component is one of the
important factors that will help the company establish relation between
customer’s level and concentration and
his inclination to buy the product. This is purely dependent on the Customer’s
navigation pattern along with his pause on each web page he encounters during
his session.
Curiosity Factor: This component will project a
customer’s inquisitive behavior in understanding a particular or a range of
products.
This is
the ultimate layer where all the web data collected in the data repository is
excavated for more interesting patterns. The discussion of the various data
mining techniques is beyond the scope of this research paper.

Fig. 5: Anatomy of the Multiple
Layered Conceptual Model
The concepts presented in the multiple layered conceptual
model gives a theoretical backdrop to the whole subject of Web data collection.
There is no discussion of any implementation software like the type of data
repository or the usage of any specific programming language. These issues are
designer- dependent and this paper only puts forth the diminutive points
involved in this practice of Web mining. The only concern that needs an insight
in realization of the proposed model is whether it will be possible to have
such a mechanism in real time.
1 Mena
Jesus. "Mining E-Customer Behavior" DB2 Magazine. Winter
1999. 02 Mar. 2001.
<http://www.db2mag.com/db_area/archives/1999/q4/>
2 Mena
Jesus. "Bringing them back" Intelligent Enterprise. 17July
2000. Vol.3 No.2. 02 Mar. 2001. <http://www.intelligententerprise.com/000717/index.shtml>
3
Chatterjee Patrali, Hoffman Donna L., Novak Thomas P. “Modeling the Clickstream:
Implications for Web-Based
Advertising Efforts” Elab Research Manuscripts.1998. 07 Feb. 2001.
<http://www.elabweb.com/papers/clickstream/clickstream.html>
4 Hoffman Donna L., Novak Thomas P. “Measuring the Flow Experience
Among Web Users” Paper Presented at Interval Research Corporation, Jul.
31, 1997.12 Mar. 2001.
<http://www.elabweb.com/novak/flow.july.1997/flow.htm>
5 Csikszentmihalyi, Mihaly and Isabella Csikszentmihalyi (1988),
"Introduction to Part IV" in Optimal Experience: Psychological
Studies of Flow in Consciousness, Mihaly Csikszentmihalyi and Isabella
Selega Csikszentmihalyi, eds., Cambridge, Cambridge University Press, 260.
6 Csikszentmihalyi, Mihaly
and Judith LeFevre (1989), "Optimal Experience in Work and Leisure,"
Journal of Personality and Social Psychology, 56 (5), 815-822.
7 Ghani, Jawaid A. and Deshpande Satish P. (1994),
"Task Characteristics and the Experience of Optimal Flow in Human-Computer
Interaction," The Journal of Psychology, 128(4), 381-391.