Innovative multi-layered conceptual model used for Customer Behavior Analysis in E-Business environment
Department of Computer Engineering,
Manipal Institute of Technology, Manipal.
This research paper introduces an innovative multiple layered conceptual model that can be used for Customer behavior analysis in an E-Business environment. This paper puts forth the present technology used in Web data collection and how a proposed model seems more competent in various aspects of Web data collection. The existing technology used for congregation of data has several limitations. This model caters to some of the issues that will make the data acquisition from the web more resourceful and accurate.
A company can gather large amount of data at its website. The staggering data collected by aggregation of each point of contact a company has with its customers can later undergo a segment of analysis and mining. The Web provides companies with an unprecedented opportunity to analyze customer behavior and preferences. Every visit to a Web site generates important customer behavioral data, regardless of whether or not a sale is made. Every visitor action is a digital gesture exhibiting three important perspectives into the data. Those three points are their habits, preferences and tendencies.
These interactions reveal important trends and patterns that can help a company design a Web site that effectively communicates and markets its products and services. Companies can aggregate, enhance, and mine Web data to learn what sells, what works and what doesn't, and who is or isn't buying .
Table 1: Web data utilization in large U.S. corporations
Web Data Applications
Don't Use Web Data
However, according to a recent survey by Forrester Research, few companies are listening: Of 50 of the largest U.S. corporations, only 18 percent are using their Web data (see Table 1, above) .
There are two main factors, which companies are not concentrating on. They are as follows:
Fig. 2: Phases in Web Mining
Let's take a look at how to collect data on visitors. The four main sources are:
4.1 Log Files
Server log files provide domain types, time of access, keywords, and search engines used by visitors. Table 2. Illustrates the amount of information gathered in a log file. The referer section of a log file provides valuable information about where visitors are coming from. It can tell you what your visitors were looking for when they came to your site by identifying the keywords they used in their search (assuming they found you through a search engine) and what search engine or banner ad they were referred from.
Table 2: Information included in a typical log file
Anatomy of A Log File
Cookies dispensed from the server can track browser visits and pages viewed and can provide some insights into how often a visitor has been to your site and what sections they wander into. Cookies are special HTTP headers that servers pass to a browser.
They reside in small text files on a browser's hard disk. You can find the cookie value in the last field of the extended log format file. A retail Web site can issue cookies to:
Cookies are standard components for tracking customer activity in most e-commerce sites. They are used as counters and unique identification values that tell retailers who is a first-time visitor and where returning visitors have been within a site.
By far the most effective method of gathering Web site visitor and customer information is via registration and purchase forms (see Figure 3, below). Forms can provide important personal information about visitors, such as gender, age, and ZIP code. Form submissions can launch a CGI program that returns a response to the Web site visitor.
Forms are simple browser-to-server mechanisms that can lead to a complex array of customer interaction from which relationships can evolve. These customer relationships can evolve into direct feedback systems through which customers can communicate with a retailer and servers can continue to gather information from browsers.
Fig. 3: A Web registration form for collecting visitor information
Illustration Source: www.excite.com
Using CGI forms, you can create either relational tables or comma-delimited flat files recording the entries from your forms. These customer-provided information files can be analyzed directly or imported into a relational database.
The database engine not only makes data management easier, but it also handles issues such as integrity, security, backup, and restoration.
Clickstream data allow us to investigate how customers respond to advertising over time at an individual level. The clickstream represents a new source of customer response data detailing the content and banner ads that customers click on during the online navigation process .
Behavioral "clickstream" data of customer navigation from Web server access log files is used in this technique. The main strengths of using server access log data are
· They recreate behavior in the actual media environment
· They are collected unobtrusively and based on observed behavior, rather than self reports
· They are free from confounds of researcher interaction
· Time pattern and order of activity is recorded
· Longitudinal data on census of customers (and not just a sample) is obtained
5. Enhancing Your Web Data
The success of any Web-mining project largely depends on the quality and depth of its data. A common methodology in data warehousing is to leverage the value of internal customer information by appending the external demographic and behavioral data. Similarly, you can append a variety of demographics to the information you capture from your registration and purchase forms.
Obtaining a cohesive and comprehensive view of customers involves not only using powerful data mining technologies; it also requires enhancing internal transactional data with this external customer information, which describes the tendencies and values of customers in detail.
Access to different types of customer demographics, coupled with data mining technology, can significantly boost customer relationship management in ways that directly affect online visitor acquisition and retention .
Data mining algorithms can search for relationships in Web data to determine if patterns exist that can yield actionable business and marketing intelligence. Data mining solutions come in many types, such as association, segmentation, clustering, classification (prediction), and visualization:
Association: uses affinity market basket analysis to determine which products tend to sell together.
Segmentation: determines distinguishing features of your most profitable customers.
Clustering: profiles customers to identify the characteristics of your visitors.
Classification/Prediction: anticipates customer behavior to discover who is likely to make multiple purchases.
Visualization: views distributions and relationships to reveal what your visitors are purchasing .
Discussion on any of the above-mentioned data-mining solutions is beyond the scope of this paper.
Data mining is the key to customer knowledge and intimacy in this type of competitive and crowded marketplace. In hyper-competitive markets, the strategic use of customer information is critical to survival. In a networked electronic environment, the margins and profits go to the fast, responsive players who are able to leverage predictive models to anticipate customer behavior and preferences.
There are several confinements to the above-mentioned techniques used for gathering web data. Some of the crucial and challenging issues that have to be dealt with are as follows:
Here a conceptual model is put forward which endeavors to cater to the above confinements.
There are essentially four layers involved in this model
The user layer is the GUI presented by any standard browser. A retailing website generally contains Banner Advertisements, Hyperlink to the product of choice, hyperlinks to other genre of products, hyperlinks that provides “add to shopping cart”, hyperlinks that provide more information or an illustration on the displayed product.
The other links are the ones provided on the standard browser. This includes the Click- button links i.e., Next, Back, Stop, Refresh, Home buttons etc. The mentioned Click-button features can also be achieved by using the shortcut keys. A particular sub-page of a website can be found by typing the page address. There can also be explicit buttons to do the Previous and Next functions on the web page.
This layer is the most critical layer of this conceptual layer. It does the imperative task of gathering web data on which the rest of the layers are dependent. This layer resides in a system that is called Scenario.
Scenario is a system that is responsible to understand and interpret the digital behavior of the customer. Scenario is component-based and each component involves acquisition of data. The two components are:
188.8.131.52 Active Media Environment Information (AcME)
This component concentrates on the physical interface part of the customer browsing. This is in turn composed of Active Window Information (AWI) and Visitor Entry Information (VNI).
184.108.40.206.1 Active Window Information (AWI)
The sub-component of AWI contains the following information:
Active window can be defined as the window the customer is presently using. Any activity taking place on one particular window is considered as Active Window. Passive window is that window that is momentarily not used by the customer. Every window changes its state from Active to Passive mode and vice-versa.
220.127.116.11.2 Visitor Entry Information (VNI)
The time between the consumer entry and exit from a Web site is defined as the visit duration (or session duration if only one site was visited during the session). Separate data entry is to be considered for active and passive windows.
18.104.22.168 Customer Navigation Information (CuNa)
Access logs record customer navigation at a Web site in terms of the following :
The Scenario gets the above-mentioned data and conveys it to the next layer that is the server layer. The server layer stores the information after an Error Checking and Correction Cycle (ECC). This cycle ensures that data stored will not be redundant and invalid. Hence there is a close correlation between the Data Acquisition and Server layer. This association will be confirmed when we discuss the Informer Agent.
Informer agent is as author-defined term that can be defined as follows:
A software system that is used for recognition of Psychological behavior demonstrated by any customer when using a website and this software resides intermediate to the Server layer and the Data Acquisition layer.
The Informer Agent is based on the principle of Human Computer Interaction. The sole component of the Informer Agent is the Flow determination Information.
It is important we understand a customer not only in terms of his digital behavior but also in terms of his psychological behavior. Hence the concept of Flow is to be understood .
Let us first define what is the meaning of Flow is "a holistic sensation where one acts with total involvement, with a narrowing of focus of attention." Here are some of the definitions of Flow according to different authors:
"The flow experience begins only when challenges and skills are above a certain level, and are in balance." 
"When both challenges and skills are high, the person is not only enjoying the moment, but is also stretching his or her capabilities with the likelihood of learning new skills and increasing self-esteem and personal complexity. This process of optimal experience has been called flow." 
"The two key characteristics of flow are (a) total concentration in an activity and (b) the enjoyment which one derives from an activity...There is an optimum level of challenge relative to a certain skill level. ...A second factor affecting the experience of flow is a sense of control over one's environment." 
This Informer agent will first determine the association of a customer with the website interface. It will draw few conclusions based on the study of Human Computer Interaction and then will attain numerical values that will be finally stored as web data.
These are some of the components of Flow Determination Information :
Skills Factor: This component will check for the computer and navigation skills the customer inherits. This will help the company recognize whether the customer is a first-timer or an experienced shopper.
Challenges Factor: This component will determine how motivated the customer is in buying or appreciating the product.
Focused Attention Factor: This component is one of the important factors that will help the company establish relation between customer’s level and concentration and his inclination to buy the product. This is purely dependent on the Customer’s navigation pattern along with his pause on each web page he encounters during his session.
Curiosity Factor: This component will project a customer’s inquisitive behavior in understanding a particular or a range of products.
This is the ultimate layer where all the web data collected in the data repository is excavated for more interesting patterns. The discussion of the various data mining techniques is beyond the scope of this research paper.
Fig. 5: Anatomy of the Multiple Layered Conceptual Model
The concepts presented in the multiple layered conceptual model gives a theoretical backdrop to the whole subject of Web data collection. There is no discussion of any implementation software like the type of data repository or the usage of any specific programming language. These issues are designer- dependent and this paper only puts forth the diminutive points involved in this practice of Web mining. The only concern that needs an insight in realization of the proposed model is whether it will be possible to have such a mechanism in real time.
1 Mena Jesus. "Mining E-Customer Behavior" DB2 Magazine. Winter 1999. 02 Mar. 2001. <http://www.db2mag.com/db_area/archives/1999/q4/>
2 Mena Jesus. "Bringing them back" Intelligent Enterprise. 17July 2000. Vol.3 No.2. 02 Mar. 2001. <http://www.intelligententerprise.com/000717/index.shtml>
Chatterjee Patrali, Hoffman Donna L., Novak Thomas P. “Modeling the Clickstream:
Implications for Web-Based Advertising Efforts” Elab Research Manuscripts.1998. 07 Feb. 2001. <http://www.elabweb.com/papers/clickstream/clickstream.html>
4 Hoffman Donna L., Novak Thomas P. “Measuring the Flow Experience Among Web Users” Paper Presented at Interval Research Corporation, Jul. 31, 1997.12 Mar. 2001.
5 Csikszentmihalyi, Mihaly and Isabella Csikszentmihalyi (1988), "Introduction to Part IV" in Optimal Experience: Psychological Studies of Flow in Consciousness, Mihaly Csikszentmihalyi and Isabella Selega Csikszentmihalyi, eds., Cambridge, Cambridge University Press, 260.
6 Csikszentmihalyi, Mihaly and Judith LeFevre (1989), "Optimal Experience in Work and Leisure," Journal of Personality and Social Psychology, 56 (5), 815-822.
7 Ghani, Jawaid A. and Deshpande Satish P. (1994), "Task Characteristics and the Experience of Optimal Flow in Human-Computer Interaction," The Journal of Psychology, 128(4), 381-391.