Friday, February 8, 2019
Thursday, February 7, 2019
Market basket Analysis
Few terminologies:
Transaction is a set of items (Itemset).
Confidence : It is the measure of uncertainty or trust worthiness associated with each discovered
pattern.
Support : It is the measure of how often the collection of items in an association occur together as percentage of all transactions
Frequent itemset : If an itemset satisfies minimum support,then it is a frequent itemset.
Strong Association rules: Rules that satisfy both a minimum support threshold and a minimum
confidence threshold
In Association rule mining, we first find all frequent itemsets and then generate strong association rules from the frequent itemsets
Apriori algorithm
is the most established algorithm for finding frequent item sets mining.
The basic
principle of Apriori is “Any subset of a frequent itemset must be frequent”.
We use these
frequent itemsets to generate association rules.
Association rule
mining
Finding frequent patterns, associations, correlations, or
causal structures among sets of items in transaction databases
Understand customer buying habits by finding associations
and correlations between the different items that customers place in their
“shopping basket”
Applications: Basket data analysis, cross‐marketing,
catalog design, loss‐leader analysis, web log analysis, fraud detection (supervisor‐>examiner)
Rule form
Antecedent →Consequent [support, confidence]
(support and confidence are user defined measures of interestingness)
Let the rule discovered be {Jamun,...} → {Potato Chips}
Potato chips as consequent => Can be used to determine what should be done to boost its sales
Jamun in the antecedent => Can be used to see which products would be affected if the store discontinues selling Jamun
Jamun in antecedent and Potato chips in the consequent => Can be used to see what products should be sold with Jamun to promote sale of Potato Chips
Find all itemsets that have high support,These are known as frequent itemsets. Generate association rules from frequent itemsets
Let us see an example
:
Finding the support, confidence and lift of i)
shirts and ties, ii) trousers and ties.
Transaction Id
|
Shirts
|
Trousers
|
Ties
|
001
|
1
|
1
|
1
|
002
|
0
|
1
|
0
|
003
|
1
|
0
|
1
|
004
|
1
|
0
|
1
|
005
|
1
|
1
|
0
|
For our data there are 3 transactions with both shirts and ties (shirts ∩
ties) out of total 5 transactions.
Support =3/5 =0.6 or 60%
Confidence for association is calculated using the following formula:
In our example, there are 3 transaction for both shirts and ties
together out of 4 transactions for shirts. The calculation for confidence
for our dataset is:
Confidence =3/4 =0.75 or 75 %
A third useful metric for association analysis is lift; it is defined
as:
Expected confidence in the above formula is presence of ties in the
overall dataset i.e. there are 4 instances of ties purchase out of 5.
The value for lift, 125%, shows that purchases of the ties improve when
the customers buy shirts. The point to note here is
that if the customer buys a shirt, does his chance of buying ties go up i.e.
value of lift above 100
Lift= confidence / expected confidence = P( ties | shirts) / P(ties)
Expected confidence in the above formula is
presence of ties in the overall dataset i.e. there are 4 instances of ties
purchase out of 5.
Lift = (3/4) / ( 3/5) = 15/12 =1.25 or 125%
Similarly for the trousers and ties
Support: 1/5
Confidence: 1/3
Lift= ( 1/3 )/ (3/5)= 5/9 or 55.6%
Customer Life Time Value
Just like we use Net Present Value (NPV) to evaluate
investments and companies, we use CLV to evaluate customer relationships CLV is
the expected NPV of the cash flows from a customer relationship CLV is defined
as the discounted sum of all future customer revenue streams minus product and
servicing costs and re marketing costs
Let us assume we are analyzing the customer life time value of a company which offers services of a kind . The following are obtained.
Where ,
This is a simple model of calculation. The formula differs when the service is offered before the contribution from the customer. [ Eg- Credit cards ]
Segmentation of data -K- Means
- Demographic segmentation
In business, demographic segmentation is when
an organization uses data about the demographic characteristics of its
customers to better target and enhance its marketing efforts.
By segmenting a market by demographic variables
such as the age of the customer, gender, income, education, religion, and
family life cycle, business professionals are able to create groups of
customers that display similar wants and needs. Examples
Age
Gender
Income
Education
Family life cycle
other segmentationss
- Behavioral segmentation
- Psychographic segmentation
- Geographic segmentation
Illustration of the algorithm- Choose two random centeroids.
Step:3
Step:4
What we expect, but what we get will be different.
This can
be accomplished by making the WCSS -within-cluster sums of squares as minimum as possible. This metric determines the optimum number
of clusters .
Subscribe to:
Posts (Atom)