Showing posts with label logistic regression. Show all posts
Showing posts with label logistic regression. Show all posts

Thursday, April 26, 2018

What’ there in wine ? Principal component analysis problem- Data analytics


­
What’ there in wine ?
Which wine is suitable for a typical customer segment and what are their preferences ?
The  objective behind it is to understand the mathematics, and the datascience part behind it. This model can be replicated to any other similar business problem .
Here is a classical problem to understand the PCA- Principal component analysis. There are 178 records, 12 variables ( components to prepare the wine), which is distributed for three categories of customers.

Problem statement: Need to identify which are the variables that contribute to the preference of the customer. Identify the variables which has the maximum variance. Visualize the learning of the machine.
The task is to classify the category of customers and their taste. For each new wine the model will be used to predict to which customer segment this could be recommended.
This is an example for unsupervised learning , where we ask the machine to learn on its own without giving any instructions in between the program.
Let’s dive deep


Importance of PCA
            1.Chooses “m” variables out of “n “, where  m < n
            2. The chosen m variables explains the most of the variance in the dataset.
Now let us workout this problem in python.
As a standard process,

  •      Divide the dataset into test set and training set, where the learning made using the training set is plugged in the test set to see the results.
  • ·        Scaling the data to have uniform distance between them and the other variables. Where there are a number of modes to do scaling, here I have preferred to use standard scaling. This is available as a package  in python in sklearn.

  •  Import the PCA. Initially set the no. of components as None and after viewing the results of the PCA, we could decide the number of variables.
  •   Here it is decided as two variables which had maximum variance.
  •  After  we have got the top two variables , we shall use the logistic regression to identify the effectiveness and check whether it has classified as planned.
  •  Let us see the results, the confusion matrix.
  •     We have got a wonderful results as it has predicted 0 as 0 in 14 occasions, 1 as 1 in 15 , and 2 as 2  in 6, with a misclassification of one occassion. 
  • ·     Use matplotlib to visualize the results.





Wednesday, March 21, 2018

Analytics in Banking - acceptance of Personal loan


LOGISTIC REGRESSION
A major portion of business of banks is lending. Personal loan has a major share in lending. Wouldn't it be interesting if with the given set of data and the analytics ability of the software  predict who is in need of  loan and the chance of accepting the loan? let us  look deeper into it.
Data Description:
ID
Customer ID
Age
Customer's age in completed years
Experience
#years of professional experience
Income
Annual income of the customer (Rs 000)
PinCode
Home Address pin code.
Family
Family size of the customer
CCAvg
Avg. spending on credit cards per month (Rs 000)
Education
Education Level. 1: Undergrad; 2: Graduate; 3: Advanced/Professional
Mortgage
Value of house mortgage if any. (Rs ###)
Personal Loan
Did this customer accept the personal loan offered in the last campaign?
Securities Account
Does the customer have a securities account with the bank?
CD Account
Does the customer have a Fixed  deposit (FD) account with the bank?
Online
Does the customer use internet banking facilities?
CreditCard
Does the customer use a credit card issued by WWWXXXYYY Bank?

Steps:
1.Fix the appropriate data type for the given data
2.Choose the tool which can be used to fix the solution. A typical tool choosen is the spss modeler.
Can choose the appropriate tool of your expertise.
3. A model is created with the following nodes.


4. Source tab- excel node to input the data
5. Output tab- table node to see the input data
6. Output tab- Data audit node to see the quality of data
7. The data is made to have a required partition- may be 60- 40 % one for testing and another for training.
8. A type node is connected to make a Logistic regression
Before we move on to the logistic regression, one should understand the need for it.
Linear regression fits the data given which can predict the outcome depending upon the input given.
Logistic regression : When we need to get the output as Yes / No- 0 or 1, Acceptance/ Non acceptance, we are in need of this model. A detailed difference between the models and the mathematical variation is not given at this point of time.
9. Here our objective is to know whether one will have a need / accept a PL or not  and therefore this modeling has an appropriate fit.
10. Once the modeling is done we are in need of its evaluation and find the data who are potentially in need.
11. This process is  know as lifting. We need to identify the data/ decimate it and get the  prospective list. 
12.The whole objective is to have the maximum effectiveness and attempt only the prospective clients. This reduces the time, effort and of course the money involved in campaign, attempting, meeting etc to a greater extent.

.