What’ there in wine ?
Which wine is suitable for a typical customer segment and
what are their preferences ?
The objective behind
it is to understand the mathematics, and the datascience part behind it. This
model can be replicated to any other similar business problem .
Here is a classical problem to understand the PCA- Principal
component analysis. There are 178 records, 12 variables ( components to prepare
the wine), which is distributed for three categories of customers.
Problem statement: Need to identify which are the variables
that contribute to the preference of the customer. Identify the variables which
has the maximum variance. Visualize the learning of the machine.
The task is to classify the category of customers and their
taste. For each new wine the model will be used to predict to which customer
segment this could be recommended.
This is an example for unsupervised learning , where we ask
the machine to learn on its own without giving any instructions in between the program.
Let’s dive deep
Importance of PCA
1.Chooses “m”
variables out of “n “, where m < n
2. The
chosen m variables explains the most of the variance in the dataset.
Now let us workout this problem in python.
As a standard process,
- Divide the dataset into test set and training set, where the learning made using the training set is plugged in the test set to see the results.
- · Scaling the data to have uniform distance between them and the other variables. Where there are a number of modes to do scaling, here I have preferred to use standard scaling. This is available as a package in python in sklearn.
- Import the PCA. Initially set the no. of components as None and after viewing the results of the PCA, we could decide the number of variables.
- Here it is decided as two variables which had maximum variance.
- After we have got the top two variables , we shall use the logistic regression to identify the effectiveness and check whether it has classified as planned.
- Let us see the results, the confusion matrix.
- We have got a wonderful results as it has predicted 0 as 0 in 14 occasions, 1 as 1 in 15 , and 2 as 2 in 6, with a misclassification of one occassion.
- · Use matplotlib to visualize the results.
No comments:
Post a Comment