Friday, November 23, 2018

Understanding Misclassifier


A common issue that has to be addressed during model building  is the understanding the output and churning it . One important understanding is required in the term 'misclassifier'.

It is also required to understand the following terms
True positive , false positive, true negative , false negative.

Depending upon the business / technical problem the important misclassifier  need to be identified and churned .

For instance,
False positive means predicting a negative result as a positive result.

In the below example  a sample fictitious data is used for understanding.
It has 32 variables and 1000 records. The data is scaled suitably to use it for analysis, which is a typical data preprocessing activity.
It is partitioned for testing and training. 80% -20 %

Two models were build- ANN and CRT.

How should we understand the results?







                     Figure1 : Model built- ANN and CRT using SPSS Modeler.






Figure2: Results from CART algorithm 


Figure3: Response from the Artificial and Neural Network. 


Based on the given output by the models the question that arises is which should be chosen ?
Considering the figure 2, the false positive 64 % and the figure 3 gives a result of 55.4 % which mean that the the former model is saying the result as positive which is higher than the ANN model.
What is the conclusion ?
If we are bothered on a model which should say the facts as such , then both of those are dangerous. Say for instance we are working on a hospital data for which treatment is given for  a symptom and the model predicts that the person has a disease. Isn't this dangerous ? Hence work on the  misclassifier and refine the model . 

No comments:

Post a Comment