A common issue that has to be addressed during model building is the understanding the output and churning it . One important understanding is required in the term 'misclassifier'.
It is also required to understand the following terms
True positive , false positive, true negative , false negative.
Depending upon the business / technical problem the important misclassifier need to be identified and churned .
For instance,
False positive means predicting a negative result as a positive result.
In the below example a sample fictitious data is used for understanding.
It has 32 variables and 1000 records. The data is scaled suitably to use it for analysis, which is a typical data preprocessing activity.
It is partitioned for testing and training. 80% -20 %
Two models were build- ANN and CRT.
How should we understand the results?
Figure1 : Model built- ANN and CRT using SPSS Modeler.
Figure2: Results from CART algorithm
Figure3: Response from the Artificial and Neural Network.
Based on the given output by the models the question that arises is which should be chosen ?
Considering the figure 2, the false positive 64 % and the figure 3 gives a result of 55.4 % which mean that the the former model is saying the result as positive which is higher than the ANN model.
What is the conclusion ?
If we are bothered on a model which should say the facts as such , then both of those are dangerous. Say for instance we are working on a hospital data for which treatment is given for a symptom and the model predicts that the person has a disease. Isn't this dangerous ? Hence work on the misclassifier and refine the model .