How to read confusion matrix python
Anyone can build a machine learning (ML) model with a few lines of code, but building a good machine learning model is a whole other story. What do I mean by a GOOD machine learning model? It depends, but generally, you’ll evaluate your machine learning model based on some predetermined metrics that you decide to use. When it comes to building classification
models, you’ll most likely use a confusion matrix and related metrics to evaluate your model. Confusion matrices are not just useful in model evaluation but also model monitoring and model management! Don’t worry, we’re not talking about linear algebra matrices here! In this article, we’ll cover what a confusion matrix is, some key terms and metrics, an example of a 2x2 matrix, and all of the related python code! With that
said, let’s dive into it! A confusion matrix, also known as an error matrix, is a summarized table used to assess the performance of a classification model. The number of correct and incorrect predictions are summarized with count values and broken down by each class. Below is an image of the structure of a 2x2 confusion matrix. To give an example, let’s say that there were ten instances
where a classification model predicted ‘Yes’ in which the actual value was ‘Yes’. Then the number ten would go in the top left corner in the True Positive quadrant. This leads us to some key terms:
Confusion Matrix MetricsNow that you understand the general structure of a confusion matrix as well as the associated key terms, we can dive into some of the main metrics that you can calculate from a confusion matrix. Note: this list is not exhaustive — if you want to see all of the metrics that you can calculate, check out Wikipedia’s page here. AccuracyThis is simply equal to the proportion of predictions that the model classified correctly. PrecisionPrecision is also known as positive predictive value and is the proportion of relevant instances among the retrieved instances. In other words, it answers the question “What proportion of positive identifications was actually correct?” RecallRecall, also known as the sensitivity, hit rate, or the true positive rate (TPR), is the proportion of the total amount of relevant instances that were actually retrieved. It answers the question “What proportion of actual positives was identified correctly?” To really hit it home, the diagram below is a great way to remember the difference between precision and recall (it certainly helped me)! SpecificitySpecificity, also known as the true negative rate (TNR), measures the proportion of actual negatives that are correctly identified as such. It is the opposite of recall. F1 ScoreThe F1 score is a measure of a test’s accuracy — it is the harmonic mean of precision and recall. It can have a maximum score of 1 (perfect precision and recall) and a minimum of 0. Overall, it is a measure of the preciseness and robustness of your model. Example of 2x2 Confusion MatrixIf this still isn’t making sense to you, it will after we take a look at the example below. Imagine that we created a machine learning model that predicts whether a patient has cancer or not. The table on the left shows twelve predictions that the model made as well as the actual result of each patient. With our paired-data, you can then fill out the confusion matrix using the structure that I showed above. Once this is filled in, we can learn a number of things about our model:
In reality, you would want the recall of a cancer detection model to be as close to 100% as possible. It’s far worse if a patient with cancer is diagnosed as cancer-free, as opposed to a cancer-free patient being diagnosed with cancer only to realize later with more testing that he/she doesn't have it. Python CodeBelow is a summary of code that you need to calculate the metrics above: # Confusion Matrix There are three ways you can calculate the F1 score in Python: # Method 1: sklearn ConclusionNow that you know what a confusion matrix is as well as its associated metrics, you can effectively evaluate your classification ML models. This is also essential to understand even after you finish developing your ML model, as you’ll be leveraging these metrics in the model monitoring and model management stages of the machine learning life cycle.
Thanks for Reading!If you like my work and want to support me…
How do you analyze the confusion matrix?Below is the process for calculating a confusion Matrix.. You need a test dataset or a validation dataset with expected outcome values.. Make a prediction for each row in your test dataset.. From the expected outcomes and predictions count: The number of correct predictions for each class.. How do you get the confusion matrix in python?Creating a Confusion Matrix. import numpy.. actual = numpy.random.binomial(1, 0.9, size = 1000) predicted = numpy.random.binomial(1, 0.9, size = 1000). from sklearn import metrics.. cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix, display_labels = [False, True]). import matplotlib.pyplot as plt.. What does a confusion matrix tell you Python?A confusion matrix is a matrix (table) that can be used to measure the performance of an machine learning algorithm, usually a supervised learning one. Each row of the confusion matrix represents the instances of an actual class and each column represents the instances of a predicted class.
How do you find the accuracy of a confusion matrix in python?To calculate accuracy, use the following formula: (TP+TN)/(TP+TN+FP+FN). Misclassification Rate: It tells you what fraction of predictions were incorrect. It is also known as Classification Error. You can calculate it using (FP+FN)/(TP+TN+FP+FN) or (1-Accuracy).
|