{"id":86321,"date":"2025-09-04T17:00:49","date_gmt":"2025-09-04T11:30:49","guid":{"rendered":"https:\/\/www.guvi.in\/blog\/?p=86321"},"modified":"2025-10-03T10:59:17","modified_gmt":"2025-10-03T05:29:17","slug":"confusion-matrix-in-machine-learning","status":"publish","type":"post","link":"https:\/\/www.guvi.in\/blog\/confusion-matrix-in-machine-learning\/","title":{"rendered":"Confusion Matrix in Machine Learning: The Ultimate Beginner\u2019s Guide"},"content":{"rendered":"\n<p>Machine learning is everywhere today, from chatbots, medical diagnosis, recommendation systems and even self-driving cars. But here\u2019s a question: How can we be sure that these models are making the right predictions?<\/p>\n\n\n\n<p>Accuracy alone isn&#8217;t enough. Consider a model that predicts all emails as &#8220;not spam.&#8221; It can have 90% accuracy, but it\u2019s completely useless because it has missed all the actual spam emails.<\/p>\n\n\n\n<p>This is why proper model evaluation is so important in data science. One of the easiest but most effective ways to evaluate a model is with a machine learning confusion matrix. A confusion matrix doesn&#8217;t just give you a right or wrong method for a prediction. It allows you to understand how well the model performed, what it got right, and where mistakes were made.<\/p>\n\n\n\n<p>Here in this blog, we will break down the confusion matrix in ML in easy steps, study some real-world examples, and see how to read it for binary as well as multiclass classification problems. Whether you are a learner or refreshing data science basics, this article will help you get the clarity you need.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is the Confusion Matrix in Machine Learning?<\/h2>\n\n\n\n<p>A confusion matrix in <a href=\"https:\/\/www.guvi.in\/blog\/introduction-to-machine-learning\/\">machine <\/a><a href=\"https:\/\/www.guvi.in\/blog\/introduction-to-machine-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">learning<\/a> is a common method used for evaluating the performance of classification models. It is a tabular representation of actual targets and the model&#8217;s prediction that the model classifier produces.<\/p>\n\n\n\n<p>Instead of just telling you the percentage of correct predictions, it highlights where your model is \u201cconfused.\u201d That\u2019s why it\u2019s called a confusion matrix.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Structure of a Confusion Matrix<\/strong><\/h3>\n\n\n\n<p>In a binary classification problem, it shows four possible outcomes at its core:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/2-2-1200x630.png\" alt=\"Confusion matrix in Machine Learning\" class=\"wp-image-86612\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/2-2-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/2-2-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/2-2-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/2-2-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/2-2-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/2-2-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<ul>\n<li>True Positive (TP): The model said &#8220;Yes,&#8221; and it really was &#8220;Yes.&#8221;<\/li>\n\n\n\n<li>True Negative (TN): The model said &#8220;No,&#8221; and it really was &#8220;No.&#8221;<\/li>\n\n\n\n<li>False Positive (FP): The model said &#8220;Yes,&#8221; but it was really &#8220;No.&#8221; (Type I mistake)<\/li>\n\n\n\n<li>False Negative (FN): The model said &#8220;No,&#8221; but it was really &#8220;Yes.&#8221; (Type II mistake)<\/li>\n<\/ul>\n\n\n\n<div style=\"background-color: #099f4e; border: 3px solid #110053; border-radius: 12px; padding: 18px 22px; color: #FFFFFF; font-size: 18px; font-family: Montserrat, Helvetica, sans-serif; line-height: 1.6; box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); max-width: 750px;\">\n  <strong style=\"font-size: 22px; color: #FFFFFF;\">\ud83d\udca1 Did You Know?<\/strong> \n  <br \/><br \/> \n  The term <strong style=\"color: #FFFFFF;\">\u201cconfusion matrix\u201d<\/strong> comes from the fact that it literally shows where the model is <em>confused<\/em> between different classes. \n  <br \/><br \/> \n  Confusion matrices aren\u2019t new, they\u2019ve been used in <strong style=\"color: #FFFFFF;\">statistics since the 1960s<\/strong>, long before machine learning became popular. \n  <br \/><br \/> \n  A model can have <strong style=\"color: #FFFFFF;\">95% accuracy<\/strong> and still be terrible if the data is imbalanced (like predicting \u201cnot spam\u201d every time). The confusion matrix helps you spot such hidden weaknesses.\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Confusion Matrix Example in Machine Learning<\/strong><\/h3>\n\n\n\n<p>Let\u2019s imagine we built a spam email classifier:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td><strong>Predicted Spam<\/strong><\/td><td><strong>Predicted Not Spam<\/strong><\/td><\/tr><tr><td><strong>Actual Spam<\/strong><\/td><td>85 (TP)<\/td><td>15 (FN)<\/td><\/tr><tr><td><strong>Actual Not Spam<\/strong><\/td><td>10 (FP)<\/td><td>90 (TN)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Here\u2019s how to interpret it:<\/p>\n\n\n\n<ul>\n<li>The model properly found 85 spam emails (TP).<\/li>\n\n\n\n<li>It didn&#8217;t get 15 spam emails (FN).<\/li>\n\n\n\n<li>It wrongly identified 10 emails that weren&#8217;t spam as spam (FP).<\/li>\n\n\n\n<li>It properly identified 90 emails that weren&#8217;t spam (TN).<\/li>\n<\/ul>\n\n\n\n<p>This table tells us far more than just overall accuracy; it shows where the model struggles.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How to Interpret a Confusion Matrix in ML<\/strong><\/h2>\n\n\n\n<p>From the confusion matrix, we can calculate key performance metrics:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>1. Accuracy<\/strong><\/h3>\n\n\n\n<p>Accuracy is the overall percentage of correct predictions that a model makes. It is the ratio of how many predictions the model got correct to the total number of predictions that it makes. While accuracy can be helpful for an overall measure of performance, it can be a little misleading, especially with imbalanced data (data where one class appears much more frequently than the other). For example, when 95% of emails are \u201cnot spam\u201d, a model that always predicts \u201cnot spam\u201d will have 95% accuracy even if it never catches any spam emails.<\/p>\n\n\n\n<p>Formula:&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" width=\"1200\" height=\"167\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-16-1200x167.png\" alt=\"How to Interpret a Confusion Matrix in ML\" class=\"wp-image-86323\" style=\"aspect-ratio:7.18562874251497;width:550px;height:auto\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-16-1200x167.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-16-300x42.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-16-768x107.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-16-1536x213.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-16-150x21.png 150w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-16.png 1600w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=Accuracy%20%3D%20%5Cfrac%7BTP%20%2B%20TN%7D%7BTP%20%2B%20TN%20%2B%20FP%20%2B%20FN%7D#0\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2. Precision<\/strong><\/h3>\n\n\n\n<p>Precision tells us the quality of positive predictions. It answers the question: Of all the times the model predicted something as a positive, how many were actually correct? High precision indicates that there are fewer false positives. For example, If a medical test identified patients with a rare disease, precision tells us how many of the patients identified as &#8220;sick&#8221; were actually sick.<\/p>\n\n\n\n<p>Formula:&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" width=\"1200\" height=\"258\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-17-1200x258.png\" alt=\"How to Interpret a Confusion Matrix in ML\" class=\"wp-image-86324\" style=\"aspect-ratio:4.651162790697675;width:372px;height:auto\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-17-1200x258.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-17-300x65.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-17-768x165.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-17-150x32.png 150w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-17.png 1477w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=Precision%20%3D%20%5Cfrac%7BTP%7D%7BTP%20%2B%20FP%7D#0\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3. Recall (Sensitivity)<\/strong><\/h3>\n\n\n\n<p>Recall is the measure of a model&#8217;s ability to find all of the actual positives. Recall answers: Of all the positives in reality, how many did the model find? In other words, a model with high recall will not miss true cases. For example, In fraud detection, recall answers how many of the actual fraud transactions the system identified correctly.&nbsp;<\/p>\n\n\n\n<p>Formula:&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" width=\"1200\" height=\"301\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-18-1200x301.png\" alt=\"How to Interpret a Confusion Matrix in ML\" class=\"wp-image-86325\" style=\"aspect-ratio:3.9867109634551494;width:350px;height:auto\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-18-1200x301.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-18-300x75.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-18-768x193.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-18-150x38.png 150w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-18.png 1267w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=Recall%20%3D%20%5Cfrac%7BTP%7D%7BTP%20%2B%20FN%7D#0\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>4. F1-Score<\/strong><\/h3>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/F-score\" target=\"_blank\" rel=\"noreferrer noopener\">F1 score<\/a> is a balance between precision and recall. Focusing only on precision or recall can sometimes be misleading. The F1 score helps bring both precision and recall together into a single number, which is useful especially in unbalanced data situations. The higher the F1 score, the better the model is in finding positives and lowering false positives.<\/p>\n\n\n\n<p>Formula:&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" width=\"1200\" height=\"166\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-21-1200x166.png\" alt=\"How to Interpret a Confusion Matrix in ML\" class=\"wp-image-86328\" style=\"aspect-ratio:7.228915662650603;width:466px;height:auto\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-21-1200x166.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-21-300x41.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-21-768x106.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-21-1536x212.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-21-150x21.png 150w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/image-21.png 1600w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p><a href=\"https:\/\/www.codecogs.com\/eqnedit.php?latex=F1%5C%20Score%20%3D%20%5Cfrac%7B2%20%5Ctimes%20Precision%20%5Ctimes%20Recall%7D%7BPrecision%20%2B%20Recall%7D#0\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Confusion Matrix for Binary Classification<\/h2>\n\n\n\n<p>When we have only two classes (e.g., Spam vs. Not Spam, Fraud vs. Not Fraud, Positive vs. Negative), the confusion matrix is a 2\u00d72 grid.<\/p>\n\n\n\n<p>It looks like this:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td><strong>Predicted Positive<\/strong><\/td><td><strong>Predicted Negative<\/strong><\/td><\/tr><tr><td><strong>Actual Positive<\/strong><\/td><td>True Positive (TP)<\/td><td>False Negative (FN)<\/td><\/tr><tr><td><strong>Actual Negative<\/strong><\/td><td>False Positive (FP)<\/td><td>True Negative (TN)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>Example: Email Spam Detection<\/strong><\/p>\n\n\n\n<p>Suppose we have 100 emails:<\/p>\n\n\n\n<ul>\n<li>50 are actually spam,<\/li>\n\n\n\n<li>50 is not spam.<\/li>\n<\/ul>\n\n\n\n<p>The model makes the following predictions:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td><strong>Predicted Spam<\/strong><\/td><td><strong>Predicted Not Spam<\/strong><\/td><\/tr><tr><td><strong>Actual Spam<\/strong><\/td><td>45 (TP)<\/td><td>5 (FN)<\/td><\/tr><tr><td><strong>Actual Not Spam<\/strong><\/td><td>7 (FP)<\/td><td>43 (TN)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>Interpretation:<\/strong><\/p>\n\n\n\n<ul>\n<li><strong>True Positives (45):<\/strong> Model correctly identified 45 spam emails.<\/li>\n\n\n\n<li><strong>False Negatives (5):<\/strong> Model missed 5 spam emails (they slipped into the inbox).<\/li>\n\n\n\n<li><strong>False Positives (7):<\/strong> Model marked 7 normal emails as spam.<\/li>\n\n\n\n<li><strong>True Negatives (43):<\/strong> Model correctly identified 43 normal emails.<\/li>\n<\/ul>\n\n\n\n<p>This simple 2\u00d72 confusion matrix gives you an exact breakdown of where your binary classifier is performing well and where it\u2019s making mistakes.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Confusion Matrix in Python for Binary Classification<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 1: Import the Libraries<\/strong><\/h3>\n\n\n\n<p>In<a href=\"https:\/\/www.guvi.in\/blog\/how-to-setup-a-python-environment-for-machine-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\"> Python<\/a>, we\u2019ll use NumPy for handling arrays, sklearn.metrics for generating the confusion matrix and classification report, and seaborn + <a href=\"https:\/\/www.guvi.in\/blog\/fundamentals-of-matplotlib\/\" target=\"_blank\" rel=\"noreferrer noopener\">matplotlib<\/a> for visualization.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>import numpy as np<br>import matplotlib.pyplot as plt<br>import seaborn as sns<br>from sklearn.metrics import confusion_matrix, classification_report<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 2: Define the Actual and Predicted Labels<\/strong><\/h3>\n\n\n\n<p>Imagine we are training a model to detect whether an image is of a Dog or not a dog.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td># Ground truth (what&#8217;s actually correct)<br>true_labels = np.array([<br>&nbsp; &nbsp; &#8216;Dog&#8217;,&#8217;Dog&#8217;,&#8217;Dog&#8217;,&#8217;Not Dog&#8217;,&#8217;Dog&#8217;,<br>&nbsp; &nbsp; &#8216;Not Dog&#8217;,&#8217;Dog&#8217;,&#8217;Dog&#8217;,&#8217;Not Dog&#8217;,&#8217;Not Dog&#8217;<br>])<br><br># Model predictions<br>pred_labels = np.array([<br>&nbsp; &nbsp; &#8216;Dog&#8217;,&#8217;Not Dog&#8217;,&#8217;Dog&#8217;,&#8217;Not Dog&#8217;,&#8217;Dog&#8217;,<br>&nbsp; &nbsp; &#8216;Dog&#8217;,&#8217;Dog&#8217;,&#8217;Dog&#8217;,&#8217;Not Dog&#8217;,&#8217;Not Dog&#8217;<br>])<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 3: Generate the Confusion Matrix<\/strong><\/h3>\n\n\n\n<p>The confusion matrix compares actual labels with predicted labels.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>cm = confusion_matrix(true_labels, pred_labels)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 4: Visualize the Confusion Matrix<\/strong><\/h3>\n\n\n\n<p>We\u2019ll use a heatmap to make it easy to interpret.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>plt.figure(figsize=(5,4))<br>sns.heatmap(cm, annot=True, fmt=&#8217;d&#8217;, cmap=&#8221;Blues&#8221;,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; xticklabels=[&#8216;Dog&#8217;,&#8217;Not Dog&#8217;],<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yticklabels=[&#8216;Dog&#8217;,&#8217;Not Dog&#8217;])<br>plt.ylabel(&#8216;Actual&#8217;)<br>plt.xlabel(&#8216;Predicted&#8217;)<br>plt.title(&#8216;Confusion Matrix &#8211; Binary Classification&#8217;)<br>plt.show()<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-2-1200x630.png\" alt=\"\" class=\"wp-image-86613\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-2-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-2-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-2-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-2-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-2-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-2-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 5: Classification Report<\/strong><\/h3>\n\n\n\n<p>Instead of manually calculating precision, recall, and F1-score, we can use classification_report.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>print(classification_report(true_labels, pred_labels))<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Confusion Matrix for Multiclass Classification<\/h2>\n\n\n\n<p>Not all problems are binary. In many real-world cases, there are more than two categories. For example:<\/p>\n\n\n\n<ul>\n<li>Classifying handwritten digits (0\u20139).<\/li>\n\n\n\n<li>Identifying fruit types (apple, banana, orange).<\/li>\n\n\n\n<li>Predicting weather conditions (sunny, rainy, cloudy).<\/li>\n<\/ul>\n\n\n\n<p>In such cases, the confusion matrix becomes an N\u00d7N table, where <em>N<\/em> is the number of classes. Each row represents the actual class, and each column represents the predicted class.<\/p>\n\n\n\n<p><strong>Example: Animal Classification<\/strong><\/p>\n\n\n\n<p>Let\u2019s say we have a model that classifies animals into Cat, Dog, and Horse. After testing, we get this matrix:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td><strong>Predicted Cat<\/strong><\/td><td><strong>Predicted Dog<\/strong><\/td><td><strong>Predicted Horse<\/strong><\/td><\/tr><tr><td><strong>Actual Cat<\/strong><\/td><td>40<\/td><td>8<\/td><td>2<\/td><\/tr><tr><td><strong>Actual Dog<\/strong><\/td><td>5<\/td><td>50<\/td><td>5<\/td><\/tr><tr><td><strong>Actual Horse<\/strong><\/td><td>3<\/td><td>4<\/td><td>43<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>Interpretation:<\/strong><\/p>\n\n\n\n<ul>\n<li>Out of 50 actual Cats, the model predicted 40 correctly but confused 8 as Dogs and 2 as Horses.<\/li>\n\n\n\n<li>Out of 60 actual Dogs, the model got 50 right but confused 5 as Cats and 5 as Horses.<\/li>\n\n\n\n<li>Out of 50 actual Horses, it predicted 43 correctly but mislabeled 7 as either Cats or Dogs.<\/li>\n<\/ul>\n\n\n\n<p>This helps us figure out where the model isn&#8217;t working. In this example, it sometimes mixes Cats with Dogs, which could mean that the traits of these two groups are similar.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Confusion Matrix in Python for Multiclass Classification<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 1: Import the Libraries<\/strong><\/h3>\n\n\n\n<p>(Same as before, but let\u2019s keep it clear for students.)<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>import numpy as np<br>import matplotlib.pyplot as plt<br>import seaborn as sns<br>from sklearn.metrics import confusion_matrix, classification_report<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 2: Define the Actual and Predicted Labels<\/strong><\/h3>\n\n\n\n<p>Here, the true labels are what the fruits really are, while the predicted labels are what our model thinks they are.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td># Ground truth (actual fruits)<br>true_labels = np.array([<br>&nbsp; &nbsp; &#8216;Apple&#8217;,&#8217;Banana&#8217;,&#8217;Orange&#8217;,&#8217;Apple&#8217;,&#8217;Orange&#8217;,<br>&nbsp; &nbsp; &#8216;Banana&#8217;,&#8217;Apple&#8217;,&#8217;Orange&#8217;,&#8217;Orange&#8217;,&#8217;Banana&#8217;<br>])<br><br># Model predictions<br>pred_labels = np.array([<br>&nbsp; &nbsp; &#8216;Apple&#8217;,&#8217;Orange&#8217;,&#8217;Orange&#8217;,&#8217;Apple&#8217;,&#8217;Banana&#8217;,<br>&nbsp; &nbsp; &#8216;Banana&#8217;,&#8217;Apple&#8217;,&#8217;Orange&#8217;,&#8217;Apple&#8217;,&#8217;Banana&#8217;<br>])<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 3: Compute the Confusion Matrix<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>cm = confusion_matrix(true_labels, pred_labels, labels=[&#8216;Apple&#8217;,&#8217;Banana&#8217;,&#8217;Orange&#8217;])<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 4: Visualize the Confusion Matrix<\/strong><\/h3>\n\n\n\n<p>A heatmap makes it easy to see where the model is doing well and where it\u2019s confusing one fruit with another.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>cm = confusion_matrix(true_labels, pred_labels, labels=[&#8216;Apple&#8217;,&#8217;Banana&#8217;,&#8217;Orange&#8217;])<br>plt.figure(figsize=(6,5))<br>sns.heatmap(cm, annot=True, fmt=&#8217;d&#8217;, cmap=&#8221;Oranges&#8221;,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; xticklabels=[&#8216;Apple&#8217;,&#8217;Banana&#8217;,&#8217;Orange&#8217;],<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yticklabels=[&#8216;Apple&#8217;,&#8217;Banana&#8217;,&#8217;Orange&#8217;])<br>plt.ylabel(&#8216;Actual&#8217;)<br>plt.xlabel(&#8216;Predicted&#8217;)<br>plt.title(&#8216;Confusion Matrix &#8211; Multiclass Classification&#8217;)<br>plt.show()<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-1-1-1200x630.png\" alt=\"\" class=\"wp-image-86694\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-1-1-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-1-1-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-1-1-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-1-1-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-1-1-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/3-1-1-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 5: Classification Report<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>print(classification_report(true_labels, pred_labels))<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This gives a detailed summary for each class separately (precision, recall, F1-score), unlike binary classification where it\u2019s just two.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Confusion Matrix Advantages and Disadvantages<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Advantages<\/strong><\/td><td><strong>Disadvantages<\/strong><\/td><\/tr><tr><td>Provides a detailed breakdown of classification results.<\/td><td>Can become complex to interpret for many classes.<\/td><\/tr><tr><td>Helps identify specific types of errors (False Positives vs False Negatives).<\/td><td>Doesn\u2019t directly show overall performance in one number.<\/td><\/tr><tr><td>Useful for imbalanced datasets where accuracy is misleading.<\/td><td>Needs additional metrics like Precision, Recall, and F1-score for complete evaluation.<\/td><\/tr><tr><td>Works for both binary and multiclass classification.<\/td><td>May overwhelm beginners when too many classes are involved.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Why the Confusion Matrix Matters in Data Science<\/h2>\n\n\n\n<p>In practical <a href=\"https:\/\/www.guvi.in\/blog\/top-data-science-projects-in-python\/\" target=\"_blank\" rel=\"noreferrer noopener\">data science projects<\/a>, simply knowing that your model is \u201c80% accurate\u201d doesn\u2019t mean much. Take a fraud detection model in which only 1 in 100 transactions is fraudulent. If your model predicts \u201cNo fraud\u201d every time, you will be 99% accurate, which is useless.<\/p>\n\n\n\n<p>The confusion matrix in classification allows you to get the complete story. It helps you identify whether your model is really learning patterns or just taking shortcuts.<\/p>\n\n\n\n<p><em>Now that you know how the Confusion Matrix works, why stop here? Take your next step with HCL GUVI\u2019s Advanced <\/em><a href=\"https:\/\/www.guvi.in\/mlp\/artificial-intelligence-and-machine-learning\/?utm_source=blog&amp;utm_medium=hyperlink&amp;utm_campaign=confusion-matrix-in-machine-learning\" target=\"_blank\" rel=\"noreferrer noopener\"><em>AI &amp; Machine Learning Course<\/em><\/a><em>, designed in collaboration with <\/em><strong><em>Intel and IITM Pravartak<\/em><\/strong><em>. This program takes you from theory to <\/em><strong><em>hands-on projects and real-world applications<\/em><\/strong><em> in Python, ML algorithms, Deep Learning, NLP, and model deployment.<\/em><\/p>\n\n\n\n<p><strong><em>Highlights of the Program:<\/em><\/strong><\/p>\n\n\n\n<ul>\n<li><strong><em>Intel-certified credential<\/em><\/strong><em> recognized globally<\/em><\/li>\n\n\n\n<li><em>Learn from industry experts and <\/em><strong><em>IITM Pravartak<\/em><\/strong><em> faculty<\/em><\/li>\n\n\n\n<li><em>Real-world <\/em><strong><em>case studies<\/em><\/strong><em> and hands-on projects<\/em><\/li>\n\n\n\n<li><em>Flexible learning with mentor support &amp; career guidance<\/em><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Final Thoughts..<\/h2>\n\n\n\n<p>In machine learning, the confusion matrix is more than just a table: it&#8217;s a view into your model&#8217;s advantages and limitations.&nbsp; The confusion matrix is your reference point for good data model evaluation, whether you&#8217;re dealing with binary classification, multiclass classification, calculating precision and recall or aspects related to confusion matrices.&nbsp;<\/p>\n\n\n\n<p>If you&#8217;re learning about confusion matrix in Python using scikit-learn, start small with a binary example, and then build up to multiclass problems.<\/p>\n\n\n\n<p>I hope this blog helped you understand what a confusion matrix in machine learning is and how to use it for binary and multiclass classification problems. Remember, learning machine learning concepts takes a lot of practice, so don&#8217;t worry if you feel stuck initially, you\u2019ll be able to improve each time you complete a project.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">FAQs<\/h2>\n\n\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1756978183542\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>1. What is a confusion matrix in simple terms?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>A confusion matrix is a table that indicates how well a machine learning model performed by showing the difference between actual and predicted values.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1756978276579\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>2. Why do we need a confusion matrix when we already have accuracy?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Accuracy simply tells you the overall percentage of correct predictions whereas a confusion matrix provides more detail as to the exact types of the errors (i.e., false positives or false negatives).<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1756978309452\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>3. What is the difference between binary and multiclass confusion matrices?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>In the case of binary classification, a confusion matrix is 2\u00d72 and is used for binary classification problems where there are two outcomes (e.g., spam vs. not spam).<br \/>In the case of multiclass classification, the confusion matrix is larger (3\u00d73, 4\u00d74, etc.) depending on how many classes there are.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1756978378055\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>4. Which metrics can we calculate from a confusion matrix?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>From a confusion matrix we can compute accuracy, precision, recall, and F1-score<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Machine learning is everywhere today, from chatbots, medical diagnosis, recommendation systems and even self-driving cars. But here\u2019s a question: How can we be sure that these models are making the right predictions? Accuracy alone isn&#8217;t enough. Consider a model that predicts all emails as &#8220;not spam.&#8221; It can have 90% accuracy, but it\u2019s completely useless [&hellip;]<\/p>\n","protected":false},"author":63,"featured_media":86610,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[933],"tags":[],"views":"2752","authorinfo":{"name":"Vishalini Devarajan","url":"https:\/\/www.guvi.in\/blog\/author\/vishalini\/"},"thumbnailURL":"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/1-300x116.png","jetpack_featured_media_url":"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/09\/1.png","_links":{"self":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/86321"}],"collection":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/users\/63"}],"replies":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/comments?post=86321"}],"version-history":[{"count":8,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/86321\/revisions"}],"predecessor-version":[{"id":86696,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/86321\/revisions\/86696"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/media\/86610"}],"wp:attachment":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/media?parent=86321"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/categories?post=86321"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/tags?post=86321"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}