{"id":89711,"date":"2025-10-14T13:56:41","date_gmt":"2025-10-14T08:26:41","guid":{"rendered":"https:\/\/www.guvi.in\/blog\/?p=89711"},"modified":"2025-11-19T11:20:29","modified_gmt":"2025-11-19T05:50:29","slug":"eda-with-numpy-pandas-matplotlib-and-seaborn","status":"publish","type":"post","link":"https:\/\/www.guvi.in\/blog\/eda-with-numpy-pandas-matplotlib-and-seaborn\/","title":{"rendered":"Exploratory Data Analysis (EDA) With Numpy, Pandas, Matplotlib And Seaborn"},"content":{"rendered":"\n<p>In the world of data science, every project starts with one vital step and that is understanding your data. Before building models or drawing insights, it\u2019s essential to know what your data truly represents. This is where Exploratory Data Analysis(EDA) plays a key role.<\/p>\n\n\n\n<p>EDA helps you uncover hidden patterns, detect anomalies, test hypotheses, and validate assumptions through summary statistics and visual exploration. Simply put, it\u2019s the process of understanding your dataset.<\/p>\n\n\n\n<p>In this blog, you\u2019ll discover how to perform EDA using four powerful Python libraries: NumPy, Pandas, Matplotlib, and Seaborn. Through hands-on examples and code snippets, you\u2019ll learn how to explore, visualize, and interpret your data effectively.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What Is Exploratory Data Analysis (EDA)?<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/WHAT-IS-EXPLORATORY-DATA-ANALYSIS-EDA_-1200x630.png\" alt=\"The infographic depicts the title of the section and a Data scientist analyzing data visualizations on a screen during EDA\" class=\"wp-image-93777\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/WHAT-IS-EXPLORATORY-DATA-ANALYSIS-EDA_-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/WHAT-IS-EXPLORATORY-DATA-ANALYSIS-EDA_-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/WHAT-IS-EXPLORATORY-DATA-ANALYSIS-EDA_-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/WHAT-IS-EXPLORATORY-DATA-ANALYSIS-EDA_-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/WHAT-IS-EXPLORATORY-DATA-ANALYSIS-EDA_-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/WHAT-IS-EXPLORATORY-DATA-ANALYSIS-EDA_-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>Before proceeding with EDA using NumPy, Pandas, Matplotlib, and Seaborn, let\u2019s first understand what <a href=\"https:\/\/www.guvi.in\/blog\/exploratory-data-analysis-eda-in-data-science\/\" target=\"_blank\" rel=\"noreferrer noopener\">EDA<\/a> actually means. Exploratory Data Analysis is the first crucial step in any data science workflow, it\u2019s a process that helps you make sense of your dataset before diving into complex modeling or machine learning.<\/p>\n\n\n\n<p>Think of EDA as your opportunity to explore, question, and interpret the data. It allows you to uncover patterns, detect anomalies, and gain insights that guide your next steps in analysis. By using statistical summaries and visualizations, EDA helps you:<\/p>\n\n\n\n<ul>\n<li>Understand data distributions<\/li>\n\n\n\n<li>Detect missing values or outliers<\/li>\n\n\n\n<li>Identify relationships among variables<\/li>\n\n\n\n<li>Summarize key features that influence outcomes<\/li>\n<\/ul>\n\n\n\n<p><strong>Example &#8211; <\/strong>Suppose you\u2019re analyzing sales data for an e-commerce company. Before forecasting revenue or predicting customer churn, you\u2019d begin by exploring the dataset, checking which products sell best, how often customers return, and whether certain seasons impact sales.<\/p>\n\n\n\n<p>Kickstart your data journey with HCL GUVI\u2019s<a href=\"https:\/\/www.guvi.in\/mlp\/data-science-ebook?utm_source=blog&amp;utm_medium=hyperlink&amp;utm_campaign=exploratory-data-analysis-with-numpy-pandas-matplotlib-and-seaborn\" target=\"_blank\" rel=\"noreferrer noopener\"> <strong>Data Science eBook<\/strong><\/a>, your step-by-step guide to mastering analytics and visualization. Explore topics like data preprocessing, EDA techniques, Python essentials, and visualization tools &#8211; all simplified for beginners to learn at their own pace.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Importance Of EDA In Data Science<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/IMPORTANCE-OF-EDA-IN-DATA-SCIENCE-1200x630.png\" alt=\"A before-and-after comparison showing raw, messy data transforming into meaningful visual insights after applying EDA\" class=\"wp-image-93779\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/IMPORTANCE-OF-EDA-IN-DATA-SCIENCE-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/IMPORTANCE-OF-EDA-IN-DATA-SCIENCE-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/IMPORTANCE-OF-EDA-IN-DATA-SCIENCE-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/IMPORTANCE-OF-EDA-IN-DATA-SCIENCE-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/IMPORTANCE-OF-EDA-IN-DATA-SCIENCE-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/IMPORTANCE-OF-EDA-IN-DATA-SCIENCE-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>Before you begin any kind of analysis or modeling, it\u2019s essential to understand why Exploratory Data Analysis (EDA) plays such a critical role. In <a href=\"https:\/\/www.guvi.in\/blog\/what-is-data-science\/\" target=\"_blank\" rel=\"noreferrer noopener\">data science<\/a>, skipping EDA is like setting sail without a map. You might move forward quickly, but you\u2019ll have no idea if you\u2019re heading in the right direction. EDA ensures that the data you use for modeling is clean, accurate, and aligned with your goals.<\/p>\n\n\n\n<p>Here\u2019s why EDA is so important:<\/p>\n\n\n\n<ul>\n<li><strong>Validates Data Quality:<\/strong> Detects missing, incorrect, or inconsistent values that could lead to false conclusions.<\/li>\n\n\n\n<li><strong>Guides the Analytical Approach:<\/strong> Helps you choose the right models, transformations, or techniques based on how the data behaves.<\/li>\n\n\n\n<li><strong>Prevents Costly Errors:<\/strong> Identifies outliers and anomalies that can negatively impact the model\u2019s accuracy.<\/li>\n\n\n\n<li><strong>Reveals Business Insights:<\/strong> Highlights meaningful trends, relationships, and patterns that can shape real-world strategies.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Key Python Libraries For EDA<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/KEY-PYTHON-LIBRARIES-FOR-EDA-1200x630.png\" alt=\" Icons of popular Python libraries for EDA including NumPy, Pandas, Matplotlib, and Seaborn.\n\" class=\"wp-image-93780\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/KEY-PYTHON-LIBRARIES-FOR-EDA-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/KEY-PYTHON-LIBRARIES-FOR-EDA-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/KEY-PYTHON-LIBRARIES-FOR-EDA-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/KEY-PYTHON-LIBRARIES-FOR-EDA-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/KEY-PYTHON-LIBRARIES-FOR-EDA-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/KEY-PYTHON-LIBRARIES-FOR-EDA-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>Before performing any analysis, it\u2019s important to know which tools make EDA possible. Python offers a powerful ecosystem of libraries that simplify <a href=\"https:\/\/www.guvi.in\/blog\/guide-to-data-exploration\/\" target=\"_blank\" rel=\"noreferrer noopener\">data exploration<\/a>, visualization, and interpretation. Each library serves a unique purpose and works together to turn raw data into meaningful insights.<\/p>\n\n\n\n<p>Let\u2019s look at the four main libraries that make EDA efficient and effective.<\/p>\n\n\n\n<ol>\n<li>Numpy<\/li>\n\n\n\n<li>Pandas<\/li>\n\n\n\n<li>Matplotlib<\/li>\n\n\n\n<li>Seaborn<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>A. Numpy \u2013 The Foundation of Numerical Computing<\/strong><\/h3>\n\n\n\n<p>Before any meaningful analysis, you need a solid base to handle numbers efficiently, and that\u2019s exactly what NumPy provides. It\u2019s the core Python library that helps you manage large sets of data and perform quick mathematical operations.<\/p>\n\n\n\n<p>NumPy makes it easy to calculate averages, medians, or standard deviations, and even helps in finding unusual or extreme values in your dataset. It\u2019s what keeps your computations fast, clean, and accurate.<\/p>\n\n\n\n<p><strong>Key Functions:<\/strong><\/p>\n\n\n\n<ul>\n<li>np.mean(), np.median(), np.std() \u2013 for quick statistical summaries<\/li>\n\n\n\n<li>np.where() \u2013 to filter data based on conditions<\/li>\n\n\n\n<li>np.percentile() \u2013 for spotting outliers<\/li>\n<\/ul>\n\n\n\n<p><strong>Example:<\/strong><\/p>\n\n\n\n<p><strong>Input<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import numpy as np\n\n# Sample salary data\ndata = np.array(&#91;45, 50, 55, 60, 65, 70, 150])\n\nprint(\"Mean:\", np.mean(data))\nprint(\"Median:\", np.median(data))\nprint(\"Standard Deviation:\", np.std(data))\n<\/code><\/pre>\n\n\n\n<p><strong>Output<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Mean: 70.71428571428571\nMedian: 60.0\nStandard Deviation: 31.676848893042084\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>B. Pandas \u2013 The Heart of Data Analysis<\/strong><\/h3>\n\n\n\n<p>Once your data is ready to explore, Pandas is the go-to tool for working with it. It helps you easily load, clean, organize, and analyze data in Python. Pandas uses a structure called a DataFrame, which organizes data into rows and columns, similar to how information is arranged in an Excel table. This makes it simple to view, edit, and understand your data. In short, Pandas helps you turn messy, unorganized data into clear and useful information before you start creating <a href=\"https:\/\/www.guvi.in\/blog\/data-visualization-in-data-science\/\" target=\"_blank\" rel=\"noreferrer noopener\">visualizations<\/a> or building models.<\/p>\n\n\n\n<p><strong>Key Functions:<\/strong><\/p>\n\n\n\n<ul>\n<li>df.head(), df.info(), df.describe() \u2013 get a quick overview of your data<\/li>\n\n\n\n<li>df.isnull().sum() \u2013 find missing values<\/li>\n\n\n\n<li>df.groupby() \u2013 group and analyze related data<\/li>\n\n\n\n<li>df.value_counts() \u2013 count how many times each value appears<\/li>\n<\/ul>\n\n\n\n<p><strong>Example:<\/strong><\/p>\n\n\n\n<p><strong>Input:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\n\n# Creating a sample sales dataset\ndata = {\n    \"Product\": &#91;\"Laptop\", \"Phone\", \"Tablet\", \"Laptop\", \"Phone\", \"Tablet\"],\n    \"Category\": &#91;\"Electronics\", \"Electronics\", \"Electronics\", \"Electronics\", \"Electronics\", \"Electronics\"],\n    \"Revenue\": &#91;85000, 45000, 30000, 90000, 48000, 31000],\n    \"Quantity\": &#91;5, 8, 10, 6, 7, 9]\n}\n\ndf = pd.DataFrame(data)\n\n# 1. Viewing first few rows of the dataset\nprint(\"First five rows of the dataset:\")\nprint(df.head())\n\n# 2. Checking basic info\nprint(\"\\nDataset Info:\")\nprint(df.info())\n\n# 3. Summary statistics\nprint(\"\\nSummary Statistics:\")\nprint(df.describe())\n\n# 4. Checking for missing values\nprint(\"\\nMissing Values:\")\nprint(df.isnull().sum())\n\n# 5. Grouping and analyzing average revenue by product\nprint(\"\\nAverage Revenue by Product:\")\nprint(df.groupby(\"Product\")&#91;\"Revenue\"].mean())\n\n# 6. Counting how many times each product appears\nprint(\"\\nProduct Frequency Count:\")\nprint(df&#91;\"Product\"].value_counts())\n<\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>First five rows of the dataset:\n  Product    Category  Revenue  Quantity\n0  Laptop  Electronics    85000         5\n1   Phone  Electronics    45000         8\n2  Tablet  Electronics    30000        10\n3  Laptop  Electronics    90000         6\n4   Phone  Electronics    48000         7\n\nDataset Info:\n&lt;class 'pandas.core.frame.DataFrame'&gt;\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\n #   Column    Non-Null Count  Dtype \n---  ------    --------------  ----- \n 0   Product   6 non-null      object\n 1   Category  6 non-null      object\n 2   Revenue   6 non-null      int64 \n 3   Quantity  6 non-null      int64 \ndtypes: int64(2), object(2)\nmemory usage: 320.0 bytes\nNone\n\nSummary Statistics:\n           Revenue   Quantity\ncount      6.000000   6.000000\nmean   54833.333333   7.500000\nstd    25857.285438   1.870829\nmin    30000.000000   5.000000\nmax    90000.000000  10.000000\n\nMissing Values:\nProduct     0\nCategory    0\nRevenue     0\nQuantity    0\ndtype: int64\n\nAverage Revenue by Product:\nProduct\nLaptop    87500.0\nPhone     46500.0\nTablet    30500.0\nName: Revenue, dtype: float64\n\nProduct Frequency Count:\nLaptop    2\nPhone     2\nTablet    2\nName: Product, dtype: int64\n<\/code><\/pre>\n\n\n\n<p><strong>Explanation:<\/strong><\/p>\n\n\n\n<ul>\n<li>The first few rows help you quickly see if your data is loaded correctly.<\/li>\n\n\n\n<li>The info() command gives a summary of columns, data types, and missing values.<\/li>\n\n\n\n<li>The describe() command shows key statistics like average and range.<\/li>\n\n\n\n<li>The missing values check ensures your data is complete.<\/li>\n\n\n\n<li>The groupby() function helps you compare data, such as finding which product brings in the highest revenue.<\/li>\n\n\n\n<li>The value_counts() function shows how often each product appears in your dataset.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>C. Matplotlib \u2013 The Backbone of Visualization<\/strong><\/h3>\n\n\n\n<p>After cleaning and preparing your data, the next step is to visualize it \u2014 to actually see what\u2019s going on. <a href=\"https:\/\/www.guvi.in\/blog\/data-visualization-with-matplotlib\/\" target=\"_blank\" rel=\"noreferrer noopener\">Matplotlib<\/a> is the most widely used library in Python for creating graphs and charts. It helps you spot patterns, trends, and outliers visually, making your analysis much easier to understand. You can use it to create histograms, bar charts, line graphs, scatter plots, and more. It\u2019s a powerful tool for turning numbers into visuals that tell a story.<\/p>\n\n\n\n<p><strong>Key Functions<\/strong><\/p>\n\n\n\n<ul>\n<li>plt.hist() \u2013 shows the distribution of data<\/li>\n\n\n\n<li>plt.boxplot() \u2013 helps detect outliers<\/li>\n\n\n\n<li>plt.scatter() \u2013 shows relationships between two variables<\/li>\n<\/ul>\n\n\n\n<p><strong>Example:<\/strong><\/p>\n\n\n\n<p><strong>Input<\/strong>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import matplotlib.pyplot as plt\n\nplt.hist(df&#91;\"Revenue\"], bins=20, color=\"skyblue\", edgecolor=\"black\")\nplt.title(\"Distribution of Revenue\")\nplt.xlabel(\"Revenue\")\nplt.ylabel(\"Frequency\")\nplt.show()<\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<ul>\n<li>The x-axis (Revenue) shows the range of revenue values divided into 20 small bins (intervals).<\/li>\n\n\n\n<li>The y-axis (Frequency) shows how many data points fall into each revenue range.<\/li>\n\n\n\n<li>The title \u201cDistribution of Revenue\u201d appears at the top.<\/li>\n\n\n\n<li>Each bar\u2019s height indicates how many entries in your dataset have revenue values within that range.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>D. Seaborn \u2013 The Statistical Visualization Powerhouse<\/strong><\/h3>\n\n\n\n<p>Once your data is clean and ready, it\u2019s time to explore it visually \u2014 and that\u2019s where <strong>Seaborn<\/strong> shines. Seaborn is built on top of Matplotlib but offers more advanced and attractive visualizations with just a few lines of code. It\u2019s perfect for understanding relationships, spotting patterns, and comparing different groups in your data. With Seaborn, you can easily create heatmaps, pairplots, and boxplots that help you find correlations and trends hidden in your dataset<\/p>\n\n\n\n<p><strong>Key Functions<\/strong><\/p>\n\n\n\n<ul>\n<li>sns.heatmap() \u2013 shows how variables are related to each other<\/li>\n\n\n\n<li>sns.pairplot() \u2013 compares relationships between multiple variables<\/li>\n\n\n\n<li>sns.boxplot() \u2013 helps compare categories and detect outliers<\/li>\n<\/ul>\n\n\n\n<p><strong>Example:<\/strong><\/p>\n\n\n\n<p><strong>Input:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import seaborn as sns\nimport matplotlib.pyplot as plt\n\n# Correlation heatmap\nplt.figure(figsize=(8,6))\nsns.heatmap(df.corr(), annot=True, cmap=\"coolwarm\")\nplt.title(\"Correlation Matrix\")\nplt.show()\n\n# Pairplot\nsns.pairplot(df, hue=\"Category\")\nplt.show()\n<\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<ul>\n<li>The <strong>heatmap<\/strong> summarizes numerical relationships at a glance.<\/li>\n\n\n\n<li>The <strong>pairplot<\/strong> shows these relationships in detail \u2014 how each pair of variables interacts, grouped by category.<\/li>\n<\/ul>\n\n\n\n<p>Together, they give you both a big-picture and in-depth understanding of your dataset, making Seaborn an excellent tool for the exploratory stage of analysis.<\/p>\n\n\n\n<p>Join HCL GUVI\u2019s<a href=\"https:\/\/www.guvi.in\/mlp\/data-science-email-course?utm_source=blog&amp;utm_medium=hyperlink&amp;utm_campaign=exploratory-data-analysis-with-numpy-pandas-matplotlib-and-seaborn\" target=\"_blank\" rel=\"noreferrer noopener\"> <strong>5-day free Data Science Email Series<\/strong><\/a> and learn the essentials of data analysis through daily lessons, practical exercises, and visual guides. Perfect for beginners eager to understand <strong>NumPy, Pandas, Matplotlib, and Seaborn<\/strong> from scratch.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Real-World EDA Workflow Example<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/REAL-WORLD-EDA-WORKFLOW-EXAMPLE-1200x630.png\" alt=\"Infographic depicting the step-by-step workflow of Exploratory Data Analysis (EDA) in Python.\" class=\"wp-image-93781\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/REAL-WORLD-EDA-WORKFLOW-EXAMPLE-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/REAL-WORLD-EDA-WORKFLOW-EXAMPLE-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/REAL-WORLD-EDA-WORKFLOW-EXAMPLE-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/REAL-WORLD-EDA-WORKFLOW-EXAMPLE-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/REAL-WORLD-EDA-WORKFLOW-EXAMPLE-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/11\/REAL-WORLD-EDA-WORKFLOW-EXAMPLE-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>Let\u2019s walk through how to apply Exploratory Data Analysis (EDA) to a <strong>Marketing Campaign dataset<\/strong>. This example helps you understand how marketers analyze spending and revenue data to find which channels perform best.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Sample Dataset<\/strong><\/h3>\n\n\n\n<p>Below is a small example dataset named marketing_campaign.csv.<br>You can create it using Excel or a text editor and save it as a CSV file.<\/p>\n\n\n\n<p><strong>Sample Dataset : <\/strong>marketing_campaign.csv<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Campaign_ID<\/strong><\/td><td><strong>Spend<\/strong><\/td><td><strong>Revenue<\/strong><\/td><td><strong>Channel<\/strong><\/td><td><strong>AgeGroup<\/strong><\/td><\/tr><tr><td>1<\/td><td>5000<\/td><td>5200<\/td><td>Social Media<\/td><td>25-34<\/td><\/tr><tr><td>2<\/td><td>7000<\/td><td>6500<\/td><td>Email<\/td><td>35-44<\/td><\/tr><tr><td>3<\/td><td>4000<\/td><td>3000<\/td><td>TV Ads<\/td><td>25-34<\/td><\/tr><tr><td>4<\/td><td>6000<\/td><td>8000<\/td><td>Social Media<\/td><td>18-24<\/td><\/tr><tr><td>5<\/td><td>3000<\/td><td>2000<\/td><td>Print Media<\/td><td>45-54<\/td><\/tr><tr><td>6<\/td><td>9000<\/td><td>12000<\/td><td>Email<\/td><td>35-44<\/td><\/tr><tr><td>7<\/td><td>10000<\/td><td>11500<\/td><td>Social Media<\/td><td>25-34<\/td><\/tr><tr><td>8<\/td><td>3500<\/td><td>4000<\/td><td>TV Ads<\/td><td>18-24<\/td><\/tr><tr><td>9<\/td><td>7500<\/td><td>9000<\/td><td>Email<\/td><td>25-34<\/td><\/tr><tr><td>10<\/td><td>2000<\/td><td>1500<\/td><td>Print Media<\/td><td>45-54<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 1: Import Libraries <\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n<\/code><\/pre>\n\n\n\n<p>These libraries will help handle, analyze, and visualize the data.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 2: Load Dataset<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># Load the dataset\ndf = pd.read_csv(\"marketing_campaign.csv\")\nprint(df.head())\n<\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<p>&nbsp;&nbsp;Campaign_ID&nbsp; Spend&nbsp; Revenue &nbsp; &nbsp; Channel &nbsp; &nbsp; AgeGroup<\/p>\n\n\n\n<p>0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5000 &nbsp; &nbsp; 5200 &nbsp; &nbsp; Social Media &nbsp; &nbsp; 25-34<\/p>\n\n\n\n<p>1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 7000 &nbsp; &nbsp; 6500 &nbsp; &nbsp; &nbsp; &nbsp; Email &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 35-44<\/p>\n\n\n\n<p>2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 4000 &nbsp; &nbsp; 3000 &nbsp; &nbsp; &nbsp; &nbsp; TV Ads&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 25-34<\/p>\n\n\n\n<p>3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6000 &nbsp; &nbsp; 8000 &nbsp; &nbsp; Social Media &nbsp; &nbsp; 18-24<\/p>\n\n\n\n<p>4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 3000 &nbsp; &nbsp; 2000&nbsp; &nbsp; &nbsp; Print Media &nbsp; &nbsp; &nbsp; 45-54<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 3: Initial Inspection <\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>print(df.info())\nprint(df.describe())\n<\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<p>&lt;class &#8216;pandas.core.frame.DataFrame&#8217;&gt;<\/p>\n\n\n\n<p>RangeIndex: 10 entries, 0 to 9<\/p>\n\n\n\n<p>Data columns (total 5 columns):<\/p>\n\n\n\n<p>&nbsp;# &nbsp; Column &nbsp; &nbsp; &nbsp; Non-Null Count&nbsp; Dtype<\/p>\n\n\n\n<p>&#8212;&nbsp; &#8212;&#8212; &nbsp; &nbsp; &nbsp; &#8212;&#8212;&#8212;&#8212;&#8211;&nbsp; &#8212;&#8211;<\/p>\n\n\n\n<p>&nbsp;0 &nbsp; Campaign_ID&nbsp; 10 non-null &nbsp; &nbsp; int64<\/p>\n\n\n\n<p>&nbsp;1 &nbsp; Spend&nbsp; &nbsp; &nbsp; &nbsp; 10 non-null &nbsp; &nbsp; int64<\/p>\n\n\n\n<p>&nbsp;2 &nbsp; Revenue&nbsp; &nbsp; &nbsp; 10 non-null &nbsp; &nbsp; int64<\/p>\n\n\n\n<p>&nbsp;3 &nbsp; Channel&nbsp; &nbsp; &nbsp; 10 non-null &nbsp; &nbsp; object<\/p>\n\n\n\n<p>&nbsp;4 &nbsp; AgeGroup &nbsp; &nbsp; 10 non-null &nbsp; &nbsp; object<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Spend &nbsp; &nbsp; &nbsp; &nbsp; Revenue<\/p>\n\n\n\n<p>count &nbsp; &nbsp; 10.000000 &nbsp; &nbsp; 10.000000<\/p>\n\n\n\n<p>mean &nbsp; 5950.000000 &nbsp; 6420.000000<\/p>\n\n\n\n<p>std &nbsp; &nbsp; &nbsp; 2640.055789 &nbsp; 3795.942415<\/p>\n\n\n\n<p>min&nbsp; &nbsp; &nbsp; 2000.000000 &nbsp; 1500.000000<\/p>\n\n\n\n<p>max&nbsp; &nbsp; 10000.000000&nbsp; 12000.000000<\/p>\n\n\n\n<p>This shows:<\/p>\n\n\n\n<ul>\n<li>No missing values<\/li>\n\n\n\n<li>Spend ranges from \u20b92,000 to \u20b910,000<\/li>\n\n\n\n<li>Revenue ranges from \u20b91,500 to \u20b912,000<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 4: Check Missing Values <\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>print(df.isnull().sum())<\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<p>Campaign_ID&nbsp; &nbsp; 0<\/p>\n\n\n\n<p>Spend&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0<\/p>\n\n\n\n<p>Revenue&nbsp; &nbsp; &nbsp; &nbsp; 0<\/p>\n\n\n\n<p>Channel&nbsp; &nbsp; &nbsp; &nbsp; 0<\/p>\n\n\n\n<p>AgeGroup &nbsp; &nbsp; &nbsp; 0<\/p>\n\n\n\n<p>dtype: int64<\/p>\n\n\n\n<p>Everything looks clean &#8211; no missing data.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 5: Univariate Analysis<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>sns.histplot(df&#91;\"Spend\"], kde=True, color=\"blue\")\nplt.title(\"Distribution of Marketing Spend\")\nplt.xlabel(\"Spend\")\nplt.ylabel(\"Frequency\")\nplt.show()\n<\/code><\/pre>\n\n\n\n<p><strong>Explanation:<br><\/strong>The output of this is a histogram that shows how marketing spend is spread out.<br>You can see that most campaigns spend between \u20b93,000 and \u20b98,000.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 6: Bivariate Analysis<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>sns.scatterplot(data=df, x=\"Spend\", y=\"Revenue\", hue=\"Channel\", s=100)\nplt.title(\"Spend vs Revenue by Marketing Channel\")\nplt.xlabel(\"Spend\")\nplt.ylabel(\"Revenue\")\nplt.show()\n<\/code><\/pre>\n\n\n\n<p><strong>Explanation:<\/strong><strong><br><\/strong>The scatterplot helps you see which channels are most effective.<br>For example, Email and Social Media campaigns tend to generate higher revenue for similar spending, while Print Media performs poorly.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step 7: Correlation Matrix<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>sns.heatmap(df.corr(), annot=True, cmap=\"coolwarm\")\nplt.title(\"Correlation Heatmap\")\nplt.show()\n<\/code><\/pre>\n\n\n\n<p><strong>Output Explanation:<\/strong><\/p>\n\n\n\n<ul>\n<li>The correlation between <strong>Spend<\/strong> and <strong>Revenue<\/strong> is around <strong>0.9<\/strong>, meaning more spend usually brings higher revenue.<\/li>\n\n\n\n<li>High correlation helps in identifying key predictors for future modeling.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Insights From This Mini Project <\/strong><\/h3>\n\n\n\n<p>From this small dataset, EDA reveals:<\/p>\n\n\n\n<ul>\n<li>Social Media and Email are the most profitable channels.<\/li>\n\n\n\n<li>Print Media shows low ROI despite moderate spend.<\/li>\n\n\n\n<li>Spending and Revenue are strongly correlated.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>Exploratory Data Analysis is the first and most critical step in any data journey. By using NumPy, Pandas, Matplotlib, and Seaborn, you can clean, visualize, and understand your data efficiently, thus laying the groundwork for better decision-making and accurate models.<\/p>\n\n\n\n<p>If you\u2019re eager to master EDA and data visualization, HCL GUVI\u2019s <a href=\"https:\/\/www.guvi.in\/zen-class\/data-science-course\/?utm_source=blog&amp;utm_medium=hyperlink&amp;utm_campaign=exploratory-data-analysis-with-numpy-pandas-matplotlib-and-seaborn\" target=\"_blank\" rel=\"noreferrer noopener\">Data Science Course<\/a> Program offers a practical, project-based learning path. With modules covering Python, NumPy, Pandas, Matplotlib, Seaborn, and machine learning, you\u2019ll gain hands-on experience guided by mentors, preparing you for real-world data roles.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>FAQS<\/strong><\/h2>\n\n\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1760421936831\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>1. What is the main goal of EDA?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>To summarize data characteristics, identify patterns, and prepare it for modeling.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1760421959636\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>2. How is EDA different from data preprocessing?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>EDA focuses on understanding and visualizing data, while preprocessing involves transforming and cleaning it for models.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1760421981449\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>3. Can EDA be automated?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Yes, tools like <strong>Sweetviz<\/strong> and <strong>Pandas-Profiling<\/strong> automate report generation, but human interpretation remains crucial.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1760422003554\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>4. What is the most used library for EDA in Python?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p><strong>Pandas<\/strong> for data manipulation and <strong>Seaborn<\/strong> for visualization are the most commonly used.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1760422027528\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>5. Is EDA required before machine learning?<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Absolutely! Skipping EDA can lead to wrong assumptions, poor model performance, and misleading insights.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>In the world of data science, every project starts with one vital step and that is understanding your data. Before building models or drawing insights, it\u2019s essential to know what your data truly represents. This is where Exploratory Data Analysis(EDA) plays a key role. EDA helps you uncover hidden patterns, detect anomalies, test hypotheses, and [&hellip;]<\/p>\n","protected":false},"author":65,"featured_media":93776,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,745,717],"tags":[],"views":"3130","authorinfo":{"name":"Jebasta","url":"https:\/\/www.guvi.in\/blog\/author\/jebasta\/"},"thumbnailURL":"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/10\/EXPLORATORY-DATA-ANALYSIS-EDA-WITH-NUMPY-PANDAS-MATPLOTLIB-AND-SEABORN-300x116.png","_links":{"self":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/89711"}],"collection":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/users\/65"}],"replies":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/comments?post=89711"}],"version-history":[{"count":8,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/89711\/revisions"}],"predecessor-version":[{"id":93782,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/89711\/revisions\/93782"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/media\/93776"}],"wp:attachment":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/media?parent=89711"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/categories?post=89711"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/tags?post=89711"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}