Menu

Loading the Dataset

Lesson 2: Loading the Dataset

After extracting the ZIP file, the next step is to load the sales data into the Google Colab notebook. At this stage, we also import the libraries that will be used throughout the project for data handling and visualization.

Pandas is used to load and work with the dataset, while Matplotlib and Seaborn are used later for creating charts and visual analysis. Importing them early keeps the workflow clean and organized.

Code to import libraries and load the dataset:

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

# Load CSV file with specified encoding

df = pd.read_csv("/content/sales_data_sample.csv", encoding='latin1')

The encoding is specified to avoid character-related errors that can occur in real-world datasets. Once this code runs successfully, the sales data is stored in the DataFrame df, which will be used in all upcoming steps such as inspection, cleaning, aggregation, and exploratory data analysis.