Menu

Cleaning the Dataset

Cleaning the Dataset

Real-world datasets often contain missing values or unnecessary information.

One simple approach is to remove rows with excessive missing values or replace missing entries where appropriate.

Example Code

matches = matches.drop_duplicates()

matches = matches.fillna("Unknown")

Explanation

In this lesson:

  • Duplicate records are removed.
  • Missing categorical values are replaced with "Unknown".

These steps improve the consistency of the dataset.