Menu

Understanding the California Housing Dataset for ML

Understanding the California Housing Dataset

The dataset used in this project is the California Housing Dataset, one of the most widely used datasets for learning regression techniques.

The dataset contains:

  • 20640 housing records
  • 8 input features
  • 1 target variable

The target variable is:

PRICE

which represents the median value of owner-occupied homes in thousands of dollars.

Some important features include:

Feature

Description

MedIncMedian income
HouseAgeMedian house age
AveRoomsAverage number of rooms
AveBedrmsAverage number of bedrooms
PopulationPopulation of block group
AveOccupAverage occupancy

Each feature contributes differently to the final house price.

The machine learning model will learn these relationships automatically during training.