Menu

Understanding the Resulting Datasets

Understanding the Resulting Datasets

After splitting, four datasets are created:

Dataset

Purpose

X_train

Training features

X_test

Testing features

y_train

Training labels

y_test

Testing labels

For the California Housing Dataset:

Dataset

Approximate Size

X_train

16512 rows

X_test

4128 rows

y_train

16512 rows

y_test

4128 rows

The training data is used for learning, while the testing data remains untouched until evaluation.