Menu

Splitting the Dataset

Splitting the Dataset

To evaluate model performance fairly, divide the dataset into training and testing sets.

Code

X_train, X_test, Y_train, Y_test = train_test_split(

X,

Y,

test_size=0.2,

random_state=2,

stratify=Y

)

Explanation

The dataset is divided into:

  • 80% Training Data
  • 20% Testing Data

The stratify parameter preserves the class distribution in both datasets, which is particularly important for highly imbalanced data.