Menu

Understanding Imbalanced Data

Understanding Imbalanced Data

One of the biggest challenges in fraud detection is class imbalance.

In the credit card dataset:

  • Genuine transactions represent almost 99.8% of the data.
  • Fraudulent transactions represent only 0.2%.

This imbalance can mislead a machine learning model.

For example, if a model predicts every transaction as genuine, it may still achieve very high accuracy while completely failing to detect fraudulent transactions.

Therefore, handling imbalanced data is a critical step before training the model.