Imbalanced Datasets in Python: SMOTE, Class Weights, and Best Practices
Jun 19, 2026 4 Min Read 21 Views
(Last Updated)
Table of contents
- TL;DR
- Introduction
- What Are Imbalanced Datasets?
- Why Accuracy Fails on Imbalanced Data
- Metrics That Matter More Than Accuracy
- Precision
- Recall
- F1-Score
- PR-AUC
- Detecting Imbalance in Python
- Using Class Weights
- Using SMOTE
- Installing SMOTE
- Applying SMOTE
- Training the Model
- Threshold Tuning: The Overlooked Technique
- Class Weights vs SMOTE vs Threshold Tuning
- Use Class Weights When
- Use SMOTE When
- Use Threshold Tuning When
- Real-World Workflow for Handling Imbalanced Datasets
- Best Practices
- Conclusion
- FAQs
- What is an imbalanced dataset in Python?
- Why is accuracy misleading for imbalanced datasets?
- What is SMOTE in machine learning?
- When should I use class weights?
- Should SMOTE be applied before train-test splitting?
- Which metric is best for imbalanced datasets?
TL;DR
- An imbalanced dataset in python occurs when one class significantly outnumbers another, making it difficult for machine learning models to learn minority-class patterns effectively.
- Accuracy is often misleading for imbalanced classification because a model can achieve high accuracy while missing important minority-class predictions.
- Precision, Recall, F1-Score, and PR-AUC are better metrics for evaluating performance on imbalanced datasets.
- Class weights provide a simple way to improve minority-class learning without changing the dataset.
- SMOTE generates synthetic minority-class samples to help models learn underrepresented patterns.
- Threshold tuning can improve minority-class detection and should be evaluated alongside SMOTE and class weights.
- The most effective approach is to compare multiple techniques and choose the one that performs best on validation data.
Introduction
Machine learning models learn patterns from data, but their performance can suffer when one class significantly outnumbers another. This challenge, known as class imbalance, is common in applications such as fraud detection, medical diagnosis, cybersecurity, and customer churn prediction. Understanding how to handle imbalanced datasets is an essential machine learning skill, and learners can build hands-on experience with such concepts through HCL GUVI’s Data Science Course.
What Are Imbalanced Datasets?
An imbalanced dataset is a dataset where one class contains significantly more samples than another.
Consider a fraud detection dataset:
- Legitimate transactions: 99,000
- Fraudulent transactions: 1,000
Although fraud is the most important outcome, it represents only 1% of the data.
Imbalanced datasets are common in:
- Credit card fraud detection
- Medical diagnosis systems
- Equipment failure prediction
- Cybersecurity threat detection
- Customer churn prediction
- Insurance claim analysis
Because machine learning algorithms are designed to optimize overall performance, they often prioritize the majority class and struggle to learn minority-class patterns effectively.
Why Accuracy Fails on Imbalanced Data
Accuracy measures the percentage of correct predictions made by a model.
Imagine a dataset where 99% of transactions are legitimate and only 1% are fraudulent.
A model that predicts every transaction as legitimate would achieve:
Accuracy = 99%
At first glance, this appears impressive.
However, the model would fail to identify a single fraudulent transaction.
This demonstrates why accuracy alone should never be used to evaluate imbalanced classification problems. A high accuracy score may hide poor minority-class performance.
Instead, practitioners focus on metrics that better reflect a model’s ability to detect important events.
Metrics That Matter More Than Accuracy
1. Precision
Precision measures how many positive predictions were actually correct.
A high precision score indicates that the model generates fewer false alarms.
2. Recall
Recall measures how many actual positive cases were successfully identified.
For fraud detection and disease diagnosis, recall is often the most important metric because missing positive cases can be costly.
3. F1-Score
F1-Score balances precision and recall into a single metric.
It is particularly useful when both false positives and false negatives matter.
4. PR-AUC
Precision-Recall Area Under Curve (PR-AUC) is often more informative than ROC-AUC for highly imbalanced datasets because it focuses on minority-class performance.
When evaluating imbalanced classification models, these metrics provide a much clearer picture than accuracy.
If you’d like to explore these evaluation metrics in greater detail, check out our guide on Precision and Recall in Machine Learning.
In many real-world fraud detection systems, fraudulent transactions often make up less than 1% of all records. This leads to a classic problem in machine learning: class imbalance. A model that simply predicts every transaction as legitimate can still achieve over 99% accuracy while completely failing to detect fraud. Because of this, accuracy alone is not a reliable metric in such scenarios. Instead, practitioners rely on metrics like precision, recall, F1-score, and AUC, along with techniques such as resampling, anomaly detection, and cost-sensitive learning. Handling imbalanced datasets is therefore a critical challenge in building effective real-world ML systems.
Detecting Imbalance in Python
Before applying any balancing technique, you should examine class distribution.
print(df['target'].value_counts())
To view class percentages:
print(df['target'].value_counts(normalize=True) * 100)
Visualizing the distribution can make the imbalance easier to identify.
import matplotlib.pyplot as plt
df['target'].value_counts().plot(kind='bar')
plt.show()
When splitting data, always preserve class ratios using stratification.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
stratify=y,
random_state=42
)
This ensures both training and testing datasets contain similar class distributions.
Using Class Weights
Class weights are often the simplest solution for handling imbalanced datasets in Python.
Instead of modifying the dataset, class weights increase the penalty for misclassifying minority-class samples.
Many Scikit-learn models support the class_weight parameter.
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(
class_weight='balanced',
random_state=42
)
model.fit(X_train, y_train)
Class weights are easy to implement, require no synthetic data generation, and work well with many algorithms. However, they may struggle with extremely imbalanced datasets, and results can vary depending on the model.
Using SMOTE
SMOTE (Synthetic Minority Over-sampling Technique) is one of the most popular oversampling techniques in machine learning.
Instead of duplicating minority-class records, SMOTE creates synthetic samples between existing minority observations.
This helps models learn minority-class patterns more effectively.
Installing SMOTE
pip install imbalanced-learn
Applying SMOTE
from imblearn.over_sampling import SMOTE
smote = SMOTE(random_state=42)
X_train_smote, y_train_smote = smote.fit_resample(
X_train,
y_train
)
Training the Model
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train_smote, y_train_smote)
SMOTE improves minority-class representation and often boosts recall by generating synthetic samples. However, it can introduce noise, create unrealistic observations, and increase training time.
For learners looking to strengthen their machine learning and data science skills, HCL GUVI’s Data Science Course covers practical workflows used in real-world projects.
Threshold Tuning: The Overlooked Technique
Many classification models use a default probability threshold of 0.5.
However, this threshold may not be optimal for imbalanced datasets.
For example, reducing the threshold from 0.5 to 0.3 may allow the model to identify more minority-class cases, improving recall.
Threshold tuning is particularly valuable when:
- Missing positive cases is expensive
- Recall is more important than precision
- Probability estimates are reliable
Recent research suggests threshold tuning can sometimes deliver performance gains comparable to or better than oversampling techniques.
Class Weights vs SMOTE vs Threshold Tuning
Choosing the right technique depends on your dataset and business objective.
Use Class Weights When
- You need a simple baseline solution
- The dataset size is limited
- Models support class weighting
Use SMOTE When
- Minority samples are extremely scarce
- Models struggle to learn minority patterns
- Recall is a critical metric
Use Threshold Tuning When
- False negatives are expensive
- Probability outputs are available
- You want to improve recall without changing the dataset
In practice, experienced machine learning engineers often evaluate all three approaches and compare results before deployment.
Looking to strengthen your Python and machine learning fundamentals? Download HCL GUVI’s free Python eBook for practical concepts, examples, and hands-on learning resources.
Real-World Workflow for Handling Imbalanced Datasets
Analyze Class Distribution
↓
Stratified Train-Test Split
↓
Train Baseline Model
↓
Evaluate Precision, Recall & F1-Score
↓
Apply Class Weights or SMOTE
↓
Compare Performance
↓
Tune Decision Thresholds
↓
Cross-Validation
↓
Deploy Best Model
Best Practices
- Always inspect class distribution before training.
- Use stratified splitting to preserve class ratios.
- Evaluate models using Precision, Recall, F1-Score, and PR-AUC.
- Start with class weights before exploring more advanced techniques.
- Apply SMOTE only to training data.
- Experiment with threshold tuning.
- Use cross-validation to verify results.
- Compare multiple approaches instead of assuming one technique will always win.
Conclusion
Handling imbalanced datasets in Python is not about choosing a single technique but selecting the approach that best fits your data and objectives. While class weights, SMOTE, and threshold tuning each offer unique advantages, their effectiveness depends on the problem being solved. By combining proper evaluation metrics, validation strategies, and balancing techniques, you can build more reliable and accurate machine learning models.
FAQs
1. What is an imbalanced dataset in Python?
An imbalanced dataset occurs when one class contains significantly more samples than another class, making it difficult for machine learning models to learn minority-class patterns effectively.
2. Why is accuracy misleading for imbalanced datasets?
Accuracy can appear high even when a model completely fails to identify minority-class samples. Metrics such as Recall and F1-Score provide a more reliable evaluation.
3. What is SMOTE in machine learning?
SMOTE is an oversampling technique that generates synthetic minority-class samples to balance training data and improve model performance.
4. When should I use class weights?
Class weights are ideal when you want a simple solution that does not modify the dataset, and your algorithm supports weighted learning.
6. Should SMOTE be applied before train-test splitting?
No. SMOTE should only be applied to the training dataset to avoid data leakage.
7. Which metric is best for imbalanced datasets?
There is no single best metric. Precision, Recall, F1-Score, and PR-AUC are commonly used depending on business objectives.



Did you enjoy this article?