
Decision Tree in Data Science: A Step-by-Step Tutorial
Jul 19, 2025 4 Min Read 248 Views
(Last Updated)
Data Science consists of different types of algorithms for various use cases. The most commonly used supervised learning algorithm is the Decision tree. This blog is a perfect guide to get started with decision trees in data science.
In this blog, we will see the detailed description of the decision tree in data science, its application and uses, how it works, and implementation using Python. Let’s get started!
Table of contents
- What is a Decision Tree in Data Science?
- Key Applications of Decision Trees in Data Science
- How Decision Trees Work
- Working of Decision Tree
- Implementing Decision Trees in Data Science: Step-by-Step
- Step 1: Import Libraries
- Step 2: Create / Import Dataset
- Step 3: Define Features and Label
- Step 4: Train the Model
- Step 5: Visualize the data
- Step 6: Test the Example
- Getting started with Guvi
- Conclusion
- FAQs
- Q1. Do I need coding for data science?
- Q2. Is DSA required for data science?
- Q3. Which language is best for data science?
- Q4. What industries hire data scientists?
- Q5. Is data science a good career for freshers
What is a Decision Tree in Data Science?
Decision Tree is a supervised learning algorithm in Data science. It follows a flowchart or tree branch-like pattern that helps in taking a series of decisions and their potential outcomes. Decision trees help solve classification and regression tasks.
Key Applications of Decision Trees in Data Science

Decision trees in data science are mainly useful in handling classification problems like classifying objects into distinct categories based on their features, and regression problems like predicting a continuous output based on one or more input variables. Some of the key applications of decision trees are:
- Fraud Detection: It helps in detecting fraudulent transactions by analyzing patterns and identifying anomalies.
- Credit Scoring: It calculates the credit scores and assesses the creditworthiness of loan applicants.
- Product Recommendations: It is used to suggest products to customers based on their previous purchase history.
- Diagnosis: It can assist doctors in diagnosing any disease and provide the result based on analyzing the patient data and medical history.
- Inventory Management: It helps in tracking and managing the inventory by predicting sales trends and optimizing stock levels.
How Decision Trees Work

This section covers the structure and key components of the decision tree and its working step by step. The structure of the decision tree is visualized in the form of a tree with its nodes, branches, and leaves. Some of the key components of the decision tree are:
- Nodes: These are the building blocks or the data.
- Root Node: It represents the entire dataset and is present at the top of the tree.
- Internal Nodes: They represent the features that help make decisions.
- Branches: These are the lines that connect one or more nodes, indicating the flow of decisions.
- Leaves (Terminal Nodes): These are the outcome of the series of decisions.
Working of Decision Tree
To understand the working of decision trees, let us consider the following example for a classification problem. The diagram below shows a decision tree to find whether a person is fit or not by taking a series of decisions. As a first step, let’s look at the given conditions.

- The first condition for a fit person is, they should be below 30 years old.
- The second condition is to check whether the person eats a lot of pizzas or exercises in the morning.
- Based on the two conditions, the decision tree will predict whether the person is fit or unfit.
Example 1: Let us consider a person who is 35 years old and does exercises in the morning. It follows the path below and predicts the person is Fit.

Example 2: Let us consider a person who is 28 years old and eats lots of pizzas. It follows the path below and predicts that the person is unfit.

Implementing Decision Trees in Data Science: Step-by-Step

In this section, we will see the implementation of a decision tree in data science using Python.
Step 1: Import Libraries
For implementing a decision tree in Python, you need to import the decision tree model from scikit-learn, pandas for manipulating the data, and matplotlib for visualizing the data.
from sklearn.tree import DecisionTreeClassifier, plot_tree import pandas as pd import matplotlib.pyplot as plt |
Step 2: Create / Import Dataset
The next step is to import the dataset into the code. For the above classification example to find whether the person is fit or not, we will create our sample dataset.
data = { ‘Age’: [35, 28, 25, 40, 29, 32], ‘Eats_Pizza’: [0, 1, 1, 0, 0, 0], ‘Exercises’: [1, 0, 1, 1, 0, 1], ‘Fit’: [1, 0, 1, 1, 1, 1] } df = pd.DataFrame(data) |
Step 3: Define Features and Label
The third step is to define the necessary input features that help predict the outcome. The label represents the output column.
X = df[[‘Age’, ‘Eats_Pizza’, ‘Exercises’]] y = df[‘Fit’] |
Step 4: Train the Model
The fourth step is to train the decision tree model for the specific use cases. Here, we will train the decision tree for classification examples.
clf = DecisionTreeClassifier(criterion=’entropy’) # or use ‘gini’ clf.fit(X, y) |
Step 5: Visualize the data
The next step is visualization. This is an optional step, but it will help explain your code or thought processes to others and stakeholders. Here, it visualizes the decision tree to predict if a person is fit.
plt.figure(figsize=(10, 6)) plot_tree(clf, feature_names=[‘Age’, ‘Eats_Pizza’, ‘Exercises’], class_names=[‘Unfit’, ‘Fit’], filled=True) plt.title(“Decision Tree to Predict if a Person is Fit”) plt.show() |
Step 6: Test the Example
The last step is to test the model to find whether it is providing the expected results or not. For the given problem, we will test the above examples 1 & 2 and verify the predicted result. Here, the model will predict either 1 or 0. ‘1’ represents the person is “Fit” and ‘0’ represents the person is “Unfit”.
# Example 1: 35 years old, does exercise in the morning example1 = [[35, 0, 1]] # Expected: Fit print(“Example 1 Prediction (35 y/o, exercises):”, “Fit” if clf.predict(example1)[0] == 1 else “Unfit”) # Example 2: 28 years old, eats a lot of pizzas example2 = [[28, 1, 0]] # Expected: Unfit print(“Example 2 Prediction (28 y/o, eats pizza):”, “Fit” if clf.predict(example2)[0] == 1 else “Unfit”) |
Getting started with Guvi
To learn about each data science subject, you do not need to go and look for resources on the internet. We provide you with the best roadmap and a bootcamp course that covers all the necessary data science algorithms with detailed doubt-clearing sessions.
Check out Guvi’s FREE E-book on Master the Art of Data Science – A Complete Guide. It provides a detailed data science roadmap and guide to start your career in data science. This is a great way to start your career.
Incorporating emerging trends into data science will fuel the rapid growth of this field, unlocking countless opportunities for aspiring data scientists. If you’re ready to take advantage of this moment and launch your career, enroll in Guvi’s industry-recognized, certified Data Science course today.
Conclusion
In this blog, we explored the structure, components, key applications, working, and step-by-step implementation of decision trees in data science using Python. By translating real-world logic into a model, we demonstrated how decision trees make predictions through clear, rule-based paths using a simple classification problem to predict whether a person is fit or unfit. Happy learning!
FAQs
Yes, coding is an essential skill for data science. Being comfortable with coding is crucial for tasks like data manipulation, building machine learning models, and automating processes. Python and R are the most commonly used programming languages in data science, and they have extensive libraries to make your job easier.
Yes, Data Structures and Algorithms (DSA) are important for data science, especially for machine learning and data engineering. DSA helps with data management, analysis, and problem solving. It can also help you write efficient code. You can expect DSA-based coding challenges during the interview.
The most popular and widely used language for data science is Python. It’s beginner-friendly, has a rich ecosystem of libraries (like Pandas, NumPy), and is great for tasks like data cleaning, analysis, and machine learning. For those working on large-scale systems, Scala and Julia are gaining popularity as well, especially for machine learning and high-performance tasks.
Data scientists are in demand everywhere, from tech giants like Google and Amazon to healthcare, finance, retail, and entertainment. Almost every industry is tapping into data to drive smarter decisions.
Absolutely! With the booming demand for data-driven insights, freshers with strong analytical skills and a passion for learning can easily find opportunities and grow rapidly in the field.
Did you enjoy this article?