header_logo
Post thumbnail
DATA SCIENCE

Decision Tree in Data Science: A Step-by-Step Tutorial

By Roopa Dharshini

Data Science consists of different types of algorithms for various use cases. The most commonly used supervised learning algorithm is the Decision tree. This blog is a perfect guide to get started with decision trees in data science.

In this blog, we will see the detailed description of the decision tree in data science, its application and uses, how it works, and implementation using Python. Let’s get started!

Table of contents


  1. What is a Decision Tree in Data Science?
  2. Key Applications of Decision Trees in Data Science
  3. How Decision Trees Work
    • Working of Decision Tree
  4. Implementing Decision Trees in Data Science: Step-by-Step
    • Step 1: Import Libraries
    • Step 2: Create / Import Dataset
    • Step 3: Define Features and Label
    • Step 4: Train the Model
    • Step 5: Visualize the data
    • Step 6: Test the Example
  5. Getting started with Guvi
  6. Conclusion
  7. FAQs
    • Q1. Do I need coding for data science?
    • Q2. Is DSA required for data science?
    • Q3. Which language is best for data science?
    • Q4. What industries hire data scientists?
    • Q5. Is data science a good career for freshers

What is a Decision Tree in Data Science?

Decision Tree is a supervised learning algorithm in Data science. It follows a flowchart or tree branch-like pattern that helps in taking a series of decisions and their potential outcomes. Decision trees help solve classification and regression tasks. 

Key Applications of Decision Trees in Data Science

Key Applications of Decision Tree in Data Science

Decision trees in data science are mainly useful in handling classification problems like classifying objects into distinct categories based on their features, and regression problems like predicting a continuous output based on one or more input variables. Some of the key applications of decision trees are:

  • Fraud Detection: It helps in detecting fraudulent transactions by analyzing patterns and identifying anomalies.
  • Credit Scoring: It calculates the credit scores and assesses the creditworthiness of loan applicants.
  • Product Recommendations: It is used to suggest products to customers based on their previous purchase history.
  • Diagnosis: It can assist doctors in diagnosing any disease and provide the result based on analyzing the patient data and medical history.
  • Inventory Management: It helps in tracking and managing the inventory by predicting sales trends and optimizing stock levels.

How Decision Trees Work

How Decision Tree Works

This section covers the structure and key components of the decision tree and its working step by step. The structure of the decision tree is visualized in the form of a tree with its nodes, branches, and leaves. Some of the key components of the decision tree are:

  • Nodes: These are the building blocks or the data.
    • Root Node: It represents the entire dataset and is present at the top of the tree.
    • Internal Nodes: They represent the features that help make decisions.
  • Branches: These are the lines that connect one or more nodes, indicating the flow of decisions.
  • Leaves (Terminal Nodes): These are the outcome of the series of decisions.

Working of Decision Tree

To understand the working of decision trees, let us consider the following example for a classification problem. The diagram below shows a decision tree to find whether a person is fit or not by taking a series of decisions. As a first step, let’s look at the given conditions. 

How Decision Tree Works
  • The first condition for a fit person is, they should be below 30 years old.
  • The second condition is to check whether the person eats a lot of pizzas or exercises in the morning.
  • Based on the two conditions, the decision tree will predict whether the person is fit or unfit.

Example 1: Let us consider a person who is 35 years old and does exercises in the morning. It follows the path below and predicts the person is Fit.

Example 1

Example 2: Let us consider a person who is 28 years old and eats lots of pizzas. It follows the path below and predicts that the person is unfit.

Example 2
MDN

Implementing Decision Trees in Data Science: Step-by-Step

Implementing Decision Tree in Data Science

In this section, we will see the implementation of a decision tree in data science using Python. 

Step 1: Import Libraries

For implementing a decision tree in Python, you need to import the decision tree model from scikit-learn, pandas for manipulating the data, and matplotlib for visualizing the data.

from sklearn.tree import DecisionTreeClassifier, plot_tree
import pandas as pd
import matplotlib.pyplot as plt

Step 2: Create / Import Dataset

The next step is to import the dataset into the code. For the above classification example to find whether the person is fit or not, we will create our sample dataset.

data = {
    ‘Age’: [35, 28, 25, 40, 29, 32],
    ‘Eats_Pizza’: [0, 1, 1, 0, 0, 0],
    ‘Exercises’: [1, 0, 1, 1, 0, 1],
    ‘Fit’: [1, 0, 1, 1, 1, 1]
}

df = pd.DataFrame(data)

Step 3: Define Features and Label

The third step is to define the necessary input features that help predict the outcome. The label represents the output column.

X = df[[‘Age’, ‘Eats_Pizza’, ‘Exercises’]]
y = df[‘Fit’]

Step 4: Train the Model

The fourth step is to train the decision tree model for the specific use cases. Here, we will train the decision tree for classification examples.

clf = DecisionTreeClassifier(criterion=’entropy’)  # or use ‘gini’
clf.fit(X, y)

Step 5: Visualize the data

The next step is visualization. This is an optional step, but it will help explain your code or thought processes to others and stakeholders. Here, it visualizes the decision tree to predict if a person is fit.

plt.figure(figsize=(10, 6))
plot_tree(clf, feature_names=[‘Age’, ‘Eats_Pizza’, ‘Exercises’], class_names=[‘Unfit’, ‘Fit’], filled=True)
plt.title(“Decision Tree to Predict if a Person is Fit”)
plt.show()

Step 6: Test the Example

The last step is to test the model to find whether it is providing the expected results or not. For the given problem, we will test the above examples 1 & 2 and verify the predicted result. Here, the model will predict either 1 or 0. ‘1’ represents the person is “Fit” and ‘0’ represents the person is “Unfit”.

# Example 1: 35 years old, does exercise in the morning
example1 = [[35, 0, 1]]  # Expected: Fit
print(“Example 1 Prediction (35 y/o, exercises):”, “Fit” if clf.predict(example1)[0] == 1 else “Unfit”)

# Example 2: 28 years old, eats a lot of pizzas
example2 = [[28, 1, 0]]  # Expected: Unfit
print(“Example 2 Prediction (28 y/o, eats pizza):”, “Fit” if clf.predict(example2)[0] == 1 else “Unfit”)

Getting started with Guvi

To learn about each data science subject, you do not need to go and look for resources on the internet. We provide you with the best roadmap and a bootcamp course that covers all the necessary data science algorithms with detailed doubt-clearing sessions.

Check out Guvi’s FREE E-book on Master the Art of Data Science – A Complete Guide. It provides a detailed data science roadmap and guide to start your career in data science. This is a great way to start your career. 

Incorporating emerging trends into data science will fuel the rapid growth of this field, unlocking countless opportunities for aspiring data scientists. If you’re ready to take advantage of this moment and launch your career, enroll in Guvi’s industry-recognized, certified Data Science course today.

Conclusion

In this blog, we explored the structure, components, key applications, working, and step-by-step implementation of decision trees in data science using Python. By translating real-world logic into a model, we demonstrated how decision trees make predictions through clear, rule-based paths using a simple classification problem to predict whether a person is fit or unfit. Happy learning!

FAQs

Yes, coding is an essential skill for data science. Being comfortable with coding is crucial for tasks like data manipulation, building machine learning models, and automating processes. Python and R are the most commonly used programming languages in data science, and they have extensive libraries to make your job easier. 

Yes, Data Structures and Algorithms (DSA) are important for data science, especially for machine learning and data engineering. DSA helps with data management, analysis, and problem solving. It can also help you write efficient code. You can expect DSA-based coding challenges during the interview.

The most popular and widely used language for data science is Python. It’s beginner-friendly, has a rich ecosystem of libraries (like Pandas, NumPy), and is great for tasks like data cleaning, analysis, and machine learning. For those working on large-scale systems, Scala and Julia are gaining popularity as well, especially for machine learning and high-performance tasks.

Data scientists are in demand everywhere, from tech giants like Google and Amazon to healthcare, finance, retail, and entertainment. Almost every industry is tapping into data to drive smarter decisions.

Absolutely! With the booming demand for data-driven insights, freshers with strong analytical skills and a passion for learning can easily find opportunities and grow rapidly in the field.

Success Stories

Did you enjoy this article?

Schedule 1:1 free counselling

Similar Articles

Loading...
Share logo Copy link
Power Packed Webinars
Free Webinar Icon
Power Packed Webinars
Subscribe now for FREE! 🔔
close
Webinar ad
Table of contents Table of contents
Table of contents Articles
Close button

  1. What is a Decision Tree in Data Science?
  2. Key Applications of Decision Trees in Data Science
  3. How Decision Trees Work
    • Working of Decision Tree
  4. Implementing Decision Trees in Data Science: Step-by-Step
    • Step 1: Import Libraries
    • Step 2: Create / Import Dataset
    • Step 3: Define Features and Label
    • Step 4: Train the Model
    • Step 5: Visualize the data
    • Step 6: Test the Example
  5. Getting started with Guvi
  6. Conclusion
  7. FAQs
    • Q1. Do I need coding for data science?
    • Q2. Is DSA required for data science?
    • Q3. Which language is best for data science?
    • Q4. What industries hire data scientists?
    • Q5. Is data science a good career for freshers