Menu

Pandas DataFrame describe() Method

Pandas DataFrame describe() Method

In this tutorial, we will learn the Python pandas DataFrame.describe() method. It generates descriptive statistics which includes the central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values.

  • For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns.
  • For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles.
  • For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value.

The below shows the syntax of the DataFrame.describe() method.

Syntax

DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)

Example 1: Describing a DataFrame using the DataFrame.describe() Method

The below example shows describing a DataFrame using the DataFrame.describe() method. By default, only numeric fields are returned.

import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe())

Once we run the program we will get the following output.

Output:

Roll No Marks 

count 3.000000 3.000000 

mean 101.333333 83.333333 

std 1.527525 7.637626 

min 100.000000 75.000000 

25% 100.500000 80.000000 

50% 101.000000 85.000000 

75% 102.000000 87.500000 

max 103.000000 90.000000

Example 2: Describing all columns of a DataFrame using the DataFrame.describe() Method

The below example shows describing all columns of a DataFrame using the DataFrame.describe() method regardless of the data type.

import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe(include='all'))

Once we run the program we will get the following output.

Output:

Name Roll No Subject Marks 

count 3 3.000000 3 3.000000 

unique 3 NaN 2 NaN 

top Abhishek NaN Science NaN 

freq 1 NaN 2 NaN 

mean NaN 101.333333 NaN 83.333333 

std NaN 1.527525 NaN 7.637626 

min NaN 100.000000 NaN 75.000000 

25% NaN 100.500000 NaN 80.000000 

50% NaN 101.000000 NaN 85.000000 

75% NaN 102.000000 NaN 87.500000 

max NaN 103.000000 NaN 90.000000

Example 3: Describing a specific column of the DataFrame using the DataFrame.describe() Method

The below example shows describing a column from a DataFrame by accessing it as an attribute.

import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.Marks.describe())

Once we run the program we will get the following output.

Output:

count 3.000000 

mean 83.333333 

std 7.637626 

min 75.000000 

25% 80.000000 

50% 85.000000 

75% 87.500000 

max 90.000000 

Name: Marks, dtype: float64

Example 4: Describing a specific column of the DataFrame using the DataFrame.describe() Method

The below example shows how to describe a DataFrame excluding numeric columns using the DataFrame.describe() method with exclude=np.number.

import pandas as pd
df= pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85],['Chetan',103,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe(exclude=np.number))

Once we run the program we will get the following output.

Output:

Name Subject 

count 3 3 

unique 3 2 

top Abhishek Science 

freq 1 2

Example 5: Describing a DataFrame using the DataFrame.describe() method consisting of None values

The below examples show how the DataFrame.describe() method describes the DataFrame consisting of None values.

import pandas as pd
df= pd.DataFrame([['Abhishek',101,'Science',None], ['Anurag',None,'Science',85],['Chetan',None,'Maths',75]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
print(df.describe())

Once we run the program we will get the following output.

Output:

Roll No Marks 

count 1.0 2.000000 

mean 101.0 80.000000 

std NaN 7.071068 

min 101.0 75.000000 

25% 101.0 77.500000 

50% 101.0 80.000000 

75% 101.0 82.500000 

max 101.0 85.000000

Conclusion

In this tutorial, we learned the Python pandas DataFrame.describe() method. We learned syntax, parameters and we solved examples by applying this method on the DataFrame with different parameters and understood this method.