Understanding Searching Tags with BeautifulSoup
Find HTML Tags using BeautifulSoup
In this tutorial we will learn about searching any tag using BeautifulSoup module. We suggest you to go through the previous tutorials about the basic introduction to the BeautifulSoup module and the tutorial covering all the useful methods of the BeautifulSoup module.
We have already learned different methods to traverse the HTML tree like parent, parents, next_sibling, previous_sibling etc. But it becomes difficult to find all the similar tags using those methods. So, now we will learn how to find any pariculat HTML tag using teh find and find_all method of the BeautifulSoup module.
If you are coming from the last tutorial, we will be using the same HTML code, if you are new here, please create a file sample_webpage.html and copy the following HTML code in it:
<!DOCTYPE html>
<html>
<head>
<title> Sample HTML Page</title>
<style>
* {
margin: 0;
padding: 0;
}
div {
width: 95%;
height: 75px;
margin: 10px 2.5%;
border: 1px dotted grey;
text-align: center;
}
p {
font-family: sans-serif;
font-size: 18px;
color: #000;
line-height: 75px;
}
a {
position: relative;
top: 25px;
}
</style>
</head>
<body>
<div id="first-div">
<p class="first">First Paragraph</p>
</div>
<div id="second-div">
<p class="second">Second Paragraph</p>
</div>
<div id="third-div">
<a href="https://www.studytonight.com">Studytonight</a>
<p class="third">Third Paragraph</p>
</div>
<div id="fourth-div">
<p class="fourth">Fourth Paragraph</p>
</div>
<div id="fifth-div">
<p class="fifth">Fifth Paragraph</p>
</div>
</body>
</html>To read the content of the above HTML file, use the following python code to store the content into a variable:
reading content from the file
with open("sample_webpage.html") as html_file: html = html_file.read()
Once we have read the file, we create the BeautifulSoup object:
```python
import bs4reading content from the file
with open("sample_webpage.html") as html_file: html = html_file.read()
creating a BeautifulSoup object
soup = bs4.BeautifulSoup(html, "html.parser")
And the process of web scraping begins…BeautifulSoup: find_all method
find_all method is used to find all the similar tags that we are searching for by prviding the name of the tag as argument to the method. find_all method returns a list containing all the HTML elements that are found. Following is the syntax:
find_all(name, attrs, recursive, limit, **kwargs)We will cover all the parameters of the find_all method one by one. Let's start with the name parameter.
find_all: name Parameter
Let's find all the p tags from the HTML code:
import bs4reading content from the file
with open("sample_webpage.html") as html_file: html = html_file.read()
creating a BeautifulSoup object
soup = bs4.BeautifulSoup(html, "html.parser")
p_tags = soup.find_all("p")
print(p_tags)
print("\n-----Class Names Of All Paragraphs-----\n")
for tag in p_tags: print(tag['class'][0])
print("\n-----Content Of All Paragraphs-----\n")
for tag in p_tags: print(tag.text)
**Output:**
\[<p class="first">First Paragraph</p>, <p class="second">Second Paragraph</p>, <p class="third">Third Paragraph</p>, <p class="fourth">Fourth Paragraph</p>, <p class="fifth">Fifth Paragraph</p>\] -----Class Names Of All Paragraphs----- first second third fourth fifth -----Content Of All Paragraphs----- First Paragraph Second Paragraph Third Paragraph Fourth Paragraph Fifth Paragraph
And with that we have learned web scraping using BeautifulSoup module. We have covered all the important and useful methods, but there are many more. If you want to dig in deep, check the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) documentation.
In the next tutorial we will scrape a website.









