Menu

Understanding Introduction to requests Module

Introduction to requests Module

requests module is used to send HTTP request to any server and receive the HTTP response back. So we will use the requests module to send HTTP request using website's URL and get back in return the response, for which we will be using Beautiful Soup module to take out the useful data/information/content of the website from the response.

So let's learn how to send HTTP request and receive the response from the server using requests module.

Some Useful request Module Methods

Following are some of the commonly used methods available in the requests module for making HTTP requests.

  1. requests.get()
  2. requests.post()
  3. requests.put()
  4. requests.delete()
  5. requests.head()
  6. requests.options()

In this tutorial, we will be using requests.get() and requests.post() methods to make HTTP requests for web scraping.

If you are new to HTTP requests and wondering what GET and POST requests are, here is a simple explanation:

  1. GET: It is used to retrieve information(webpage) using a URL.
  2. POST: It is used to send information to a URL.

Making Request using requests.get

requests.get(URL) method is used to send HTTP request and receive the data back as response. It takes a URL for a website or any API.

response.content is a variable in which the response content is stored which is the response of the get() method.

Let's take an example,

import requests module

import requests

send a request and receive the information from https://www.google.com

response = requests.get("https://www.google.com")

printing the response

print(response.content)


The output for the above script would be the entire **page source**(or, source code) for the specified URL, which we have stored in the following **file**(as it's too long).

You must be wondering how we can read anything from this as it's too complicated. Well to make the response content readable we use **Beautiful Soup** module, which we will cover in the coming tutorials.

We can print the **header** information sent by the website in the response using the `response.headers` method.

For newbies, **header** information contains general meta information about the HTTP connection along with some connection properties.

Let's print headers for the above get request:

```python

import requests module

import requests

send a request and receive the information from https://www.google.com

response = requests.get("https://www.google.com")

headers of the website

print(response.headers)


To print the values in a more readable format we can access each value separately using the method `response.headers.items()` and the use a `for` loop to print each value.

```python

import requests module

import requests

send a request and receive the information from https://www.google.com

response = requests.get("https://www.google.com")

headers of the website

for key, value in response.headers.items(): print(key, '\t\t', value)


#### **Status of Request**

When we make a GET request using `requests.get()` method, the request might fail, get re-directed to some other URL, can fail at client side or server side or it may successfully complete.

To know the status of the request, we can check the **status code** of the response received.

This can be done using the `response.status_code` value. It's very simple,

```python

import requests module

import requests

send a request and receive the information from https://www.google.com

response = requests.get("https://www.google.com")

status of request

print(response.status_code)


**Output:**

200

Following are the different status values that you may get in the response:

| Status Code | Description |
| --- | --- |
| 1XX | Informational |
| 2XX | Success |
| 3XX | Redirection |
| 4XX | Client Error |
| 5XX | Server Error |

For example: **200** status code is for success. Whereas, **201** status code is for created(when we send a request to create some resource) etc.

Just like GET request, we can make the POST request using the `requests.post(URL)` method and handling the response is same.

For web scraping we will mostly use GET request.

Setup User Agent

When we try to access websites using a program, some websites doesn't allow it for security reasons, as that makes a website susceptible to unnecessary request generated via programs which in extreme case can even burden the website server by sending a large number of requests.

To overcome this, we will use Fake Useragent module, which is used to fake a request to a server as if it is initiated by a user and not a program.

To install the module fake_useragent, run the following command:

pip install fake_useragent

Once it is installed, we can use it to generate a fake user request like this:

import UserAgent from the fake_useragent module

from fake_useragent import UserAgent

create an instance of the 'UserAgent' class

obj = UserAgent()

create a dictionary with key 'user-agent' and value 'obj.chrome'

header = {'user-agent': obj.chrome}

send request by passing 'header' to the 'headers' parameter in 'get' method

r = requests.get('https://google.com', headers=header)

print(r.content)


The output for this request will be the source code of the webpage **https://google.com** as if it was opened by a user using the **Chrome** browser.

So now we know how to send a HTTP request to any URL and receive the response using the `requests` module. In the next tutorial we will learn how to get real useful content from the HTTP response using the **Beautiful Soup** module.