There are many Python modules. Requests is one that is widely used to send HTTP requests. It’s a third-party alternative to the standard “urllib“, “urllib2“, and “urllib3” as they can be confusing and often need to be used together. Requests in Python greatly simplifies the process of sending HTTP requests to their destination.
Learning to send requests in Python is a part of any budding developer’s journey. In this Python requests tutorial, we will outline the grounding principles, the basic and some advanced uses. Additionally, we will provide some Python requests examples. Also, if you’re curious about integrating Oxylabs proxies with requests head over to this page.
Essentially it is a widely-used library for making HTTP requests in Python made to provide you with a simplistic way to interact with APIs and web services, as well as to scrape websites and perform other HTTP-based tasks.
Its intuitive API makes it easy to send HTTP requests while at the same time supporting a variety of HTTP methods, including GET, PUT, DELETE, HEAD, OPTIONS, and PATCH.
The Python Requests module is a library that strives to be as easy to use and parse as possible. Standard Python HTTP libraries are difficult to use, parse and often require significantly more statements to do the same thing. Let’s take a look at a Urllib3 and a Requests example:
Urllib3:
#!/usr/bin/env python# -*- coding: utf-8 -*-
import urllib3
http = urllib3.PoolManager()
gh_url = 'https://api.github.com'
headers = urllib3.util.make_headers(user_agent= 'my-agent/1.0.1', basic_auth='abc:xyz')
requ = http.request('GET', gh_url, headers=headers)
print (requ.headers)
print(requ.data)
# ------# 200# 'application/json'
Requests:
#!/usr/bin/env python# -*- coding: utf-8 -*-
import requests
r = requests.get('https://api.github.com', auth=('user', 'pass'))
print r.status_codeprint r.headers['content-type']
# ------# 200# 'application/json'
Not only do Requests reduce the amount of statements needed but it also makes the code significantly easier to understand and debug even for the untrained eye.
As it can be seen, Requests is notably more efficient than any standard Python library and that is no accident. Requests have been and are being developed with several PEP 20 (The Zen of Python) idioms in mind:
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Readability counts.
These five idioms form the foundation of the ongoing Python request module development and any new contribution should conform to the principles listed above.
Requests isn’t a part of the Python Standard Library, therefore it needs to be downloaded and installed. Installing Requests is simple as it can be done through a terminal.
$ pip install requests
We recommend using the terminal provided in the coding environment (e.g. PyCharm) as it will ensure that the library will be installed without any issues.
Finally, before beginning to use Requests in any project, the library needs to be imported:
#In Python "import requests" allows us to use the library
import requests
Out of all the possible HTTP requests, GET is the most commonly used. GET, as the name indicates, is an attempt to acquire data from a specified source (usually, a website). In order to send a GET request, invoke requests.get() in Python and add a destination URL, e.g.:
import requests
requests.get('http://httpbin.org/')
Our basic Python requests example will return a <Response [200]> message. A 200 response is ‘OK’ showing that the request has been successful. Response messages can also be viewed by creating an object and print(object.status_code). There are many more status codes and several of the most commonly encountered are:
200 – ‘OK’
400 – ‘Bad request’ is sent when the server cannot understand the request sent by the client. Generally, this indicates a malformed request syntax, invalid request message framing, etc.
401 – ‘Unauthorized’ is sent whenever fulfilling the requests requires supplying valid credentials.
403 – ‘Forbidden’ means that the server understood the request but will not fulfill it. In cases where credentials were provided, 403 would mean that the account in question does not have sufficient permissions to view the content.
404 – ‘Not found’ means that the server found no content matching the Request-URI. Sometimes 404 is used to mask 403 responses when the server does not want to reveal reasons for refusing the request.
429 – This error code is more tricky as it means the client has exceeded the rate limit imposed by the server. Essentially, the server is receiving too many requests from the client within a specific time frame, and has temporarily blocked further requests from that client. An example of how to handle such an error, might look like this:
import requests
import time
response = requests.get('https://example.com/api')
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 1))
print(f"Rate limit exceeded. Waiting {retry_after} seconds before retrying...")
time.sleep(retry_after)
response = requests.get('https://example.com/api')
You can also use our automatically rotated Residential Proxies to avoid sending too many requests from the same IP address. The chance of getting a 429 error will noticeably decrease.
Apparently, 404 means the requested page can't be found on the website server
GET requests can be sent with specific parameters if required. Parameters follow the same logic as if one were to construct a URL by hand. Each parameter is sent after a question mark added to the original URL and pairs are split by the ampersand (&) sign:
payload = {'key1': 'value1', 'key2': 'value2'}
requests.get('http://httpbin.org/', params=payload)
Our URL would now be formed as:
https://httpbin.org/get?key2=value2&key1=value1
Yet while useful, status codes by themselves do not reveal much about the content acquired. So far, we only know if the acquisition was successful or not, and if not, for what possible reason.
In order to view the Python requests response object sent by a GET request, we should create a variable. For the sake of simplicity, let’s name it ‘response’:
response = requests.get('http://httpbin.org/')
In Python Requests, timeout value is set to none by default which means that if we do not receive a response, our application will hang indefinitely.
We can now access the status code without using the console. In order to do so we will need to print out a specific section (status_code):
print(response.status_code)
So far the output will be identical to the one received before – <Response [200]>. Note that status codes in the have boolean values assigned to them (200 up to 400 is True, 400 and above is False). Using response codes as boolean values can be useful for several reasons such as checking whether the response was successful in general before continuing to perform other actions on the response.
In order to read the content of the response, we need to access the text part by using response.text. Printing the output will provide the entire response into the Python debugger window.
print(response.text)
Requests automatically attempts to make an educated guess about the encoding based on the HTTP header, therefore providing a value is unnecessary. In rare cases, changing the encoding may be needed and it can be done by specifying a value to response.encoding. Our specified value will then be used whenever we make a call.
Responses can also be decoded to the JSON format. HTTPbin doesn’t send a request that can be decoded into JSON. Attempting to do so will raise an exception. For explanatory purposes, let’s use Github’s API:
response = requests.get('http://api.github.com')
print(response.json())
Using .json() returns a dictionary object that can be accessed and searched.
Python request headers hold important data related to the message
Response headers are another important part of the request. While they do not contain any content of the original message, headers hold many important details of the response such as information about the server, the date, encoding, etc. Every detail can be acquired from the initial response by making a call:
print(response.headers)
As with the .json() call, headers create a dictionary type object which can then be accessed. Adding parameters to the call will list out a part of the response, e.g.:
print(response.headers['Date'])
Our function will now print the date stored in the response header. Values are considered case-insensitive, therefore Requests will output the same result regardless of whether the parameter was formed as ‘date’ or ‘Date’.
You can also send custom Python requests headers. Dictionary-type objects are used yet again, although this time they have to be created. Headers are passed in an identical manner to parameters. To check whether our request header has been sent successfully we will need to make the call response.request.headers:
import requests
headers = {'user-agent': 'my-agent/1.0.1'}
response = requests.get('http://httpbin.org/', headers=headers)
print(response.request.headers)
Running our code should output the request header in the debugger window with the user agent stated as ‘my-agent/1.0.1’. As a general rule, sending most common user agents is recommended as otherwise some websites could return a 403 ‘Forbidden’ response.
Custom HTTP headers are usually used for troubleshooting or informational purposes. User agents are often utilized in web scraping projects in order to change the perceived source of incoming requests.
Sending a Python POST request is the second most used HTTP method. They are used to create a resource on a server with specified data. Sending a POST request is almost as simple as sending a GET:
response = requests.post('https://httpbin.org/post', data = {'key':'value'})
Of course, all HTTP methods (HEAD is an exception) return a response body which can be read. Responses to POST requests can be read in the same manner as GET (or any other method):
print(response.text)
Responses, rather obviously, in the relation to the types of requests made. For example, a POST request response contains information regarding the data sent to the server.
In most cases, specifying the data in the POST request might not be enough. Requests library accepts arguments from dictionary objects which can be utilized to send more advanced data:
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('https://httpbin.org/post', data = payload)
Our new request would send the payload object to the destination server. At times, sending JSON POST requests can be necessary. Requests have an added feature that automatically converts the POST request data into JSON.
import requests
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('https://httpbin.org/post', json = payload)
print(response.json())
Alternatively, the json library might be used to convert dictionaries into JSON objects. A new import will be required to change the object type:
import json
import requests
payload = {
'key1': 'value1',
'key2': 'value2'}
jsonData = json.dumps(payload)
response = requests.post('https://httpbin.org/post', data = jsonData)
print(response.json())
Note that the “json” argument is overridden if either “data” or “files” is used. Requests will only accept one of the three in a single POST.
POST and GET are the two most common methods the average user uses. For example, scraper API users utilize only these two HTTP methods to send job requests (POST) and retrieve data (GET). Yet, there are many more ways to interact with servers over HTTP.
PUT – replaces all the current representations of the target resource with the submitted content.
Although PUT and POST requests may look similar, PUT requests are idempotent. This means that multiple requests will have the same result. If you make multiple PUT requests, each will overwrite the same resource, whereas a POST request will create a new resource.
Although PUT and POST requests may look similar, PUT requests are idempotent. This means that multiple requests will have the same result. If you make multiple PUT requests, each will overwrite the same resource, whereas a POST request will create a new resource.
Sending a PUT request is the same as sending a POST request, but using the put() method:
response = requests.put('https://httpbin.org/post', data =
{'key':'value'})
DELETE – removes all the existing representations of the target resource given by URL.
The delete() method sends a request to a specified URL and is made to delete a specified resource. Sending a DELETE request is the same as sending any other HTTP request, the syntax is as follows:
requests.delete(url, params={key: value}, args)
Let’s make a sample DELETE request that deletes the “delete” resource:
import requests
# Making a DELETE request
response = requests.delete('https://httpbin.org/delete')
We can print out the status code to check out the response received:
print(response.status_code)
HEAD – Like GET, but just transfers the status and header section.
The head() method requests data from the web server. It only returns the information about the web server and not the content itself. This makes it faster than a GET request. Moreover, developers can include a custom header defined by their application to provide additional information or authentication credentials to the server. Here is the basic syntax for making a HEAD request:
requests.head(url, **kwargs)
OPTIONS – states the communication options for the target resource.
The options() method retrieves a response object containing all the data and properties like response content, headers, encoding, cookies, etc. The basic Syntax of an OPTIONS request is as follows:
response = requests.options(url)
PATCH – applies modifications to a specified resource.
The PATCH request must only include the changes to the resource, not the entire resource. This is similar to PUT, but the body contains instructions describing how to modify a resource currently on the server to produce a new version using the patch method. The syntax of the PATCH request is as follows:
requests.patch(url, params={key: value}, args)
All the HTTP methods listed above are rarely used outside server administration, web development, and debugging. An average internet user will not have permission to perform actions such as DELETE or PUT on nearly any website. Other HTTP methods are most useful for testing websites, which are often outside the field of interest of the average internet user.
Of course, the most widely used method is GET which allows developers to pull data from an API using the requests package. Let’s use requests.get() method to send an HTTP GET request to http://httpbin.org/ip:
response = requests.get("http://httpbin.org/ip")
Now, we can use this response object and methods such as status_code, content, text, and json() to get more information.
An authenticated request is a type of request where the sender must provide some form of authentication to access the requested resource from a session instance. The authentication mechanism can vary, but it usually involves sending some token or credentials along with the request.
To send data via authenticated requests, we first need to obtain the necessary authentication credentials. This can vary depending on the API or service we're working with, but some common authentication methods include API keys, OAuth tokens, and basic authentication.
Once we have the authentication credentials, we can use the requests library to send the request with the appropriate authentication headers. Here's an example of how to send a POST request with a JSON payload and basic authentication. You can manually add query strings to the URL to change the payload destination:
import requests
url = 'https://example.com/api/data'
payload = {'key1': 'value1', 'key2': 'value2'}
auth = ('username', 'password')
headers = {'Content-Type': 'application/json'}
response = requests.post(url, json=payload, auth=auth, headers=headers)
if response.status_code == 200:
print('Data sent successfully')
else:
print('Error sending data:', response.text)
In the example above, we first define the URL we want to send the request to and the payload we want to send in JSON format. We also specify the authentication credentials using the basic authentication method, where we provide a username and password. We also set the Content-Type header to specify that we're sending a JSON payload.
We then use the requests.post() method to send the request with the authentication headers and the JSON payload. If the request is successful (i.e., we receive a 200 status code), we print a success message. Otherwise, we print an error message with the response text.
Retrieving data via authenticated requests is similar to sending data, but instead of sending a payload, we're requesting data from the server. We must also provide the appropriate authentication credentials to access the requested resource.
Here's an example of retrieving data via authenticated requests using the GET method and an API key:
import requests
url = 'https://example.com/api/data'
api_key = 'your-api-key'
headers = {'Authorization': f'Bearer {api_key}'}
response = requests.get(url, headers=headers)
if response.status_code == 200:
data = response.json()
print('Data retrieved successfully:', data)
else:
print('Error retrieving data:', response.text)
In the example above, we define the URL we want to retrieve data from and provide the API key using the Authorization header with the Bearer token type. We then use the requests.get() method to send the request with the appropriate authentication headers. If the request is successful, we print the retrieved data. Otherwise, we print an error message with the response text.
Securing communication across multiple systems is critical to protect data confidentiality and integrity in today's environment. Using SSL/TLS certificates, which allow for encrypted connection between systems, is one technique to assure safe communication.
SSL Verification is a process that checks the identity of the server with which we are connecting and assures the validity of the SSL/TLS certificate presented by the server. SSL Verification is essential for preventing man-in-the-middle attacks, in which an attacker intercepts communication between two parties and modifies the data being delivered.
Python's requests package provides a simple interface for making HTTP requests. To ensure secure communication, the library performs SSL verification by default. But, in some circumstances, such as when working with self-signed certificates or testing on a local server, we may need to disable SSL certificate verification.
Set the verify argument to False to deactivate SSL verification in the requests library. Here's an example of how to send a GET request to a server without using SSL:
import requests
url = 'https://example.com/api/data'
response = requests.get(url, verify=False)
if response.status_code == 200:
print('Data retrieved successfully')
else:
print('Error retrieving data:', response.text)
To enable secure connection, the requests library performs SSL verification by default. If the server's SSL/TLS certificate is invalid, the library will throw a CertificateError exception. But, sometimes, we need to supply our SSL/TLS certificate or utilize a bespoke CA package.
To enable SSL verification with a custom CA bundle, use the verify argument to give the path of the CA bundle file. Here's an example of using a custom CA bundle to send a GET request to a server with SSL verification enabled:
import requests
url = 'https://example.com/api/data'
ca_bundle = '/path/to/ca/bundle.crt'
response = requests.get(url, verify=ca_bundle)
if response.status_code == 200:
print('Data retrieved successfully')
else:
print('Error retrieving data:', response.text)
Bearer tokens are often used in APIs for authentication and authorization. To use a bearer token with Python's requests repository, we can include it as a header in our HTTP requests. Here's an example of using a bearer token to send a GET request to an API:
import requests
url = 'https://example.com/api/data'
token = 'Bearer my_access_token'
headers = {'Authorization': token}
response = requests.get(url, headers=headers)
if response.status_code == 200:
data = response.json()
print('Data retrieved successfully')
else:
print('Error retrieving data:', response.text)
In the preceding example, we define the URL for the API endpoint from which we wish to get data and give our bearer token as a header in the Authorization field of the requests using the headers argument. The get() method. If the request is successful (a 200 status code is given), we use the response.json() function to convert the JSON data returned by the API into a Python object. In that case, we print an error message and the response conten
Python Requests library is both an incredibly powerful and easy to use tool that can be utilized to send HTTP requests. Understanding the basics is often enough to create simple applications or scripts.
Want to find out more about developing Python scripts? Check out our Python web scraping tutorial that will help you to develop your first data acquisition application! Our blog has plenty of both basic and advanced guides for all your proxy and scraping needs!
Sadly it doesn’t, you’ll need to install it separately if you wish to use it.
A good alternative could be HTTPX, as it's a fast and comprehensive HTTP client library specifically designed for Python 3 and mobile platforms. Furthermore, it supports HTTP/1.1 and HTTP/2 while having a simple API and providing beneficial features such as async support and connection pooling.
Although both Requests and BeautifulSoup are libraries for Python used for web scraping, they serve somewhat different purposes.
For example, Requests allows you to send HTTP/HTTPS requests to a website and receive the response, unlike BeautifulSoup, which provides the ability to extract data from HTML and XML documents.
Essentially, Requests sends requests to a website to get the HTML content, while BeautifulSoup parses the HTML content and extracts the needed data. To learn more about Requests and BeautifulSoup, see this useful article.
Requests is a library, not a package. For Python, a library is a collection of modules or functions which altogether are used in code to perform specific tasks. A package differs from a library in that it is a way of organizing related modules or subpackages into a single namespace.
About the author
Adomas Sulcas
PR Team Lead
Adomas Sulcas is a PR Team Lead at Oxylabs. Having grown up in a tech-minded household, he quickly developed an interest in everything IT and Internet related. When he is not nerding out online or immersed in reading, you will find him on an adventure or coming up with wicked business ideas.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Forget about complex web scraping processes
Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.
Scale up your business with Oxylabs®
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub