Proxies for web scraping are used in multiple scenarios – be it market research, price monitoring, or brand protection. Regardless of your proxy use cases, rotating them when scraping is essential. But why?
Today’s beginner-friendly guide will answer exactly why it’s necessary to rotate proxies while scraping. Afterwards, the guide will lay down the exact steps of rotating proxies using Python. In the last portion of the article, you’ll get some extra professional tips and tricks on proxy rotation – let’s get started.
What is proxy rotation and why is it important?
Proxy rotation is a process of automatically assigning different IP addresses to a new web scraping session. The process is based on a specific time frame status code or a number of requests.
A common challenge in the web scraping field is avoiding getting blocked by the target website – that’s where proxy rotation comes into play. Websites are not keen on bots and may find thousands of requests coming from the same IP address suspicious. However, with rotating proxy IP addresses, you can enhance your anonymity, imitate the behavior of several organic users, and circumvent most anti-scraping measures.
Now, there are mainly two options for rotating IP addresses: you can either use a third-party rotator tool (i.e., Oxylabs’ Proxy Rotator) or build your own in Python. Let’s take a look at the latter option.
You should start by creating a virtual environment. You should do that by running this command:
$ virtualenv venv
This will install Python, pip, and common libraries in your venv folder.
Next, you need to invoke the source command to activate the environment:
$ source venv/bin/activate
The last step is to install the requests module in the current virtual environment:
$ pip install requests
And that’s it – you have successfully installed the requests module.
You need to create a file with the .py extension and provide the following script:
import requests
response = requests.get('https://ip.oxylabs.io/location')
print(response.text)
Now, you should run it from a terminal:
$ python no_proxy.py
128.90.50.100
The output will show your current IP address. Our goal is to show you how to hide your IP address and rotate different IP addresses to stay anonymous and avoid getting blocked, so let’s move forward.
Now, let’s start with the basics: how do we use a single proxy? In order to use a proxy server, you’ll need:
Scheme (e.g., http);
IP address;
Port (e.g., 3128);
Username and password to connect to the proxy (optional).
Once you have all the information, you need to set it up in this order:
SCHEME://USERNAME:PASSWORD@YOUR_PROXY_IP:YOUR_PROXY_PORT
Here are a few examples of the proxy formats you may encounter:\
http://2.56.215.247:3128
https://2.56.215.247:8091
https://my-user:aegi1Ohz@2.56.215.247:8044
Note that you can specify multiple protocols and even define specific domains for which a different proxy will be used:
scheme_proxy_map = {
'http': PROXY1,
'https': PROXY2,
'https://example.org': PROXY3,
}
Add the following imports to your Python code file:
import requests
from requests.exceptions import ProxyError, ReadTimeout, ConnectTimeout
Finally, you should try to make a request by calling requests.get and passing all the variables we defined earlier. With our script, we can also handle the exceptions and show the error when a network issue occurs.
try:
response = requests.get('https://ip.oxylabs.io/location', proxies=scheme_proxy_map, timeout=TIMEOUT_IN_SECONDS)
except (ProxyError, ReadTimeout, ConnectTimeout) as error:
print('Unable to connect to the proxy: ', error)
else:
print(response.text)
The output of this script should show you the IP of your proxy:
$ python single_proxy.py
2.56.215.247
You are now hidden behind a proxy when making your requests through the Python script. Now, we can move on to learning how to rotate a list of proxies instead of using a single one.
Rotating proxies using a proxy pool
In this part of the tutorial, we’re going to use a list of proxies in a CSV file called proxies.csv:
http://2.56.215.247:3128
https://88.198.24.108:8080
http://50.206.25.108:80
http://68.188.59.198:80
... any other proxy server, each on a separate line
First of all, create a Python file and define both the file name and how long you are willing to wait for a single proxy to respond:
import aiohttp
import asyncio
import csv
TIMEOUT_IN_SECONDS = 10
CSV_FILENAME = 'proxies.csv'
Next, write the code that opens the CSV file, reads every proxy server line by line into a csv_row variable, and builds the scheme_proxy_map configuration needed by the requests module:
with open(CSV_FILENAME) as open_file:
reader = csv.reader(open_file)
for csv_row in reader:
scheme_proxy_map = {
'https': csv_row[0],
}
To check if everything is working, we’ll use the same scraping code as before to access the website via proxies:
with open(CSV_FILENAME) as open_file:
reader = csv.reader(open_file)
for csv_row in reader:
scheme_proxy_map = {
'https': csv_row[0],
}
# Access the website via proxy
try:
response = requests.get('https://ip.oxylabs.io/location', proxies=scheme_proxy_map, timeout=TIMEOUT_IN_SECONDS)
except (ProxyError, ReadTimeout, ConnectTimeout) as error:
pass
else:
print(response.text)
If you want to scrape publicly available content using any working proxy from the list, add a break after print to stop going through the proxies in the CSV file:
response = requests.get('https://ip.oxylabs.io/location', proxies=scheme_proxy_map, timeout=TIMEOUT_IN_SECONDS)
except (ProxyError, ReadTimeout, ConnectTimeout) as error:
pass
else:
print(response.text)
break # notice the break here
Now, the only thing that’s left preventing us from reaching our full potential is speed.
To rotate proxies using async, you should use the aiohttp module. You can install it using the following CLI command:
$ pip install aiohttp
Then, you need to create a Python file where you define:
The CSV filename that contains the proxy list;
A URL that you wish to use to check the proxies;
How long you’re willing to wait for each proxy – the timeout setting.
CSV_FILENAME = 'proxies.csv'
URL_TO_CHECK = 'https://ip.oxylabs.io/location'
TIMEOUT_IN_SECONDS = 10
Next, you need to define an async function and run it using the asyncio module. It accepts two parameters:
the URL it needs to request;
the proxy to use to access it.
Then, you need to print the response. If the script receives an error when attempting to access the URL via proxy, it will print it as well:
async def check_proxy(url, proxy):
try:
session_timeout = aiohttp.ClientTimeout(total=None,
sock_connect=TIMEOUT_IN_SECONDS,
sock_read=TIMEOUT_IN_SECONDS)
async with aiohttp.ClientSession(timeout=session_timeout) as session:
async with session.get(url, proxy=proxy, timeout=TIMEOUT_IN_SECONDS) as resp:
print(await resp.text())
except Exception as error:
# you can comment out this line to only see valid proxies printed out in the command line
print('Proxy responded with an error: ', error)
return
The next step is to define the main function that reads the CSV file and creates an asynchronous task to check the proxy for every single record in the CSV file:
async def main():
tasks = []
with open(CSV_FILENAME) as open_file:
reader = csv.reader(open_file)
for csv_row in reader:
task = asyncio.create_task(check_proxy(URL_TO_CHECK, csv_row[0]))
tasks.append(task)
await asyncio.gather(*tasks)
You should run the main function and wait until all the async tasks are completed.
asyncio.run(main())
That’s all – now, your proxies will be running at top speed.
Lastly, let’s take a look at some general tips on proxy rotation to ensure a smooth web scraping process.
Avoid free proxy services
Despite the appeal, using free proxy IP addresses has far more negatives than positives. With multiple people using free proxies simultaneously and a common lack of financial support, they tend to be considerably slower. Free proxy providers have no obligations to guarantee that their proxies will always be available: you may start working on your scraping project one day and find out the proxies you used are no longer available the following day.
Additionally, there are multiple security and privacy issues associated with free proxies. For example, the majority of free proxy providers don’t support encrypted HTTPS connections.
To learn more about the risks of using free proxies, check out our Why You Shouldn't Use Free Proxies - Risks & Reasons blog post.
Pair IP rotation with user-agent rotation
User-agents are strings in HTTP requests that help websites identify details like browser, operating system, software, and device type. With multiple requests coming from the same OS and browser in a short period of time, the target website can detect suspicious activity and ban you. Hence, besides rotating proxies, you should also rotate user agents to consolidate the evasion of blocks..
Choose a reliable premium proxy service
Instead of using free proxies,risking your data privacy and security, and dealing with issues like slow speeds, it’s strongly recommended to go for a reputable premium proxy provider. Look out for a provider that’s transparent about their proxy sourcing practices and gives proof their proxies are obtained ethically.
Alternative solution: Oxylabs’ Scraper APIs with zero infrastructure management
Although building a proxy rotator in Python is relatively easy, you’ll still need to put additional time and effort into the process. If you’re looking for an all-in-one product that does all the work for you, Oxylabs Scraper APIs are the ideal solution. Our APIs incorporate a built-in proxy rotator, which automatically changes IP addresses regularly so you won’t have to deal with CAPTCHAs or risk getting banned.
Proxy rotation is an integral part of any successful web scraping project; luckily, building a rotator in Python is relatively easy. However, if you have any further questions related to the topic, feel free to drop a message at support@oxylabs.io and one of our experts will be happy to help out.
Also, if you prefer the visual format, you can check out our video on this topic:
Easy & Quick Tutorial - How to Rotate Proxies With Python
Finally, if you're interested in more Python solutions for web scraping, refer to the Related articles section below and where you'll find some other automation tutorials on running tasks as a service and scheduling recurring jobs.
About the author
Roberta Aukstikalnyte
Senior Content Manager
Roberta Aukstikalnyte is a Senior Content Manager at Oxylabs. Having worked various jobs in the tech industry, she especially enjoys finding ways to express complex ideas in simple ways through content. In her free time, Roberta unwinds by reading Ottessa Moshfegh's novels, going to boxing classes, and playing around with makeup.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Scraper APIs for flawless data gathering
Forget the hassle of building a proxy rotator and use our all-in-one product solution instead.
Scale up your business with Oxylabs®
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub