Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Data acquisition Tutorials

How to Bypass CAPTCHA in Web Scraping Using Python

Yelyzaveta Nechytailo

2023-10-036 min read

Transcribed as a Completely Automated Public Turing Test to Tell Computers and Humans Apart, CAPTCHA is a test that determines whether a user accessing websites or data is real. By providing challenges that prove hard for computers to solve, CAPTCHAs quickly identify suspicious users and modern bots and prevent such activities as scraping and crawling.

This article will provide insights into bypassing CAPTCHA challenges in web scraping. We’ll talk about the different types of tests that can be encountered in the modern internet landscape and discuss useful anti-CAPTCHA solutions to implement in your data-gathering operations.

What are the different types of CAPTCHAs?

Generally, there are three CAPTCHA types: text-based, image-based, and sound-based.

Text-based CAPTCHA

It’s usually a combination of random letters and characters presented in a hard-to-read format, with characters being turned, scaled, and distorted in various ways.

Image-based CAPTCHA

Image-based CAPTCHA challenges usually display several pictures in a grid and ask the user to select a specific type of image. For instance, images with traffic lights.

Sound-based CAPTCHA

Also known as an audio CAPTCHA, it presents audio clips with a combination of letters or numbers that users have to enter, often accompanied by background noise for added difficulty.

hCAPTCHA

It’s a CAPTCHA service that clients can set up on their website. Serving as an alternative to reCAPTCHA, it offers better privacy and provides more control over the CAPTCHA experience.

Google reCAPTCHA

It’s a free CAPTCHA service developed by Google that offers protection for web pages. Just like hCAPTCHA, it uses advanced techniques to catch bot-like activity. One such technique is that Google reCAPTCHA now can even recognize human users without any interaction on its side – they simply take into account the user’s previous interactions with other websites, which might be an undesirable approach due to privacy issues.

Google reCAPTCHA is also widely used in most of the brand’s services and products, such as Google Search, Maps, Play, Shopping, and many more.

You can find more information about each of these CAPTCHA types as well as dig deeper into how these tools work in general, in our blog post on how CAPTCHAs work.

How to bypass any CAPTCHA with Web Unblocker using Python

It’s no secret that CAPTCHAs are one of the biggest challenges when it comes to public data gathering. They interrupt companies’ scraping activities, making it hard to allocate enough time for analyzing data and making the right decisions. A CAPTCHA response during web scraping may look like this:

When a CAPTCHA challenge is triggered, it blocks any access to the desired data until the test is passed. One of the ways to overcome a CAPTCHA challenge is to use a service that takes care of them manually. However, this approach takes more time compared to using anti-detection techniques to bypass CAPTCHAs by not triggering them at all. The costs can accumulate for larger-scale projects when taking care of CAPTCHA tests manually; thus, employing anti-detection solutions to avoid a CAPTCHA challenge in the first place can offer a more streamlined and cost-effective approach.

That’s exactly why Web Unblocker was developed. This web scraping solution powered by artificial Intelligence successfully bypasses advanced anti-bot systems, including complex CAPTCHAs. One of its main features is dynamic browser fingerprinting. This feature selects the right combination of headers, cookies, and other browser parameters, allowing you to appear as an organic user and easily get access to the public data you need.

Using Web Unblocker is straightforward, as the setup is exactly the same as with proxy servers, so let’s review how to use Web Unblocker in Python. We offer a 1-week free trial for our website unblocker, so head to the Oxylabs dashboard and create a free account to get started.

1. Install the prerequisites

Begin by installing the requests library, which we’ll use to send a web request to the target website. We’ll use the Beautiful Soup package to navigate the HTML and parse the desired elements. For installation, we’ll use pip, a package installer for Python, which should install automatically with Python.

Open up your terminal and enter the following line:

pip install requests beautifulsoup4

2. Inspect your target site

We’ll target a dummy bookstore website https://books.toscrape.com/ to get all the titles from the first listing page. The book titles are stored in the title attribute within the <a> tag, which is under the <h3> tag:

This dummy website doesn’t have CAPTCHAs implemented, so let’s imagine that it does. Thus, one of the options for bypassing CAPTCHA challenges is to use a solution like Web Unblocker that doesn’t trigger them in the first place.

3. Set up the Web Unblocker endpoint

Start by importing the installed Python libraries:

import requests
from bs4 import BeautifulSoup

Next, create the web_unblocker dictionary object and form the URL with your Oxylabs sub-user’s credentials and the Web Unblocker endpoint:

web_unblocker = {
  'http': 'http://USERNAME:PASSWORD@unblock.oxylabs.io:60000',
  'https': 'http://USERNAME:PASSWORD@unblock.oxylabs.io:60000',
}

4. Send a request to the target

The next step is to send a GET request to the target website through Web Unblocker. This can be achieved with the following code snippet:

response = requests.get(
    'https://books.toscrape.com/',
    verify=False,
    proxies=web_unblocker
)

Web Unblocker requires users to ignore the SSL certificate, which is done by adding verify=False within the GET request. Then, include the proxies argument and pass the web_unblocker object to forward the web request through the Web Unblocker endpoint.

5. Parse the desired data

Here you can utilize the Beautiful Soup library to extract the content from the target page. First, create the soup object, which will store the HTML content:

soup = BeautifulSoup(response.content, "html.parser")

Then, create a for loop to extract all the titles:

for title in soup.select("h3 a"):
    print(title.get("title"))

The soup.select uses CSS expressions to select all the <a> tags inside the <h3> tags. Since all the titles are stored as a value of the title attribute, you can retrieve the complete title names using the .get function. If you’re interested in learning web scraping, check out our in-depth blog posts on Python Web Scraping and Beautiful Soup to get an easy start.

The complete code should look like this:

import requests
from bs4 import BeautifulSoup

web_unblocker = {
    'http': 'http://USERNAME:PASSWORD@unblock.oxylabs.io:60000',
    'https': 'http://USERNAME:PASSWORD@unblock.oxylabs.io:60000'
}

response = requests.get(
    'https://books.toscrape.com/',
    verify=False,
    proxies=web_unblocker
)

soup = BeautifulSoup(response.content, "html.parser")
for title in soup.select("h3 a"):
    print(title.get("title"))

As you can see, it only takes a few lines of Python code to incorporate Oxylabs’ Web Unblocker. Using the above code, you should expect the following output:

A Light in the Attic
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History of Humankind
The Requiem Red
The Dirty Little Secrets of Getting Your Dream Job
The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull
The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics
The Black Maria
Starving Hearts (Triangular Trade Trilogy, #1)
Shakespeare's Sonnets
Set Me Free
Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)
Rip it Up and Start Again
Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991
Olio
Mesaerion: The Best Science Fiction Stories 1800-1849
Libertarianism for Beginners
It's Only the Himalayas

Hopefully, these Python examples helped you see how effortless is the integration process of Web Unblocker. Visit our documentation to learn more about its parameters and general integration steps.

Developing your own solution

Of course, it’s always possible to create your own solution that takes care of complex CAPTCHAs. While the development stage may take some time, you can tailor it specifically to the kind of requests you wish to send. This can result in higher success rates, allowing you to perform web scraping activities without interruptions. There are a couple of viable tools for this quest:

Playwright

It’s an excellent web testing and automation tool owned by Microsoft, which can also be used to avoid CAPTCHAs. It supports the most popular programming languages, such as Python, JavaScript, and Java. Playwright can work with Chromium-based, Firefox, and WebKit browsers, allowing the users more flexibility. We’ve made a detailed blog post specifically on Playwright, so be sure to take a look at this Playwright Scraping Tutorial for more information.

Puppeteer

It’s also a very effective web automation tool that you can use to design a program that avoids CAPTCHAs. While Puppeteer, owned by Google, supports only JavaScript, you can use it in Python with an unofficial library called Pyppeteer. The downside of Puppeteer is that it only supports Chromium-based browsers for interaction. If you’re curious to learn more, check out our in-depth blog post on web scraping with Puppeteer and a tutorial on how to overcome CAPTCHA challenges with Puppeteer.

Keep in mind that developing your own solution will require you to spend time writing code and micromanaging it to adapt to constant changes. In cases where this is an issue, the better option is to utilize ready-made web scrapers that avoid CAPTCHAs automatically. It takes a mountain of effort to build yourself a scalable scraper that sifts through the web undetected and uninterrupted, but a pre-built tool can ease the process immensely, saving time and resources. See how both methods differ in this guide to scraping Amazon.

Final thoughts

With CAPTCHAs being one of the most common challenges when it comes to public data collection, it’s essential to find a reliable and high-quality solution to bypass them. This article presented a few anti-CAPTCHA solutions you can try implementing in your scraping tasks as well as discussed the different types of CAPTCHA tests available today.

If you're curious to try out our scraping solutions, you can simply get a free trial and follow our guides for your desired target. Here are some tutorials to get you started: how to scrape Google search results and how to scrape Etsy data.

If you have any questions about this topic or would like to learn more about Web Unblocker, Oxylabs’ ultimate solution for bypassing CAPTCHAs, feel free to contact us at hello@oxylabs.io or via the live chat.

Frequently asked questions

Is there a way to bypass CAPTCHA?

Yes, there are many different services, such as a CAPTCHA solver or proxy solutions on the market, specifically designed for the purpose of bypassing a CAPTCHA test. For instance, Oxylabs’ Web Unblocker chooses the right combination of cookies, headers, browser attributes, etc., to appear as an organic user and, eventually, overcome all target website blocks.

Can reCAPTCHA be bypassed?

While Google reCAPTCHA is considered to be more sophisticated and harder to bypass than the original CAPTCHA, it’s still possible to bypass it in several different ways. You can either implement a ready-to-use tool or develop your own and tailor it specifically to the kind of requests you wish to send.

Can a bot bypass CAPTCHA?

Even though modern CAPTCHAs are advanced and tend to provide a high level of security for websites, sophisticated bots can still bypass them. These tools are usually developed with special features like dynamic browser fingerprinting that let users overcome even the most complex CAPTCHA tests and perform their scraping and crawling activities uninterruptedly.

Why are CAPTCHAs used?

On websites, CAPTCHAs are used to separate human users from malicious bots. They act as a safety net to stop bots from engaging in possibly harmful or malicious activities like spamming or fraudulent transactions.

How can I avoid CAPTCHAs?

There are several ways to avoid CAPTCHA when gathering web data. If you’re using a DIY scraper, make sure to use proxies and rotate them. Adjusting User-Agent headers to refine your scraper's fingerprint is another useful tactic. Additionally, you might want to consider using automated tools like Web Unblocker, which can effectively solve CAPTCHA challenges for you. Also, it's a good practice to use CAPTCHA proxies.

What does CAPTCHA stand for?

First patented in 1997, CAPTCHA is an abbreviation for Completely Automated Public Turing Test to tell Computers and Humans Apart.

About the author

Yelyzaveta Nechytailo

Senior Content Manager

Yelyzaveta Nechytailo is a Senior Content Manager at Oxylabs. After working as a writer in fashion, e-commerce, and media, she decided to switch her career path and immerse in the fascinating world of tech. And believe it or not, she absolutely loves it! On weekends, you’ll probably find Yelyzaveta enjoying a cup of matcha at a cozy coffee shop, scrolling through social media, or binge-watching investigative TV series.

Learn more about Yelyzaveta Nechytailo

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Data acquisition