Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Scrapers Data acquisition

Scraping the Web With a High Success Rate

Gabija Fatenaite

2019-10-103 min read

OxyCon, Oxylabs’ very first annual web data harvesting conference, was packed with in-depth talks and workshops. On the second day of the event, Eivydas Vilcinskas, Software Engineer at Oxylabs, took the stage to share some tactical advice on how to reach a high success rate using Oxylabs Scraper APIs (formerly known as Real-Time Crawler). Scraper APIs include SERP Scraper API, E-Commerce Scraper API, Real Estate Scraper API. and Web Scraper API.

According to Eivydas, 99.7% of the time, Scraper APIs successfully deliver data. However, as with all services, there is always that 0.03% chance of the system downtime. Fortunately, in his workshop, Eivydas walked through all possible issues and explained how to solve each and every one of them.

How to access Oxylabs’ web crawling tools, Scraper APIs

Before we go into error codes, let’s quickly recap how you could access the service of Scraper APIs.

Using Proxy Endpoint

This method is the simplest (but also the most limited) way of accessing Scraper APIs. Apart from providing the target URL, you can only provide headers to select the wanted User-Agent-Type and Geo-Location to spoof. We also don’t allow the use of our javascript rendering service or submitting jobs in batches.

Proxy Endpoint acts as a standard proxy with some added functionality and returns the body of the response from the target verbatim. We don’t wrap it in any structures like JSON or add any additional data.

Via real-time data delivery method

When using the real-time data delivery method, you POST the job, and Scraper APIs return the requested data on an open connection. If done correctly, the data should come back with the HTTP status code 200 and should contain a JSON with the data you requested.

Via callback data delivery method

The callback method allows you to decide when to retrieve the requested data (but no later than after 24 hours) and lets you manage the full range of options as well as request/response timings. Check our previous blog post callback vs. real-time data delivery methods to learn more.

Scraping the web with Scraper APIs: error types

According to Eivydas, the majority of errors that you might encounter while integrating or using Scraper APIs fall into three categories: request, response, and content.

Request errors

These types of errors are related to the request path and usually arise when the signal doesn’t reach the intended destination. It might mean that the Scraper APIs servers are physically not reachable over the network and/or the services are not running correctly.

Response errors

Usually, response error means that the network is running smoothly, the services are available to return something, and the issue is most probably related to the way a request is made and the data it contains. For example, you might be using a wrong HTTP method for contacting the service endpoint, or you request for the data that we cannot provide.

Content errors

Once you get the data from the Scraper API, you can process it further. We report the job as completed successfully, but during the data analysis on your part, you find out that the data is not exactly as you requested, or that it has some flaws.

The most common errors and how to solve them

Depending on which type of access method you are using and which kind of error you get, there might be different ways to solve an issue. We summarized all of them in the classy table down below.

Access type	Error type	Error code	Solution	Plan B
Proxy Endpoint / Real-Time / Callback	Request	Servers are not reachable	Wait a few minutes before retrying	If the server is still down after 5 minutes, contact your account manager
Proxy Endpoint / Real-Time / Callback	Request	Scraper API is not reachable	Check if you’re not hitting the wrong endpoint	Troubleshoot your connection. If the connection is ok, contact Oxylabs
Proxy Endpoint / Real-Time / Callback	Response	400	Look for the message in the body to see the reason	–
Real-Time / Callback	Response	401	Check if you’re using correct credentials or if your user wasn’t disabled. To fix this, contact your account manager	You might get this error code if the source that you’ve given to the Scraper API is not supported or disabled for you. Contact your account manager to discuss implementing the necessary source in our system or to have it enabled for you.
Proxy Endpoint / Real-Time / Callback	Response	404	Check the documentation for correct endpoints	–
Real-Time / Callback	Response	405	Use POST to submit your jobs. Any other HTTP method will return a response with this status code	–
Proxy Endpoint	Response	407	Check if you’re not using incorrect credentials	If you want to reset your credentials, contact your account manager
Proxy Endpoint / Real-Time	Response	408	Increase the default timeout value to 120 s and try again	If timeout comes from the Scraper API’s side, contact Oxylabs
Callback	Response	408	Increase the default timeout value to 30 s and try again	If timeout comes from the Scraper API’s side, contact Oxylabs
Proxy Endpoint / Real-Time / Callback	Response	429	You have reached the limit of requests per week/month/etc. Reach out to your account manager to increase the limit	You might be making too many requests per minute. Contact your account manager to increase this limit
Real-Time / Callback	Response	5xx	If this error appears for more than 5 minutes, contact Oxylabs	–
Proxy Endpoint / Real-Time / Callback	Content	Status code is not 200	Retry the job	The reason for this error might be incorrect job parameters. This status code should be handled from your side, but you can always contact Oxylabs for assistance
Proxy Endpoint / Real-Time / Callback	Content	200 but data for given parameters are incorrect	Target website might have changed their algorithms. There might be a way to get the required data by using different parameters. Contact Oxylabs for assistance	–
Proxy Endpoint / Real-Time / Callback	Content	Corrupted content	Have you decoded or decompressed the data? If yes and the data is still corrupted, contact Oxylabs	–

Structured data errors

While accessing Scraper API via callback data delivery method, you might get structured data errors. According to Eivydas, the results for structured data contain two separate status codes. The one in the root of the object contains the status code of the HTTP response that the target has given us. There is a different one in the content.parse_status_code, which marks the status of the parsing efforts:

12000 – parse successful. The content should be pristine, contain all the fields that are expected to be parsed.
12004 – parse successful with errors. There might be some fields that were not parsed correctly somewhere in the tree. Such fields contain the text “Could not parse xyz: ReasonForFailure”.
12003 – we could not parse the content because it is not supported. For now, only the target responses of 200 are attempted to be parsed.
12002 – failed the attempt to parse. A real failure on our side. This might be caused by a significant change in the HTML structure, and we have to adapt our code to continue parsing the format.

Wrapping up

So, we’ve covered the most common errors which you can encounter while integrating and using Scraper APIs. Reaching a high success rate should be a piece of cake now! Moreover, detailed documentation on what Eivydas has covered in his workshop is currently in the works and will be published on our website soon. In the meantime, if you have any questions regarding Scraper APIs, feel free to contact us.

About the author

Gabija Fatenaite

Lead Product Marketing Manager

Gabija Fatenaite is a Lead Product Marketing Manager at Oxylabs. Having grown up on video games and the internet, she grew to find the tech side of things more and more interesting over the years. So if you ever find yourself wanting to learn more about proxies (or video games), feel free to contact her - she’ll be more than happy to answer you.

Learn more about Gabija Fatenaite

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Data acquisition Scrapers