Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Data acquisition Scrapers

Setting the Right Approach to Web Scraping

Iveta Vistorskyte

2020-06-264 min read

Just recently, Oxylabs hosted the first webinar about residential proxies usage mistakes and how to solve them. We shared our knowledge on how to start web scraping. In this article, we will specify our tips on setting the right approach to web scraping and what are the key elements for the best web scraping practice.

If you are interested in watching the whole webinar, click here and watch it for free! In the webinar, you will learn how to choose between residential and datacenter proxies. Also, get tips on how to decide which proxy service provider works best for you.

Successful web scraping

Just as with most data gathering tasks, getting started is the hardest part. To make it easier, follow these steps: set a preferred session, see if it works with a test query, and then start scraping your target website. Testing is an essential part because you can check if your web scraping will be successful, and make sure you will get the best results.

Sessions and their importance

Sessions are an essential part of the residential proxy network. They enable you to use the same IP address for multiple requests. By default, every new request that goes through the residential network is carried out by a new proxy and this can cause issues. For example, if you are using a full browser, bot, or a headless browser to download assets from your target websites, all of them must be downloaded using the same IP address. In this case, assets mean everything that comes with the HTML – CSS, JavaScript files, images, and so on.

Reliable proxy providers will offer you flexible and adjustable session control features, so you can be sure that this part will be managed easily.

HTTP headers for web scraping

HTTP abbreviation stands for HyperText Transfer Protocol, which manages how communication is transferred and structured on the internet. Also, HTTP is responsible for how web servers and browsers should respond to different requests. There are different types of HTTP headers: request header, response header, general HTTP header, entity header, and so on. If you want to get more information, check out our other blog post, where we covered this topic in detail.

When web scraping, sending the HTTP headers, and preferably in the right order is the minimum these days. All the requests without specific HTTP headers are likely to be blocked very quickly. For successful web scraping, you should think of every possible way to avoid blocks. Optimizing HTTP headers reduces the chances of being blocked by data sources.

To start optimizing HTTP headers, we advise you to see how the browser works by itself. In Firefox or Chrome, hit the F12 button and open developer tools. Go to the Network tab and refresh a page you are on. You will see all requests that the browser had to make in order to fully render the page. Find where the HTML content was loaded, and you will see what headers and in which order were sent. Try to make this happen on your scraper too.

“Fingerprinting” and its relevance

“Fingerprinting” is all the information that your browser gives websites about you and your computer, such as mouse input, resolution, installed plugins, and much more. Having all this information, you can make a single hash, a fingerprint. It makes it easier to identify if requests come from a browser or not. Fingerprinting is becoming the primary weapon to identify web scraping bots and increases the chances of being blocked.

Some websites already have anti-scraping solutions that check “fingerprints”, but it is not very common yet. The major problems are that it still brings a lot of false positives, which might have converted to sales. More importantly, it requires tremendous hardware resources to process all the data. Overall, chances to run into such issues are quite slim, but if you do, the best way is to use a headless browser, preferably with stealth addons.

What are headless browsers?

As the last resort we recommend trying headless browsers. A headless browser is a type of software that can access web pages but does not show them to the user. They can direct the content of the target servers to another program. Some of them even have extensions and plugins to hide that they are not real browsers, but usually they work pretty well out of the box. This is your best shot with seriously difficult targets.

More practical tips on web scraping

1. Visit the home page before accessing the inner content. Regular users rarely have full links to products or articles, first they land on the home page, and then browse further.

2. Data that is under authentication or protected with the password could be considered as private, and scraping such data in some cases can be illegal. Before starting web scraping of any kind, we suggest you consult your legal advisors and carefully read the particular website’s terms of service, or even receive a scraping license if possible.

3. Choose the right proxy type for your web scraping tasks. Two of the main proxy types are residential and datacenter proxies. Usually, they are used for different targets. You can find more information on this topic by watching the whole webinar.

Conclusions

Figuring out how to start web scraping can be a complicated task. To make it easier, follow this workflow: set a preferred session, see if it works with a test query, and then start scraping your target public data source. Do not forget to discuss with your legal advisors that you would not encounter any legal issues when web scraping.

The most difficult part is to avoid being blocked by targeted servers. Sessions, HTTP headers, headless browsers, and “fingerprinting” are the essential things you should note to make your web scraping session successful.

If you are interested in web scraping, Oxylabs has a self-service check out for smaller residential proxy plans! Register here and decide what is best for you. Furthermore, if you have more questions, book a call with our sales team! They are ready to answer all your questions.

About the author

Iveta Vistorskyte

Lead Content Manager

Iveta Vistorskyte is a Lead Content Manager at Oxylabs. Growing up as a writer and a challenge seeker, she decided to welcome herself to the tech-side, and instantly became interested in this field. When she is not at work, you'll probably find her just chillin' while listening to her favorite music or playing board games with friends.

Learn more about Iveta Vistorskyte

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Data acquisition