Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Data acquisition

Most Common HTTP Headers

Vytautas Kirjazovas

2021-09-204 min read

A common and repetitive question in the world of web scraping is how to avoid getting blocked by target servers? And, how to increase the quality of retrieved data?

HTTP headers for web scraping

Of course, there are proven resources and techniques, such as the use of a proxy or practicing rotating IP address that will help your web scraper to avoid blocks.

However, another sometimes overlooked technique is to use and optimize HTTP headers. This practice will allow to significantly decrease your web scraper’s chances of getting blocked by various data sources, and also ensure that the retrieved data is of high quality.

Don’t be alarmed if you have little knowledge about web headers, as we covered what HTTP headers are and discuss how they are connected in the web scraping process. If you wish to further your knowledge on the topic of scraping, check out our guide on how to scrape a website with Python.

In this article, we are revealing the 5 most common HTTP headers that need to be used and optimized, and provide you with the reasoning behind it.

Here is the brief list of the most common HTTP headers:

Header	Example value
HTTP header User-Agent	Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/12.0
HTTP header Accept-Language	en-US
HTTP header Accept-Encoding	gzip, deflate
HTTP headers Accept	text/html
HTTP header Referer	http://www.google.com/

HTTP headers enable both the client and server to transfer further details within the request or response.

1. HTTP header User-Agent

The User-Agent request header passes information related to the identification of application type, operating system, software, and its version, and allows for data target to decide what type of HTML layout to use in response i.e. mobile, tablet, or pc.

User-Agent

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5)
AppleWebKit/605.1.15 (KHTML, like Gecko)
Version/12.1.1 Safari/605.1.15

Authenticating the User-Agent request header is a common practice by web servers, and it is the first check that allows data sources to identify suspicious requests. For instance, when web scraping is in process, numerous requests are traveling to the web server, and if User-Agent request headers are identical, it will seem as it is a bot-like activity. Hence, experienced web scraping punters will manipulate and differentiate User-Agent header strings, which consequently allow portraying multiple organic users’ sessions.

So, when it comes to the User-Agent request header, remember to frequently alter the information this header carries, which will allow you to substantially reduce your odds of getting blocked.

2. HTTP header Accept-Language

The Accept-Language request header passes information indicating to a web server which languages the client understands, and which particular language is preferred when the web server sends the response back.

Accept-Language

en-gb

It’s worth mentioning that this particular header usually comes into play when web servers are unable to identify the preferred language e.g. via URL.

That said, the key with the Accept-Language request header is relevance. It is essential to ensure that set languages are in accordance with the data-target domain and client’s IP location. Simply because, if requests from the same client would appear in multiple languages this would raise suspicions to the web server of bot-like behavior (non-organic request approach), and consequently, they might block the web scraping process.

3. HTTP header Accept-Encoding

The Accept-Encoding request header notifies the web server of what compression algorithm to use when the request is handled. In other words, it states that the required information can be compressed (if the web server can handle it) when being sent out from the web server to the client.

Accept-Encoding

br, gzip, deflate

However, when optimized it allows saving traffic volume, which is a win-win situation for both the client and the web server from the traffic load perspective. The client still gets the required information (just compressed), and the web server isn’t wasting its resources by transferring a huge load of traffic.

4. HTTP header Accept

The Accept request header falls into a content negotiation category, and its purpose is to notify the web server on what type of data format can be returned to the client.

test/html,application/xhtml+xml,application/x
ml;q=0.9,*/*;q=0.8

It’s as simple as it sounds, but a common hiccup with web scraping is overlooking or forgetting to configure the request header accordingly to the web server’s accepted format. If the Accept request header is configured suitably, it will result in more organic communication between the client and the server, and consequently, decrease the web scraper’s chances of getting blocked.

5. HTTP header Referer

The Referer request header provides the previous web page’s address before the request is sent to the web server.

Referer

http://www.google.com/

It might seem that the Referer request header has very little impact when it comes to blocking the scraping process, when in fact, it actually does. Think of a random organic user’s internet usage patterns. This user is quite likely surfing the mighty internet and losing track of hours in a day. Hence, if you want to portray the web scraper’s traffic to seem more organic, simply specify a random website before starting a web scraping session.

The key is not to jump the gun and instead take this rather straightforward step. Hence, remember to always set up the Referer request header, and boost your chances of slipping under anti-scraping measures implemented by web servers.

Wrapping it up

With the list of common HTTP request headers provided in this article, now you know which web headers to configure, and by doing so, it will allow increasing your web scraper’s chances of a successful and efficient data extraction operation.

It’s safe to state that the more you know about the technical side of web scraping, the more fruitful your web scraping results will be. Use this knowledge wisely, and it’s a given that your web scraper will work more effectively and efficiently. If you’re just looking for web scraping project ideas and wondering how to begin web scraping at all, read it up at our blog. If you want to jump straight to the web scraping tasks, take a look at our own general-purpose web scraper.

Of course, if you have any further questions or would like to get a consultation, feel free to leave a comment below, drop us a line via live chat or email us at hello@oxylabs.io.

About the author

Vytautas Kirjazovas

Head of PR

Vytautas Kirjazovas is Head of PR at Oxylabs, and he places a strong personal interest in technology due to its magnifying potential to make everyday business processes easier and more efficient. Vytautas is fascinated by new digital tools and approaches, in particular, for web data harvesting purposes, so feel free to drop him a message if you have any questions on this topic. He appreciates a tasty meal, enjoys traveling and writing about himself in the third person.

Learn more about Vytautas Kirjazovas

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Proxies