Web scraping is public data extraction from web pages. When you browse the web on your computer, to get data from a web page, a browser sends an HTTP request that includes HTTP headers, among other things.
HTTP request headers play a crucial role in web scraping, conveying additional information between web servers and clients. Customizing HTTP request headers facilitates better communication between your software and the target website.
In this guide, you’ll learn how to send and receive HTTP headers using cURL, a versatile command-line tool for transferring data with URL syntax.
Every HTTP request and response may carry some additional information called HTTP headers. These headers provide important metadata, such as the content type, language, and caching instructions. Using HTTP headers, web developers can ensure that their websites function properly and provide a seamless experience to users.
HTTP headers consist of a name-value pair, separated by a colon – :. The name identifies the type of information sent, while the value is the actual data.
Some of the most common HTTP headers include User-Agent, Content-Type, Accept, and Cache-Control.
When you send an HTTP request with cURL, it sends the following headers by default:
Host: example.com
User-Agent: curl/7.87.0
accept: */*
You can change the value of these headers when sending a request.
To send HTTP headers with cURL, you can use the -H or --header option followed by the header name and value in the format "Header-Name: value":
curl -H "User-Agent: MyCustomUserAgent" http://httpbin.org/headers
In the example below, a custom User-Agent header is sent as "MyCustomUserAgent" when requesting the http://httpbin.org/headers page.
Changing the value of User-Agent
The http://httpbin.org/headers page is meant for testing as it returns a JSON file with all the headers it found in the request. Ignore the X-Amzn header that this site uses internally.
Custom HTTP headers can serve purposes such as authentication, content negotiation, or adding metadata to your requests.
To send custom HTTP headers with cURL, use the -H option and provide the header name and value as shown in the previous section. Here's another example:
curl -H "Authorization: Bearer my-access-token" http://httpbin.org/headers
In this example, an Authorization header is sent with the value "Bearer my-access-token" to access a protected resource at http://httpbin.org/headers.
To send multiple headers with cURL, you can use the -H option multiple times in the same command. Each -H option should be followed by a different header name and value:
curl -H "User-Agent: MyCustomUserAgent" -H "Accept: application/json" http://httpbin.org/headers
In this example, two headers are being sent:
A custom User-Agent.
An Accept header indicating the preference for JSON responses.
To view the response headers from a web server, you can use the -I or --head option with cURL. This will issue a HEAD request, which retrieves only the headers without the actual content:
curl -I http://httpbin.org/headers
curl --head http://httpbin.org/headers
You can also use the -i or --include option to show both the response headers and the content in the output:
curl -i http://httpbin.org/headers
curl --include http://httpbin.org/headers
If you want to send an empty header, provide the header name followed by a semicolon:
curl -H "User-Agent;" http://httpbin.org/headers
If you want to remove a header that cURL adds by default, you can provide the header name followed by a colon without a value. For example, to remove the User-Agent header:
curl -H "User-Agent:" http://httpbin.org/headers
You can use a colon with no value to remove a header
If you want to see more detailed information about the request and response, including the headers sent and received, you can use the -v or --verbose option. This can be helpful for debugging purposes:
curl -v http://httpbin.org/headers
curl --verbose http://httpbin.org/headers
If you want to save the response headers to a file for further analysis, you can use the -o or --output option along with the -D or --dump-header option:
curl -D headers.txt -o content.txt http://httpbin.org/headers
In this example, the response headers will be saved to headers.txt, and the content will be saved to content.txt.
In addition to the examples provided above, there are other use cases for sending custom headers with cURL. Following are some of the common scenarios where custom headers are particularly useful.
When requesting data from an API or web service, you may need to specify the desired response format, such as JSON or XML. In this case, you can use the Accept header to indicate your preference:
curl -H "Accept: application/json" http://httpbin.org/headers
You can use conditional headers like If-Modified-Since or If-None-Match to request a resource only if it has been modified since a specific date or doesn't match a specific ETag value. These headers help minimize bandwidth usage and optimize web scraping performance:
curl -H "If-Modified-Since: Sun, 06 Nov 2022 08:49:37 GMT" http://httpbin.org/headers
When making requests to certain websites or APIs, you may need to include a Referer header to provide the source of the request. This can be important for tracking purposes or to comply with specific API requirements:
curl -H "Referer: http://example.com" http://httpbin.org/headers
While the Authorization header is commonly used for authentication, some APIs or services may require a custom header for this purpose. In such cases, you will need to include a custom header with the appropriate authentication information:
curl -H "X-Api-Key: my-api-key" http://httpbin.org/headers
You might encounter some common issues when working with cURL and HTTP headers. Here are a few tips to help you troubleshoot problems.
Ensure the header name and value are correctly formatted, with a colon (":") separating the name and value. Also, ensure no extra spaces or typos in the header name or value.
Not all headers are supported by every API or web service. Check the documentation for the target API or website to confirm that the header you're using is accepted and properly implemented.
Although HTTP headers are generally case-insensitive, some APIs or services might expect headers with specific capitalization. If you're having issues, try adjusting the capitalization of the header name to match the API documentation or examples.
When you encounter issues, it's important to review the response from the server, which might include status codes or error messages that can help you diagnose the problem. Use the -i or --include option with cURL to view the response headers and content together, which can help you identify issues related to the headers you're sending.
If you want to know more about the essence of the topic, check what HTTP headers are and get familiar with cURL and its usage.
Additionally, see how cURL works with Python and for proxying – proxy applications with cURL.
You can also check this topic on our GitHub.
If you have any questions about the process above, don’t hesitate to reach us via the live chat on our homepage or email us at support@oxylabs.io.
To add headers in cURL, use the -H or --header command followed by the header name and value in the format "Header-Name: value":
curl -H "User-Agent: MyCustomUserAgent" http://httpbin.org/headers
Yes, cURL automatically adds standard headers, such as User-Agent, Accept, and Host, based on the request type and other options. You can override or add custom headers using the -H command.
To check HTTP headers in cURL, use the -I or --head option to only retrieve headers without the actual content:
curl -I http://httpbin.org/headers
Alternatively, you can use the -i or --include option to show both the response headers and content in the output:
curl -i http://httpbin.org/headers
Yes, you can send empty headers by providing the header name followed by a semicolon:
curl -H "User-Agent;" http://httpbin.org/headers
To remove a header that cURL adds by default, provide the header name followed by a colon without a value. For example, to remove the User-Agent header:
curl -H "User-Agent:" http://httpbin.org/headers
About the author
Augustas Pelakauskas
Senior Copywriter
Augustas Pelakauskas is a Senior Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures - the most recent one being writing. After testing his abilities in the field of freelance journalism, he transitioned to tech content creation. When at ease, he enjoys sunny outdoors and active recreation. As it turns out, his bicycle is his fourth best friend.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Forget about complex web scraping processes
Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.
Scale up your business with Oxylabs®
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub