Web Scraper API is an AI-driven tool with a range of smart built-in features for public data extraction from any web page in real time and at scale. There is no need to develop and maintain your own scraping infrastructure. You can focus on what’s most important – the data – and leave technicalities to us.
In this guide, you’ll learn how to start using Web Scraper API and send a first query.
Register, or if you already have an account, log in to the dashboard.
After selecting a free trial or subscription plan, a pop-up window will appear, asking to create an API user. Think of a username and password and create an API user.
3. You'll see a following pop-up with a test query for scraping sandbox.oxylabs.io. To test, copy the provided code to your terminal, insert your API user credentials, and run the query.
A test query from the dashboard
Here’s the query in code:
curl 'https://realtime.oxylabs.io/v1/queries' \
--user 'USERNAME:PASSWORD' \
-H 'Content-Type: application/json' \
-d '{"source": "universal", "url": "https://sandbox.oxylabs.io/", "geo-location": "United States", "render": "html"}'
Here’s an output example of the query:
{
"results": [
{
"content": "<!doctype html>\n<html lang=\"en\">\n<head>
...
</script></body>\n</html>\n",
"created_at": "2023-09-01 08:14:14",
"updated_at": "2023-09-01 08:14:29",
"page": 1,
"url": "https://sandbox.oxylabs.io/products/",
"job_id": "7103288932057528321",
"status_code": 200
}
]
}
For a visual representation of how to set up and manually test Web Scraper API, check the video below.
You can also check how Web Scraper API works in our Scraper APIs Playground, accessible via the dashboard.
The example above implements the Realtime integration method. With Realtime, you can send your request and receive data back on the same open HTTPS connection straight away.
You can integrate Web Scraper API using one of the three methods:
Realtime
Push-Pull
Proxy Endpoint
Read more about integration methods and how to choose one here. In essence, here are the main differences.
Push-Pull | Realtime | Proxy Endpoint | |
---|---|---|---|
Type | Asynchronous | Synchronous | Synchronous |
Job query format | JSON | JSON | URL |
Job status check | Yes | No | No |
Batch query | Yes | No | No |
Upload to storage | Yes | No | No |
For full examples of Push-Pull and Proxy Endpoint integration methods, please see our GitHub or documentation.
Below are the main query parameters. For more details and additional parameters, such as handling specific context types, visit our documentation.
Parameter | Description |
---|---|
source |
Sets the scraper to process your request.The default value is universal . |
url |
Direct URL (link) to the target page. |
user_agent_type |
Device type and browser.The default value is desktop . |
geo_location |
Geolocation of a proxy used to retrieve the data. |
locale |
Locale, as expected in the Accept-Language header. |
render |
Enables JavaScript rendering. |
Below are the most common response codes you can encounter using Web Scraper API. Please contact technical support if you receive a code not found in our documentation.
Response | Error message | Description |
---|---|---|
200 |
OK | All went well. |
202 |
Accepted | Your request was accepted. |
204 |
No content | You are trying to retrieve a job that has not been completed yet. |
400 |
Multiple error messages | Wrong request structure. Could be a misspelled parameter or an invalid value. The response body will have a more specific error message. |
401 |
Authorization header not provided / Invalid authorization header / Client not found | Missing authorization header or incorrect login credentials. |
403 |
Forbidden | Your account does not have access to this resource. |
404 |
Not found | The job ID you are looking for is no longer available. |
422 |
Unprocessable entity | There is something wrong with the payload. Make sure it's a valid JSON object. |
429 |
Too many requests | Exceeded rate limit. Please contact your account manager to increase limits. |
500 |
Internal server error | We're facing technical issues, please retry later. We may already be aware, but feel free to report it anyway. |
524 |
Timeout | Service unavailable. |
612 |
Undefined internal error | Job submission failed. Retry at no extra cost with faulted jobs, or reach out to us for assistance. |
613 |
Faulted after too many retries | Job submission failed. Retry at no extra cost with faulted jobs, or reach out to us for assistance. |
Web Scraper API has a range of smart built-in features.
Web Crawler lets you crawl any website, select useful content, and have it delivered in bulk. The tool can discover all pages on a website and fetch data from them at scale and in real time. Read more for tech details.
Scheduler automates recurring web scraping and parsing jobs by scheduling them. You can schedule at any interval – every minute, every five minutes, hourly, daily, every two days, and so on. With Scheduler, you don’t need to repeat requests with the same parameters. Read more for tech details.
Custom Parser lets you get structured data from any website. You can parse data with the help of XPath and CSS expressions. With Custom Parser, you can take the necessary information from the HTML and convert it into a readable format.
Read more for tech details.
Cloud integration allows you to get your data delivered to a preferred cloud storage bucket, whether it's AWS S3 or GCS. This eliminates the need for additional requests to fetch results – data goes directly to your cloud storage. Read more for tech details.
Headless Browser enables you to interact with a web page, imitate organic user behavior, and efficiently render JavaScript. You don't need to develop and maintain your own headless browser solution, so you can save time and resources on more critical tasks. Read more for tech details.
In the Oxylabs dashboard, you can follow your usage. Within the Statistics section, you’ll find a graph with scraped pages and a table with your API user's data. It includes average response time, daily request counts, and total requests. Additionally, you can filter the statistics to see your usage during specified intervals.
You can try Web Scraper API for free for a week with 5K results. If you have any questions, please contact us via the live chat or email us at support@oxylabs.io.
For more tutorials and tips on all things web data extraction, stay engaged:
Every user account has a rate limit corresponding to their monthly subscription plan. The rate limit should be more than enough based on the expected volume of scraping jobs.
You can download images either by saving the output to the image extension when using the Proxy Endpoint integration method or passing the content_encoding parameter when using the Push-Pull or Realtime integration methods.
Yes, with the free trial, you’ll get 5,000 results for a week.
You can choose a plan suited for small businesses and large enterprises, starting from $49/month.
Billing depends on the number of successful results. Failed attempts with an error from our side won’t affect your bills.
About the author
Augustas Pelakauskas
Senior Copywriter
Augustas Pelakauskas is a Senior Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures - the most recent one being writing. After testing his abilities in the field of freelance journalism, he transitioned to tech content creation. When at ease, he enjoys sunny outdoors and active recreation. As it turns out, his bicycle is his fourth best friend.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Web Scraper API for effortless data gathering
Extract data even from the most complex websites without hassle.
Scale up your business with Oxylabs®
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub