OxyCon, Oxylabs’ very first annual web data harvesting conference, was packed with in-depth talks and workshops. On the second day of the event, Eivydas Vilcinskas, Software Engineer at Oxylabs, took the stage to share some tactical advice on how to reach a high success rate using Oxylabs Scraper APIs (formerly known as Real-Time Crawler). Scraper APIs include SERP Scraper API, E-Commerce Scraper API, Real Estate Scraper API. and Web Scraper API.
According to Eivydas, 99.7% of the time, Scraper APIs successfully deliver data. However, as with all services, there is always that 0.03% chance of the system downtime. Fortunately, in his workshop, Eivydas walked through all possible issues and explained how to solve each and every one of them.
Before we go into error codes, let’s quickly recap how you could access the service of Scraper APIs.
This method is the simplest (but also the most limited) way of accessing Scraper APIs. Apart from providing the target URL, you can only provide headers to select the wanted User-Agent-Type and Geo-Location to spoof. We also don’t allow the use of our javascript rendering service or submitting jobs in batches.
Proxy Endpoint acts as a standard proxy with some added functionality and returns the body of the response from the target verbatim. We don’t wrap it in any structures like JSON or add any additional data.
When using the real-time data delivery method, you POST the job, and Scraper APIs return the requested data on an open connection. If done correctly, the data should come back with the HTTP status code 200 and should contain a JSON with the data you requested.
The callback method allows you to decide when to retrieve the requested data (but no later than after 24 hours) and lets you manage the full range of options as well as request/response timings. Check our previous blog post callback vs. real-time data delivery methods to learn more.
According to Eivydas, the majority of errors that you might encounter while integrating or using Scraper APIs fall into three categories: request, response, and content.
These types of errors are related to the request path and usually arise when the signal doesn’t reach the intended destination. It might mean that the Scraper APIs servers are physically not reachable over the network and/or the services are not running correctly.
Usually, response error means that the network is running smoothly, the services are available to return something, and the issue is most probably related to the way a request is made and the data it contains. For example, you might be using a wrong HTTP method for contacting the service endpoint, or you request for the data that we cannot provide.
Once you get the data from the Scraper API, you can process it further. We report the job as completed successfully, but during the data analysis on your part, you find out that the data is not exactly as you requested, or that it has some flaws.
Depending on which type of access method you are using and which kind of error you get, there might be different ways to solve an issue. We summarized all of them in the classy table down below.
Access type | Error type | Error code | Solution | Plan B |
Proxy Endpoint / Real-Time / Callback | Request | Servers are not reachable | Wait a few minutes before retrying | If the server is still down after 5 minutes, contact your account manager |
Proxy Endpoint / Real-Time / Callback | Request | Scraper API is not reachable | Check if you’re not hitting the wrong endpoint | Troubleshoot your connection. If the connection is ok, contact Oxylabs |
Proxy Endpoint / Real-Time / Callback | Response | 400 | Look for the message in the body to see the reason | – |
Real-Time / Callback | Response | 401 | Check if you’re using correct credentials or if your user wasn’t disabled. To fix this, contact your account manager | You might get this error code if the source that you’ve given to the Scraper API is not supported or disabled for you. Contact your account manager to discuss implementing the necessary source in our system or to have it enabled for you. |
Proxy Endpoint / Real-Time / Callback | Response | 404 | Check the documentation for correct endpoints | – |
Real-Time / Callback | Response | 405 | Use POST to submit your jobs. Any other HTTP method will return a response with this status code | – |
Proxy Endpoint | Response | 407 | Check if you’re not using incorrect credentials | If you want to reset your credentials, contact your account manager |
Proxy Endpoint / Real-Time | Response | 408 | Increase the default timeout value to 120 s and try again | If timeout comes from the Scraper API’s side, contact Oxylabs |
Callback | Response | 408 | Increase the default timeout value to 30 s and try again | If timeout comes from the Scraper API’s side, contact Oxylabs |
Proxy Endpoint / Real-Time / Callback | Response | 429 | You have reached the limit of requests per week/month/etc. Reach out to your account manager to increase the limit | You might be making too many requests per minute. Contact your account manager to increase this limit |
Real-Time / Callback | Response | 5xx | If this error appears for more than 5 minutes, contact Oxylabs | – |
Proxy Endpoint / Real-Time / Callback | Content | Status code is not 200 | Retry the job | The reason for this error might be incorrect job parameters. This status code should be handled from your side, but you can always contact Oxylabs for assistance |
Proxy Endpoint / Real-Time / Callback | Content | 200 but data for given parameters are incorrect | Target website might have changed their algorithms. There might be a way to get the required data by using different parameters. Contact Oxylabs for assistance | – |
Proxy Endpoint / Real-Time / Callback | Content | Corrupted content | Have you decoded or decompressed the data? If yes and the data is still corrupted, contact Oxylabs | – |
While accessing Scraper API via callback data delivery method, you might get structured data errors. According to Eivydas, the results for structured data contain two separate status codes. The one in the root of the object contains the status code of the HTTP response that the target has given us. There is a different one in the content.parse_status_code, which marks the status of the parsing efforts:
12000 – parse successful. The content should be pristine, contain all the fields that are expected to be parsed.
12004 – parse successful with errors. There might be some fields that were not parsed correctly somewhere in the tree. Such fields contain the text “Could not parse xyz: ReasonForFailure”.
12003 – we could not parse the content because it is not supported. For now, only the target responses of 200 are attempted to be parsed.
12002 – failed the attempt to parse. A real failure on our side. This might be caused by a significant change in the HTML structure, and we have to adapt our code to continue parsing the format.
So, we’ve covered the most common errors which you can encounter while integrating and using Scraper APIs. Reaching a high success rate should be a piece of cake now! Moreover, detailed documentation on what Eivydas has covered in his workshop is currently in the works and will be published on our website soon. In the meantime, if you have any questions regarding Scraper APIs, feel free to contact us.
About the author
Gabija Fatenaite
Lead Product Marketing Manager
Gabija Fatenaite is a Lead Product Marketing Manager at Oxylabs. Having grown up on video games and the internet, she grew to find the tech side of things more and more interesting over the years. So if you ever find yourself wanting to learn more about proxies (or video games), feel free to contact her - she’ll be more than happy to answer you.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Forget about complex web scraping processes
Choose Oxylabs' advanced web intelligence collection solutions to gather real-time public data hassle-free.
Scale up your business with Oxylabs®
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub