Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

OxyCon Events

OxyCon 2021: The Top Takeaways From Day One

Monika Maslauskaite

2021-08-255 min read

While the present global scenario is unfavorable for face-to-face meetings, Oxylabs has held a virtual event we’ve all been waiting for so long: OxyCon 2021. This late-summer afternoon has finally brought us together for a two-day conference, boosted with know-how by experts from market-leading businesses. 16 speakers have gathered online to cover the most relevant aspects of web scraping.

On day one, we had a mix of in-depth presentations and discussions, wrapped up with some friendly interactions. Whether you attended the event or are just curious, we’ve put together a brief rundown of the first day. So, let’s get started.

Data Quality – Your Worst Nightmare

Following the opening speech by Oxylabs CEO Julius Černiauskas, introducing the conference and its moderators Gabija Fatėnaitė, Product Marketing Manager, and Vaidotas Šedys, Head of Risk Management, it was time to meet the first presenter: Allen O’Neill, founder and CTO at DataWorks.

Allen shared tips and tricks on how to reach the top of data quality and make sure working with data is cost-efficient and not too time-consuming:

First and foremost, he suggested that businesses should focus on their core competencies and leverage experts to ensure data is high-quality.
Allen also noted it’s essential to evaluate the scope of data needed and define the expected value range. Different industries require different data.
Finally, he provided ten key points that are crucial to observe when ensuring data quality. They include data clustering, unexpected values evaluation, impossible values consideration, checking different types of data, keeping in mind extraordinary data, tracking values that are out of scope or range, checking for outliers, making sure data is logical, and eliminating spelling mistakes.

Indexing: Scraping Website from Zero to Sitemap

Eivydas Vilčinskas, Senior Software Engineer at Oxylabs, opened up a series of more technical topics designed for developers in the field of web scraping. Eivydas introduced website indexing and explained how businesses could use it for data collection.

Eivydas noted two most important things businesses have to define before gathering data: 1) what data they need; 2) where to find it. Only after that the process can go on.

According to the speaker, website indexing is crucial, yet it depends on the website how much it can be used. For instance, if the content doesn’t change, indexes can be available for a long period of time. However, websites containing dynamic content must make sure to update their index constantly.

TLS Fingerprinting in Web Scraping

Another technical topic was covered by Martynas Juravičius, Lead Data Analyst at Oxylabs. He presented the concept of TLS fingerprinting and its applications, and discussed the impact this process has on web scraping and bot detection.

Firstly, Martynas introduced what fingerprinting, in general, is – a process of taking protocol settings and combining them into a unique fingerprint stored in a database. Fingerprints are used to track malicious software and identify device parameters. TLS fingerprinting is a passive type of fingerprinting and might be one factor in bot scrapers getting blocked during data scraping.

How do they get blocked? Basically, anti-bot software compares TLS fingerprints with HTTP user agents. If they don’t match – that’s the reason the scraper gets restricted. To avoid this, developers use vast databases of user agents.

Martynas provided three solutions on how to avoid TLS fingerprinting while web scraping:

Randomize parameters (Cipher suites and TLS versions) – sending different parameters that would be hardly detectable by anti-bot software.
Align parameters – employ massive user agent databases and align them with TLS parameters.
Use real browsers and user agents – for example, download many various browsers to your computer.

Harnessing the Power of External Data in E-commerce

Tomas Montvilas, Chief Commercial Officer at Oxylabs, continued the conference with one more fascinating topic on data collection for businesses. He outlined the main challenges of collecting external data in real-time, namely:

Building and maintaining real-time scraping pipelines.
Managing proxy infrastructure.
Handling CAPTCHAs and website changes.
Data parsing and cleaning.

Tomas also noted that external data is a powerful tool that helps companies stand out from competitors. In the presentation, he discussed the following use cases and solutions:

Optimizing assortment in digital shelves by identifying selection gaps and overlaps using pricing ladders.
Enabling real-time dynamic pricing by applying multiple price recommendation algorithms, such as competitive response, KVI, and elasticity algorithms.
Monitoring search placements in the marketplace by moving to the first page and thus boosting your sales.

Monitoring Web Scrapers: Best Practices

Oxylabs Data Analyst Andrius Kūkšta emphasized the necessity of quick reaction to any potential deviations while maintaining web scrapers. During his presentation, Andrius pointed out some beneficial practices for building and maintaining scraper monitoring systems:

Building block detection tools. This includes getting HTML, parsing it, passing it to the classifier that would predict if there is a possibility of a block.
Collecting statistics. For this, you’ll need to prepare dashboards and do a lot of analysis of operation outcomes, durations, and all the steps of a process chain.
Setting up alerting systems. It’s important to set highly sensitive thresholds at first and make them smaller later on if you don’t want to get alerts so often. Also, it comes in handy to split alerts into critical and non-critical ones.
Testing scraper monitoring capacity. This contains information such as how many requests you can make per proxy and average requests per every scraping parameter set.
Making data-based decisions. It’s essential to evaluate if you have enough capacity before onboarding new customers, continuously improve it, and consider pricing.

Machine Learning Infrastructure

The final presentation of day one was made by Pujaa Rajan, Machine Learning Engineer at Stripe. Being an experienced professional in machine learning, Pujaa presented what tools and resources are a must when developing machine learning infrastructure in a company. The key takeaways from the presentation:

Pujaa introduced the lifecycle of Machine Learning: preparing data, building a model, training it, putting a model into production, continuous scaling, and retraining for making it more cost-effective, adding new features, and overall deciding on how to improve your product.
She also discussed the aspects of how each ML infrastructure supports the above-mentioned lifecycle by going through the steps of the development, from writing a code to monitoring the overall result.
What’s more, Pujaa introduced software that it’s used for building and maintaining the ML infrastructure. According to the speaker, choosing a programming language is a vital step before developing any project.
Pujaa discussed some technical details of the ML infrastructure and summed up her presentation by defining the characteristics of an excellent ML-based infrastructure: usefulness, explainability, simplicity, and scalability.

Here’s a quick wrap-up of what we’ve experienced on day one. We also had a second inspiring day filled with expertise sharing. Head over to our blog post to learn more about day two of OxyCon 2021!

About the author

Monika Maslauskaite

Former Content Manager

Monika Maslauskaite is a former Content Manager at Oxylabs. A combination of tech-world and content creation is the thing she is super passionate about in her professional path. While free of work, you’ll find her watching mystery, psychological (basically, all kinds of mind-blowing) movies, dancing, or just making up choreographies in her head.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

OxyCon Events