Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Tutorials

Puppeteer on AWS Lambda

Jordan Hansen

2021-12-222 min read

There are a few challenges when it comes to getting Puppeteer to work properly on AWS Lambda, and we’ll address all of them in this post.

But first, let’s start with introducing both Puppeteer and AWS Lambda.

Jordan Hansen, Owner of Cobalt Intelligence

What is Puppeteer?

Simply put, Puppeteer is a software for controlling a (headless) browser. It’s a piece of open-source software developed and supported by Google’s developer tools team. It allows you to simulate user interaction with a browser through a simple API.

This is very helpful for doing things like automated tests or, my personal use case, web scraping.
A picture’s worth a thousand words. How much is a gif worth? With a little bit of code shown in the gif below, I can log in to a Google account. You simply need to click, enter text, paginate, and scrape all the publicly available data you need.

What is AWS Lambda?

AWS Lambda is what Amazon calls “Run code without thinking about servers or clusters.” You can simply create a function on Lambda and then execute it. It’s that easy.

Simply put, you can do everything on AWS Lambda. Okay, everything is a strong word, but almost. For example, I scrape thousands of public web pages every night with AWS Lambda functions. I also manage to insert data into databases. All of my back-end API routes are hosted on AWS Lambda when users sign up for my services.

Getting started with AWS Lambda is simple and inexpensive. You only need to pay for what you use, and they also have a generous free trial.

Problem #1 – Puppeteer is too big to push to Lambda

AWS Lambda has a 50 MB limit on the zip file you push directly to it. Due to the fact that it installs Chromium, the Puppeteer package is significantly larger than that. However, this 50 MB limit doesn’t apply when you load the function from S3! See the documentation here.

AWS Lambda quotas can be tight for Puppeteer:

The 250 MB unzipped can be bypassed by uploading directly from an S3 bucket. So I create a bucket in S3, use a node script to upload to S3, and then update my Lambda code from that bucket. The script looks something like this:

"zip": "npm run build && 7z a -r function.zip ./dist/*  node_modules/",
"sendToLambda": "npm run zip && aws s3 cp function.zip s3://chrome-aws && rm function.zip && aws lambda update-function-code --function-name puppeteer-examples --s3-bucket chrome-aws --s3-key function.zip"

Problem #2 – Puppeteer on AWS Lambda doesn’t work

By default, Linux (including AWS Lambda) doesn’t include the necessary libraries required to allow Puppeteer to function.

Fortunately, there already exists a Chrome AWS Lambda package that uses Chromium. You can find it here. You will need to install it and puppeteer-core in your function that you are sending to Lambda.

The regular Puppeteer package will not be needed and, in fact, counts against your 250 MB limit.

`npm i --save chrome-aws-lambda puppeteer-core`

And then, when you are setting it up to launch a browser from Puppeteer, it will look like this:

const browser = await chromium.puppeteer
        .launch({
            args: chromium.args,
            defaultViewport: chromium.defaultViewport,
            executablePath: await chromium.executablePath,
            headless: chromium.headless
        });

Final note

Puppeteer requires more memory than a regular script, so keep an eye on your max memory usage. When using Puppeteer, I recommend at least 512 MB on your AWS Lambda function.
Also, don’t forget to run `await browser.close()` at the end of your script. Otherwise, you may end up with your function running until timeout for no reason because the browser is still alive and waiting for commands.

About the author

Jordan Hansen

Founder of Cobalt intelligence

Jordan Hansen is a professional web scraper who lives in Eagle, ID in the United States. His company, Cobalt Intelligence, gets Secretary of State business data for banks and business lenders via API.

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Tutorials Scrapers