Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Tutorials Scrapers

How to Scrape Product Data From Wayfair: A Step-by-Step Guide

Augustas Pelakauskas

2023-04-036 min read

Wayfair is a retailer specializing in furniture and home appliances. As a big player in the home appliances business, it’s a major source of public web data, especially for the e-commerce industry.

On the Wayfair website, you can find various product data types – potential targets for analysis. When collected at scale and in real time, such data can be used to forecast trends and check featured data fluctuations.

With Oxylabs Wayfair Scraper API, you can extract e-commerce data to weigh your stance against the competition, position a product at a golden spot to maximize revenues, or simply buy an item at the lowest price. No matter the use case, the API’s maintenance-free infrastructure will save time and effort.

Collecting and analyzing e-commerce data helps maintain a competitive advantage with the following applications:

Pricing intelligence – create a long-term product pricing strategy.
Dynamic pricing – adjust your prices according to the competition.
Real-time product monitoring – check various product attributes.
MAP monitoring – track MAP violators to enforce policy agreements.

The following tutorial will explore how to scrape data from Wayfair. Read on for the page layout overview, project environment preparation, fetching the Wayfair product page for data extraction, and export to CSV or JSON format.

Overview of Wayfair page layout

Before getting technical, let’s analyze the Wayfair page layout. Here are some of the most relevant types.

Search result page

The search result page appears when searching for products. For example, if you search for the term Sofa, the search result will be similar to the one below:

Search results page

You can extract all the products listed for the search term “Sofa” as well as their links, titles, prices, ratings, and images.

Product listing page

Product listing appears when you click on a product to see the details. It shows all the product information in addition to the main data already visible on the search result page.

Product listing page

reCAPTCHA protection page

The reCAPTCHA protection page appears when Wayfair detects unusual browsing behavior, such as repeated or too-fast (for an organic user) navigation from page to page, indicating the use of automated scripts such as scrapers. The page looks similar to the one below:

reCAPTCHA protection page

Bypassing Wayfair scraping challenges

Elaborate anti-bot systems and an ever-changing web layout make automated data extraction difficult. As a consequence, when collecting data at scale from Wayfair, you might get blocked, banned, or blacklisted, not to mention constantly micromanaging the script to fix code breaks.

Wayfair is using Google’s reCAPTCHA service to block automated scrapers. It is an anti-bot protection service that uses fingerprinting algorithms and behavioral pattern recognition.

Oxylabs Wayfair API provides out-of-the-box support for bypassing the anti-bot measures by providing proxies, custom headers, user agents, and other features. This immensely eases the process and simplifies the scraper you would have to build.

Compared to regular scrapers, Wayfair Scraper API has multiple advantages, including:

ML-driven proxy management
Dynamic browser fingerprinting
JavaScript rendering

How to Scrape Wayfair Product Data

Now, let’s see how to use Oxylabs Wayfair API to extract data from the Wayfair product page.

1. Set up the project environment by installing Python and required libraries

To begin scraping Wayfair data, prepare the project environment. If you already have Python installed, you can skip the Python installation and only install the dependencies in your active Python environment.

Installing Python

This tutorial is written using Python 3.11.2. However, it should also work with the older or latest version of Python 3. You can download the latest version of Python from the oﬃcial web page.

Installing dependencies

Once you have downloaded and installed Python, install the following dependencies by executing the command below in the terminal or command prompt:

python -m pip install requests bs4 pandas

This command will install the Requests, Beautiful Soup, and Pandas libraries. These modules will interact with the API and store data.

2. Fetch Wayfair product data using Wayfair Scraper API

Here’s a target product page. Use Wayfair Scraper API to fetch Wayfair product data and parse it using the Beautiful Soup library.

Signup for an Oxylabs account

To use the Oxylabs Wayfair API, create an Oxylabs account. You will find all the available APIs here. Make sure to use a 1-week free trial. You’ll have enough time to fine-tune the scraper. After the trial ends, you can keep using the service by seamlessly upgrading to a preferred plan.

Once you create your account, you will get your sub-user’s credentials, with which you can send network requests to the API.

Wayfair Scraper API overview

Before starting, let’s discuss some of the most useful query parameters of Wayfair Scraper API. The API operates in two modes.

Scraping using URL

Using this method, you can scrape any Wayfair URL. You will only have to pass two required parameters: url and source. The source parameter should be set to wayfair, and the url should be a Wayfair web page URL.

It also takes optional parameters such as user_agent_type and callback_url. The user_agent_type tells the API which device the user agent will use (e.g., desktop). Lastly, the callback_url parameter is used to specify a URL to which the server should send a response after processing the request. Take a look at an example of a payload:

payload = {
    "source": "wayfair",
    "url": "https://www.wayfair.com/furniture/pdp/wade-logan-freetown-885-wide-reversible-sleeper-sofa-chaise-w010379019.html",
    "user_agent_type": "desktop",
    "callback_url": "<URL to your callback endpoint.>"
}

Scraping with query

The other method is to scrape data from search results. It also needs two parameters: source and query. This time, set the source to wayfair_search and put the search terms in the query parameter. This endpoint also supports additional parameters such as start_page, pages, limit, callback_url, and user_agent_type.

payload = {
'source': 'wayfair_search',
'query': 'sofa',
'start_page': 1,
'pages': 5,
'limit': 48
}

The result will start from the page number mentioned in the start_page parameter. You can retrieve several pages from the search result using the pages parameter and control how many search results per page to fetch using the limit parameter.

Sending network requests

To start writing your Wayfair scraper, import the libraries and create a payload with the necessary variables:

import requests
from bs4 import BeautifulSoup

product_url = "https://www.wayfair.com/furniture/pdp/wade-logan-freetown-885-wide-reversible-sleeper-sofa-chaise-w010379019.html"
payload = {
    "source": "wayfair",
    "url": product_url,
    "user_agent_type": "desktop",
}
username = "USERNAME"
password = "PASSWORD"

Notice username, password, and product_url variables. You will have to use your Oxylabs sub-user’s username and password. Also, if you wish, you can replace product url with the desired URL.

Next, send a POST request using the Requests module to Oxylabs' realtime API endpoint: https://realtime.oxylabs.io/v1/queries.

response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    auth=("USERNAME", "PASSWORD"),
    json=payload,
)
print(response.status_code)

In the code above, the POST method of the Requests module is used to send a POST request to the API. The sub-user’s credentials are passed for authentication, and the payload is sent in JSON format.

If you run this code, you’ll see 200 as an output which indicates success. If you get any other status code, recheck your credentials and payload.

3. Parse HTML data using BeautifulSoup

Now, you can parse the content of the JSON response. The JSON object will have the content of the webpage in HTML format. Use BeautifulSoup to parse the HTML from the response:

content = response.json()[“results”][0][“content”]
soup = BeautifulSoup(content, “html.parser”)

The default html.parser is in use. You can use a different parser if you want.

The soup object has the parsed HTML content. Now, parse the title, price, and rating from this object.

Title

Using a browser, inspect the HTML properties of the product title. To open the inspect tab, right-click on the product title and click inspect. You’ll see something similar to the image below:

Inspecting the HTML properties

According to the HTML property, write the following code to extract the title of this product:

title = soup.find(“h1”, {“data-hb-id”: “heading”}).text

Price

Inspect the price element and find the proper class attributes:

Inspecting the price element

price = soup.find(“div”, {“class”: “SFPrice”}).find(“span”, {“class”:”oakhm64z_6101”}).text

Rating

Similarly, you can parse the rating element with the following code:

rating = soup.find(“span”, {“class”: “ProductRatingNumberWithCount-rating”}).text

The class attribute of the span element is used to identify the rating element and extract the text content.

4. Exporting data

The product data is now parsed. Use Pandas to export the data in CSV and JSON formats. Next, create a list of dict objects with the parsed data and create a data frame:

import pandas as pd
data = [{
   “Product Title”: title,
   “Price”: price,
   “Rating”: rating,
   “Link”: product_url,
}]
df = pd.DataFrame(data)

Exporting data in CSV

Using the data frame object, export the data in a CSV file with a single line of code. Since you don’t need an index, set the index to False.

df.to_csv("product_data.csv", index=False)

Once you execute this function, the script will create a file named product_data.csv.

Exporting data in JSON

Similarly, use the data frame to export the data in JSON format. Pass an additional parameter, orient, to indicate the need for JSON data in records format.

df.to_json("product_data.json", orient="records")

The script will create another file named product_data.json in the current folder containing the exports.

Conclusion

Building a scraper that can send requests as an actual browser and mimic human browsing behavior is quite difficult. Also, you would have to maintain it and keep it up to date with constant changes. Such micromanagement requires in-depth knowledge and extensive scraping experience.

With Wayfair Scraper API, you can shift your focus where it matters most - data analysis - instead of dealing with technicalities.

For code samples showcased above, check our GitHub.

For more e-commerce scraping targets, follow Amazon, Google Shopping, Etsy, and Walmart guides. Oxylabs E-Commerce Scraper API enables you to quickly gather data from the top 50 marketplaces.

If you have questions or face issues, get in touch via the 24/7 live chat on our homepage or email us at support@oxylabs.io.

About the author

Augustas Pelakauskas

Senior Copywriter

Augustas Pelakauskas is a Senior Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures - the most recent one being writing. After testing his abilities in the field of freelance journalism, he transitioned to tech content creation. When at ease, he enjoys sunny outdoors and active recreation. As it turns out, his bicycle is his fourth best friend.

Learn more about Augustas Pelakauskas

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Scrapers Tutorials