Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Tutorials Scrapers

How to Scrape Amazon Reviews With Python

Enrika Pavlovskytė

2023-12-074 min read

As sellers pack the digital shelves with goods, customers become fickle and quickly change between brands and items in search of something that meets their expectations best. They’re also more vocal about product experiences, often sharing feedback to help other consumers decide on their next purchase. For companies, this uncovers an excellent opportunity to tune into customers’ needs and improve their products accordingly.

In this blog post, we want to shed more light on scraping reviews from one of the biggest e-commerce sites — Amazon. We’ve already explored such topics as Amazon scraping and automated Amazon price tracking. This time, we'll present two approaches to capturing customer feedback from reviews: a custom-built Amazon review scraper and an automated solution.

Let's get to it!

Setting up

For this tutorial, you'll be using Python, so make sure you have Python 3.8 or above installed and three packages — Requests, Pandas, and Beautiful Soup. We've detailed the installation process in our previous blog post.

After that, start by importing all the necessary libraries and creating a header.

import requests
from bs4 import BeautifulSoup
import pandas as pd

custom_headers = {
    "accept-language": "en-GB,en;q=0.9",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15",
}

Implementing custom headers is a crucial step that ensures you don’t get blocked while scraping Amazon reviews — we’ve covered this aspect in detail in our product scraping blog post.

Getting the review objects

Now that you're ready to start scraping get all the review objects and extract the information you'll need from them. You'll need to find a CSS selector for the product reviews and then use the .select method to extract all of them.

You can use this selector to identify the Amazon reviews:

div.review

And the following code to collect them:

review_elements = soup.select("div.review")

This will leave you with an array of all the reviews over which you'll iterate and gather the required information.

You need an array where you can add the processed reviews and a for loop to start iterating:

scraped_reviews = []
   for review in review_elements:

Author name

The first in our list is the author's name. Use the following CSS selector to select the name:

span.a-profile-name

Also, you can collect the names in plain text with the following snippet:

r_author_element = review.select_one("span.a-profile-name")
r_author = r_author_element.text if r_author_element else None

Review rating

The next thing to extract is the review rating. It can be located with the following CSS:

i.review-rating

The rating string has some extra text that you won’t need, so let’s remove that:

r_rating_element = review.select_one("i.review-rating")
r_rating = r_rating_element.text.replace("out of 5 stars", "") if r_rating_element else None

Title

To get the title of the review, use this selector:

a.review-title

Getting the actual title text will require you to specify the span as shown below:

r_title_element = review.select_one("a.review-title")
r_title_span_element = r_title_element.select_one("span:not([class])") if r_title_element else None
r_title = r_title_span_element.text if r_title_span_element else None

Review text

The review text itself can be found with the following selector:

span.review-text

You can then scrape Amazon review text accordingly:

r_content_element = review.select_one("span.review-text")
r_content = r_content_element.text if r_content_element else None

Date

One more thing to fetch from the review is the date. It can be located using the following CSS selector:

span.review-date

Here’s the code that fetches the date value from the object:

r_date_element = review.select_one("span.review-date")
r_date = r_date_element.text if r_date_element else None

Verification

Another thing you can do is check if the review is verified or not. The object holding this information can be accessed with this selector:

span.a-size-mini

And extracted using the following code:

r_verified_element = review.select_one("span.a-size-mini")
r_verified = r_verified_element.text if r_verified_element else None

Images

Finally, if any pictures are added to the review, you can get their URLs with this selector:

img.review-image-tile

And then extract them with the following code:

r_image_element = review.select_one("img.review-image-tile")
r_image = r_image_element.attrs["src"] if r_image_element else None

Now that you have all this information gathered assemble it into a single object. Then, add that object to the array of reviews for this product that you’ve created before starting our for loop:

r = {
"author": r_author,
"rating": r_rating,
"title": r_title,
"content": r_content,
"date": r_date,
"verified": r_verified,
"image_url": r_image
}

scraped_reviews.append(r)

Exporting data

When you already have all the data scraped, the last thing to do is to export it to a file. You can export the data in CSV format using the code below:

search_url = "https://www.amazon.com/BERIBES-Cancelling-Transparent-Soft-Earpads-Charging-Black/product-reviews/B0CDC4X65Q/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews"
soup = get_soup(search_url)
reviews = get_reviews(soup)
df = pd.DataFrame(data=reviews)

df.to_csv("amz.csv")

After running the script, you'll see your data in the file amz.csv:

Here’s the final script:

import requests
from bs4 import BeautifulSoup
import pandas as pd

custom_headers = {
"Accept-language": "en-GB,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Cache-Control": "max-age=0",
"Connection": "keep-alive",
"User-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15",
}

def get_soup(url):
response = requests.get(url, headers=custom_headers)
if response.status_code != 200:
print("Error in getting webpage")
exit(-1)

soup = BeautifulSoup(response.text, "lxml")

return soup

def get_reviews(soup):
review_elements = soup.select("div.review")

scraped_reviews = []

for review in review_elements:
r_author_element = review.select_one("span.a-profile-name")
r_author = r_author_element.text if r_author_element else None

r_rating_element = review.select_one("i.review-rating")
r_rating = r_rating_element.text.replace("out of 5 stars", "") if r_rating_element else None

r_title_element = review.select_one("a.review-title")
r_title_span_element = r_title_element.select_one("span:not([class])") if r_title_element else None
r_title = r_title_span_element.text if r_title_span_element else None

r_content_element = review.select_one("span.review-text")
r_content = r_content_element.text if r_content_element else None

r_date_element = review.select_one("span.review-date")
r_date = r_date_element.text if r_date_element else None

r_verified_element = review.select_one("span.a-size-mini")
r_verified = r_verified_element.text if r_verified_element else None

r_image_element = review.select_one("img.review-image-tile")
r_image = r_image_element.attrs["src"] if r_image_element else None

r = {
"author": r_author,
"rating": r_rating,
"title": r_title,
"content": r_content,
"date": r_date,
"verified": r_verified,
"image_url": r_image
}

scraped_reviews.append(r)

return scraped_reviews

def main():
search_url = "https://www.amazon.com/BERIBES-Cancelling-Transparent-Soft-Earpads-Charging-Black/product-reviews/B0CDC4X65Q/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews"
soup = get_soup(search_url)
data = get_reviews(soup)
df = pd.DataFrame(data=data)

df.to_csv("amz.csv")

if __name__ == '__main__':
main()

Scrape Amazon product reviews with an API

As an alternative to building your own scraper, you can also look into some ready-made solutions like Amazon Scraper API. For instance, our Scraper API is specifically designed to deal with various Amazon data sources, including Amazon review data. It also boasts additional features like:

Product data localization in 195 locations worldwide;
Results delivered in raw HTML or structured JSON formats;
Convenient automation features like bulk scraping and automated jobs;
Maintenance-free web scraping infrastructure.

Let's check out Amazon Review Scraper API.

Setting up payload

Start by creating a new file and setting up a payload. You can use our amazon_reviews data source and provide the product ASIN in the payload, for example:

import requests
from pprint import pprint

payload = {
    'source': 'amazon_reviews',
    'domain': 'com',
    'query': 'B098FKXT8L',
    'start_page': 1,
    'pages': 3,
    'parse': True
}

Also, the above payload instructs Amazon Scraper API to start from the first page and scrape three pages in total. If you set parse to True, you’ll get structured data.

Send a POST request

Once the payload is done, create the request by passing your authentication key.

# Get response
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('USERNAME', 'PASSWORD'),
    json=payload,
)

Print the response

Then, simply print the response:

# Print prettified response to stdout.
pprint(response.json())

This is how the full code should look like:

import requests
from pprint import pprint

# Structure payload.
payload = {
    'source': 'amazon_reviews',
    'domain': 'com',
    'query': 'B098FKXT8L',
    'start_page': 1,
    'pages': 3,
    'parse': True
}

# Get response
response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('USERNAME', 'PASSWORD'),
    json=payload,
)

# Print prettified response to stdout.
pprint(response.json())

Here, you can see a snapshot of one of the reviews in the output:

Conclusion

There are multiple approaches to scrape Amazon product reviews. While a custom scraper will give you more flexibility, a commercial choice like Amazon Scraper API will significantly save time and effort. You can also check out datasets, if you decide that getting read-to-use data is enough to satisfy your needs.

If you found this article helpful, be sure to check out our blog for resources on scraping Best Buy, Wayfair, or eBay.

About the author

Enrika Pavlovskytė

Copywriter

Enrika Pavlovskytė is a Copywriter at Oxylabs. With a background in digital heritage research, she became increasingly fascinated with innovative technologies and started transitioning into the tech world. On her days off, you might find her camping in the wilderness and, perhaps, trying to befriend a fox! Even so, she would never pass up a chance to binge-watch old horror movies on the couch.

Learn more about Enrika Pavlovskytė

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Tutorials Scrapers