Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Tutorials Scrapers

Scraping Real Estate Data With Python: Step-by-Step

Augustas Pelakauskas

2024-01-113 min read

Real estate brokerage platforms offer a huge variety of listings that are updated constantly, making them a good target for web scraping. That said, keeping up with the real estate market in real-time and at a large scale is challenging.

What is data scraping in real estate?

Data scraping in real estate is the extraction of data from online real estate listings. The extracted data includes property listings, prices, amenities, images, and more. Data scraping typically uses automated tools that navigate real estate websites and gather data from their pages.

In this guide, you’ll learn how to collect public property data from Redfin with the help of Oxylabs Real Estate Scraper API and Python. You can scrape real estate data like prices, sizes, number of beds and baths available, and addresses, increasing the likelihood of finding a good deal or understanding the market better.

You can find the following code on our GitHub.

1. Prepare environment

You can download the latest version of Python from the official website.

To store your Python code, run the following command to create a new Python file in your current directory.

touch main.py

Install dependencies

Next, run the command below to install the dependencies required for web scraping and data processing. Let’s use Requests, Beautiful Soup, and Pandas.

pip install bs4 requests pandas

Import libraries

Now, open the previously created Python file and import the installed libraries.

import requests
import pandas as pd
from bs4 import BeautifulSoup

2. Prepare the API request

After importing the libraries, the following step is to prepare the payload for the API request. Using Real Estate Scraper API, you’ll need to retrieve credentials for API authentication from the Oxylabs dashboard. Replace USERNAME and PASSWORD with your retrieved credentials.

USERNAME = "USERNAME"
PASSWORD = "PASSWORD"

payload = {
    "source": "universal",
    "url": "https://www.redfin.com/city/29470/IL/Chicago",
}

For this example, let’s scrape real estate listings in Chicago. You can replace the url value with a Redfin home listings page of your choosing. Make sure the source parameter is set to universal.

NOTE: You can find all the parameters and more samples in our documentation.

3. Send request

Use the declared credentials and payload to send a POST request to the API. Pass the credentials and payload to the auth and JSON parameters, respectively.

response = requests.post(
       url="https://realtime.oxylabs.io/v1/queries",
       auth=(USERNAME, PASSWORD),
       json=payload,
)
print(response.status_code)

If everything works as expected, you should see a 200 status code printed out in your terminal.

As a best practice, consider adding this line after your POST request. It guarantees that the data you receive from the API is what you expect instead of an error code.

response.raise_for_status()

Here’s the full code for sending the request:

response = requests.post(
    url="https://realtime.oxylabs.io/v1/queries",
    auth=(USERNAME, PASSWORD),
    json=payload,
)
response.raise_for_status()
print(response.status_code)

If you get an HTTPException, check if your payload and credentials are correct.

4. Extract HTML

The API response comes back in JSON format, equivalent to a dictionary in Python. You can use the response object and the Beautiful Soup library to extract the HTML content as follows.

html = response.json()["results"][0]["content"]
soup = BeautifulSoup(html, "html.parser")

The Soup object will be used to extract the necessary data from the HTML content. Use CSS selectors to select specific data from the HTML content.

5. Parse data from HTML

Start with collecting every home listing found on the page. Navigate to the web page, right-click on the part of the listing that includes the data you need, and click Inspect.

You should see that the parent element has a class called bottomV2. Use it to select each listing from the HTML content.

data = []
for listing in soup.find_all("div", {"class": "bottomV2"}):
	...

For cleaner code, create a function called extract_data_from_listing and write your data extraction code there. The function should accept the HTML content of the listing as an argument and return a dictionary containing the extracted data.

def extract_data_from_listing(listing):
	...

Next, implement the created function.

By inspecting the price and address fields, you can see that they’re both span elements with homecardV2 and collapsedAddress classes, respectively. Let’s use them to retrieve the values.

price = listing.find("span", {"class": "homecardV2Price"}).get_text(strip=True)
address = listing.find("span", {"class": "collapsedAddress"}).get_text(strip=True)

For the rest of the fields, you can see that they all contain the same stats class.

Use this class to select all elements at once and parse them separately.

stats = listing.find_all("div", {"class":"stats"})
try:
    bed_count_elem, bath_count_elem, size_elem = stats[0], stats[1], stats[2]
except IndexError:
    raise Exception("Got less stats than expected")

bed_count = bed_count_elem.get_text(strip=True)
bath_count = bath_count_elem.get_text(strip=True)
size = size_elem.get_text(strip=True)

If Redfin’s page structure changes, raise an exception to know what went wrong.

After parsing each data entry, construct dictionaries and append them to the previously declared list.

entry = {
    "price": price,
    "address": address,
    "bed_count": bed_count,
    "bath_count": bath_count,
    "size": size,
}
data.append(entry)

Here’s the full code for extracting data from the HTML content:

def extract_data_from_listing(listing):
    price = listing.find("span", {"class": "homecardV2Price"}).get_text(strip=True)
    address = listing.find("span", {"class": "collapsedAddress"}).get_text(strip=True)
    stats = listing.find_all("div", {"class": "stats"})
    try:
        bed_count_elem, bath_count_elem, size_elem = stats[0], stats[1], stats[2]
    except IndexError:
        raise Exception("Got less stats than expected")

    bed_count = bed_count_elem.get_text(strip=True)
    bath_count = bath_count_elem.get_text(strip=True)
    size = size_elem.get_text(strip=True)

    return {
        "price": price,
        "address": address,
        "bed_count": bed_count,
        "bath_count": bath_count,
        "size": size,
    }


data = []

for listing in soup.find_all("div", {"class": "bottomV2"}):
    entry = extract_data_from_listing(listing)
    data.append(entry)

6. Save to CSV

Lastly, dump your collected data into a CSV file using pandas.

df = pd.DataFrame(data)
df.to_csv("real_estate_data.csv")

The complete code

Here’s the full code for scraping real estate data from Redfin with Oxylabs Real Estate Scraper API:

import requests
import pandas as pd
from bs4 import BeautifulSoup


def extract_data_from_listing(listing):
    price = listing.find("span", {"class": "homecardV2Price"}).get_text(strip=True)
    address = listing.find("span", {"class": "collapsedAddress"}).get_text(strip=True)
    stats = listing.find_all("div", {"class": "stats"})
    try:
        bed_count_elem, bath_count_elem, size_elem = stats[0], stats[1], stats[2]
    except IndexError:
        raise Exception("Got less stats than expected")

    bed_count = bed_count_elem.get_text(strip=True)
    bath_count = bath_count_elem.get_text(strip=True)
    size = size_elem.get_text(strip=True)

    return {
        "price": price,
        "address": address,
        "bed_count": bed_count,
        "bath_count": bath_count,
        "size": size,
    }


USERNAME = "USERNAME"
PASSWORD = "PASSWORD"

payload = {
    "source": "universal",
    "url": "https://www.redfin.com/city/29470/IL/Chicago",
}

response = requests.post(
    url="https://realtime.oxylabs.io/v1/queries",
    auth=(USERNAME, PASSWORD),
    json=payload,
)
response.raise_for_status()

html = response.json()["results"][0]["content"]
soup = BeautifulSoup(html, "html.parser")

data = []

for listing in soup.find_all("div", {"class": "bottomV2"}):
    entry = extract_data_from_listing(listing)
    data.append(entry)


df = pd.DataFrame(data)
df.to_csv("real_estate_data.csv")

Final word

As results prove, using Python along with Redfin Scraper API is a seamless way to automate real estate data collection processes required for insights into the real estate market.

Oxylabs Scraper API enables you to extract data from Redfin and bypass typical challenges associated with web scraping. Please refer to our technical documentation for more on the API parameters and variables discussed in this tutorial.

Under Oxylabs’ real estate umbrella, you can find more target-tailored scrapers: Zillow Data API, Zoopla Scraper, and MLS Scraper API.

For more tutorials on popular targets like Amazon, Zillow, Craigslist, IMDb, and many others, check our blog.

If you have any questions, feel free to reach out by sending a message to support@oxylabs.io or live chat.

About the author

Augustas Pelakauskas

Senior Copywriter

Augustas Pelakauskas is a Senior Copywriter at Oxylabs. Coming from an artistic background, he is deeply invested in various creative ventures - the most recent one being writing. After testing his abilities in the field of freelance journalism, he transitioned to tech content creation. When at ease, he enjoys sunny outdoors and active recreation. As it turns out, his bicycle is his fourth best friend.

Learn more about Augustas Pelakauskas

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Scrapers Tutorials