Back to blog
Roberta Aukstikalnyte
Zillow is one of the largest real estate websites in the United States, with 200+ million visits per month. With a number this big, it’s no surprise that this website contains immense amounts of valuable information for real estate professionals. But to take advantage of this data, you’ll require a reliable web scraping solution. In today’s article, we’ll give an in-depth demonstration of how to use the Zillow data API to gather real estate listings data.
Before we get started with the actual steps, let’s answer one crucial question: is scraping property listings beneficial? Yes, in fact, it is beneficial due to several reasons:
Collecting bulk data
Automated web scraping tools allow you to easily gather large amounts of data from multiple sources. This way, you don’t have to spend hours of repetitive work; also, it would nearly be impossible to collect large volumes of data manually. And, as a real estate professional, you'll definitely need large quantities of data to make informed decisions.
Accessing data from various sources
Certain trends and patterns may not be apparent from a single source of data. That said, it would be wise to scrape data from several sources, including listings sites, property portals, and individual agent or broker websites. This way, you’ll be sure to get a more comprehensive view of the real estate market.
Detect new opportunities
Scraping real estate data can also help you identify opportunities and make more informed decisions. For example, as an investor, you can use scraped data to identify real estate properties that are undervalued or overvalued in order to make more profitable investment decisions.
Similarly, you can use scraped data to identify properties that are similar to your own listings – this way, you can determine the optimal pricing and marketing strategy.
Nothing good ever comes easy, and the process of scraping real estate websites is no exception. Let’s take a look at some of the common obstacles you may come across during the process:
Sophisticated dynamic layouts
Often, property websites use complex and dynamic web layouts. Because of that, it may be difficult for web scrapers to adapt and extract relevant information. As a result, the extracted data may be inaccurate or incomplete, requiring you to make fixes manually.
Advanced anti-scraping measures
Another common challenge is that many property websites use technologies like JavaScript, AJAX, and CAPTCHA. These technologies may prevent you from gathering the data or even result in an IP block, so you’ll need specific techniques to bypass them.
Questionable data quality
It’s no secret that property prices change rapidly; hence, there’s a risk of receiving outdated information that doesn’t reflect the present state of the real estate market.
Copyrighted data
All in all, the legality of web scraping is a largely debated topic. And, when it comes to scraping real estate websites, it’s no exception. The rule of thumb is if the data is considered publicly available, you should be able to scrape it. On the other hand, if the data is copyrighted, you should respect the rules and not scrape it. In general, it’s best if you consult a legal professional about your specific situation so you can be sure you’re not breaching any rules.
To gather Zillow data, we’ll be using Python to interact with the Real Estate Scraper API; however, you can choose a different programming language if you like.
We’ll begin by installing the latest version of Python. Once that’s done, you’ll need to install the following packages using Python's package manager pip:
python -m pip install requests bs4
The command above will install `requests` and `bs4` libraries. We’ll use these modules to interact with Real Estate Scraper API and parse all the extracted HTML files.
Before we start writing the code, let’s discuss the parameters of the API. Oxylabs’ Real Estate Scraper API requires only two parameters – source and url; the rest are optional. Let’s take a look at what they do:
source – to scrape Zillow data, this parameter needs to be set to universal;
url – a valid link to any Zillow page;
user_agent_type – sets the device type and browser;
geo_location – allows acquiring data from a specific location;
locale – sets the `Accept-Language` header based on this parameter;
render - enables JavaScript rendering.
In this section, we’ll build a web scraper that will allow us to extract data from Zillow search results. First, let’s import the necessary dependencies:
import requests
from bs4 import Beautifulsoup
Next, we’ll insert a search query and copy the URL. For this example, we’ll search for properties on sale, which gives us this URL: https://www.zillow.com/homes/for_sale/_rb/
Using this URL, we’ll create a payload:
url = "https://www.zillow.com/homes/for_sale/_rb/"
payload = {
'source': 'universal',
'url': url,
'user_agent_type': 'desktop',
}
Now, we’ll use the requests module to make a POST request to the API. We’ll store the result in the response variable:
response = requests.post(
'https://realtime.oxylabs.io/v1/queries',
auth=('USERNAME', 'PASSWORD'),
json=payload,
)
Notice, we’re passing a tuple with `username` and `password` – make sure to replace those with your own Oxylabs’ credentials. We’ll also send the payload as `json` .
Now, we’ll print the response code to validate that the request was sent successfully:
print(response.status_code)
Here, you should get a 200 status code; if you don’t, make sure your internet connection is working and you’ve entered the correct URL and credentials.
Next, we’ll parse the Zillow website’s HTML using the Beautiful Soup library. First, we need to grab the HTML from the json output of the API and then we’ll parse it:
content = response.json()['results'][0].get("content", "")
soup = BeautifulSoup(content, 'html.parser')
data = []
The soup object will contain the parsed HTML of the Zillow page. The rest of the task is easy – we’ll parse it just like any other normal HTML page. See below:
for div in soup.find("div", {
"class": "StyledPropertyCardDataArea-c11n-8-85-1__sc-yipmu-0"}):
price = div.find("span", {
"data-test": "property-card-price"
}).text
address = div.find("address", {
"data-test": "property-card-addr"
}).text
data.append({
"price": price,
"address": address,
})
Here, we’re simply looping over the search results and parsing each result to grab the address and price from the property page. We just inspect the HTML code for the HTML properties and select the proper tags using the find method of Beautiful Soup. Using the same technique, we can also extract various other properties as well.
Scraping individual listings
Now, let’s see how we can extract individual listings from Zillow. For our example, we’ll be using the link below, but feel free to replace it with your own: https://www.zillow.com/homedetails/3789-Conley-Downs-Ln-Decatur-GA-30034/14427531_zpid/
url =
"https://www.zillow.com/homedetails/3789-Conley-Downs-Ln-Decatur
-GA-30034/14427531_zpid/"
payload = {
'source': 'universal',
'url': url,
'user_agent_type': 'desktop',
}
We’ll also have to inspect the desired elements to find the specific HTML tags and attributes. We’ll do it with a web browser and parse the listing page with Beautiful Soup, using that information accordingly.
content = response.json()['results'][0].get("content", "")
soup = BeautifulSoup(content, 'html.parser')
price = soup.find("span", {'data-testid':
'price'}).find("span").text
address = soup.find("h1", {'class': 'qxgaF'}).text number_of_bed, size = [elem.find("strong").text for elem in soup.find_all("span", {'data-testid': 'bed-bath-item'})[:2]] status = soup.find("span", {'class': 'ixkFNb'}).text
property_data = {
"link": url,
"price": price,
"address": address,
"number of bed": number_of_bed,
"size (sqft)": size,
"status": status,
}
print(property_data)
We also can extract real estate agent data using the same API. For this, we’ll only need to slightly modify the search result scraper we built earlier and put the correct HTML tag and attributes.
For this example, we’ll use a URL to a list of real estate agents in the Decatur GA area:
https://www.zillow.com/professionals/real-estate-agent-reviews/decatur-ga/
import requests
from bs4 import Beautifulsoup
url =
"https://www.zillow.com/professionals/real-estate-agent-reviews/
decatur-ga/"
payload = {
'source': 'universal',
'url': url,
'user_agent_type': 'desktop',
}
response = requests.post(
'https://realtime.oxylabs.io/v1/queries',
auth=('USERNAME', 'PASSWORD'),
json=payload,
)
print(response.status_code)
content = response.json()['results'][0].get("content", "")
soup = BeautifulSoup(content, 'html.parser')
agents = []
for elem in soup.find("tr", {"class": "cUqKEI"}):
agent_name = elem.find("a", {"class": "jMHzWg"}).text agent_link = elem.find("a", {"class": "jMHzWg"}).get("href") phone = elem.find("div", {"class": "dlivvk"}).text
address = elem.find("div", {"class": "bmKuCz"}).text agents.append({
"Name": agent_name,
"Link": agent_link,
"Phone": phone,
"Address": address,
})
print(agents)
Once you run this code, it’ll use the Real Estate Scraper API and extract the list of real estate agents. Keep in mind that Zillow frequently changes the layout of its website and the HTML attributes. You might have to change a few class names or attributes if the above code stops working; it should be relatively simple though.
Due to the frequent layout changes and anti-bot measures, scraping Zillow can be rather challenging. Luckily, Oxylabs’ Zillow data scraper is designed to deal with these obstacles so you can scrape Zillow data successfully.
If you run into any questions or uncertainties, don’t hesitate to reach out to our support team via email or live chat on our website. Our professional team will gladly consult you about any matter related to scraping public data from Zillow.
About the author
Roberta Aukstikalnyte
Senior Content Manager
Roberta Aukstikalnyte is a Senior Content Manager at Oxylabs. Having worked various jobs in the tech industry, she especially enjoys finding ways to express complex ideas in simple ways through content. In her free time, Roberta unwinds by reading Ottessa Moshfegh's novels, going to boxing classes, and playing around with makeup.
All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.
Get the latest news from data gathering world
Try Zillow Scraper API
Choose Oxylabs' Zillow Scraper API to gather real estate data with no IP blocks.
Scale up your business with Oxylabs®
GET IN TOUCH
General:
hello@oxylabs.ioSupport:
support@oxylabs.ioCareer:
career@oxylabs.ioCertified data centers and upstream providers
Connect with us
Advanced proxy solutions
Resources
Innovation hub