Proxy locations

Europe

North America

South America

Asia

Africa

Oceania

See all locations

Network status Careers

hello@oxylabs.io

English (EN)

English

中文

Proxies

Proxies & Advanced Proxy Solutions

Residential Proxies

Human-like scraping without IP blocking

Mobile Proxies

Harness the power of IP addresses from real mobile devices

Rotating ISP Proxies

Extract the required data without the fear of getting blocked

Web Unblocker

AI-powered proxy solution for block-free scraping

Shared Datacenter Proxies

Fast and reliable proxies for cost-effective scraping

Dedicated Datacenter Proxies

The highest performing proxies on the market

Static Residential Proxies

Combined power of Datacenter and Residential IPs

Tools & Addons

Oxy Proxy Extension for Chrome

Free Chrome proxy manager extension that works with any proxy provider.

Oxy Proxy Manager for Android

Free Android proxy manager app that works with any proxy provider.

Proxy RotatorAdd-on

Rotates your Datacenter Proxies to help increase success rates.

Scraper APIs

SERP Scraper APIFREE TRIAL

Scalable SERP data delivery from major search engines

E-Commerce Scraper APIFREE TRIAL

Enterprise-level data from largest e-commerce marketplaces

Real Estate Scraper APIFREE TRIAL

Real-time data from popular real estate websites

Web Scraper APIFREE TRIAL

Public data delivery from a majority of websites

Features

Web Crawler

Discovers all pages on a website and fetches data at scale.

Scheduler

Schedules multiple scraping and parsing jobs at specified frequencies.

Custom Parser

Parses scraped documents by executing given parsing instructions.

Headless BrowserNEW

Render JavaScript and execute browser instructions.

DatasetsNew

Datasets

Company Data

Comprehensive datasets for business profiling

E-Commerce Product Data

Datasets for product catalog insights from E-Commerce stores

Job Postings Data

Datasets for labour market research and insights

Community and Code Data

Datasets for developer community trends

Product Review Data

Fresh datasets for user sentiment analysis

Pricing

Proxies

Residential Proxies

Human-like scraping

Starts from

$10

Pay as you go

Mobile Proxies

3G/4G/5G Mobile Proxies

Starts from

$22

Pay as you go

Rotating ISP Proxies

Extended sessions

Starts from

$340/month

Shared Datacenter Proxies

Cost-effective solution

Starts from

$50/month

Dedicated Datacenter Proxies

Superior performance

Starts from

$50/month

Scraper APIs

SERP Scraper API

Scalable SERP data delivery

Starts from

$49/month

E-Commerce Scraper API

Enterprise-level product page data

Starts from

$49/month

Web Scraper API

Data from a majority of websites

Starts from

$49/month

Real Estate Scraper API

Real-time real estate data

Starts from

$49/month

Advanced Proxy Solutions

Web Unblocker

AI-powered proxy solution

Starts from

$75/month

Learn

Getting Started

Knowledge Base

Read the latest articles about the world of web scraping, proxies, and more

Webinars

Check our webinars to learn more about data gathering issues and solutions

White papers

Get extensive white papers to understand the most complex scraping topics

OxyCon

Join inspiring discussions at Oxylabs’ annual web scraping conference

Scraping Experts

Watch lessons by industry-leading experts to gain insights on data gathering

Useful Information

Quick Start Guides

Featured

Explore tutorials and code samples to build a web scraping infrastructure with Oxylabs solutions.

Solutions

By Industry

E-Commerce

Get access to valuable e-commerce data with the help of advanced scraping solutions

Cybersecurity

Collect threat intelligence and inspect risky activities anonymously with reliable proxies

Brand protection

Monitor the web on a large scale to ensure no unauthorized product seeped into the market

SERP Monitoring

Monitor SERPs to enhance your business strategy

Travel and hospitality

Gather real-time flight and hotel data to and build a solid strategy for your travel business.

By Use Case

View all

By Target

View all

Back to blog

Scrapers Tutorials

How to Scrape Indeed Jobs Data

Danielius Radavicius

2023-12-154 min read

In an era where data drives decisions, accessing up-to-date job market information is crucial. Indeed.com, a leading job portal, offers extensive insights into job openings, popular roles, and company hiring trends. However, manually collecting this job data can be tedious and time-consuming. This is where web scraping comes in as a game-changer, and Oxylabs' Web Scraper API makes this task seamless, efficient, and reliable.

Why Scrape Indeed?

Scraping Indeed.com allows businesses, analysts, and job seekers to stay ahead in the competitive job market. From tracking the most popular jobs to understanding industry demands, the insights gained from job postings and job details on Indeed are invaluable. Automated data collection through scraping not only saves time but also provides a more comprehensive view of the job landscape. Job scraping is a technique widely used by HR professionals.

The Tool: Oxylabs’ Web Scraper API

Oxylabs' Web Scraper API is designed to handle complex web scraping tasks with ease. It bypasses anti-bot measures, ensuring you get the job data you need without interruption. Whether you're looking to scrape job titles, company names, or detailed job descriptions, Oxylabs simplifies the process.

This step-by-step tutorial will guide you through scraping job postings from Indeed.com, focusing on extracting key job details like job titles, descriptions, and company names.

Project Setup

You can find the following code on our GitHub.

Prerequisites

Before diving into the code to scrape indeed, ensure you have Python 3.8 or newer installed on your machine. This guide is written for Python 3.8+, so having a compatible version is crucial.

Creating a Virtual Environment

A virtual environment is an isolated space where you can install libraries and dependencies without affecting your global Python setup. It's a good practice to create one for each project. Here's how to set it up on different operating systems:

python -m venv indeed_env #Windows
python3 -m venv indeed_env #Mac and Linux

Replace indeed_env with the name you'd like to give to your virtual environment.

Activating the Virtual Environment

Once the virtual environment is created, you'll need to activate it:

.\indeed_env\Scripts\Activate #Windows
source indeed_env/bin/activate #Mac and Linux

You should see the name of your virtual environment in the terminal, indicating that it's active.

Installing Required Libraries

We'll use the requests library for this project to make HTTP requests. Install it by running the following command:

pip install requests, pandas

And there you have it! Your project environment is ready for Indeed data scraping using Oxylabs' Indeed Scraper API. In the following sections, look into the Indeed structure.

Overview of Web Scraper API

Oxylabs' Web Scraper API allows you to extract data from many complex websites easily.

The following is a simple example that shows how Scraper API works.

# scraper_api_demo.py
import requests

payload = {
    "source": "universal",
    "url": "https://www.indeed.com"
}

response = requests.post(
    url="https://realtime.oxylabs.io/v1/queries",
    json=payload,
    auth=("username", "password"),
)

print(response.json())

As you can see, the payload is where you would inform the API what and how you want to scrape.

Save this code in a file scraper_api_demo.py and run it. You will see that the entire HTML of the page will be printed, along with some additional information from Scraper API.

In the following section, let's examine various parameters we can send in the payload.

Scraper API Parameters

The most critical parameter is source. For IMDb, set the source as universal, a general-purpose source that can handle all domains.

The parameter url is self-explanatory, a direct link to the page you want to scrape.

The example code in the earlier section has only these two parameters. The result is, however, the entire HTML of the page.

Instead, what we need is parsed data. This is where the parameter parse comes into the picture. When you send parse as True, you must also send one more parameter —parsing_instructions. Combined, these two parameters allow you to get parsed data in any structure you like.

The following allows you to parse the page title and retrieve results in JSON:

"title": {
    "_fns": [
                {
                    "_fn": "xpath_one", 
                    "_args": ["//title/text()"]
                }
            ]
        }
},

The key _fns indicates a list of functions, which can contain one or more functions indicated by the "_fn" key, along with the arguments.

In this example, the function is xpath_one, which takes an XPath and returns one matching element. On the other hand, the function xpath returns all matching elements.

On similar lines are css_one and css functions that use CSS selectors instead of XPath.

For a complete list of available functions, see the Scraper API documentation.

The following code prints the title of the Indeed page:

# indeed_title.py
import requests

payload = {
    "source": "universal",
    "url": "https://www.indeed.com",
    "render": "html",
    "parse": True,
    "parsing_instructions": {
        "title": {
            "_fns": [
                {
                    "_fn": "xpath_one",
                    "_args": ["//title/text()"]
                 }
            ]
        }
    }
}

response = requests.post(
    url="https://realtime.oxylabs.io/v1/queries",
    json=payload,
    auth=("username", "password")
)

print(response.json()["results"][0]["content"])

Run this file to get the title of Indeed.

In the next section, we will scrape jobs from a list.

Scraping Indeed Job Postings

Before scraping a page, we need to examine the page structure.

Open the Job search results in Chrome, right-click the job listing, and select Inspect.

Move around your mouse until you can precisely select one job list item and related data.

You can use the following CSS selector to select one job listing:

.job_seen_beacon

We can iterate over each matching item and get the specific job data points such as job title, company name, location, salary range, date posted, and job description.

First, create the placeholder for job listing as follows:

payload = {
    "source": "universal",
    "url": "https://www.indeed.com/jobs?q=work+from+home&l=San+Francisco%2C+CA",
    "render": "html",
    "parse": True,
    "parsing_instructions": {
        "job_listings": {
            "_fns": [
                {
                    "_fn": "css",
                    "_args": [".job_seen_beacon"]
                }
            ],

Note the use of the function css. It means that it will return all matching elements.

Next, we can use reserved property _items to indicate that we want to iterate over a list, further processing each list item separately.

It will allow us to use concatenating to the path already defined as follows:

 "job_listings": {
            "_fns": [
                {
                    "_fn": "css",
                    "_args": [".job_seen_beacon"]
                }
            ],
            "_items": {
                "job_title": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": [".//h2[contains(@class,'jobTitle')]/a/span/text()"]
                        }
                    ]
                },
                "company_name": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": [".//span[@data-testid='company-name']/text()"]
                        }
                    ]
                },

Similarly, we can add other selectors. After adding other details, here are the job_search_payload.json file contents:

{
    "source": "universal",
    "url": "https://www.indeed.com/jobs?q=work+from+home&l=San+Francisco%2C+CA",
    "render": "html",
    "parse": True,
    "parsing_instructions": {
        "job_listings": {
            "_fns": [
                {
                    "_fn": "css",
                    "_args": [".job_seen_beacon"]
                }
            ],
            "_items": {
                "job_title": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": [".//h2[contains(@class,'jobTitle')]/a/span/text()"]
                        }
                    ]
                },
                "company_name": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": [".//span[@data-testid='company-name']/text()"]
                        }
                    ]
                },
                "location": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": [".//div[@data-testid='text-location']//text()"]
                        }
                    ]
                },
                "salary_range": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": [".//div[contains(@class, 'salary-snippet-container') or contains(@class, 'estimated-salary')]//text()"]
                        }
                    ]
                },
                "date_posted": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": [".//span[@class='date']/text()"]
                        }
                    ]
                },
                "job_description": {
                    "_fns": [
                        {
                            "_fn": "xpath_one",
                            "_args": ["normalize-space(.//div[@class='job-snippet'])"]
                        }
                    ]
                }
            }
        }
    }
}

A good way to organize your code is to save the payload as a separator JSON file. It will allow you to keep your Python file as short as follows:

# parse_jobs.py
import requests
import json

payload = {}
with open("job_search_payload.json") as f:
    payload = json.load(f)

response = requests.post(
    url="https://realtime.oxylabs.io/v1/queries",
    json=payload,
    auth=("username", "password"),
)

print(response.status_code)

with open("result.json", "w") as f:
    json.dump(response.json(), f, indent=4)

Exporting to JSON and CSV

The output of Scraper API is a JSON. You can save the extracted job listing as JSON directly.

You can use a library such as Pandas to save the job data as CSV.

Remember that the parsed data is stored in the content inside results.

As we created the job listings in the key job_listings, we can use the following snippet to save the extracted indeed data:

# parse_jobs.py
import pandas as pd

# save the indeed data as a json file and then save to CSV

df = pd.DataFrame(response.json()["results"][0]["content"]["job_listings"])
df.to_csv("job_search_results.csv", index=False)

Conclusion

Utilizing Web Scraper API to scrape Indeed data simplifies the task, whereas, without it, the job can be rather difficult and daunting. Notably, you can even use GUI tools such as Postman or Insomnia to scrape Indeed. You only need to send a post request to the API with the desired payload.

The detailed documentation on Web Scraper API is available here, and if you’d like to try our Web Scraper API, you can do so for free.

About the author

Danielius Radavicius

Copywriter

Danielius Radavičius is a Copywriter at Oxylabs. Having grown up in films, music, and books and having a keen interest in the defense industry, he decided to move his career toward tech-related subjects and quickly became interested in all things technology. In his free time, you'll probably find Danielius watching films, listening to music, and planning world domination.

Learn more about Danielius Radavicius

All information on Oxylabs Blog is provided on an "as is" basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Oxylabs Blog or any third-party websites that may be linked therein. Before engaging in scraping activities of any kind you should consult your legal advisors and carefully read the particular website's terms of service or receive a scraping license.

Tutorials Scrapers