Scraping Amazon Product Data Using Python: Step-by-Step Guide

This comprehensive guide will teach you how to scrape Amazon product data using Python. Whether you’re an eCommerce professional, researcher, or developer, you’ll learn to create a solution to extract valuable insights from Amazon’s marketplace. By following this guide, you’ll acquire practical knowledge on setting up your scraping environment, overcoming common challenges, and efficiently collecting the needed data.

Zilvinas Tamulis

Mar 27, 2025

15 min read

Twitter

Facebook

Understanding Amazon scraping

Scraping Amazon product data can be incredibly useful for an eCommerce business. Automated data extraction lets you uncover trends in consumer behavior, gauge product demand, and perform pricing analysis. With the right techniques, you can monitor competitors by analyzing product details and customer reviews, giving you a clear advantage in a competitive market.

That said, scraping Amazon isn’t without its challenges. The website employs various methods, like CAPTCHAs, rate limiting, and even IP bans, to discourage automated access. Overcoming these issues requires a careful approach. Using methods such as rotating user agents, introducing delays between requests, and leveraging advanced tools like Selenium for dynamic content can help you build a more resilient scraper.

Benefits of Amazon scraping

Scraping Amazon product data with Python provides businesses and researchers with a powerful way to access valuable market insights. By automating data collection, you can gather large amounts of product information efficiently, eliminating the need for manual entry and reducing human errors.

One key advantage is cost efficiency. Automation allows for scalable data extraction without requiring additional labor. Additionally, scraped data can be integrated into internal systems, enabling advanced analytics, machine learning models, and predictive insights for strategic decision-making.

Another significant benefit is real-time monitoring. By continuously tracking product details such as pricing, inventory levels, and customer feedback, businesses can adjust pricing strategies dynamically and respond swiftly to market trends.

Overall, leveraging Python for Amazon scraping streamlines data collection, enhances analytical capabilities, and provides businesses with a competitive edge in eCommerce.

Step-by-step guide to scraping Amazon product data

There are a few steps before starting to collect real-time data from any Amazon product page. So, what are we waiting for? Let's get our hands dirty!

Set up prerequisites for scraping

Before diving into the code, ensure you have the right knowledge and tools to scrape Amazon product data effectively. There are some basics you need to know before starting with the code:

Familiarity with Python programming;
Understanding the core structure of HTML and the organization of web content;
Insight into HTTP request mechanisms and how browsers communicate with web servers.

And you'll need a few tools in your arsenal:

Python 3.x;
IDE or code editor – use tools like Visual Studio Code, PyCharm, or any editor you prefer;
Libraries, including requests for sending HTTP requests, BeautifulSoup (from the bs4 package) for parsing HTML, pandas for organizing and analyzing data. Optionally, Selenium if you need to handle dynamic content or more complex scraping tasks;

Browser developer tools – get comfortable using your browser’s Inspect tool to examine the HTML structure of Amazon pages;
Optional tools – a virtual environment (using venv or virtualenv) to manage your project dependencies and headless browser drivers (like ChromeDriver) if you plan to use Selenium.

Step 1: Install Python and set up your environment

Begin by installing setting up Python:

Download Python. Install the latest version of Python 3.x from python.org.
Add Python to PATH. Ensure Python is added to your system’s PATH during installation by clicking the Add Python to system's PATH checkbox.
Verify the installation. To execute the command below, use an IDE built-in terminal or standalone:

python --version

Upgrade pip. Python’s package manager (pip) should be updated to install libraries smoothly:

python -m ensurepip --upgrade

(Optional) Set up a virtual environment. A virtual environment helps manage dependencies without interfering with global packages:

python -m venv venv

Activate it with the following command on Windows:

venv\Scripts\activate

On MacOS/Linux:

source venv/bin/activate

Step 2: Install required libraries

On its own, Python isn't able to perform web scraping. You'll need to install the essential libraries using pip:

Install Requests, Beautiful Soup, and Pandas. You'll need these 3 libraries to make HTTP requests, parse data, and analyze it:

python -m pip install requests beautifulsoup4 pandas

Install Selenium. If you plan to scrape dynamic content, you may also need Selenium:

python -m pip install selenium

Step 3: Create your Python script

It's time to put the installed tools to use and write the Python script. The scraper will extract specific elements, such as the product title and price.

Create a new Python file. Open your code editor and create a new file named amazon_scraper.py.
Import libraries. The Requests library retrieves the content of a webpage, while BeautifulSoup processes the HTML structure:

import requests
from bs4 import BeautifulSoup

Set the target URL. Pick a URL for scraping (for example, an Amazon product page):

url = "https://www.amazon.com/dp/B09FT3KWJZ/"

Define headers. These help your script mimic a real browser:

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1"
}

Send an HTTP request. This line sends a request to the Amazon product page as a real browser would:

response = requests.get(url, headers=headers)

Implement error control. Ensure that the program stops if the request fails:

if response.status_code != 200:
    print("Failed to fetch the page. Status code:", response.status_code)
    exit()

Parse the HTML content. Use Beautiful Soup to parse the scraped content:

soup = BeautifulSoup(response.content, "html.parser")

Extract the product title and price. Target specific elements by their id and class names:

title = soup.find("span", id="productTitle")
price = soup.find("span", class_="a-price-whole")

Ensure correct price format. The fraction of the price is located in an element with a different class name, so they'll need to be combined together:

price_fraction = soup.find("span", class_="a-price-fraction")

if price and price_fraction:
    price = f"{price.text.strip()}{price_fraction.text.strip()}"

Display results. Print the product title and price results in the terminal:

print("Product Title:", title.text.strip() if title else "N/A")
print("Price:", price if price else "N/A")

Complete code example:

import requests
from bs4 import BeautifulSoup

url = "https://www.amazon.com/dp/B09FT3KWJZ/"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1"
}

response = requests.get(url, headers=headers)

if response.status_code != 200:
    print("Failed to fetch the page. Status code:", response.status_code)
    exit()

soup = BeautifulSoup(response.content, "html.parser")

title = soup.find("span", id="productTitle")
price = soup.find("span", class_="a-price-whole")\

price_fraction = soup.find("span", class_="a-price-fraction")

if price and price_fraction:
    price = f"{price.text.strip()}{price_fraction.text.strip()}"

print("Product Title:", title.text.strip() if title else "N/A")
print("Price:", price if price else "N/A")

import requests
from bs4 import BeautifulSoup

url = "https://www.amazon.com/dp/B09FT3KWJZ/"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1"
}

response = requests.get(url, headers=headers)

if response.status_code != 200:
    print("Failed to fetch the page. Status code:", response.status_code)
    exit()

soup = BeautifulSoup(response.content, "html.parser")

title = soup.find("span", id="productTitle")
price = soup.find("span", class_="a-price-whole")\

price_fraction = soup.find("span", class_="a-price-fraction")

if price and price_fraction:
    price = f"{price.text.strip()}{price_fraction.text.strip()}"

print("Product Title:", title.text.strip() if title else "N/A")
print("Price:", price if price else "N/A")

Step 4: Run Your Scraper

To execute the script, use a terminal to open the folder and run it:

cd path/project_folder
python amazon_scraper.py

Advanced techniques in scraping Amazon product data

In this section, we’ll explore more advanced techniques using Selenium for handling dynamic content, Beautiful Soup for parsing HTML and XML documents, and Pandas for organizing your data.

Advanced Beautiful Soup techniques

Beyond basic data extraction, BeautifulSoup offers powerful features to help you tackle more complex scraping challenges. Here are some advanced techniques you might find useful when scraping Amazon product data:

CSS selectors

Use the select() method to locate elements using CSS-style selectors. This approach allows for the precise targeting of nested elements. For example, you can quickly find elements by their classes or IDs without chaining multiple find() or find_all() calls.

from bs4 import BeautifulSoup

html = """
<div>
  <div class="product">
    <span id="title">Amazon Echo</span>
    <span class="price">$99.99</span>
  </div>
</div>
"""

soup = BeautifulSoup(html, "html.parser")
product_title = soup.select("div.product > span#title")

print(product_title[0].text if product_title else "Title not found")

Regular expressions

Sometimes, the attributes or text content you're trying to extract vary dynamically. By combining BeautifulSoup’s search functions with Python’s re module, you can match patterns in element attributes or text. This is particularly useful when dealing with elements that have dynamically generated class names.

import re
from bs4 import BeautifulSoup

html = """
<div class="product">
  <span class="title-123">Product Name</span>
</div>
"""

soup = BeautifulSoup(html, "html.parser")
pattern = re.compile(r"title-\d+")
title = soup.find("span", class_=pattern)

print(title.text if title else "Title not found")

Lambda functions for custom filtering

When the built-in filtering options in Beautiful Soup are not sufficient, you can use a lambda function with the find_all() method. A lambda function is an anonymous function that can be defined inline to apply custom filtering logic. This allows you to filter elements based on specific conditions, such as attributes or content, that would be difficult to address with the standard filtering methods.

from bs4 import BeautifulSoup

html = """
<div class="product" data-price="10.99">
  <span class="title">Product 1</span>
</div>
<div class="product" data-price="19.99">
  <span class="title">Product 2</span>
</div>
"""

soup = BeautifulSoup(html, "html.parser")

expensive_products = soup.find_all(lambda tag: tag.name == "div" and tag.get("data-price") and float(tag.get("data-price")) > 15)

for product in expensive_products:
    title = product.find("span", class_="title")
    print(title.text if title else "No title found")

from bs4 import BeautifulSoup

html = """
<div class="product" data-price="10.99">
  <span class="title">Product 1</span>
</div>
<div class="product" data-price="19.99">
  <span class="title">Product 2</span>
</div>
"""

soup = BeautifulSoup(html, "html.parser")

expensive_products = soup.find_all(lambda tag: tag.name == "div" and tag.get("data-price") and float(tag.get("data-price")) > 15)

for product in expensive_products:
    title = product.find("span", class_="title")
    print(title.text if title else "No title found")

Using SoupStrainer

If you are working with large HTML documents, you can speed up parsing and reduce memory usage by using a SoupStrainer. A SoupStrainer allows you to focus on a subset of the document, ignoring parts that are irrelevant to your task. This is particularly useful when you know exactly which elements or sections of the HTML are important and you want to avoid loading unnecessary data.

from bs4 import BeautifulSoup, SoupStrainer

html = """
<html>
  <body>
    <div class="irrelevant">Ignore this content</div>
    <div class="product">
      <span class="title">Product Name</span>
    </div>
  </body>
</html>
"""

only_product = SoupStrainer("div", class_="product")
soup = BeautifulSoup(html, "html.parser", parse_only=only_product)

print(soup.prettify())

Navigating the parse tree

Advanced usage often involves navigating the tree structure of the HTML document. You can move up and down the tree using parent, children, and sibling relationships. This is helpful when data is deeply nested or spread across different parts of the document.

from bs4 import BeautifulSoup

html = """
<div class="product">
  <span class="title">Product Name</span>
  <span class="price">$19.99</span>
</div>
"""

soup = BeautifulSoup(html, "html.parser")
product_div = soup.find("div", class_="product")

title = product_div.find("span", class_="title")

price = title.find_next_sibling("span", class_="price")

print("Title:", title.text)
print("Price:", price.text)

Parser options

Beautiful Soup supports different parsers like lxml and html5lib. Switching to a parser that best fits the structure of the target page may improve performance or accuracy when dealing with malformed HTML.

from bs4 import BeautifulSoup

html = "<html><head><title>Test</title></head><body><p>Example content</p></body></html>"

soup_lxml = BeautifulSoup(html, "lxml")
print("LXML Parser Title:", soup_lxml.title.text)

soup_html5lib = BeautifulSoup(html, "html5lib")
print("html5lib Parser Title:", soup_html5lib.title.text)

By leveraging these advanced techniques, you can tailor BeautifulSoup to handle the complex and often dynamic HTML structures found on Amazon, ensuring that you extract the precise data you need for in-depth analysis.

Advanced Selenium techniques

Selenium is a powerful tool for pages that load content dynamically. It simulates a real browser, allowing you to capture data that isn’t immediately available through a simple HTTP request. Here's a step-by-step guide to set up a simple script:

Configure Chrome options for headless browsing:

chrome_options = Options()
chrome_options.add_argument("--headless")

Initialize the webdriver (ensure ChromeDriver is in your PATH):

driver = webdriver.Chrome(options=chrome_options)

Navigate to the Amazon product page:

driver.get("https://www.amazon.com/dp/B09FT3KWJZ/")
driver.implicitly_wait(5)  # Wait for the page to load

Get the page source and parse with BeautifulSoup:

page_source = driver.page_source
soup = BeautifulSoup(page_source, "html.parser")

title = soup.find(id="productTitle")

print("Product Title:", title.text.strip() if title else "N/A")
driver.quit()

The described parts of the code set up a headless Chrome browser, which allows the script to run without opening a visible window. By configuring the browser in headless mode, it simulates a user browsing session efficiently. The WebDriver is initialized with these settings and navigates to an Amazon product page, employing an implicit wait to ensure that dynamic content has time to load. Once the page is fully rendered, the entire HTML source is retrieved and parsed using BeautifulSoup. The script then searches for the element containing the product title, extracts its text (while handling cases where the element might be missing), prints the result, and finally quits the browser to free up system resources.

Complete code example:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

# Configure Chrome options for headless browsing
chrome_options = Options()
chrome_options.add_argument("--headless")

# Initialize the webdriver (ensure ChromeDriver is in your PATH)
driver = webdriver.Chrome(options=chrome_options)

# Navigate to the Amazon product page
driver.get("https://www.amazon.com/dp/B09FT3KWJZ/")
driver.implicitly_wait(5)  # Wait for the page to load

# Get the page source and parse with BeautifulSoup
page_source = driver.page_source
soup = BeautifulSoup(page_source, "html.parser")

title = soup.find(id="productTitle")
print("Product Title:", title.text.strip() if title else "N/A")

driver.quit()

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

# Configure Chrome options for headless browsing
chrome_options = Options()
chrome_options.add_argument("--headless")

# Initialize the webdriver (ensure ChromeDriver is in your PATH)
driver = webdriver.Chrome(options=chrome_options)

# Navigate to the Amazon product page
driver.get("https://www.amazon.com/dp/B09FT3KWJZ/")
driver.implicitly_wait(5)  # Wait for the page to load

# Get the page source and parse with BeautifulSoup
page_source = driver.page_source
soup = BeautifulSoup(page_source, "html.parser")

title = soup.find(id="productTitle")
print("Product Title:", title.text.strip() if title else "N/A")

driver.quit()

Organizing data with Pandas

After extracting data, you might want to store it for further analysis. The pandas library is excellent for creating structured data frames and exporting your results, for example, to a CSV file. Here's how:

Extract data:

html = "<div class='product'><span class='title'>Product 1</span><span
class='price'>$10.99</span></div>"
soup = BeautifulSoup(html, 'html.parser')

Extract the title and price:

title = soup.find('span', class_='title')
price = soup.find('span', class_='price')

Organize the extracted data into a dictionary:

data = {
    "Title": [title.text.strip() if title else "N/A"],
    "Price": [price.text.strip() if price else "N/A"]
}

Create a DataFrame using pandas:

df = pd.DataFrame(data)

Export the DataFrame to a CSV file:

df.to_csv("amazon_product_data.csv", index=False)

Described parts of the code present how to take a small piece of HTML, parse it with BeautifulSoup to extract specific data elements (in this case, the product title and price), and then organize that data into a structured format using pandas. It starts by creating a BeautifulSoup object from a string of HTML, then locates the elements containing the title and price. The extracted text is cleaned up (using strip()) and stored in a dictionary with keys corresponding to the column names for a DataFrame. Finally, a pandas DataFrame is created from the dictionary and exported to a CSV file without the index column, making it easy to integrate the data into other systems or analysis workflows.

Full Python script:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.amazon.com/dp/B09FT3KWJZ/"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
    "Accept-Encoding": "gzip, deflate",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Upgrade-Insecure-Requests": "1"
}

response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.content, "html.parser")
title = soup.find(id="productTitle")
price = soup.find("span", class_="a-offscreen")

data = {
    "Title": [title.text.strip() if title else "N/A"],
    "Price": [price.text.strip() if price else "N/A"]
}

df = pd.DataFrame(data)

df.to_csv("amazon_product_data.csv", index=False)

print(df)

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.amazon.com/dp/B09FT3KWJZ/"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
    "Accept-Encoding": "gzip, deflate",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Upgrade-Insecure-Requests": "1"
}

response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.content, "html.parser")
title = soup.find(id="productTitle")
price = soup.find("span", class_="a-offscreen")

data = {
    "Title": [title.text.strip() if title else "N/A"],
    "Price": [price.text.strip() if price else "N/A"]
}

df = pd.DataFrame(data)

df.to_csv("amazon_product_data.csv", index=False)

print(df)

Scraping Amazon product data without coding

Extracting product data from Amazon is essential for market analysis, price comparison, and inventory management. For those without programming expertise, no-code tools and scraping APIs provide accessible solutions to gather this information efficiently.

No-code web scraping tools

No-code web scraping platforms enable users to extract data from websites like Amazon without writing any code. These tools offer user-friendly interfaces where you can define the data points to extract, such as product titles, prices, and reviews. For instance, Octoparse provides pre-built templates specifically designed for scraping Amazon product data. By inputting parameters like product categories or keywords, users can quickly gather structured data for analysis.

eCommerce Scraping APIs

eCommerce scraping APIs offer another code-free approach to collect Amazon product data. These APIs handle the complexities of data extraction, delivering structured information in formats like JSON or CSV. For example, Smartproxy's Amazon Scraper API allows users to retrieve product listings, prices, and offers by simply making API requests. This method ensures accurate and up-to-date data collection without the need for manual coding.

Wrapping up

Scraping Amazon data is a powerful strategy for eCommerce businesses seeking to enhance their market position. By systematically collecting and analyzing product details, pricing, customer reviews, and competitor information, businesses can gain valuable insights that drive informed decision-making. This process enables effective price comparison, accurate demand forecasting, and the identification of emerging market trends. Implementing web scraping techniques, whether through coding or utilizing specialized tools, equips businesses with the data necessary to adapt and thrive in the competitive eCommerce landscape.

About the author

Zilvinas Tamulis

Technical Copywriter

A technical writer with over 4 years of experience, Žilvinas blends his studies in Multimedia & Computer Design with practical expertise in creating user manuals, guides, and technical documentation. His work includes developing web projects used by hundreds daily, drawing from hands-on experience with JavaScript, PHP, and Python.

Connect with Žilvinas via LinkedIn

All information on Smartproxy Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Smartproxy Blog or any third-party websites that may belinked therein.

In this article

OnlyFans Scraping: The Complete Guide 2024

Data Collection

OnlyFans Scraping: The Complete Guide 2025

In recent years, there has been a significant shift in the way content creators, influencers, and artists connect with their audience and monetize their talents. OnlyFans, a subscription-based social media platform, has emerged as a website that allows creators to share exclusive content directly with their dedicated followers for a subscription fee.

OnlyFans scraping, which involves extracting publicly available data from the website, has sparked an interest. In this blog post, we’ll delve into this scraping world, its possible use cases, and the benefits it offers. Excited to learn more? Buckle up, and let’s begin!

James Keenan

Jan 15, 2024

6 min read

Data Collection

Python

Scraping the Web with Selenium and Python: A Step-By-Step Tutorial

Since the late 2000s, web scraping has become essential for extracting public data, giving a competitive edge to those who use it. A common challenge is scraping pages with delayed data loading due to dynamic content, which traditional tools often struggle with. Fortunately, Selenium Python web scraping can effectively handle this issue. In this blog post, you'll learn how to scrape dynamic web data with delayed JavaScript rendering using Python and the Selenium library, with a complete code example and a video tutorial available at the end.

Dominykas Niaura

Nov 09, 2023

10 min read

Frequently asked questions

Does Amazon allow scraping?

Amazon’s terms of service discourage automated data extraction. While the data is publicly accessible, scraping Amazon without permission may violate their policies and lead to restrictions such as IP bans or legal consequences. We strongly recommend reviewing and adhering to Amazon’s terms of service before engaging in web scraping activities.

Scraping Amazon Product Data Using Python: Step-by-Step Guide

Understanding Amazon scraping

Benefits of Amazon scraping

Step-by-step guide to scraping Amazon product data

Set up prerequisites for scraping

Step 1: Install Python and set up your environment

Step 2: Install required libraries

Step 3: Create your Python script

Step 4: Run Your Scraper

Advanced techniques in scraping Amazon product data

Advanced Beautiful Soup techniques

CSS selectors

Regular expressions

Lambda functions for custom filtering

Using SoupStrainer

Navigating the parse tree

Parser options

Advanced Selenium techniques

Organizing data with Pandas

Scraping Amazon product data without coding

No-code web scraping tools

eCommerce Scraping APIs

Wrapping up

Frequently asked questions

Does Amazon allow scraping?

Is it possible to scrape data from Amazon?

What is the best way to scrape Amazon for product data?

High speed proxies for all kinds of use cases

Scraping Amazon Product Data Using Python: Step-by-Step Guide

Understanding Amazon scraping

Benefits of Amazon scraping

Step-by-step guide to scraping Amazon product data

Set up prerequisites for scraping

Step 1: Install Python and set up your environment

Step 2: Install required libraries

Step 3: Create your Python script

Step 4: Run Your Scraper

Advanced techniques in scraping Amazon product data

Advanced Beautiful Soup techniques

CSS selectors

Regular expressions

Lambda functions for custom filtering

Using SoupStrainer

Navigating the parse tree

Parser options

Advanced Selenium techniques

Organizing data with Pandas

Scraping Amazon product data without coding

No-code web scraping tools

eCommerce Scraping APIs

Wrapping up

Related articles

Frequently asked questions

Does Amazon allow scraping?

Is it possible to scrape data from Amazon?

What is the best way to scrape Amazon for product data?

High speed proxies for all kinds of use cases