Back to blog

JavaScript Web Scraping Tutorial (2025)

Ever wished you could make the web work for you? JavaScript web scraping allows you to gather valuable information from websites in an automated way, unlocking insights that would be difficult to collect manually. In this guide, you'll learn the key tools, techniques, and best practices to scrape data efficiently, whether you're a beginner or a developer looking to streamline data collection.

Zilvinas Tamulis

Mar 28, 2025

13 min read

What is web scraping?

Web scraping is the process of extracting data from websites with the help of scripts or automated tools. Instead of going to the website yourself and scrolling through for information, the computer gets the entire page and finds the content for you. This process allows you to collect large amounts of information effortlessly, making it useful for market research, price tracking, staying on top of the latest news, and more.

Scraping can be done on both static and dynamic websites, using different techniques depending on how the data is displayed. Simple sites can be scraped using basic HTTP requests and HTML parsing, while more complex websites require handling JavaScript-rendered content or interacting with elements like dropdowns and buttons. In this tutorial, you'll learn about both methods and when to use them.

Why use JavaScript for web scraping?

JavaScript is the backbone of the modern Internet, bringing interactive user interfaces and dynamic content updates to life. Running on almost every web browser, it enhances websites with animations, real-time data, and advanced functionality, transforming them into fully fledged applications rather than just static pages.

You might not even know it, but whenever you fill out a form, watch stock prices change, or scroll through social media, JavaScript is always working behind the scenes. Unlike static HTML and CSS, JavaScript enables websites to load and modify content at any moment without requiring a full page refresh.

Over the years, JavaScript has grown into a versatile programming language that can be used for far more than just building websites. Thanks to Node.js, JavaScript can run on servers, making it possible to build backend applications, such as automation scripts and even web scrapers. This means developers can use the same language to create web pages and also extract data from them. How the tables have turned…

JavaScript offers a variety of powerful libraries and frameworks for web scraping. Cheerio, for example, is an excellent tool for quickly and efficiently parsing static HTML. Meanwhile, Puppeteer and Playwright are more advanced alternatives and allow developers to control headless browsers, mimicking real user interactions such as clicks, filling out forms, scrolling, and even mouse movement. Such features allow them to scrape data that are otherwise hard to obtain due to the limitations of websites to prevent automated bots. With these libraries, it's possible to scrape both simple and complex sites effortlessly.

Key tools for web scraping in JavaScript

When it comes to web scraping with JavaScript, several key tools can help you extract data from websites. Each tool has its strengths and is suited for different types of scraping tasks. Here's an overview of the most commonly used tools:

  1. Puppeteer. A headless Node.js library primarily used for scraping dynamic content. It allows you to control an automated version of the Chrome browser, enabling you to interact with and extract data from websites that load content using JavaScript. Puppeteer is ideal when you need to scrape pages with complex content rendering or if you want to simulate user interactions like clicking buttons or scrolling.
  2. Playwright. A new alternative to Puppeteer, offering multi-browser support (Chrome, Firefox, and WebKit). Like Puppeteer, Playwright enables you to control browsers for scraping dynamic content. However, its ability to support multiple browsers makes it a more versatile choice, especially when you need to test across different environments or scrape websites that may behave differently in other browsers.
  3. Axios. A promise-based HTTP client for Node.js that simplifies making requests to fetch HTML or API data. Unlike Puppeteer and Playwright, Axios doesn't interact with the browser, making it best suited for scraping static HTML content or fetching data from APIs. It's lightweight and quick for simple web scraping tasks where JavaScript rendering isn't necessary. Use for basic web scraping tasks where you only need to fetch static HTML or work with APIs.
  4. Cheerio. Commonly used in conjunction with Axios to parse and extract data from static HTML content. With Cheerio, you can use familiar jQuery syntax to traverse the HTML structure, making it easy to extract information like text, links, and images. It’s a great tool for scraping websites where the content is static and there's no need to render JavaScript. Use Cheerio when you want to parse and manipulate simple HTML documents quickly.

Getting started with web scraping in JavaScript

The best part about getting started with JavaScript is that you don't have to install or set up anything. As long as you have a browser, you can run your JavaScript files by simply including a link to the script in an HTML file.

However, to run more advanced scripts and include libraries and other external tools, you'll need to set up Node.js and npm (Node Package Manager), which allow you to run JavaScript outside the browser and install necessary tools and libraries. Here's how:

  1. Download and install Node.js with npm. Visit the Node.js official website and download the latest LTS (Long-Term Support) version. Follow the installation instructions for your operating system. As for npm, it comes installed together with all the latest Node.js versions, so you don't have to do anything additional.
  2. Verify your installation. Open a terminal or command prompt and run the following commands to check if Node.js and npm were installed. If both commands return the version number, the installation was successful.
node -v
npm -v

3. Create a new project. To keep everything neat and tidy, create a project directory where you'll run your scraping project and navigate to it:

mkdir web-scraper
cd web-scraper

4. Initialize a new Node.js project. This creates a package.json file to manage dependencies:

npm init -y

5. Install Axios and Cheerio. Axios will be used to fetch HTML from a webpage, while Cheerio will parse and extract data from the fetched HTML.

npm install axios cheerio

6. Create a new JavaScript file. Either manually or through a terminal command, create a new file. Name it anything you want, but it must end with the .js extension.

7. Import the installed libraries. Open the newly created file and write the following lines at the beginning to include both Axios and Cheerio in your script:

const axios = require('axios');
const cheerio = require('cheerio');

8. Create a test request. Below is the most simple request you can make with the combination of these two libraries. Axios will get the sample web page, while Cheerio will parse the scraped data, extract the title of the page, and print it in the terminal.

const axios = require('axios');
const cheerio = require('cheerio');
axios.get('https://example.com').then(response => {
const $ = cheerio.load(response.data); // Load HTML into cheerio
console.log($('title').text()); // Extract and print the page title
});

9. Run the script. To execute the script, enter the following command in your terminal tool:

node example-script.js

Handling dynamic pages

Dynamic web pages rely on JavaScript to load and modify content after the initial HTML is received, making them difficult to scrape with traditional HTTP requests. Unlike static pages, where data is readily available in the source code, dynamic sites often require user interactions, API calls, or delayed loading to display content. This poses a challenge for developers trying to extract data, as tools like Axios and Cheerio can only access the initial HTML response, missing key information rendered later by JavaScript.

To handle this, headless browsers provide a way to interact with web pages just like a real user would. These tools can render JavaScript, wait for elements to appear, and even simulate clicks and form submissions. By mimicking a real browser, headless solutions allow developers to extract data from dynamic pages with precision, overcoming the limitations of static HTML scrapers.

Among headless browser automation tools, Playwright stands out as a powerful and versatile solution. It supports multiple browsers, including Chromium, Firefox, and WebKit, and provides built-in waiting mechanisms to ensure elements are fully loaded before interacting with them. With its robust API and ease of use, Playwright is an excellent choice for handling dynamic web pages, making web scraping and automation more efficient and reliable.

Here's a step-by-step guide on installing Playwright, writing a simple script, and scraping a sample website while ensuring elements are fully loaded before extracting data:

  1. Ensure Node.js is installed. Playwright needs this library to function and can't be used without it.
  2. Initialize a new Node.js project. Open your terminal, create a new project folder, navigate inside it, and initialize the project.
mkdir playwright-scraper && cd playwright-scraper
npm init -y

3. Install Playwright. Run the following command to install Playwright. It will install Playwright and the necessary browser binaries.

npm install playwright

4. Create a new JavaScript file. Create a new file called scraper.js in your project folder.

touch scraper.js

5. Import Playwright in your script. Open the newly created file in your preferred code editor. At the top of the file, add the following line to import Playwright and use the Chromium browser.

const { chromium } = require('playwright');

6. Launch the headless browser. Continue writing the script. The next part will initialize and launch a headless browser instance. You can change the headless value to false if you want a graphic interface to see what the browser is doing.

(async () => {
const browser = await chromium.launch({ headless: true }); // Starts a browser without a visible UI
const page = await browser.newPage(); // Creates a new browser tab for navigation
})();

7. Navigate to a website. The next line will tell the script to visit a sample website:

await page.goto('https://example.com', { waitUntil: 'domcontentloaded' }); // Ensures the DOM is fully loaded before proceeding

8. Wait for an element to appear. Since the content will load dynamically, add a waiting mechanism:

await page.waitForSelector('h1'); // Waits until the <h1> element is available on the page

9. Extract data from the page. Select and extract the text of the <h1> element:

const heading = await page.textContent('h1'); // Grabs the text inside the <h1> tag
console.log('Page Heading:', heading); // Prints the extracted text to the terminal

10. Close the browser. After the script is done scraping, it should close the browser to save resources:

await browser.close();

Here's the full code:

const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'domcontentloaded' });
await page.waitForSelector('h1');
const heading = await page.textContent('h1');
console.log('Page Heading:', heading);
await browser.close();
})();

Run it through the terminal as usual. Pay attention to any messages that may appear, as a fresh installation of Playwright might require you to run a few extra commands before proceeding.

If you see results in your terminal, you’ve successfully installed Playwright and used it to scrape data from a dynamic webpage while ensuring elements are fully loaded. You can now extend this script to extract more complex data, interact with elements, or explore even more advanced ideas.

Data extraction and processing

Once you’ve gathered raw data from a website, the next step is to extract meaningful information from it. JavaScript provides several powerful methods for parsing and processing the data. Here's how you can go about extracting, cleaning, and storing it:

Extracting specific elements

To extract specific elements from the scraped HTML, you can use libraries like Cheerio or work directly with the DOM. Cheerio allows you to use jQuery-like syntax, which makes it easy to traverse the DOM and find the data you're interested in. For example, if you want to extract all the product titles from an eCommerce site, you can use Cheerio's $() method to target specific HTML elements:

const axios = require('axios');
const cheerio = require('cheerio');
axios.get('https://example.com/products')
.then(response => {
const $ = cheerio.load(response.data);
const productTitles = [];
$('h2.product-title').each((index, element) => {
productTitles.push($(element).text().trim());
});
console.log(productTitles);
});

In this example, Cheerio targets all <h2> elements with the class .product-title and extracts their text. This gives you an array of product titles that you can later process.

Cleaning and structuring data

Data extracted from websites often needs cleaning before it can be helpful. This can involve removing extra whitespace, converting strings to numbers, or formatting dates. JavaScript offers powerful methods like regex and array methods (map(), filter(), reduce()) to clean and structure the data.

For instance, if product prices are scraped as strings with extra characters (like "$" or ","), you can clean the data with a regular expression and convert the prices into numbers:

const cleanedPrices = productPrices.map(price => {
return parseFloat(price.replace(/[^0-9.-]+/g, ''));
});
console.log(cleanedPrices);

Here, regex removes any non-numeric characters (except for the decimal point) before converting the cleaned string into a floating-point number.

Practical example: eCommerce website scraping

Scraping eCommerce websites for valuable data such as product information, prices, and reviews is an everyday use case for web scraping. While traditional methods often require managing headless browsers and complex scraping logic, services like Smartproxy's eCommerce Scraping API make it incredibly easy to gather this information with just a few lines of code. Let’s take a closer look at how this can be done using their API.

In this example, you'll see how you can use the service to extract product details from a book listing page. The API handles all the complexities of bypassing anti-scraping measures, such as CAPTCHAs and IP blocks, removing any worries of hitting a wall and allowing you to scrape data without interruptions or the need for additional configuration or workarounds.

  1. Make a POST request to the API. Initialize the function scrape and send a POST request to Smartproxy's eCommerce Scraping API endpoint.
const scrape = async() => {
const response = await fetch("https://scraper-api.smartproxy.com/v2/scrape", {
method: "POST",
body: JSON.stringify({...}),
});

2. Specify the request body parameters. The request body includes the target URL and tells the API that you're scraping an eCommerce page. It also instructs the API to parse the page content. Enter this in the placeholder ({...}) above:

body: JSON.stringify({
"target": "ecommerce",
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"parse": true
}),

3. Include an authorization token. Add an Authorization header to authenticate the request, ensuring only authorized users can access the API.

headers: {
"Content-Type": "application/json",
"Authorization": "Basic auth token"
},

4. Log the API response. After the data is scraped and parsed by the API, log the resulting JSON data to the console.

console.log(await response.json());

5. Execute the scrape function. Finally, call the scrape function to trigger the entire scraping process.

scrape();

Here's the full code:

const scrape = async() => {
const response = await fetch("https://scraper-api.smartproxy.com/v2/scrape", {
method: "POST",
body: JSON.stringify({
"target": "ecommerce",
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"parse": true
}),
headers: {
"Content-Type": "application/json",
"Authorization": "Basic auth token"
},
}).catch(error => console.log(error));
console.log(await response.json())
}
scrape()

You can also modify this code using a user-friendly interface in the Smartproxy dashboard. It lets you easily adjust parameters like URL, JavaScript rendering, language, location, and device type, with the changes instantly reflected in the generated code.

free-trial.svg

Try Scraping APIs for free

Get 1,000 requests and effortlessly scrape data with JavaScript.

Tips for efficient scraping with JavaScript

When scraping websites with JavaScript, efficiency and reliability are key to avoiding detection and ensuring smooth data extraction. Here are some best practices to improve your scraping process:

  1. Use waiting mechanisms. Always wait for elements to load before extracting data using methods like waitForSelector() or waitForLoadState(). This prevents errors caused by missing elements.
  2. Optimize browser sessions. Reuse browser instances instead of launching a new one for each request. This reduces overhead and speeds up scraping, especially when dealing with multiple pages.
  3. Rotate user agents and headers. Mimic real users by randomly selecting user-agent strings and setting appropriate headers like Accept-Language and Referer to reduce the risk of being blocked.
  4. Use proxies to avoid blocks. Many sites detect and block scrapers based on IP addresses. Using rotating proxies or residential proxy services helps distribute requests and avoid bans.
  5. Respect robots.txt and avoid overloading servers. Check a site's robots.txt file to see if scraping is allowed, and implement rate limiting (setTimeout() or page.waitForTimeout()) to prevent excessive requests that may trigger anti-scraping measures.

By following these tips, you can improve your scraping efficiency while staying under the radar of anti-bot protections.

Final words

JavaScript web scraping makes data extraction effortless, whether you're pulling info from static pages with Cheerio or tackling dynamic sites with Playwright. While Axios handles simple requests, Playwright’s multi-browser support and smart waiting features make it the go-to for scraping JavaScript-heavy content. To scrape like a pro, use proxies, rotate user agents, and optimize browser sessions to stay undetected. With the right tools and techniques, you can turn the web into your personal data source!

About the author

Zilvinas Tamulis

Technical Copywriter

A technical writer with over 4 years of experience, Žilvinas blends his studies in Multimedia & Computer Design with practical expertise in creating user manuals, guides, and technical documentation. His work includes developing web projects used by hundreds daily, drawing from hands-on experience with JavaScript, PHP, and Python.


Connect with Žilvinas via LinkedIn

All information on Smartproxy Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Smartproxy Blog or any third-party websites that may belinked therein.

Frequently asked questions

What is web scraping with JavaScript?

Web scraping with JavaScript is the process of extracting data from websites using JavaScript-based tools like Puppeteer, Cheerio, or Playwright. JavaScript is handy for scraping dynamic websites that load content dynamically. With Node.js, developers can run web scrapers outside the browser to automate data extraction efficiently.

How do you do web scraping in JavaScript?

What is the best JavaScript library for web scraping?

How do you handle pagination when web scraping with JavaScript?

© 2018-2025 smartproxy.com, All Rights Reserved