Playwright Web Scraping: A Practical Tutorial
Ever feel like extracting data from the web is like trying to direct a play without a script? Enter Playwright – your all-in-one stage manager for seamless web scraping. It handles the browser, the elements, and even the unpredictable plot twists of modern web pages. Follow this tutorial to learn how to use this powerful tool to extract data from any web page.
What is Playwright?
Playwright is a modern web scraping and browser automation framework that simplifies data extraction from web pages. It supports multiple headless browsers, including Chromium, Firefox, and WebKit, making it a convenient tool that covers many popular developer requirements. It also offers a great and simple API that allows developers to interact with dynamic user interfaces, locate elements using CSS selectors, and easily extract structured data.
While Playwright is a new actor in the scene, it stands out above many older tools for its extensive list of features. It excels at handling modern, JavaScript-heavy websites and supports multiple programming languages like JavaScript, Python, and C#, allowing developers to write scripts in any preferred language. Playwright can also create isolated browser contexts that enable scraping across multiple pages simultaneously without sharing state, making it both efficient and secure. You can tell it's been created by people familiar with the struggles of web scraping and packed all the best features in this fantastic framework.
If you feel like the websites you're trying to get data from are as complex as the intricate schemes of William Shakespeare's Much Ado About Nothing – worry not, as Playwright is built to tackle any web scraping or web automation challenges easily.
Methods for web scraping using Playwright
Playwright provides several powerful methods for web scraping across different programming languages, including Python, Node.js, and JavaScript. Here's a list of a few of them:
- Page navigation. With Playwright, you can navigate to a web page using functions such as page.goto(). This allows you to navigate the website's pages, which is especially useful when content isn't limited to a single page. It's a commonly used method for scraping eCommerce websites that list products across several pages.
- Element selection. Playwright allows you to select elements on the page using CSS selectors or XPath. Regardless of your preference, the framework will enable you to easily select HTML elements with methods such as page.querySelector(). Once elements are selected, you can extract various types of data, including text, links, images, and attributes.
- Handling dynamic content. Playwright can interact with JavaScript-heavy websites by waiting for elements to load with page.waitForSelector() or page.waitForTimeout(), ensuring the content is fully loaded before scraping.
- Interacting with elements. Playwright allows you to simulate actions like clicking buttons, filling out forms, and scrolling through pages to load more content. Methods such as page.click() are helpful for scraping content behind interactive elements.
- Handling browser contexts. Playwright's support for multiple browser contexts allows you to scrape data from various pages or simulate user sessions without conflicts. Paired with reliable proxies, this feature is a great way to stay anonymous and undetected while browsing. This is useful for multi-tab scraping, multiple account management, or automating several actions simultaneously.
- Network interception. You can intercept network requests and responses using page.route() to gather dynamically loaded data via API calls, providing an advanced method of scraping data directly from the network traffic.
- Browser automation. Playwright enables automating complex workflows, such as logging into websites, submitting forms, and navigating through various pages, making it suitable for scraping data from applications with login mechanisms or multi-step interactions.
Web scraping with Playwright: a step-by-step guide
Now that you know the whole repertoire of Playwright, let's get started with setting it up for web scraping. For this tutorial, we're going to use Node.js, but you can also install the framework using Python. Follow these steps to set up and get started right away:
- Install Playwright. You can get Playwright using npm, yarn, or pnpm by entering the command below into your terminal. You'll have a few prompts to answer, such as picking between TypeScript and JavaScript, the name of your tests folder, and browsers:
npm
npm init playwright@latest
yarn
yarn create playwright
pnpm
pnpm create playwright
2. Include Playwright in your script. Create a new JavaScript (.js) file and include the line below at the beginning. You can switch the chromium option with webkit or firefox if you have them installed.
const { chromium } = require('playwright');
3. Navigate to a web page. For this example, we're going to extract data from a website called ScrapeMe, which is ideal for various web scraping tests. With the following code, you'll be able to launch a new browser window and navigate to the web page:
const { chromium } = require('playwright');(async () => {// Launch a new browser instanceconst browser = await chromium.launch({ headless: false });// Open a new pageconst page = await browser.newPage();// Navigate to the ScrapeMe websiteawait page.goto('https://scrapeme.live/shop/');// After the above actions are performed, close the browser.await browser.close();})();
4. Select and extract a specific element. The website has a list of items similar to those of a regular online shop. While Playwright offers a wide range of features to interact with web pages, for this example, we'll simply select the 3rd product from the list by its class name. Let's expand the previous code:
const { chromium } = require('playwright');(async () => {// Launch a new browser instanceconst browser = await chromium.launch({ headless: false });// Open a new pageconst page = await browser.newPage();// Navigate to the ScrapeMe websiteawait page.goto('https://scrapeme.live/shop/');// Select all elements matching the classconst productElements = await page.$$('.woocommerce-loop-product__title');// Access the 3rd element (index 2) and get its text contentconst thirdProductTitle = await productElements[2].textContent();console.log(`3rd Product Title: ${thirdProductTitle}`);// After the above actions are performed, close the browser.await browser.close();})();
The script opens a browser window, navigates to the target website, selects all elements with a defined class, and then prints the text content of the 3rd element from a list of items with that class. If you're unsure how to inspect a website's HTML and find the class name of the titles, check out our comprehensive guide on inspecting elements.
In these examples, we used the headless: false option, which makes the browser visible when performing script actions. You can set it to true to save computer resources and only get the result in your terminal.
Proxy implementation
While Playwright allows automated scripts to work for you, remember that the requests still come from your IP address. For pure anonymity and risk-free web scraping, it's highly recommended that you use high-quality proxies. Smartproxy offers a great range of cheap and effective proxy solutions with locations from 195+ countries, <0.3s average speed, and 99.99% uptime, ensuring that your web scraping activities with Playwright go undetected.
To use proxies with Playwright, you can pass proxy settings through the browser's launch or launchPersistentContext options. Playwright supports proxy integration via the proxy object, which accepts the proxy server URL.
Here's how you can modify your script to include the proxy with authentication:
const { chromium } = require('playwright');(async () => {// Proxy serverconst proxy = 'gate.smartproxy.com:10001';// Launch a new browser instance with proxy settingsconst browser = await chromium.launch({headless: false,proxy: {server: `http://${proxy}`,},});// Open a new browser context and pass the credentialsconst context = await browser.newContext({httpCredentials: {username: 'user',password: 'pass',},});// Open a single pageconst page = await context.newPage();// Check IP on the same page by navigating to the IP check URLawait page.goto('https://ip.smartproxy.com/ip');const content = await page.evaluate(() => document.body.innerText);console.log(`Your IP: ${content}`);// Navigate to the ScrapeMe websiteawait page.goto('https://scrapeme.live/shop/');// Select all elements matching the classconst productElements = await page.$$('.woocommerce-loop-product__title');// Access the 3rd element (index 2) and get its text contentconst thirdProductTitle = await productElements[2].textContent();console.log(`3rd Product Title: ${thirdProductTitle}`);// Close the browserawait browser.close();})();
This script does several things – first, it connects to a proxy server to make any future requests through a different IP address. Then, it makes a request to the Smartproxy IP-checker website to print your IP address to check if the connection is coming from a different address from your own. Finally, it makes the same request to the ScrapeMe website that prints the 3rd element from the product page.
Playwright vs. other frameworks
Playwright isn't the only name mentioned in the end credits roll of the most popular web scraping tools. Two more famous names pop up when searching for the most efficient frameworks – Puppeteer and Selenium. How are these tools different from Playwright, and why should you choose them? Below is a brief comparison table:
Playwright
Puppeteer
Selenium
Speed
Fast (supports modern browsers)
Fast (only for Chromium-based browsers)
Slower (supports older browsers)
Features
Advanced automation, cross-browser support
Focus on Chromium; less features for other browsers
Extensive but less modernized features
Efficiency
High (headless browser by default, runs several instances at once)
High (limited to Chromium, suitable for modern setups)
Medium (larger footprint due to legacy support)
Ease of use
Easy (developer-friendly API, easy setup)
Easy (simple APIs)
Moderate (steeper learning curve)
Community
Small (backed by Microsoft)
Medium (supported by Google)
Large (long-standing veterans of the industry)
Documentation
Excellent (detailed and regularly updated)
Good (focused on Chromium use cases)
Extensive (covers legacy and modern use cases)
Browser support
Chromium, Firefox, WebKit
Chromium-based browsers only
Chromium, Firefox, Safari, Internet Explorer, Edge
Programming language support
Multiple (JavaScript, Python, Java, C#, etc.)
Limited (primarily JavaScript)
Extensive (JavaScript, Python, Java, Ruby, etc.)
Playwright vs. Puppeteer for scraping
Playwright and Puppeteer offer fast and efficient scraping capabilities but cater to different audiences. Playwright supports multiple browsers, making it ideal for cross-browser scraping tasks, whereas Puppeteer focuses exclusively on Chromium-based browsers. Playwright also provides advanced features like headless mode by default and concurrent sessions, giving it an edge in efficiency for complex workflows. However, Puppeteer’s simplicity and close integration with Chromium make it an excellent choice for more straightforward scraping projects.
Playwright vs. Selenium for scraping
Playwright and Selenium are another pair of excellent frameworks that take upon different stages. Playwright offers modern APIs, headless browser mode by default, and superior efficiency, making it ideal for complex workflows. In contrast, Selenium has extensive support for legacy browsers like Internet Explorer and a wider range of programming languages, making it a better choice for projects needing legacy compatibility. While Selenium boasts a larger community and a more mature ecosystem, Playwright is faster and more efficient for modern browser automation tasks.
Curtain call
The Playwright comes on stage to take the final bow – did you enjoy their performance? With its extensive browser support, modern features, and easy setup and usability, Playwright has undoubtedly earned a standing ovation for its role as one of the best frameworks for web scraping. Whether tackling complex web automation projects or running a simple script to extract data, this tool ensures the show goes on without a hitch.
About the author

Zilvinas Tamulis
Technical Copywriter
Zilvinas is an experienced technical copywriter specializing in web development and network technologies. With extensive proxy and web scraping knowledge, he’s eager to share valuable insights and practical tips for confidently navigating the digital world.
All information on Smartproxy Blog is provided on an as is basis and for informational purposes only. We make no representation and disclaim all liability with respect to your use of any information contained on Smartproxy Blog or any third-party websites that may belinked therein.