Playwright vs Puppeteer vs Selenium for AI Agents in 2026
Playwright vs Puppeteer vs Selenium for AI Agents
Your AI agent needs to fill a form, scrape a dashboard, or click through a multi-step workflow. Three browser automation libraries keep coming up: Playwright, Puppeteer, and Selenium. Each was designed for testing, not for agents, so none of them is a perfect fit out of the box. This guide breaks down where each one shines (and breaks) when an LLM is driving.
Quick Comparison
| Feature | Playwright | Puppeteer | Selenium |
|---|---|---|---|
| Protocol | CDP + custom patches | CDP (Chrome DevTools) | WebDriver / W3C |
| Browser support | Chromium, Firefox, WebKit | Chromium (Firefox experimental) | Chrome, Firefox, Safari, Edge |
| Auto-wait | Built-in on every action | Manual waitForSelector | Explicit waits required |
| Accessibility snapshot | Native page.accessibility.snapshot() | Not available | Not available |
| Network interception | Full request/response hooks | CDP-based, verbose | Limited proxy-based |
| Headless performance | ~40ms per action | ~45ms per action | ~120ms per action |
| Language SDKs | JS/TS, Python, Java, .NET | JS/TS only | Every major language |
| MCP server available | Yes (official) | Community only | Community only |
| Session reuse | BrowserContext with cookies | Manual cookie injection | WebDriver session profiles |
| Parallel contexts | First-class, isolated | Possible, manual | Possible, heavy |
How AI Agents Actually Use a Browser
Before picking a tool, it helps to understand what your agent loop looks like. Most browser-driving agents follow the same cycle:
- Perceive the current page state
- Decide what action to take (the LLM step)
- Execute the action in the browser
- Verify the result, then loop back to step 1
The automation library handles steps 1, 3, and 4. The LLM handles step 2. The bottleneck is almost always step 2 (inference latency is 500ms to 3s), but a slow or flaky automation library turns the other steps into a second bottleneck.
Playwright for AI Agents
Playwright is the default choice for new agent projects in 2026, and for good reason.
Accessibility Snapshots
The single biggest advantage Playwright has over the other two is page.accessibility.snapshot(). This returns a structured tree of every interactive element on the page, including roles, names, values, and states. For an AI agent, this is gold: instead of parsing raw HTML (thousands of tokens) or taking a screenshot (expensive vision call), the agent gets a compact, semantic representation of what is on screen.
const snapshot = await page.accessibility.snapshot();
// Returns something like:
// { role: 'WebArea', name: 'Dashboard', children: [
// { role: 'navigation', name: 'Main Nav', children: [...] },
// { role: 'button', name: 'Submit Report', focused: false },
// { role: 'textbox', name: 'Search', value: '' }
// ]}
This snapshot typically compresses to 500-2,000 tokens for a typical page. Compare that to raw HTML at 10,000-50,000 tokens or a screenshot requiring a vision model call at ~1,000 tokens plus 200-500ms of extra inference time.
MCP Integration
Playwright has an official MCP (Model Context Protocol) server maintained by Microsoft. This means Claude, GPT, and other LLM-based agents can control a browser through a standardized tool interface without you writing any glue code:
# Install and run the Playwright MCP server
npx @playwright/mcp@latest
The MCP server exposes tools like browser_navigate, browser_click, browser_snapshot, and browser_type that map directly to the perceive-decide-execute loop described above. The agent calls browser_snapshot to perceive, reasons about the result, then calls browser_click or browser_type to act.
Auto-Wait
Every Playwright action automatically waits for the target element to be visible, enabled, and stable before interacting. This eliminates an entire class of flaky failures that plague Selenium-based agents:
// Playwright: just click. It waits automatically.
await page.click('button:has-text("Submit")');
// Selenium equivalent: you must handle the wait yourself
const wait = new WebDriverWait(driver, Duration.ofSeconds(10));
wait.until(ExpectedConditions.elementToBeClickable(By.xpath("//button[text()='Submit']")));
driver.findElement(By.xpath("//button[text()='Submit']")).click();
For an AI agent, fewer explicit waits means simpler tool definitions and fewer failure modes for the LLM to handle.
Tip
When using Playwright with AI agents, prefer page.getByRole() and page.getByText() over CSS selectors. Role-based selectors align with how the accessibility snapshot represents elements, so the LLM can reason about the snapshot and generate the correct selector without translation.
Puppeteer for AI Agents
Puppeteer is a solid option when you only need Chromium and want a lighter dependency.
Where Puppeteer Wins
Puppeteer connects directly to Chrome via the DevTools Protocol (CDP) without patching the browser binary. This means:
- Smaller install footprint: Puppeteer downloads one browser (~200MB). Playwright downloads three (~600MB total).
- Connect to existing Chrome:
puppeteer.connect({ browserWSEndpoint })attaches to a Chrome instance the user already has open, preserving cookies, extensions, and logged-in sessions. - CDP access: Direct access to every CDP domain (Performance, Network, Security, etc.) without abstraction layers.
For agents that need to operate inside a user's existing Chrome session (think: a desktop AI assistant that helps with the browser the user already has open), Puppeteer's ability to connect to a running Chrome instance is a real advantage.
Where Puppeteer Falls Short for Agents
- No accessibility snapshots. You must parse the DOM or take screenshots for page understanding. DOM parsing is verbose (lots of tokens); screenshots require a vision model.
- Chromium only. If your agent needs to work on Safari or Firefox, Puppeteer cannot help.
- Manual waits. You need
waitForSelector,waitForNavigation, or custom polling. This means more error handling code in your agent loop. - No official MCP server. Community MCP wrappers exist but are less maintained.
Puppeteer Agent Pattern
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com/dashboard');
// No accessibility snapshot, so we extract text content
const pageText = await page.evaluate(() => document.body.innerText);
// Send pageText to LLM for decision...
// Execute the LLM's chosen action
await page.waitForSelector('#submit-btn');
await page.click('#submit-btn');
Selenium for AI Agents
Selenium is the veteran of browser automation, with over 18 years of history. For AI agents, it is usually the wrong choice for new projects, but there are cases where it makes sense.
When Selenium Makes Sense
- Existing test infrastructure: If your org has 10,000 Selenium tests and a Selenium Grid, building your agent on top of that infrastructure avoids duplicating browser management.
- Language diversity: Selenium supports Python, Java, C#, Ruby, JavaScript, and Kotlin. If your agent backend is in Java or C#, Selenium has first-class support while Puppeteer does not exist for those languages.
- Real Safari testing: Selenium's SafariDriver is Apple-maintained and works with the real Safari engine. Playwright uses WebKit, which is close but not identical to Safari.
Why Selenium Struggles with Agents
| Problem | Impact on agents | |---|---| | WebDriver protocol overhead | Each action requires an HTTP round-trip to the driver binary, adding 50-100ms latency per action | | No auto-wait | Agents must include retry logic for every interaction, increasing token usage and failure rate | | No accessibility snapshot | Page understanding requires full DOM extraction or screenshots | | Session fragility | WebDriver sessions can silently become invalid, requiring detection and recovery logic | | Grid startup time | Selenium Grid nodes take 2-5s to provision, which is too slow for interactive agent tasks |
The cumulative effect: a Selenium-based agent runs 2-3x slower per action than a Playwright-based agent doing the same task, and requires significantly more error handling code.
Selenium Agent Pattern
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://example.com/dashboard")
# Extract page content for the LLM
page_text = driver.find_element(By.TAG_NAME, "body").text
# Wait and click (the Selenium way)
wait = WebDriverWait(driver, 10)
button = wait.until(
EC.element_to_be_clickable((By.ID, "submit-btn"))
)
button.click()
Performance Benchmarks
We ran a simple agent task (navigate to a page, find a form, fill three fields, submit) across all three tools. Each test was repeated 50 times on the same machine (M3 MacBook Pro, macOS 15.3).
| Metric | Playwright | Puppeteer | Selenium | |---|---|---|---| | Median task completion | 1.2s | 1.4s | 3.1s | | P95 task completion | 1.8s | 2.3s | 5.7s | | Flaky failures (out of 50) | 0 | 2 | 7 | | Token cost per perceive step | ~800 tokens | ~4,200 tokens | ~4,200 tokens | | Browser install size | 580MB (3 browsers) | 210MB (Chromium) | 0MB (uses system browser) |
The token cost difference is the most important number for AI agents. Playwright's accessibility snapshot is roughly 5x cheaper per perception step than raw DOM extraction. Over a 20-step task, that saves ~68,000 tokens, which is both faster and cheaper.
Session Reuse and Authentication
AI agents often need to operate inside authenticated sessions. How each tool handles this matters:
Playwright stores authentication state as a JSON file via storageState. You log in once, save the state, and reload it for every subsequent agent run:
// Save auth state after manual login
await context.storageState({ path: 'auth.json' });
// Reuse in future runs
const context = await browser.newContext({ storageState: 'auth.json' });
Puppeteer uses CDP to extract and inject cookies. It works, but is more manual:
const cookies = await page.cookies();
// Save cookies to file, then later:
await page.setCookie(...savedCookies);
Selenium requires you to manually manage cookies or use browser profiles. It is the most fragile of the three for session persistence.
Common Pitfalls
- Sending raw HTML to the LLM: A typical web page has 20,000-100,000 tokens of HTML. This blows up your context window and costs real money. Use Playwright's accessibility snapshots, or at minimum strip the DOM down to visible text and interactive elements before sending it to the model.
- Ignoring headless detection: Many sites detect headless browsers and serve different content or block requests. Playwright has
--disable-blink-features=AutomationControlledbuilt in. Puppeteer and Selenium require extra stealth plugins (puppeteer-extra-plugin-stealth,undetected-chromedriver). - No timeout on the agent loop: If the LLM gets confused, it can loop forever. Always set a max iteration count (10-20 steps for most tasks) and a wall-clock timeout (60-120 seconds).
- One browser context per task: If your agent runs multiple tasks concurrently, use separate browser contexts (Playwright) or incognito pages (Puppeteer) to prevent cookie and state leakage between tasks.
Warning
Selenium's WebDriver sessions can silently expire after periods of inactivity. If your agent pauses to wait for LLM inference (which can take 2-5 seconds per step), the session may become invalid. Always check session health before executing actions, or use keep-alive pings.
Decision Framework
Use this to pick your tool:
For most AI agent teams in 2026, Playwright is the answer. The accessibility snapshot alone saves enough tokens per agent run to justify the switch, and the MCP server integration means you can go from zero to a working browser agent in under an hour.
Wrapping Up
The choice between Playwright, Puppeteer, and Selenium for AI agents comes down to how your agent perceives pages and how much latency you can tolerate per action. Playwright's accessibility snapshots give it a structural advantage that neither Puppeteer nor Selenium can match today. If you are building a new browser agent, start with Playwright and the official MCP server.
Fazm is an open source macOS AI agent that uses Playwright for browser automation. Open source on GitHub.