Using Playwright Accessibility Tree Snapshots to Let AI Agents Browse the Web

Matthew Diakonov

Updated March 19, 2026

playwright accessibility-tree browser-automation web-agents no-code ai_agents

Using Playwright Accessibility Tree Snapshots to Let AI Agents Browse the Web

The traditional approach to letting an AI agent interact with websites involves either writing custom selectors for every page or sending screenshots to a vision model. Both approaches have serious drawbacks. There is a third option that is simpler and more reliable.

The Selector Problem

Custom CSS or XPath selectors break constantly. A website redesign, an A/B test, or even a minor class name change can invalidate your entire automation. Maintaining selectors for dynamic web apps is a full-time job.

The Screenshot Problem

Sending screenshots to a vision model works but is slow and expensive. Every interaction requires a screenshot capture, an API call to a multimodal model, and parsing the model's response to find coordinates. Latency adds up fast, and accuracy on small or densely packed UI elements is unreliable.

The Accessibility Tree Approach

Playwright has an accessibility tree snapshot mode that gives you a semantic tree of every element on the page - buttons, links, text fields, headings, and more - with labels and reference IDs. No selectors. No screenshots. No vision model.

The snapshot looks something like this:

[button ref=e1] "Sign In"
[textfield ref=e2] "Email address" value=""
[link ref=e3] "Forgot password?"
[heading ref=e4] "Welcome back"

You feed this tree to Claude or GPT, tell it what you want to do ("log into the account using these credentials"), and the model picks the right element references to interact with. Then you use Playwright to click or type using those refs.

Why This Works

The accessibility tree is semantic by design - it was built for screen readers, which need to understand what every element does. This is exactly the kind of structured information that LLMs are good at parsing. The model does not need to guess where a button is from pixels. It reads the label and role directly.

This approach is also resilient to visual changes. As long as the button still says "Sign In" in the accessibility tree, the automation works regardless of styling, layout, or theme changes.

This post was inspired by a discussion on r/AI_Agents by u/Tricky-Promotion6784.

Fazm is an open source macOS AI agent. Open source on GitHub.

Using Playwright Accessibility Tree Snapshots to Let AI Agents Browse the Web

Using Playwright Accessibility Tree Snapshots to Let AI Agents Browse the Web

The Selector Problem

The Screenshot Problem

The Accessibility Tree Approach

Why This Works

More on This Topic

Related Posts

Browser Automation AI Agent with Playwright and Puppeteer

Switching from DOM Selectors to Accessibility Tree Cut Our Flake Rate from 30% to 5%

Automate Browser Tasks Without Coding - Desktop Automation with Accessibility APIs