Extracting Structured Data from Webpages for AI Agents - Accessibility Trees vs HTML

M
Matthew Diakonov

HTML Is Unstable, Accessibility Trees Are Not

When an AI agent needs to extract structured data from a webpage, the obvious approach is parsing the HTML. Find the right CSS selectors, extract text content, follow the DOM structure.

This works until the website updates their layout, changes class names, or restructures their components. Then every selector breaks.

The accessibility tree offers a better signal.

What the Accessibility Tree Provides

Every webpage exposes an accessibility tree - a structured representation built for screen readers. This tree contains:

  • Roles. Button, link, heading, textbox, table, row, cell. These describe what elements do, not how they look.
  • Names. The human-readable label for each element. A button labeled "Submit Order" will always have that name regardless of its CSS class.
  • States. Whether a checkbox is checked, a menu is expanded, or a button is disabled.
  • Relationships. Which elements are children of which, forming a logical hierarchy.

This representation is semantic, not visual. It describes meaning, not presentation.

Why Accessibility Trees Are More Stable

CSS class names change with every redesign. Accessibility labels change rarely because they are tied to functionality, not styling. A "Submit Order" button has been called "Submit Order" through five redesigns.

This stability makes accessibility-first scraping dramatically more maintainable than selector-based scraping. Your agent targets button[name="Submit Order"] instead of .btn-primary.checkout-submit.v2.

Practical Extraction Patterns

For AI agents, the workflow looks like this:

  1. Navigate to the page using browser automation
  2. Capture the accessibility tree instead of the DOM
  3. Send the tree to the language model as structured text
  4. Let the model extract the relevant information

The accessibility tree is typically 10-50x smaller than the full DOM. This means lower token costs and faster processing. A page with 5,000 DOM nodes might have 200 accessibility nodes that capture all the meaningful content.

Limitations

Not every website has good accessibility markup. Poorly built sites produce sparse or unhelpful accessibility trees. For these, you still need screenshot-based extraction or DOM parsing as a fallback. But for well-built sites - which includes most major web applications - the accessibility tree is the superior signal.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts