How Accessibility-Based Desktop Automation Fixes Flaky Browser Tests
How Accessibility-Based Desktop Automation Fixes Flaky Browser Tests
If you have spent any time with browser automation, you know the pain. Tests pass locally but fail in CI. Selectors break when a developer changes a class name. Elements exist in the DOM but are not clickable because an overlay has not finished animating. The flake rate on any non-trivial browser automation suite makes you question why you bothered writing the tests.
Why Browser Automation Breaks
Most browser automation tools - Selenium, Playwright, Puppeteer - interact with the DOM directly. They find elements by CSS selectors, XPaths, or test IDs, then simulate clicks and keystrokes. This works until it does not.
DOM changes are the obvious culprit. A frontend refactor renames components, restructures the element hierarchy, or changes how elements are rendered. Every selector-based test that touches those elements breaks. A developer changes class="submit-btn primary" to class="btn btn-primary" and your test suite lights up red.
But the subtler problem is timing. Modern web apps load content asynchronously, render progressively, and animate transitions. The element your automation is looking for might exist in the DOM but not be visible, not be interactable, or not yet be in its final position. Hardcoded sleep(500) statements paper over this temporarily - until the CI runner is slower than expected and the sleep is not enough.
The five root causes of flaky browser automation:
- Race conditions between test actions and application responses
- Hardcoded waits that guess at timing instead of waiting for specific conditions
- Shared state between tests - databases, cookies, browser storage
- DOM rendering delays from async frameworks like React or Next.js
- Unstable CSS selectors that break when markup changes
Most teams spend more time chasing flake than they spend on the feature work the tests are supposed to protect.
The Accessibility Layer Alternative
Desktop automation through the macOS Accessibility API (AXUIElement) sidesteps most of these problems. Instead of querying the DOM, the automation reads the accessibility tree - a semantic representation of what is on screen. Buttons are identified as buttons with labels. Text fields are identified as text fields. The underlying HTML structure is irrelevant.
This means a frontend refactor that changes the underlying HTML but keeps the same visible UI does not break the automation. The accessibility tree still shows the same buttons, text fields, and labels because the semantic meaning has not changed - only the implementation details have.
Here is a concrete comparison. Finding a submit button:
DOM-based (fragile):
// Breaks if class name changes, element moves in the hierarchy,
// or the button is wrapped in a new container
await page.click('.checkout-form .submit-btn.primary');
Accessibility-based (resilient):
// Works as long as there is a button labeled "Submit" that is interactable
// Does not care about class names, DOM depth, or HTML structure
await agent.click({ role: 'button', label: 'Submit' });
The accessibility query will still find the button after a complete UI redesign - as long as the visual button labeled "Submit" still exists and is enabled. That is the resilience that matters for production automation.
When Accessibility Automation Applies
The accessibility layer works best for:
- Automating native macOS applications where you have no DOM access at all
- Cross-application workflows that span browser, Finder, and native apps
- Long-running automations where UI changes would break fragile selectors
- Agents that need to understand UI semantics rather than just simulate clicks
It is less suited for:
- Fine-grained DOM manipulation that requires JavaScript execution
- Testing specific HTML structure as part of a contract (e.g., verifying a specific aria-label was applied)
- Browsers with unusual accessibility tree implementations
For most day-to-day desktop automation - filling forms, clicking buttons, reading text from applications - the accessibility layer is more reliable than DOM selectors.
Open Source Matters Here
Open source desktop automation frameworks let you inspect exactly how element detection works, understand the fallback behavior when elements are not found, and contribute fixes for edge cases specific to your environment. No vendor lock-in, no waiting for a commercial tool to fix a bug that blocks your workflow.
The Fazm approach uses the macOS accessibility API for all desktop interactions. The element matching logic is in the open repository - you can see exactly how it resolves ambiguous element lookups, handles disabled elements, and deals with accessibility trees that change mid-action.
The Practical Trade-Off
You give up the precision of DOM-level interaction for the resilience of semantic-level interaction. You cannot use CSS selectors to distinguish two visually identical buttons in different parts of the DOM - you need to use context (which container they are in, what precedes them) to disambiguate.
For most automation tasks, that is a trade worth making. The reduction in flake rate is significant. An automation suite that had a 15-20% flake rate on DOM selectors will typically drop below 5% when converted to accessibility-based targeting - because the majority of flake comes from selector instability and timing issues around DOM mutations, both of which the accessibility layer avoids.
This post was inspired by a discussion on r/AI_Agents.
Fazm is an open source macOS AI agent. Open source on GitHub.