What a 37% UI Automation Success Rate Teaches About Building Reliable Desktop Agents
What a 37% UI Automation Success Rate Teaches You
When we first wired up Fazm's UI automation, the success rate was somewhere around 40%. Four out of ten clicks missed their target. Text ended up in the wrong field. Buttons got clicked before they were fully rendered.
It was humbling. And extremely educational.
The Failure Taxonomy
Most failures fell into a few categories:
- Coordinate misalignment. macOS accessibility APIs report element frames as top-left origin rectangles. If you click the top-left corner of a button, you are clicking the border - or worse, the element above it. Switching to center-point clicking fixed a huge chunk of failures.
- Lazy-loading races. Modern apps load content progressively. The agent would see a button in the accessibility tree, click it, and hit a loading spinner instead because the actual interactive element had not rendered yet.
- Scroll position drift. The agent would locate an element, scroll to reveal it, then click - but the scroll animation shifted the element's position between the locate and click steps.
- Stale tree references. The accessibility tree is a snapshot. Between reading the tree and acting on it, the UI might have changed completely.
The Fix - Post-Action Traversal
The single biggest improvement was adding post-action accessibility tree traversal. After every click, type, or scroll action, Fazm re-reads the accessibility tree and compares the new state against the expected state.
This does three things:
- Detects misclicks immediately rather than letting errors compound.
- Provides retry signals - if the expected UI state did not appear, the agent can try again with adjusted coordinates.
- Builds a ground truth dataset of what worked and what did not, which feeds back into improving the action pipeline.
This single pattern took us from around 40% to 85-90% success rate. The remaining 10-15% is mostly apps with non-standard UI frameworks that do not expose clean accessibility trees.
- Accessibility API vs Screenshot Computer Control
- Avoid Fragile Automations With the Accessibility Tree
- AI Agent Self-Report Trap - Screenshot Verification
Fazm is an open source macOS AI agent. Open source on GitHub.