How to Avoid Fragile Automations - Stop Using Screenshots and Coordinates
How to Avoid Fragile Automations - Stop Using Screenshots and Coordinates
You build an automation that clicks a button at coordinates (450, 320). It works perfectly. Then the app updates its UI, the button moves 30 pixels down, and your automation clicks empty space. You fix the coordinates. Next month it breaks again.
This is the fundamental problem with pixel and coordinate-based automation. It targets where things are on screen rather than what they are. Every visual change - a font size update, a new sidebar, a different screen resolution - breaks everything.
The Accessibility Tree Approach
The accessibility tree represents UI elements by their identity, not their position. A "Submit" button is a button element with the label "Submit" regardless of where it appears on screen. An email field is a text input with a specific role and label whether it is at the top of the form or the bottom.
When you target elements through the accessibility tree, your automation says "click the button labeled Submit" instead of "click at position (450, 320)." The app can completely redesign its layout and your automation still works because the logical element still exists.
DOM Works the Same Way for Web
For browser-based tasks, targeting DOM elements with CSS selectors or ARIA labels provides the same resilience as the accessibility tree does for native apps. A selector like button[aria-label="Submit"] survives layout changes that would break coordinate-based clicks.
When Coordinates Still Make Sense
Canvas-based applications, games, and some design tools do not expose their elements through accessibility APIs. For those specific cases, screenshot analysis with coordinate targeting is the only option. But for standard business applications - which is where most automation value lives - the accessibility tree is strictly better.
The Practical Impact
Automations built on accessibility targeting require almost zero maintenance after initial setup. Automations built on coordinates require constant fixing. Over six months, the maintenance cost of fragile automations exceeds the time you saved by building them. Build on the right foundation from the start.
Fazm is an open source macOS AI agent. Open source on GitHub.