The Automation Decision Tree - API First, Accessibility API Second, Skip Everything Else
The Automation Decision Tree
Here is the most important lesson from building a desktop AI agent: if it has an API, use the API. If it only has a GUI, use the accessibility API. If neither works, do not automate it.
This sounds obvious. It is not. The temptation to make the agent "do everything through the screen like a human" is strong. It demos well. It fails in production.
Why APIs Come First
When a service exposes an API, you get:
- Structured responses. You know exactly what happened. No parsing screenshots or guessing from visual changes.
- Idempotent operations. You can retry safely. Clicking a "Submit" button twice might submit two orders. Calling an API endpoint twice with the same idempotency key does not.
- Speed. An API call takes milliseconds. A GUI interaction - locate element, scroll into view, click, wait for response, verify - takes seconds.
Fazm checks for available MCP servers and API integrations before falling back to GUI automation. If Slack has an API (it does), the agent sends a message through the API, not by opening the Slack app and typing into the message box.
When to Use the Accessibility API
Some apps have no API. Proprietary desktop software, legacy tools, apps that never bothered with integrations. For these, the macOS accessibility API gives you a structured representation of the UI - buttons, text fields, labels, hierarchies - without relying on pixel matching or screenshots.
This is slower and less reliable than an API, but far more reliable than screenshot-based automation.
When to Skip Entirely
Some things cannot be reliably automated through any available interface. Apps with custom rendering engines that do not expose accessibility data. Workflows that require visual judgment the LLM cannot make reliably. Tasks where the cost of a mistake exceeds the cost of doing it manually.
The hardest skill in building an automation agent is knowing when not to automate.
- Native OS APIs vs Terminal for Desktop Agents
- Accessibility API vs OCR for Desktop Agents
- MCP Servers Beyond Chat - Desktop Automation
Fazm is an open source macOS AI agent. Open source on GitHub.