Desktop Agents Need Native OS APIs, Not Just Terminal Commands
Desktop Agents Need Native OS APIs, Not Just Terminal Commands
Most AI agents start with terminal access. Give the model a shell and it can run commands, edit files, manage git repos, deploy code. That is genuinely useful - but it is a small slice of what people actually do on their computers.
Try booking a flight from the terminal. Try filling out a form in Salesforce. Try moving tasks between columns in a Notion board. The vast majority of knowledge work happens in GUI applications that have no CLI equivalent.
What Accessibility APIs Unlock
On macOS, the Accessibility framework (AXUIElement) exposes every interactive element in every application. Buttons, text fields, dropdowns, checkboxes, menu items, table rows - all with their positions, labels, roles, and current states.
This means a desktop agent can:
- Click the "Send" button in any email client
- Read the text in any text field in any application
- Navigate menus and select specific items
- Fill out forms across any app, not just web browsers
The critical difference from terminal commands is that this works with applications the way users actually use them. You do not need the app to have an API or a CLI. If it has a GUI, the accessibility tree exposes it.
Why Not Just Automate the Browser
Browser automation covers web apps, but many professional tools are native applications - Xcode, Figma desktop, Slack, Excel, Adobe Creative Suite. These have no browser equivalent, or the browser version is significantly limited compared to the native app.
Even for web apps, native accessibility APIs can be more reliable than browser automation. You do not need to deal with iframes, shadow DOM, or dynamic class names. The accessibility tree gives you semantic labels that are stable across updates.
The Combination
The most capable desktop agents use all three layers: terminal commands for system operations and developer tools, accessibility APIs for interacting with native applications, and browser automation for web-specific tasks. Each layer covers gaps the others cannot reach.
Fazm is an open source macOS AI agent. Open source on GitHub.