Back to Blog

Most AI Agents Are Stuck in Terminal and Browser - Native App Control Is the Gap

Fazm Team··2 min read
terminalbrowsernative-appsaccessibility-apigap

The local AI stack has gotten impressive. You can run powerful models on a MacBook with Ollama. You can build agents that write code, search the web, and chain API calls. But try asking one of these agents to rename a batch of files in Finder, update a cell in Numbers, or move an email to a specific folder in Mail. It can't.

Almost every AI agent today lives in two places: the terminal and the browser. That covers developers pretty well. It covers everyone else poorly.

The Native App Blind Spot

Think about the applications most people use daily - Mail, Calendar, Finder, Preview, Notes, Messages, Excel, PowerPoint, Slack, Zoom. These are GUI applications with complex interfaces that have no command-line equivalent. You can't pipe Keynote through grep.

This is a massive gap in the current AI agent landscape. The models are smart enough to handle complex tasks. The local inference is fast enough to feel responsive. But the connection between the model and the actual applications people use is missing.

Accessibility APIs Are the Bridge

macOS has a built-in solution that most agent developers are ignoring. The Accessibility API exposes every UI element in every running application - buttons, menus, text fields, tables, checkboxes. An agent that interfaces with this API can control any native application the same way assistive technologies do.

This means clicking a specific button in Figma, reading the content of a cell in Numbers, or dragging a file between Finder windows. All programmatic, all reliable, no screenshots needed.

The next evolution isn't better models or faster inference. It's connecting those models to the applications where real work happens. Terminal and browser are solved. Native apps are the frontier.

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts