File Access Is Just the Beginning for Desktop Agents
Most desktop agents start with file access. They can read your documents, list your folders, maybe move things around. It feels like progress because the agent is finally touching your local machine instead of living entirely in the cloud. But reading files is the simplest thing a desktop agent can do.
The interesting question is what comes after file access. And the answer is app control.
From Files to Applications
Think about what you actually do with files. You don't just read them - you open them in specific applications, edit them with specialized tools, and share them through particular workflows. A PDF isn't useful because it exists on your disk. It's useful because you can open it in Preview, annotate it, and send it through Mail.
An agent that can access your files but can't open them in the right app, can't interact with that app's interface, and can't chain the result into another app is doing maybe 10% of what's needed. The other 90% is application control.
The Accessibility Tree Makes It Possible
macOS exposes every running application's interface through accessibility APIs. Buttons, text fields, menus, sliders, tables - they're all queryable and actionable programmatically. This means an agent can read what's on screen without taking screenshots, find specific UI elements without pixel matching, and interact with them without simulating mouse movements.
This is how you go from "I can see your files" to "I found the contract in your Downloads folder, opened it in Preview, highlighted the payment terms, and attached it to the email draft I created in Mail." That chain of actions across three applications is where desktop agents become genuinely useful.
File access is the first rung. Application control is the ladder.
Fazm is an open source macOS AI agent. Open source on GitHub.