File Access Is Just the Beginning for Desktop Agents

Matthew Diakonov

Updated March 19, 2026

file-access desktop-agent app-control accessibility evolution

Most desktop agents start with file access. They can read your documents, list your folders, maybe move things around. It feels like progress because the agent is finally touching your local machine instead of living entirely in the cloud. But reading files is the simplest thing a desktop agent can do.

The interesting question is what comes after file access. And the answer is app control.

From Files to Applications

Think about what you actually do with files. You don't just read them - you open them in specific applications, edit them with specialized tools, and share them through particular workflows. A PDF isn't useful because it exists on your disk. It's useful because you can open it in Preview, annotate it, and send it through Mail.

An agent that can access your files but can't open them in the right app, can't interact with that app's interface, and can't chain the result into another app is doing maybe 10% of what's needed. The other 90% is application control.

The Accessibility Tree Makes It Possible

macOS exposes every running application's interface through accessibility APIs. Buttons, text fields, menus, sliders, tables - they're all queryable and actionable programmatically. This means an agent can read what's on screen without taking screenshots, find specific UI elements without pixel matching, and interact with them without simulating mouse movements.

This is how you go from "I can see your files" to "I found the contract in your Downloads folder, opened it in Preview, highlighted the payment terms, and attached it to the email draft I created in Mail." That chain of actions across three applications is where desktop agents become genuinely useful.

File access is the first rung. Application control is the ladder.

Fazm is an open source macOS AI agent. Open source on GitHub.

AI Assistants That Control Your Apps vs Ones That Just Chat About Them

Voice plus file support is solid. But actually controlling your apps through the accessibility layer - clicking buttons, filling forms, navigating menus

Mar 17, 2026

Desktop Agents Need Native OS APIs, Not Just Terminal Commands

A CLI is useful but the real unlock for desktop agents is accessibility APIs that let you interact with any app's actual UI - buttons, text fields, menus

Mar 17, 2026

AI Agents vs Copilot: When to Let AI Drive vs Ride Shotgun

Desktop agents, coding agents, and workflow agents all work differently from copilots. Compare autonomy, cost, accuracy, and real use cases to pick the right tool.

Apr 13, 2026

From Files to Applications

The Accessibility Tree Makes It Possible

Related Posts

AI Assistants That Control Your Apps vs Ones That Just Chat About Them

Desktop Agents Need Native OS APIs, Not Just Terminal Commands

AI Agents vs Copilot: When to Let AI Drive vs Ride Shotgun