Desktop Agents / Autonomy Model

macOS Desktop Agent Autonomy, Without Approval Fatigue

Most "autonomous" Mac agents aren't. They stop and ask you to approve every click, or they run headless and hope nothing breaks. Fazm takes a third route: every action lands in a local SQLite table the instant it runs, along with the exact SQL needed to undo it. You dismiss what you don't want. This page walks through that contract, end to end, with the file paths you can audit.

Matthew Diakonov, Fazm

Published April 18, 20269 min read

4.9from open source, auditable

Accessibility APIs, not screenshots

$10 built-in spend cap

SQL-level rollback on every action

Autonomy with an undo button

How Fazm's agent runs on your Mac

Read the Accessibility tree, not pixels.

Execute AppleScript and native calls.

Log each action to observer_activity.

Dismiss to run rollback_operations.

$10 cap, then your API key takes over.

0:00 / 0:07

The autonomy spectrum, honestly labelled

"Autonomous" is doing a lot of work in product copy right now. Under the hood, every Mac-capable agent picks one of three execution contracts. They are not equivalent and they feel very different at the keyboard.

Approval-gate agents block on each tool call and wait for your click. Fire-and-forget agents run the whole plan and surface a log afterward. Fazm's model, auto-accept with rollback, commits each action immediately but records the inverse so you can reach back and cancel any step you disagree with.

Three execution contracts, side by side

Feature	Approval-gate agents	Fazm
Do I approve every click?	Yes, modal per action	No, cards appear post-action
What if the agent gets it wrong?	You caught it before it ran	Dismiss card, rollback SQL runs
Element targeting	Screenshot + vision model	Accessibility tree + AppleScript
Cost ceiling	Usually none, bill later	$10 built-in, then your own key
Multi-session concurrency	Usually blocks all sessions	Per-session lock via sendingSessionKeys
Stale OS permission handling	Silent failure after macOS update	isAccessibilityBroken flag surfaces it

How a single action flows through the agent

When you tell Fazm "archive every Notion page older than 90 days" or "reply to this email with the number from the spreadsheet," the request walks through four stages. None of them look like the screenshot-paint-click loop that most computer-use agents run.

Inputs, the agent loop, and the rollback log

The anchor fact: observer_activity is a rollback log, not a queue

If you open the Fazm desktop source and jump to Desktop/Sources/Providers/ChatProvider.swift, lines 3200 through 3485 describe a polling loop called pollChatObserverCards(). It does not wait on your click. It fetches rows from a local table called observer_activity, and in the same transaction it updates them to status 'acted' with userResponse 'approve' before the card is even shown to you.

That is the moment where autonomy actually happens. The approval is a default, not a prompt. What makes the pattern safe instead of reckless is the neighbouring column, rollback_operations, which stores the SQL statements needed to undo the action. When you tap dismiss on a card, rollbackChatObserverOperations() (same file, line 3435) reads those statements and runs them.

ChatProvider.swift (excerpt, line ~3200)

Numbers that make the model concrete

$0USD built-in spend cap

0messages held in memory

0sseconds between pollForNewMessages

0approval modals per action

These are not marketing numbers. They are the constants builtinCostCapUsd, maxMessagesInMemory, and the poll interval inside pollForNewMessages(), all in ChatProvider.swift.

Built-in cost cap

$0 of autonomy on us, then your key takes over.

Runaway agent loops are a billing problem first and a safety problem second. Fazm ships with a bundled Anthropic key you can hammer until the cumulative spend on your machine hits builtinCostCapUsd = 10.0. At that point the client flips to your personal OAuth or API key without silently eating cost. Read the switch logic in ChatProvider.swift around line 2272.

Why Accessibility APIs, not screenshots

Screenshot agents re-discover the UI on every turn. The model stares at a PNG, guesses where the Submit button is, emits pixel coordinates, clicks. If the window moved between frames, it misses. If the button re-styled on hover, it misses. The resulting "oh sorry, let me try again" loop is the main reason approval gates exist in that family of products.

Fazm reads the macOS Accessibility tree directly. That tree is the same data VoiceOver uses to narrate a screen. It contains structured, addressable elements with roles, values, and children. When Fazm says "click the Send button in Messages," it is calling AX to locate the element by role and label, then handing off to NSAppleScript at AppState.swift line 1005 to act on it. Pixel position is irrelevant.

~/Library/Application Support/Fazm/app.sqlite

The agent loop, in four stages

Capture

ScreenCaptureManager.swift calls CGWindowListCreateImage() against the focused app's PID, falling back to CGDisplayCreateImage on the main display. The Accessibility tree for that window is queried in parallel.

Reason

sendMessage() in ChatProvider.swift packs the AX tree, the screen capture, and your prompt, then calls the model. The bridgeMode is either 'builtin' (Fazm's bundled key) or 'personal' (your own), gated by builtinCostCapUsd.

Act

The model returns a tool call. ChatToolExecutor.swift runs it: NSAppleScript for cross-app control, SQL for local state, Playwright MCP for browser work. Before the card renders, the action is already live.

Record and expose undo

pollChatObserverCards() writes a row to observer_activity with status 'acted' and a rollback_operations payload. That row is what surfaces in your chat as a card. If you dismiss it, rollbackChatObserverOperations() runs the undo.

What this model gets right, and where it bites

Feels fast because it is

No per-action modal means a 12-step task is 12 steps, not 12 steps plus 12 clicks. The latency floor is the tool call plus the rollback write, not your reaction time.

Undo is state, not a hope

rollback_operations is deterministic SQL. It is not 'the agent tries to remember what it did.' The undo is written at the same instant as the do.

Cost is bounded by default

builtinCostCapUsd is a constant, not a toggle. You cannot accidentally spend $400 on a runaway loop before noticing.

Irreversible actions still need care

A SQL rollback cannot un-send an email or un-delete a row on a remote server. For those, Fazm still falls back to confirmation. The model is auto-accept for local state, not for the whole internet.

You must read the card feed

The tradeoff of zero approval modals is that you are skimming a post-hoc activity feed. If you ignore it entirely, you lose the leverage that made the model safe.

Stale permissions are loud now

After a macOS update, TCC sometimes revokes Screen Recording or Accessibility without telling the app. Fazm surfaces this via isScreenRecordingStale and isAccessibilityBroken so the agent does not silently degrade.

What ships with the Mac, not the model

The autonomy model depends on macOS primitives you can name. Fazm does not reinvent them, it composes them.

CGWindowListCreateImageNSAppleScriptAXUIElementNSWorkspaceTCC (Screen Recording)TCC (Accessibility)GRDB (SQLite)Playwright MCPCGDisplayCreateImage

Try the execute-first model on your Mac

Fazm is free to start. First $10 of Claude usage on us, then bring your own key. No approval modals, no screenshot guesswork, full rollback log you can inspect.

Download Fazm for macOS →

Frequently asked questions

What does 'macOS desktop agent autonomy' actually mean in 2026?

It means an AI agent runs on your Mac, decides which apps to touch, and executes multi-step tasks without you approving every individual click. The hard part is not the deciding, it is the execution contract: does the agent pause on every action, batch them, or commit first and let you roll back. Fazm takes the third approach.

How is Fazm different from Claude Computer Use or OpenAI Operator?

Computer Use and Operator are screenshot-first. They take a picture of your screen, send it to a vision model, and get back pixel coordinates to click. Fazm reads the real Accessibility tree on macOS and calls AppleScript and native APIs to act, so element selection does not depend on the model re-recognizing a UI from an image each turn.

Does Fazm ask me to approve every action?

No. Every tool call lands in a local SQLite table called observer_activity with status 'acted' and userResponse 'approve'. If you disagree, you tap dismiss and it runs the rollback_operations field, which is SQL that undoes whatever state the agent changed. The code for this is pollChatObserverCards() and rollbackChatObserverOperations() in ChatProvider.swift.

What stops it from burning through Claude credits?

A constant named builtinCostCapUsd set to $10.0 in ChatProvider.swift. When the cumulative spend on the bundled Anthropic key crosses $10, the app auto-switches to your personal API key (or OAuth-linked Claude account) instead of silently eating cost.

What macOS permissions do I need to grant?

Screen Recording (for CGWindowListCreateImage to capture individual app windows), Accessibility (for AX queries and keystroke injection), and App Management for installer paths. Fazm detects stale permissions after macOS updates via isScreenRecordingStale and isAccessibilityBroken flags so it tells you when the system quietly revoked a grant.

Is this open source? Can I audit the rollback code myself?

Yes. The Fazm desktop client is open source. The files cited in this page are at Desktop/Sources/Providers/ChatProvider.swift, AppState.swift, and FloatingControlBar/ScreenCaptureManager.swift in the Fazm repository.

What happens when Fazm gets something wrong mid-task?

Because every action is logged before you see it, the recovery model is read-then-undo rather than stop-and-prompt. You dismiss the activity card, rollback_operations runs the inverse SQL, and the agent loop ('sendMessage' in ChatProvider.swift) dequeues the next pending message. The retry is user-initiated, not model-initiated.

Read the source

Verify every claim on this page

The file paths cited throughout this guide are real. If you clone the Fazm desktop repo, jump to Desktop/Sources/Providers/ChatProvider.swift and search for builtinCostCapUsd, pollChatObserverCards, and rollbackChatObserverOperations. All three should be in the same file. That triangulation is the whole autonomy contract.

Get Fazm