macOS Desktop Agent Autonomy, Without Approval Fatigue
Most "autonomous" Mac agents aren't. They stop and ask you to approve every click, or they run headless and hope nothing breaks. Fazm takes a third route: every action lands in a local SQLite table the instant it runs, along with the exact SQL needed to undo it. You dismiss what you don't want. This page walks through that contract, end to end, with the file paths you can audit.
The autonomy spectrum, honestly labelled
"Autonomous" is doing a lot of work in product copy right now. Under the hood, every Mac-capable agent picks one of three execution contracts. They are not equivalent and they feel very different at the keyboard.
Approval-gate agents block on each tool call and wait for your click. Fire-and-forget agents run the whole plan and surface a log afterward. Fazm's model, auto-accept with rollback, commits each action immediately but records the inverse so you can reach back and cancel any step you disagree with.
Three execution contracts, side by side
| Feature | Approval-gate agents | Fazm |
|---|---|---|
| Do I approve every click? | Yes, modal per action | No, cards appear post-action |
| What if the agent gets it wrong? | You caught it before it ran | Dismiss card, rollback SQL runs |
| Element targeting | Screenshot + vision model | Accessibility tree + AppleScript |
| Cost ceiling | Usually none, bill later | $10 built-in, then your own key |
| Multi-session concurrency | Usually blocks all sessions | Per-session lock via sendingSessionKeys |
| Stale OS permission handling | Silent failure after macOS update | isAccessibilityBroken flag surfaces it |
How a single action flows through the agent
When you tell Fazm "archive every Notion page older than 90 days" or "reply to this email with the number from the spreadsheet," the request walks through four stages. None of them look like the screenshot-paint-click loop that most computer-use agents run.
Inputs, the agent loop, and the rollback log
The anchor fact: observer_activity is a rollback log, not a queue
If you open the Fazm desktop source and jump to Desktop/Sources/Providers/ChatProvider.swift, lines 3200 through 3485 describe a polling loop called pollChatObserverCards(). It does not wait on your click. It fetches rows from a local table called observer_activity, and in the same transaction it updates them to status 'acted' with userResponse 'approve' before the card is even shown to you.
That is the moment where autonomy actually happens. The approval is a default, not a prompt. What makes the pattern safe instead of reckless is the neighbouring column, rollback_operations, which stores the SQL statements needed to undo the action. When you tap dismiss on a card, rollbackChatObserverOperations() (same file, line 3435) reads those statements and runs them.
Numbers that make the model concrete
These are not marketing numbers. They are the constants builtinCostCapUsd, maxMessagesInMemory, and the poll interval inside pollForNewMessages(), all in ChatProvider.swift.
Built-in cost cap
$0 of autonomy on us, then your key takes over.
Runaway agent loops are a billing problem first and a safety problem second. Fazm ships with a bundled Anthropic key you can hammer until the cumulative spend on your machine hits builtinCostCapUsd = 10.0. At that point the client flips to your personal OAuth or API key without silently eating cost. Read the switch logic in ChatProvider.swift around line 2272.
Why Accessibility APIs, not screenshots
Screenshot agents re-discover the UI on every turn. The model stares at a PNG, guesses where the Submit button is, emits pixel coordinates, clicks. If the window moved between frames, it misses. If the button re-styled on hover, it misses. The resulting "oh sorry, let me try again" loop is the main reason approval gates exist in that family of products.
Fazm reads the macOS Accessibility tree directly. That tree is the same data VoiceOver uses to narrate a screen. It contains structured, addressable elements with roles, values, and children. When Fazm says "click the Send button in Messages," it is calling AX to locate the element by role and label, then handing off to NSAppleScript at AppState.swift line 1005 to act on it. Pixel position is irrelevant.
The agent loop, in four stages
Capture
ScreenCaptureManager.swift calls CGWindowListCreateImage() against the focused app's PID, falling back to CGDisplayCreateImage on the main display. The Accessibility tree for that window is queried in parallel.
Reason
sendMessage() in ChatProvider.swift packs the AX tree, the screen capture, and your prompt, then calls the model. The bridgeMode is either 'builtin' (Fazm's bundled key) or 'personal' (your own), gated by builtinCostCapUsd.
Act
The model returns a tool call. ChatToolExecutor.swift runs it: NSAppleScript for cross-app control, SQL for local state, Playwright MCP for browser work. Before the card renders, the action is already live.
Record and expose undo
pollChatObserverCards() writes a row to observer_activity with status 'acted' and a rollback_operations payload. That row is what surfaces in your chat as a card. If you dismiss it, rollbackChatObserverOperations() runs the undo.
What this model gets right, and where it bites
Feels fast because it is
No per-action modal means a 12-step task is 12 steps, not 12 steps plus 12 clicks. The latency floor is the tool call plus the rollback write, not your reaction time.
Undo is state, not a hope
rollback_operations is deterministic SQL. It is not 'the agent tries to remember what it did.' The undo is written at the same instant as the do.
Cost is bounded by default
builtinCostCapUsd is a constant, not a toggle. You cannot accidentally spend $400 on a runaway loop before noticing.
Irreversible actions still need care
A SQL rollback cannot un-send an email or un-delete a row on a remote server. For those, Fazm still falls back to confirmation. The model is auto-accept for local state, not for the whole internet.
You must read the card feed
The tradeoff of zero approval modals is that you are skimming a post-hoc activity feed. If you ignore it entirely, you lose the leverage that made the model safe.
Stale permissions are loud now
After a macOS update, TCC sometimes revokes Screen Recording or Accessibility without telling the app. Fazm surfaces this via isScreenRecordingStale and isAccessibilityBroken so the agent does not silently degrade.
What ships with the Mac, not the model
The autonomy model depends on macOS primitives you can name. Fazm does not reinvent them, it composes them.
Try the execute-first model on your Mac
Fazm is free to start. First $10 of Claude usage on us, then bring your own key. No approval modals, no screenshot guesswork, full rollback log you can inspect.
Download Fazm for macOS →Frequently asked questions
What does 'macOS desktop agent autonomy' actually mean in 2026?
It means an AI agent runs on your Mac, decides which apps to touch, and executes multi-step tasks without you approving every individual click. The hard part is not the deciding, it is the execution contract: does the agent pause on every action, batch them, or commit first and let you roll back. Fazm takes the third approach.
How is Fazm different from Claude Computer Use or OpenAI Operator?
Computer Use and Operator are screenshot-first. They take a picture of your screen, send it to a vision model, and get back pixel coordinates to click. Fazm reads the real Accessibility tree on macOS and calls AppleScript and native APIs to act, so element selection does not depend on the model re-recognizing a UI from an image each turn.
Does Fazm ask me to approve every action?
No. Every tool call lands in a local SQLite table called observer_activity with status 'acted' and userResponse 'approve'. If you disagree, you tap dismiss and it runs the rollback_operations field, which is SQL that undoes whatever state the agent changed. The code for this is pollChatObserverCards() and rollbackChatObserverOperations() in ChatProvider.swift.
What stops it from burning through Claude credits?
A constant named builtinCostCapUsd set to $10.0 in ChatProvider.swift. When the cumulative spend on the bundled Anthropic key crosses $10, the app auto-switches to your personal API key (or OAuth-linked Claude account) instead of silently eating cost.
What macOS permissions do I need to grant?
Screen Recording (for CGWindowListCreateImage to capture individual app windows), Accessibility (for AX queries and keystroke injection), and App Management for installer paths. Fazm detects stale permissions after macOS updates via isScreenRecordingStale and isAccessibilityBroken flags so it tells you when the system quietly revoked a grant.
Is this open source? Can I audit the rollback code myself?
Yes. The Fazm desktop client is open source. The files cited in this page are at Desktop/Sources/Providers/ChatProvider.swift, AppState.swift, and FloatingControlBar/ScreenCaptureManager.swift in the Fazm repository.
What happens when Fazm gets something wrong mid-task?
Because every action is logged before you see it, the recovery model is read-then-undo rather than stop-and-prompt. You dismiss the activity card, rollback_operations runs the inverse SQL, and the agent loop ('sendMessage' in ChatProvider.swift) dequeues the next pending message. The retry is user-initiated, not model-initiated.
Read the source
Verify every claim on this page
The file paths cited throughout this guide are real. If you clone the Fazm desktop repo, jump to Desktop/Sources/Providers/ChatProvider.swift and search for builtinCostCapUsd, pollChatObserverCards, and rollbackChatObserverOperations. All three should be in the same file. That triangulation is the whole autonomy contract.
Get Fazm
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.