Desktop AI agentmacOS accessibilityProduction readiness

Desktop AI agent, beyond demos: the macOS edge cases a shipping Mac agent has to ship code for

Every demo of a desktop AI agent runs on a freshly granted permission, a compliant app, and a network that never stalls. Every real install eventually hits the opposite of each of those three conditions. This guide walks through the exact code Fazm ships so the opposite conditions do not surface as bugs: a Finder fallback AX probe, a CGEvent tap TCC probe, and a three-tier tool timeout watchdog. All three are in the public source tree.

F
Fazm
11 min read
4.9from 200+ Mac users
Every claim anchored to a specific line in the public Fazm source
Covers AXIsProcessTrusted stale cache, cannotComplete disambiguation, and MCP tool hang recovery
Written for someone who is evaluating a shipping Mac agent, not watching a screen recording

The demo-to-reality gap, by the numbers

A demo has one app, one permission grant, one tool call at a time. A real Mac agent installation has to survive hundreds of permission revalidations per day across dozens of apps, each with its own AX compliance quirks, and a long tail of tool calls that can hang on a flaky network.

0Second poll interval for permission health
0Retries before declaring permission broken
0sTimeout tier for internal tools
0sTimeout tier for MCP tools

These four numbers are not marketing. They are the actual values at AppState.swift:308 and acp-bridge/src/index.ts:77-78.

Two probes, one decision

0 independent signals before 0 permission flip

Fazm does not trust a single AX failure to declare permission broken. It runs a Finder AX probe and, if Finder is not running, a CGEvent tap TCC probe. Only when both signals agree does it flip isAccessibilityBroken. This is why users do not see spurious recovery prompts when they briefly focus an Electron app with a flaky AX tree.

What actually breaks a demo in the wild

Three failure modes account for almost all user-visible desktop agent regressions after a product ships. None of them appear in a conference demo because a conference demo runs for 90 seconds on a freshly provisioned machine. Toggle the view below to see the difference.

Demo vs reality

Fresh macOS install, permission granted 30 seconds ago. One app open, compliant AX tree, immediate response. Tool calls return in under a second. The screen recording looks magical.

  • AXIsProcessTrusted() returns true on first call
  • One app in focus, known to have a clean AX tree
  • Tool calls hit cached state, return instantly
  • No macOS update, no app re-sign, no TCC churn

Anchor fact: the Finder fallback probe

The whole demo-to-reality gap begins with AXError.cannotComplete. The naive read is that it means permission is broken. The real read is that it could mean a dozen things. Fazm ships a disambiguator. It is 19 lines long and lives in the public source tree at Desktop/Sources/AppState.swift, starting at line 468. The shape is a second AX call against com.apple.finder, a known-good AX-compliant app. If Finder also fails, permission is really broken. If Finder succeeds, the original failure was app-specific.

2 probes

AXError.cannotComplete confirmed by Finder — permission is truly stuck

AppState.swift:474 (log line emitted when both probes fail)

Desktop/Sources/AppState.swift

The else-branch at the bottom is the interesting one. If Finder happens to not be running, Fazm does not give up. It falls through to a second, independent TCC probe that checks the live permission database directly.

The second probe: a listen-only CGEvent tap

AppState.swift:490 handles the case where AXIsProcessTrusted() lies, which is the actual failure mode on macOS 26 Tahoe after an OS update or an app re-sign. The trick is that CGEvent.tapCreate checks the live TCC database every call, instead of the per-process cache that AXIsProcessTrusted() reads. A successful tap creation means permission is real, regardless of what the cache says. The tap is created in listen-only mode and invalidated immediately so it does not consume system resources.

Desktop/Sources/AppState.swift

This pattern is not in any desktop-agent marketing page. It is a small block of code that absorbs a large amount of user pain, specifically the pain that ships alongside every macOS point update.

How the two probes cooperate on a live action

The sequence below runs every time the agent action path returns AXError.cannotComplete. It is the difference between a product that silently disables itself after a macOS update and one that keeps working.

1

Action fires against the frontmost app

Swift-side AppState requests kAXFocusedWindowAttribute on AXUIElementCreateApplication(frontApp.pid). On a healthy permission state this returns in under a millisecond and the agent loop continues.

2

AXError.cannotComplete surfaces

The call returns .cannotComplete. This could mean the TCC permission is broken, or it could mean the current app happens to have a partial or non-compliant accessibility tree (Electron apps, Catalyst apps, older cross-platform frameworks).

3

Finder AX probe runs (AppState.swift:468)

Fazm pulls NSRunningApplication for com.apple.finder and tries the same kAXFocusedWindowAttribute call against it. Finder is Apple-shipped and AX-compliant, so success there means the original failure was app-specific and the agent keeps running.

4

CGEvent tap TCC probe as fallback (AppState.swift:490)

If Finder is not running, Fazm calls CGEvent.tapCreate with .listenOnly. A successful tap creation means the live TCC database grants permission, even if the per-process AXIsProcessTrusted() cache says otherwise. The tap is invalidated immediately after the check.

5

State flip and user-visible recovery (only if both probes fail)

Only when both probes agree that permission is stuck does Fazm flip isAccessibilityBroken to true and surface the recovery flow in the UI. This two-signal gate is why users do not see spurious permission prompts when a single app has a flaky AX tree.

End-to-end, what goes where

Three inputs on the left, one decision hub in the middle, three outputs on the right. The interesting thing is that all three inputs operate in parallel and only two signals need to agree for a decision.

Permission health decision path

AXIsProcessTrusted()
Finder AX probe
CGEvent tap probe
AppState.swift
Keep loop running
Flip isAccessibilityBroken
Surface recovery UI

See it on your own machine

Fazm ships with the Finder fallback probe and the CGEvent tap probe on by default. Install, grant accessibility, try revoking it in System Settings, and watch the recovery flow surface within 5 seconds.

Download Fazm

Tool hangs: the third failure mode

Accessibility permission is the loudest demo-to-reality failure mode, but tool hangs are the quietest one. A ToolSearch call to a local index should return in under a second. An MCP tool call to a browser or to Google Workspace might legitimately take tens of seconds. Collapsing both into one global timeout means either the fast tools hang too long on error or the slow tools get killed mid-session. Fazm splits them into three tiers.

acp-bridge/src/index.ts

When a tool exceeds its tier, the bridge synthesises a completed-with-error frame so the model can recover the loop and the Swift bridge can unblock. Users override the whole thing via FAZM_TOOL_TIMEOUT_SECONDS from Settings, Advanced, Tool Timeout.

What a recovery looks like in the log

A real recovery run, captured from /tmp/fazm-dev.log on a machine that had just taken a macOS 26.1 update and briefly lost its permission cache. All three probes fire, the CGEvent tap wins, and the loop keeps running.

fazm-dev.log

The side-by-side view

What a demo-grade desktop agent does versus what a shipping one has to do. Everything in the right column has a specific file and line number. Everything in the left column is what happens when that code is missing.

FeatureTypical demo-grade agentFazm
AXIsProcessTrusted() returns false after a macOS updateAgent silently stops working, user has no recourseCGEvent tap probe hits the live TCC database (AppState.swift:490)
Frontmost app returns AXError.cannotCompleteTreated as permission-broken, agent disables itselfDisambiguated via Finder AX probe (AppState.swift:468)
MCP tool call hangs indefinitelySession locks up, user has to force quitSynthetic error frame fires at 120s (acp-bridge/src/index.ts:78)
Internal tool takes too longSame 120s cap as everything else, slow to surfaceTight 10s cap for ToolSearch-class calls
Legitimately slow multi-step toolKilled at the global timeout mid-sessionDefault 300s ceiling, user override via env var
Primary input surface to the modelScreenshots (model reads pixels, 2-5s per action)Accessibility tree (structured text, sub-second per action)
Permission health pollingChecked once at startup, never revalidated5 second poll with up to 3 retries, 2-signal disambiguation

What the demos leave out

Demo-to-reality is not a model problem. It is a permission, probe, and timeout problem.

Every desktop agent launch in 2026 will pick the same small set of frontier Claude, GPT, and Gemini variants. Model choice is not the differentiator. The differentiator is what happens when AXIsProcessTrusted() returns false on a Tuesday afternoon, when a user focuses an Electron app with a half-built AX tree, and when a tool call to Google Workspace stalls for 90 seconds. Those three conditions account for most of the regressions between a shipped product and its launch demo.

Fazm answers them with three concrete blocks of code: a Finder fallback probe, a CGEvent tap probe, and a three-tier tool timeout watchdog. Each one is small. All three are in the public source. That is the shape of a desktop AI agent that survives past the demo.

Frequently asked questions

Why do most desktop AI agent demos not translate to real installs?

Demos run on a machine that was set up five minutes ago with fresh TCC permissions, a single reference app, and no macOS update in flight. Real installs live through OS updates, app re-signs, background apps that expose partial or broken accessibility trees, and network tool calls that hang. The demo version of a desktop agent does not need to handle AXError.cannotComplete, does not need to disambiguate stale AXIsProcessTrusted() responses, and does not need per-tool-class timeouts. The shipped version does. That is the entire gap between a compelling screen recording and a product that a non-technical user can leave running unattended.

What does AXIsProcessTrusted() actually lie about?

It caches the TCC (Transparency, Consent, and Control) decision on the first call and does not always invalidate that cache when macOS updates, when the app is re-signed, or when the user toggles the permission in System Settings. On macOS 26 Tahoe the per-process cache is particularly sticky. Fazm's comment at AppState.swift:308 calls it out explicitly: 'AXIsProcessTrusted() can return stale data after macOS updates or app re-signs'. The symptom is that the agent silently stops controlling apps even though System Settings shows the permission as granted. A naive agent just freezes. Fazm disambiguates using two independent signals (a Finder AX call and a CGEvent tap) before deciding the permission is really broken.

What is the Finder fallback probe and why does it exist?

confirmAccessibilityBrokenViaFinder at AppState.swift:468 takes a running Finder process, creates an AXUIElement for it, and asks for its focused window. Finder is a known-good AX-compliant app that Apple ships with the OS, so a call against it should always succeed when accessibility permission is truly granted. The probe exists because AXError.cannotComplete from an arbitrary app can mean two very different things: either permission is actually broken, or that particular app's accessibility tree is just non-compliant or temporarily unresponsive. If Finder also fails, Fazm logs 'AXError.cannotComplete confirmed by Finder, permission is truly stuck' and enters recovery. If Finder succeeds, Fazm treats the original error as app-specific and keeps running.

What happens if Finder is not running?

probeAccessibilityViaEventTap at AppState.swift:490 is the tie-breaker. It calls CGEvent.tapCreate with tap: .cgSessionEventTap and options: .listenOnly, listening for mouseMoved events. Event tap creation is special because it queries the live TCC database directly rather than the per-process cache that AXIsProcessTrusted() uses. If the tap is created successfully, permission is real, even if AXIsProcessTrusted() currently says otherwise. The tap is invalidated immediately after the check so it does not consume system resources. That two-signal design is how Fazm survives the stale-cache case on macOS 26 Tahoe where AXIsProcessTrusted() continues returning false for minutes after the user grants the permission in System Settings.

How does Fazm keep MCP tool calls from hanging forever?

It runs a tool timeout watchdog in the ACP bridge, at acp-bridge/src/index.ts lines 72 to 101. Three tiers: TOOL_TIMEOUT_INTERNAL_MS = 10_000 for internal tools like ToolSearch, TOOL_TIMEOUT_MCP_MS = 120_000 for any tool whose name starts with 'mcp__' (macos-use, whatsapp, google-workspace, playwright, and any user-added servers), and TOOL_TIMEOUT_DEFAULT_MS = 300_000 for the long tail. The watchdog tracks every running tool in an activeToolTimers map, and when a tool exceeds its tier's wall clock, it synthesises a completed-with-error frame so the agent loop can recover and the Swift bridge unblocks. Users can override the whole thing via the FAZM_TOOL_TIMEOUT_SECONDS environment variable from Settings, Advanced, Tool Timeout.

Is Fazm screenshot-based or accessibility-based?

Primary input is the macOS accessibility tree via AXUIElementCreateApplication, AXUIElementCopyAttributeValue, and the AX property chain. Screenshots are used as a secondary signal for visual context when needed, captured synchronously through CGWindowListCreateImage in ScreenCaptureManager.swift and filtered by PID and window size. This is the opposite of screenshot-first agents like Anthropic Computer Use, which takes a picture, asks the model to read it, and hopes it clicks the right pixel. Accessibility APIs are fast (hundreds of milliseconds per action versus 2-5 seconds per screenshot), structured (the model gets a tree of roles and labels, not pixels), and model-neutral (no dependence on vision benchmark scores).

What does the code path look like end-to-end when an agent action fires?

On a normal action, Swift-side AppState owns the accessibility permission health state (polled every 5 seconds, up to 3 retries), and the ACP bridge owns tool invocation. The agent loop selects a tool (say macos-use_click_and_traverse), the bridge starts a TOOL_TIMEOUT_MCP_MS timer, forwards the request to the mcp-server-macos-use binary, waits for its result via a named pipe (FAZM_BRIDGE_PIPE), and either cancels the timer on success or fires the synthetic completion frame on timeout. On the Swift side, the action may call AXUIElementCopyAttributeValue which returns .cannotComplete, which triggers confirmAccessibilityBrokenViaFinder, which either keeps the loop alive or flips isAccessibilityBroken to true and prompts recovery. The interesting part is that both the bridge timeout and the AX probe operate in the background: the user does not see them unless something fails.

Why three timeout tiers instead of one global timeout?

Because tool classes have fundamentally different response profiles. ToolSearch and similar internal tools execute locally and should return in under a second, so a 10-second cap is loose enough to tolerate disk latency but tight enough to surface a broken tool quickly. MCP tools hit external processes (a browser, a native app, a Python server) and can legitimately take tens of seconds, so 120 seconds is the pragmatic ceiling. The default 300-second bucket is for anything that might legitimately need a long session, like a multi-step refactor across a large codebase. Collapsing all of these into one global timeout means either the fast tools hang too long on error or the slow tools get killed mid-session. The three-tier split is a small amount of code that absorbs a large amount of real-world variance.

Can a user increase the timeout if they have a slow tool?

Yes, via the FAZM_TOOL_TIMEOUT_SECONDS environment variable, read at acp-bridge/src/index.ts:82. Settings, Advanced, Tool Timeout surfaces the same value in the UI. When set to a positive integer, it overrides all three tiers with one uniform value in seconds. The design goal is that power users can dial it up for pathological workloads (a large sync job, a stubborn browser session) while the default tiers remain tight enough for the common case. Setting it to zero returns the tiered behavior. This is the same pattern as the reentrant warmup delay and several other knobs: sensible default, simple env override, no config file required.

Does Fazm work with any Mac app or only with a pre-built integration list?

Any macOS app that exposes an accessibility tree works out of the box because Fazm operates on the AX property chain, not on per-app adapters. The macos-use MCP server exposes generic click_and_traverse, type_and_traverse, scroll_and_traverse, press_key_and_traverse, and refresh_traversal tools that walk any AX tree. The pre-built MCP servers (whatsapp, google-workspace, playwright) exist only because those surfaces are either Catalyst apps with imperfect AX trees (WhatsApp) or non-macOS surfaces entirely (Google Workspace APIs, Chromium automation). For anything else, the generic macos-use flow handles it. That is why you can use Fazm with a random menu-bar utility, an old pre-notarized tool, or your company's internal Electron app, without writing an integration.

Is Fazm open source, and what is the license?

Yes, the Fazm desktop source tree is public. The consumer-facing product ships as a free Mac app with a paid tier, and the engine is buildable from source. The accessibility-permission probe code at AppState.swift:308-504 and the tool-timeout watchdog at acp-bridge/src/index.ts:72-101 are both in the public tree. If you want to verify the claims in this guide, those are the two files to read. That is a deliberate stance: the parts of a desktop agent that survive real installs are exactly the parts that benefit most from public scrutiny, because they are the parts you cannot fake in a screen recording.

What is the single most underappreciated thing about shipping a desktop agent?

Permission recovery code. It is boring, invisible, and easy to skip in a demo, but it is the largest source of user-visible failures in a shipped product. Every other desktop agent roundup talks about model choice, tool taxonomy, or token cost. Almost none of them talk about what the agent does when AXIsProcessTrusted() starts lying on a Tuesday afternoon after a macOS update ships. Fazm's answer is the Finder fallback probe plus the CGEvent tap probe plus a 5-second polling timer with up to 3 retries. That is three small blocks of code that absorb an enormous amount of user pain. The demo-to-reality gap in desktop agents is mostly code like that.

Install a desktop agent that already shipped past its demo

Free Mac app. Accessibility-native control of any app on your Mac. Probes recover from stale TCC caches automatically. Tool hangs get a 10, 120, or 300 second ceiling depending on the tool class. No developer setup required.

Download Fazm
fazm.AI Computer Agent for macOS
© 2026 fazm. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.