Field notes from a consumer Mac agent

A computer agent on your Mac is mostly a permission probe with a model attached

The 2026 wave of computer agents (Claude Computer Use, OpenAI Operator, Manus, Grok Computer) all live in cloud sandbox VMs and read the screen with vision models. Fazm runs on your real Mac and reads the macOS accessibility tree as structured text. The interesting engineering is not the model. It is the layer of OS plumbing that keeps the agent alive when the TCC cache lies and AXError.cannotComplete is ambiguous.

3-layer probe5s retry, 3 attemptsCGEvent.tapCreate tie-break5 bundled MCP servers
M
Matthew Diakonov
10 min read
4.8from Sourced from the Fazm repo: Desktop/Sources, acp-bridge/src, CHANGELOG.json
Three-layer accessibility probe in AppState.swift lines 307-504
CGEvent.tapCreate fallback for stale TCC cache on macOS 26 Tahoe
Bundled mcp-server-macos-use binary at Contents/MacOS

The short version

A computer agent reads your screen and clicks for you. The 2026 generation splits two ways: cloud agents that rent a Linux sandbox and feed screenshots to a vision model, and consumer desktop agents that run on your real machine and read native accessibility data. Cloud agents are easier to build because the host OS is theirs. Desktop agents are harder because the host OS belongs to the user, who has revoked, granted, and re-granted accessibility permission three times this week, and macOS does not always notice.

The most interesting code in Fazm is not the prompt and not the model wiring. It is roughly two hundred lines of Swift in AppState.swift that decides, on every poll, whether the accessibility permission is actually working. Three layers of checks, an ambiguity disambiguator, a retry timer, and a process restart of last resort. None of it is exciting. All of it is load-bearing.

AXUIElementkAXFocusedWindowAttributeCGEvent.tapCreateTCC cachemacOS 26 TahoeAXError.cannotCompleteFinder probeACP v0.29.2mcp-server-macos-use5-second retry interval

What the probe layer looks like, end to end

The AX permission state on macOS is binary in the user's mental model and tri-modal in reality: granted-and-working, granted-but-broken, and not-granted. Fazm has to detect all three and react differently to each. The diagram below maps the three input signals to the single "is the agent allowed to do anything" verdict the rest of the app reads off isAccessibilityBroken.

Three signals into one verdict

AXIsProcessTrusted
AXUIElementCopyAttributeValue
CGEvent.tapCreate
checkAccessibilityPermission
isAccessibilityBroken = false
5-second retry timer
Quit & Reopen alert

The five things every consumer Mac agent has to handle

None of these matter for a cloud computer use agent in a sandbox VM. All of them matter the moment your agent runs on the user's real machine. These are the load-bearing pieces in Fazm's probe code, in execution order.

TCC trust check

AXIsProcessTrusted at AppState.swift line 311. The single API call most macOS automation tutorials stop at. Returns a cached boolean from the user's Privacy & Security database. Fast, but on macOS 26 Tahoe the cache lies after a permission revoke until the next process restart.

Real AX call

testAccessibilityPermission at line 433 makes an AXUIElementCopyAttributeValue call against NSWorkspace.shared.frontmostApplication and inspects every AXError code. .success, .noValue, .notImplemented, and .attributeUnsupported all mean the API is alive.

AXError.cannotComplete is ambiguous

Could be your permission, could be the target app refusing to expose AX. Fazm runs a second call against Finder via confirmAccessibilityBrokenViaFinder at line 468 to disambiguate. Qt apps, PyMOL, and a few OpenGL surfaces would otherwise look like permission failures.

CGEvent tap probe

probeAccessibilityViaEventTap at line 490 calls CGEvent.tapCreate with cgSessionEventTap as the tie-breaker. This bypasses the per-process TCC cache and asks the live macOS TCC database directly. Only used when previously granted, to avoid Privacy & Security toast spam during onboarding.

Retry, then prompt restart

If TCC says trusted but AX calls fail, startAccessibilityRetryTimer at line 375 polls every 5 seconds, up to 3 times. On exhaustion, an NSAlert titled 'Accessibility Permission Needs Restart' offers Quit & Reopen, which spawns 'sleep 1 && open <bundlePath>' and terminates. The accessibility daemon binds to PID; only a fresh PID clears it.

The probe, step by step

Fazm runs a checkAccessibilityPermission cycle on launch and every 60 seconds thereafter. The path through the cycle changes based on what each layer returns. Below is the actual decision tree, traced from line numbers in AppState.swift.

1

1. Standard TCC trust check

Every 60 seconds while the app is running, AppState calls AXIsProcessTrusted. If it returns false and the user previously had permission, Fazm jumps to the event tap probe instead of trusting the false. If it returns true, Fazm continues to the next layer.

2

2. Real AX call against frontmost app

testAccessibilityPermission picks NSWorkspace.shared.frontmostApplication, calls AXUIElementCreateApplication on its PID, then AXUIElementCopyAttributeValue with kAXFocusedWindowAttribute. The return code is mapped: success/noValue/notImplemented/attributeUnsupported pass; apiDisabled fails immediately; cannotComplete branches to the Finder probe.

3

3. Finder probe to disambiguate cannotComplete

confirmAccessibilityBrokenViaFinder calls AXUIElementCopyAttributeValue against Finder by bundle ID com.apple.finder. Finder is the canonical AX-compliant baseline. Match: cannotComplete from Finder too means truly broken. Mismatch: the suspect app was the problem, permission is fine.

4

4. CGEvent tap as tie-breaker

If Finder is not running (rare on macOS, but possible after a forced login), probeAccessibilityViaEventTap creates a listenOnly tap on cgSessionEventTap watching mouseMoved events. If the tap returns non-nil, the live TCC database confirms permission. The tap is invalidated immediately via CFMachPortInvalidate, no event ever fires.

5

5. Retry timer or restart prompt

If the result is 'TCC trusted but AX calls fail,' isAccessibilityBroken flips to true and a 5-second-interval Timer.scheduledTimer kicks in. After 3 retries with no recovery, the user sees an NSAlert. Choosing Quit & Reopen relaunches via /bin/sh -c 'sleep 1 && open <bundlePath>' and NSApplication.shared.terminate.

The actual code

Two functions, both from /Users/matthewdi/fazm/Desktop/Sources/AppState.swift. The first is the real AX call with the AXError dispatch. The second is the event-tap tie-breaker that bypasses the TCC cache.

Desktop/Sources/AppState.swift

What the log stream looks like during a stuck-permission recovery

When a user toggles Accessibility off and then on again in System Settings without restarting Fazm, the log stream below is what the probe layer emits. The recovery happens without the user seeing anything: the event tap probe catches the stale cache, the retry timer confirms AX is alive again, and the agent keeps running.

Console log: stuck-permission auto-recovery
0probe layers in checkAccessibilityPermission
0sretry interval, max 3 attempts
0MCP servers bundled inside the signed app
0sbetween AX permission polls
~2 KB

A Gmail inbox sent to Claude as a native macOS accessibility tree, instead of an ~80 KB base64 PNG screenshot. Same task, forty times less per-call input.

Fazm v1.5.0 changelog, March 27, 2026

Cloud computer use vs Mac-resident computer agent

Same job, different host OS, different engineering surface.

FeatureCloud computer use APIsFazm (Mac-resident)
Where the agent runsCloud sandbox VM (Linux, ephemeral)Your real Mac, signed and notarized binary
Primary read pathScreenshots fed to a vision modelmacOS accessibility tree via AXUIElement
Per-step input size (Gmail inbox)~80 KB base64 PNG~2 KB structured text
Click targetPixel coordinates from visionStable AX element identifiers
Permission modelCredential vault inside the VMNative macOS TCC, granted to the parent app
Stale-permission handlingN/A (no host OS to drift)Three-layer probe + 3-retry timer + restart prompt
Works in your appsWhatever is installed in the VMMail, Calendar, Messages, Cursor, Figma, Notion, anything native
Fallback when AX is missingN/ACGWindowListCreateImage with .bestResolution + Screen Recording probe

Comparison reflects shipping behavior of Claude Computer Use, OpenAI Operator, Manus, Grok Computer (announced) versus Fazm v2.4.1 as of 2026-04-22.

Why the probe layer ships at all

The first version of Fazm that read the accessibility tree shipped on March 27, 2026. The first version of the three-layer probe shipped after a week of bug reports that all said roughly the same thing: "the agent stopped working but Settings says it has permission." What was actually happening was the macOS 26 Tahoe TCC cache holding a stale answer for the Fazm process across permission toggles, while the AX calls underneath silently returned .cannotComplete. The user toggled the permission off and back on. Settings said yes. AXIsProcessTrusted said yes. The agent did nothing.

The fix could not be a one-liner because the symptoms have two roots. Sometimes the TCC cache itself is the liar. Sometimes the frontmost app is one that does not implement AX, so cannotComplete is reasonable and the agent is fine. Telling the difference required the Finder probe and the event tap. Without both, the recovery alert would fire on every PyMOL or Qt-based foreground app, which would be the kind of false positive that gets a consumer app uninstalled.

When pixels still earn their keep

The accessibility tree is not the answer for everything. PDFs with embedded images, Figma canvases, WebGL surfaces, and Canva designs all collapse to a sea of generic group roles in the AX dump. For those, Fazm captures the frontmost window through ScreenCaptureManager.captureAppWindow at /Users/matthewdi/fazm /Desktop/Sources/FloatingControlBar/ScreenCaptureManager.swift line 14. The capture path uses CGWindowListCreateImage with three flags chosen on purpose: .optionIncludingWindow keeps the window's shadow for visual context, .boundsIgnoreFraming drops the title bar chrome, and .bestResolution preserves retina scaling. The function returns .permissionDenied as a distinct case from .success when window metadata is readable but pixel capture fails, which is the unambiguous signal that Screen Recording was revoked separately from Accessibility.

The default for native Catalyst apps (Mail, Calendar, Messages, Notes, Reminders) is to skip the screenshot entirely and read the AX tree, because every actionable element is already named in the tree. The default for browser pages is to use Playwright MCP, which exposes the DOM. The default for visual canvases is to fall through to the screenshot. Three read paths, picked per app, all routed through the same Claude session.

Want to walk through the probe layer with us?

Bring your Mac agent or your Mac automation problem. We will pull up the AppState.swift code on a call and show you the actual log stream during a recovery.

Frequently asked questions

What is a computer agent, in plain terms?

A program that drives a computer the way a person would. It reads the screen, decides what to click, and types into apps that were never designed for automation. The 2026 wave of these agents (Claude Computer Use, OpenAI Operator, Manus, Grok Computer, Fazm) all share that definition. Where they split is on how they read the screen and where they run. The cloud-side ones rent a sandboxed Linux VM, take screenshots, and send pixels to a vision model. A consumer Mac agent like Fazm runs on your real machine, reads the macOS accessibility tree as structured text, and uses native AXUIElement calls to click on stable element IDs.

Why does running on the user's real Mac matter?

Because that is where the user's Calendar, Mail, Messages, Cursor, and Figma already live, signed in, with their actual data. A cloud computer use agent in a sandbox VM has none of that. Either you log in inside the VM (sketchy and slow) or you give it OAuth tokens (now you have a credential storage problem). Running on the real Mac avoids both, but the price is engineering against macOS itself: TCC permission prompts, accessibility cache invalidation on macOS 26 Tahoe, screen recording entitlements, and the AXError surface area documented below.

What is the accessibility tree and why does Fazm read it?

Every native macOS window exposes a structured tree of its UI through the AX (accessibility) APIs, the same data Apple's VoiceOver screen reader consumes. AXUIElementCreateApplication takes a process ID, AXUIElementCopyAttributeValue with kAXFocusedWindowAttribute walks the tree, and you get back an enumerable list of buttons, rows, text fields, and groups with stable identifiers and bounding rects. A Gmail inbox rendered as an accessibility tree is roughly two kilobytes of text. The same inbox as a base64 PNG screenshot is closer to eighty kilobytes. Sending the tree is faster, cheaper, and gives the model element IDs to act on instead of pixel coordinates that go stale the moment a window resizes.

What is the actual permission probe Fazm runs, and where does it live?

Three layers, all in /Users/matthewdi/fazm/Desktop/Sources/AppState.swift. Layer one is AXIsProcessTrusted at line 311, the standard TCC trust check that almost every macOS automation tutorial stops at. Layer two is testAccessibilityPermission at line 433, which makes a real AXUIElementCopyAttributeValue call against the frontmost app and inspects the AXError result. Layer three is probeAccessibilityViaEventTap at line 490, which calls CGEvent.tapCreate with cgSessionEventTap because on macOS 26 Tahoe, AXIsProcessTrusted maintains a per-process cache that returns true even after the user has revoked permission. The event tap creation API hits the live TCC database directly and is the only reliable way to invalidate that cache.

What does Fazm do when AXError.cannotComplete comes back?

It runs a confirmation probe against Finder. AXError.cannotComplete is genuinely ambiguous on macOS: it can mean the agent's own permission has gone stale, OR that the frontmost app is one of the handful that does not implement AX at all (Qt apps, OpenGL games, PyMOL, and a few others). If Fazm sees cannotComplete from the frontmost app, confirmAccessibilityBrokenViaFinder at line 468 makes a second AX call against Finder, which is guaranteed to be AX-compliant. If Finder also fails, the permission is genuinely broken; if Finder succeeds, the original failure was app-specific and the permission is fine. Without this disambiguation, every Qt-based app would falsely flag the agent as broken.

How does the retry timer work, and when does the user see it?

When testAccessibilityPermission returns false but TCC says trusted, Fazm starts a 5-second-interval retry timer in startAccessibilityRetryTimer at line 375 with a max of 3 attempts (maxAccessibilityRetries at line 304). On each tick it re-runs checkAccessibilityPermission. If three retries fail, showAccessibilityRestartAlert at line 404 surfaces an NSAlert titled 'Accessibility Permission Needs Restart' with two buttons: 'Quit & Reopen' and 'Later.' Choosing 'Quit & Reopen' calls relaunchApp at line 422, which spawns 'sleep 1 && open <bundlePath>' through /bin/sh and terminates the current process. The reason for the spawn-and-terminate dance is that the macOS Accessibility daemon binds permission state to the process ID, and the only way to clear a stuck binding without a reboot is to fully replace the process.

Is the screenshot path gone entirely?

No, just demoted. ScreenCaptureManager at /Users/matthewdi/fazm/Desktop/Sources/FloatingControlBar/ScreenCaptureManager.swift uses CGWindowListCreateImage with .optionIncludingWindow, .boundsIgnoreFraming, and .bestResolution flags to capture the frontmost app's window when the agent genuinely needs pixel context (a PDF with images, a Figma canvas, a WebGL surface). The capture is gated behind Screen Recording permission, and the function returns .permissionDenied as a distinct case from .success when the window metadata is readable but pixel capture fails, which is the signal that Screen Recording was revoked while Accessibility stayed granted. The default for native Catalyst apps like Mail, Calendar, and Messages is to skip the screenshot entirely.

Which MCP servers does Fazm bundle, and why does it matter for a computer agent?

Five, hardcoded as BUILTIN_MCP_NAMES at line 1266 of /Users/matthewdi/fazm/acp-bridge/src/index.ts: fazm_tools, playwright, macos-use, whatsapp, and google-workspace. The macos-use binary is the bridge into the AX layer described above; it is bundled at Contents/MacOS/mcp-server-macos-use inside the signed app and resolved at line 63 of the same bridge file. The reason a computer agent ships an MCP server inside the .app instead of asking the user to install one is that the agent layer (Claude, in Fazm's case) speaks the Agent Client Protocol; tool calls flow through MCP; and bundling means the user grants Accessibility once to the parent app and the bundled binary inherits that permission, which it would not if it were installed separately.

Does this approach work on Windows or Linux?

Not yet, on either, because the engineering is platform-specific to the macOS AX surface, the TCC cache behavior, and the CGEvent tap API. Windows has a similar concept (UI Automation, IAccessible2, the Windows Automation API) and Linux has AT-SPI, but the failure modes are different and the permission models are completely different. Fazm is Mac-only at v2.4.1; the cloud-only computer use agents are platform-agnostic by design because their VM is always Linux. If you want a real consumer agent that drives the apps already installed on your real machine, the work is platform-by-platform.

What can I verify in the Fazm repo to confirm everything in this article?

Three files. /Users/matthewdi/fazm/Desktop/Sources/AppState.swift lines 300 to 504 for the entire permission probe (timers, three-layer check, retry alert, relaunch). /Users/matthewdi/fazm/Desktop/Sources/FloatingControlBar/ScreenCaptureManager.swift lines 14 to 44 for the demoted screenshot path with .permissionDenied disambiguation. /Users/matthewdi/fazm/acp-bridge/src/index.ts line 63 for the bundled macos-use binary path and line 1266 for the BUILTIN_MCP_NAMES constant. The CHANGELOG entry that documents the dynamic MCP server list is at /Users/matthewdi/fazm/CHANGELOG.json line 20, version 2.4.0, dated 2026-04-20.