Where accessibility-API computer use actually breaks on macOS.

Most articles on accessibility-API agents present them as if they just work on every app. They do not. There is a real list of app categories where the AX tree is too thin or absent, and there are two macOS system-state edge cases that look like the agent broke when it did not. This page catalogs both, with the actual error codes you see, and points at the disambiguation logic Fazm ships in the open source repo to keep going when the AX layer goes ambiguous.

M
Matthew Diakonov
9 min read
4.9from open source on GitHub
Probes Finder as a control on AXError.cannotComplete
Detects the macOS 26 stale-TCC cache via event tap
Five app categories where AX falls short, named

Direct answer (verified 2026-05-01)

Accessibility-API computer use on macOS reaches every app that implements NSAccessibility, which is most native AppKit and SwiftUI apps. It degrades or fails on five app classes: Electron text trees (Slack, Discord, VS Code, Notion), Qt apps without an AT-SPI bridge, OpenGL or Metal canvases, web canvases (Figma, Skia, WebGL), and Python or Tk-based UIs like PyMOL. On top of that, two macOS system-state cases look like a permission failure but are not: the per-process AXIsProcessTrusted cache can go stale on macOS 26 (Tahoe) after an app re-sign or update, and AXError.cannotComplete is returned for both revoked permission and apps that never implemented AX. A useful agent has to disambiguate.

What every other guide on this misses

Read the pages that currently explain accessibility APIs for AI agents and they cluster into two stories. Story one: the screenshot-versus-AX comparison, with a table that says AX is faster and more reliable. Story two: the user-facing tour of VoiceOver, Zoom, and Switch Control, which is about people, not agents. Both are fine for what they are. Neither answers the question a developer or operator asks the day they wire an AX agent into a real workflow, which is “in which apps does this thing silently stop working, and what does it return when it does”.

That gap matters because the failure modes are sneaky. The agent does not crash. It clicks nothing, returns nothing, or worse, clicks the wrong AXGroup because the only nodes in the tree are generic containers with no role or name. The reader needs a list of app classes to expect, the actual error codes the OS hands back, and the trick to tell “the system is broken” from “this app just does not implement accessibility”. That trick is the anchor of this page.

1 error, 3 meanings

The same AXError.cannotComplete is returned for revoked permission and for apps that never implemented AX. A retry loop on that error code will spin forever on a Qt app.

AppState.swift, Fazm open source repo

The five app classes where AX falls short

Sorted by how often a real Mac user runs into them. The first one alone covers most of the apps a knowledge worker opens in a day, which is why “just use AX” is not a complete answer for an agent.

Electron and Chromium-shell apps

Slack, Discord, VS Code, Notion, Linear desktop, GitHub Desktop, Spotify, Figma desktop. The Chromium content view exposes a flat tree of generic AXGroup nodes with few roles and no useful names. Forcing the renderer accessibility flag adds depth at a CPU cost but does not fix sparse ARIA inside the web app.

Qt apps without AT-SPI bridge

PyMOL, FreeCAD, OBS Studio, KeePassXC, qBittorrent, Anki. Qt's accessibility plug-in has to be loaded and wired through to NSAccessibility for AX to work. Many Qt builds on macOS do not. AXError.cannotComplete is the typical response.

OpenGL / Metal canvases

Most game engines, Blender, Unity / Unreal editors, GPU-rendered design tools. The window has one AXGroup at the canvas, the agent cannot reach controls inside without falling back to vision or coordinate hits.

Web canvases inside browsers

Figma, Excalidraw, Tldraw, Miro, charts rendered with WebGL or Skia, design tools that draw their own UI to a canvas. Agents that consume the AX tree see no nodes inside the canvas. OCR or model-based vision is the only honest approach.

Python / Tk and other non-Cocoa UIs

PyMOL, IDLE, older Tk-based scientific tools, anything wrapping an X11 server via XQuartz. AXError.cannotComplete on every call from the frontmost app, even when the system permission is fine.

The error codes you actually see, and what each one means

AXUIElementCopyAttributeValue and the related calls return one of roughly six error codes on a failed call. The interesting ones for an agent are the four below. The error type is in the public HIServices header, which is the canonical reference.

AXError codes a computer-use agent has to disambiguate

FeatureAction a naive agent takesAction a useful agent takes
AXError.successSame. Easy case.Read the value, dispatch the action.
AXError.apiDisabled (system AX off)Often confused with cannotComplete and treated as transient. Spins on retries.Stop, log, prompt user to enable accessibility in System Settings. Unambiguous.
AXError.cannotComplete (most common)Map directly to 'permission broken' and re-prompt. Or map to 'transient' and retry forever. Either way nags the user or stalls.Run a control AX call against Finder. Finder fails too: permission is stuck, ask the user to relaunch the agent. Finder succeeds: the failure is app-specific, log it and move on (vision fallback if available).
AXError.notImplemented / attributeUnsupportedTreats as a hard failure and gives up.Try a different attribute or fall back to coordinate-based action. The app exists, the value just is not exposed.

Source: AppState.swift testAccessibilityPermission(), Fazm open source repo, github.com/m13v/fazm.

The two macOS-26 cache states that look like a bug

These are not bugs in the agent. They are how TCC behaves on modern macOS. An agent that does not handle them will report a broken permission to a user who has the toggle on, and the user will give up.

0AXIsProcessTrusted cache goes stale after a re-sign or macOS 26 update; toggle is on, API returns false
0AXError.cannotComplete returned for both revoked permission and apps that never implemented AX

The fix for state 1 is to not trust AXIsProcessTrusted alone. Create a CGEvent listen-only tap. Tap creation reads the live TCC database directly and bypasses the per-process cache. If the tap succeeds while AXIsProcessTrusted returns false, the cache is stale and the user does not need to do anything.

The fix for state 2 is to never trust a single AX call against the frontmost app to tell you the permission state. Always run a control call against a known-good app like Finder. The next section walks through that.

The anchor fact: probe Finder as a control

This is the trick that lets a Mac agent stay sane when it is sitting in front of a Qt app or a Python UI that returns AXError.cannotComplete. Run the same AX call against com.apple.finder. Finder is a known-good NSAccessibility client. If Finder fails too, the permission is genuinely stuck, and the agent should ask the user to quit and reopen. If Finder succeeds, the original failure was app-specific, the permission is fine, and the agent should log the app as AX-incompatible and move on.

// Desktop/Sources/AppState.swift, lines 468 to 485

private func confirmAccessibilityBrokenViaFinder(suspectApp: String) -> Bool {
  if let finder = NSRunningApplication.runningApplications(
       withBundleIdentifier: "com.apple.finder"
     ).first {
    let finderElement = AXUIElementCreateApplication(finder.processIdentifier)
    var finderWindow: CFTypeRef?
    let finderResult = AXUIElementCopyAttributeValue(
      finderElement, kAXFocusedWindowAttribute as CFString, &finderWindow
    )
    if finderResult == .cannotComplete || finderResult == .apiDisabled {
      // Finder also fails: permission is truly stuck.
      return false
    } else {
      // Finder works: the original failure was app-specific.
      return true
    }
  } else {
    // Finder not running: fall back to event tap probe as tie-breaker.
    return probeAccessibilityViaEventTap()
  }
}

Finder being the control matters. It is always present on a Mac, it implements NSAccessibility cleanly, and it never gets caught in the AT-SPI / Qt / OpenGL situations that confuse the frontmost-app check. When confirmAccessibilityBrokenViaFinder returns true, the agent silently keeps working on the apps that do expose AX and falls through to vision-based interaction on the apps that do not. The user is not asked to grant a permission they already granted.

The companion routine is probeAccessibilityViaEventTap, on the same file at lines 490 to 504. It creates a CGEvent tap of type cgSessionEventTap with options listenOnly. Tap creation hits the live TCC database, so a tap that succeeds while AXIsProcessTrusted returns false is the canonical signature of the macOS 26 stale-cache state. Together the two routines cover the four-by-four table of (app implements AX yes/no) by (process cache fresh yes/no), without nagging the user when the system is actually fine.

Apps where the agent should expect to fall back to vision

Not exhaustive, but representative. If the agent encounters any of these, the probe-Finder check almost always confirms “permission fine, app-specific incompatibility” and a vision-or-coordinate fallback is the right path.

Slack

Electron. Sparse AX tree, mostly AXGroup nodes with no useful names.

Discord

Electron. Same shape as Slack, generic groups dominate.

VS Code

Electron. Better than most Electron apps but still requires --force-renderer-accessibility for full depth.

Notion

Electron. The doc surface is a contenteditable div tree, AX tree is thin.

Figma desktop

Electron + WebGL canvas. AX tree stops at the canvas boundary.

Blender

OpenGL canvas. One AXGroup at the viewport, nothing inside.

PyMOL

Python / Tk. AXError.cannotComplete on every call.

OBS Studio

Qt without AT-SPI bridge wired through. Mostly opaque to AX.

Where AX still wins

The list of failures is real, but it is also a minority surface. On the apps where AX is well-implemented, an accessibility-API agent is faster, cheaper, and more reliable than a screenshot agent by a wide margin. No frame to capture, no model pass to tokenize pixels into elements, no coordinate prediction to resolve. The agent reads a typed AXTextField, dispatches a kAXPressAction, and gets back a structured response.

The set is large. Finder, Safari, Mail, Calendar, Messages, Notes, Reminders, Preview, System Settings, Keychain Access, Disk Utility, Pages, Numbers, Keynote, Music, TV, Podcasts, Maps, Photos, Things, Fantastical, Bear, MarsEdit, Tot, BBEdit, Xcode, Terminal, iTerm2 with caveats. Most of what a small business owner runs in a day is in this set. The five problem classes above are the ones where the agent has to be honest about a fallback path.

The pattern that holds: if VoiceOver can read the app, an accessibility-API agent can drive it. If you turn on VoiceOver and it falls silent on the app, plan for a vision fallback.

The minimum disambiguation set, by the numbers

Counts from the actual code in AppState.swift, the file an agent on macOS has to ship something equivalent to.

0
AXError codes worth handling distinctly
0
retry attempts before prompting Quit and Reopen
0
seconds between accessibility re-checks while broken
0
control mechanisms (Finder probe, event-tap probe)

The retry-and-probe machinery is roughly 200 lines of Swift. Without it the agent flickers between “permission granted” and “permission broken” every time the user opens a Qt or Electron app, and the support load eats the team. With it, a stuck cache shows up as exactly one alert at startup and an app-specific incompatibility shows up as a log line.

What this means if you are evaluating a computer-use agent

Two questions answer most of it. First, how does the agent handle the apps you actually use every day? If your stack is Slack plus VS Code plus Figma, an AX-only agent will look great on Mail and Finder and bounce on the apps that matter. Ask whether there is a vision fallback and how it is triggered.

Second, how does the agent behave when the system permission is in a weird state? Re-sign the app, update macOS, deny and re-grant the permission. The agent that handles all three without nagging you to reset something is the one that has the disambiguation shipped. The agent that loops on a permission alert every time you open Blender does not.

Fazm uses real accessibility APIs and screen context for the apps where AX falls short, runs locally, and is fully open source. The accessibility-permission code is in Desktop/Sources/AppState.swift, easy to read and easy to verify.

Want to walk through the AX edge cases live?

Fifteen minutes on a call. Bring the apps you use every day, watch how Fazm handles the AX-thin ones, and see whether the hybrid path covers your real workflow.

Questions developers ask before betting an agent on accessibility APIs

Where exactly does an accessibility-API agent fail on macOS?

Three failure shapes. App-shape: the app implements no NSAccessibility tree, or implements such a sparse one that the agent cannot tell a button from a label. Electron is the famous offender, Slack, Discord, VS Code, and Notion all expose minimal AX. Qt apps work only when the platform plug-in is wired through to NSAccessibility, which not every Qt app does. OpenGL and Metal canvases (game engines, GPU-rendered tools) draw pixels with no AX nodes inside. Web canvases, including most charting libraries and design tools rendered with Skia or WebGL, are opaque. Python and Tk apps like PyMOL often return AXError.cannotComplete from every call. Permission-shape: the system has the permission but the per-process cache went stale, common after a macOS 26 (Tahoe) update or an app re-sign. Error-shape: AXError.cannotComplete is the same error code for 'permission revoked' and 'this app never implemented AX', which means an agent that just retries on cannotComplete will loop on a Qt app forever.

What is the actual list of macOS apps that accessibility APIs work well on?

Native AppKit and SwiftUI apps from Apple work well: Finder, Safari, Mail, Calendar, Messages, Notes, Reminders, Preview, Keychain Access, System Settings, Pages, Numbers, Keynote, Music, TV, Podcasts, Maps, Photos. Most third-party native apps that ship as AppKit or SwiftUI on the App Store also expose a usable AX tree: Things, Fantastical, Bear, MarsEdit, Tot, BBEdit, Xcode (mostly), Terminal, iTerm2 (with some quirks). The further from native, the worse it gets. A pattern that holds: if a screen reader user can use the app with VoiceOver, an accessibility-API agent can usually drive it. If VoiceOver is silent on the app, a screenshot-based approach is more reliable.

Why do Electron apps look fine to a human but break for accessibility-API agents?

Electron renders Chromium inside a native shell, so what you see is a web page rendered to a single Chromium content view. Chromium does expose accessibility nodes through the platform AX bridge, but the depth and quality of the tree depends on the web app inside the shell, the Electron version, and whether accessibility is force-enabled with the --force-renderer-accessibility flag. In practice, Slack and Discord expose mostly generic AXGroup nodes with no roles or names that an agent can dispatch on, so 'click the second send button' becomes guesswork. The DOM is rich, the rendered AX tree is thin. Apps using web frameworks via Tauri or wkwebview-based wrappers have similar gaps. The honest workaround is hybrid: AX where it exists, screenshot OCR or coordinate-based fallback where the AX tree is too thin to act on.

What is AXError.cannotComplete and why is it the worst error code?

AXError.cannotComplete is what you get back from AXUIElementCopyAttributeValue when the call cannot be answered, but the API does not tell you why. It is returned for at least three different conditions: the calling process has lost accessibility trust (revoked permission or stale TCC cache), the target process has not implemented NSAccessibility, or the target process is alive but unresponsive. An agent that maps cannotComplete directly to 'permission broken' will nag the user to re-grant access every time the user opens a Qt app. An agent that maps cannotComplete to 'app limitation' will silently give up on real permission failures. The fix is to disambiguate: run the same kind of call against a known-good control app like Finder. If the control fails too, the permission is broken. If it succeeds, the original failure is app-specific.

What does Fazm specifically do when it hits these limits?

The relevant code is in Desktop/Sources/AppState.swift in the open source repo. checkAccessibilityPermission() polls AXIsProcessTrusted, then runs an actual call against the frontmost app via testAccessibilityPermission(). If that returns cannotComplete, the agent does not panic. It calls confirmAccessibilityBrokenViaFinder(), which runs the same call against com.apple.finder. If Finder also fails, the permission is truly stuck and the user is asked to Quit & Reopen. If Finder succeeds, the original failure is logged as 'app-specific AX incompatibility' and the agent moves on. There is also probeAccessibilityViaEventTap(), which creates a CGEvent listen-only tap, because event-tap creation checks the live TCC database directly and bypasses the per-process cache that goes stale on macOS 26 Tahoe. The retry timer at startAccessibilityRetryTimer fires every 5 seconds for 3 attempts before showing a restart alert, which is the only real fix for a stuck cache.

Why does the TCC cache go stale on macOS 26?

macOS caches the result of AXIsProcessTrusted per process, and on Sequoia and Tahoe that cache is not always invalidated when the user grants permission in System Settings or when an app is re-signed. The symptom is: the user clicked the toggle, System Settings shows the app as enabled, but AXIsProcessTrusted keeps returning false. Apple's documented work-around is to relaunch the app, which spawns a new process and gets a fresh TCC read. Fazm detects this case by attempting to create a CGEvent tap; tap creation hits the live TCC, so a tap that succeeds while AXIsProcessTrusted returns false is the signature of a stale cache. The same pattern matters for any computer-use agent shipped on macOS today, because users who reinstall, update, or re-grant the permission will hit it.

Is the limit fundamentally different from Anthropic Computer Use or OpenAI Operator?

Anthropic Computer Use and OpenAI Operator are screenshot-and-coordinate models. They do not consume an accessibility tree at all. Their limits are different: tiny text, dense UIs, multi-monitor setups, dynamic content that has shifted between the screenshot and the click, and per-screenshot token cost. An accessibility-API agent's limits are the ones described on this page: app categories that do not expose AX, ambiguous error codes, stale system caches. The two models are complementary. The honest engineering answer is that a useful agent on macOS today is hybrid: AX-first for native apps where the tree is reliable, fall back to vision and OCR for Electron and canvas-rendered apps, with explicit disambiguation when the AX layer reports an ambiguous error.

Is there a way to force Electron apps to expose better accessibility data?

Sometimes. Chromium and Electron both support a runtime flag that forces the renderer to build a complete accessibility tree. For Chrome the flag is --force-renderer-accessibility, and Electron inherits it. The trade-off is performance: forcing the tree adds memory and CPU per renderer process, and it does not fix sparse roles inside the web app itself. If the web app uses div soup with no aria-label or role attributes, the AX tree is still flat. The reliable fix lives upstream: the apps would need to add ARIA semantics. Until then, an agent on Slack or Discord ends up reading window titles and message text via OCR, then clicking by coordinates derived from the captured frame. This is the gap that pushed Anthropic and OpenAI toward screenshot-first agents in the first place.

Do screen-recording permissions help with the apps that break AX?

Yes, and that is what an honest hybrid agent does. macOS exposes ScreenCaptureKit (CGWindow on older systems) for capturing per-window pixel buffers. Combined with a vision model or local OCR, the agent can read what is on screen even when AXUIElementCopyAttributeValue returns nothing useful. Fazm asks for screen-recording permission specifically for this case. The order matters too: macOS dialogs prompt cleanly when accessibility is asked for first and screen-recording last, because screen-recording requires an app restart to take effect, which would interrupt the accessibility grant flow. The Fazm onboarding asks in that order on purpose.

Where in the source can I read the actual handling code?

Three concrete spots. AppState.swift around lines 308 to 503 holds the polling loop, the disambiguation against Finder, and the event-tap probe. The retry timer and restart alert are in the same file just below. PermissionsPage.swift in MainWindow/Pages renders the user-facing flow that prompts for the permission and shows the broken-state hint. ChatPrompts.swift around line 369 documents the order of permission requests for onboarding (microphone, then accessibility, then screen recording). All of it is plain Swift in github.com/m13v/fazm, no obfuscation, you can clone the repo and trace one accessibility check from the timer firing to the AXError code being mapped.