BRIDGE / TREE / DISAMBIGUATION

macOS accessibility APIs and Electron apps: where the bridge actually breaks

The answer is not that Electron is invisible to macOS Accessibility. Chromium ships a real NSAccessibility bridge, and an external AX agent does get a response. The answer is that two layers fail back to back: the bridge is opt-in, and the tree it produces is a translated DOM, so unlabeled <div> soup becomes AXGroup soup. Worse, the failure shares an error code with a broken permission, which is why naive agents nag users every time Slack comes to the front. This is the failure at the API call, in the dump, and in the disambiguation.

M
Matthew Diakonov
10 min read

Direct answer · verified 2026-05-20

Why don't macOS Accessibility APIs work on Electron apps?

They do reach, but two-layered. Chromium's NSAccessibility bridge is opt-in (Electron must set app.accessibilitySupportEnabled = true or pass --force-renderer-accessibility), and even with it on, the tree is a translation of the DOM. So Slack, Discord, Notion, and a stock VS Code all produce mostly empty AXGroup nodes with no role or name an agent can dispatch on. The authoritative reference for the opt-in side is the Electron accessibility doc at electronjs.org/docs/latest/tutorial/accessibility. The translation side is verifiable by diffing the AX dump of any Electron app at /tmp/macos-use/ against any native AppKit app.

The first layer: the bridge is opt-in

When an external process calls AXUIElementCreateApplication(pid) on an Electron app, what comes back is the Chromium content view's NSAccessibility root. Chromium does not eagerly build the full accessibility tree behind that root. It builds it on demand, when an assistive-tech client opens it, or when the embedder asks for it explicitly. Two ways to ask:

  • In Electron app code, set app.accessibilitySupportEnabled = true at startup. This is the runtime knob the Electron docs surface, and it is per-app.
  • Pass --force-renderer-accessibility to Electron's commandLine.appendSwitch before the app is ready. This is the Chromium-side flag, same effect, exposed for development and forced-on builds.

Without one of those, the AX dump of an Electron app at default settings looks like the left side below: an application root, a window, the menu bar, and one anonymous content group. With one of them, it looks like the right side: every visible UI region becomes its own group, and the visible static text shows up as AXStaticText nodes. Same Slack window, two different views.

Same Slack window, two different AX dumps

[AXApplication] "Slack"
  [AXWindow] "general | acme"
    [AXMenuBar] "" x:0 y:0 w:1440 h:24
    [AXGroup] "" x:0 y:24 w:1440 h:876
      // content area collapsed to one node
-340% more nodes

The relevant detail in the right-side dump: hundreds of nested AXGroup nodes carrying no name, no role beyond "group", and no useful descendant roles. The message text shows up as static text, which is enough for an agent to summarize what the user is looking at, but the "send", "react", "reply in thread", and "upload file" controls do not exist as named buttons in the tree. They are styled divs in the underlying DOM, and the translation faithfully reproduces that as anonymous groups. Turning the flag on gets you visibility into the page, not controllability of the page.

The second layer: the tree is a translation of the DOM

This is the part that pages on this topic skip. Even with the bridge forced on, Chromium does not invent semantics. It walks the live document, maps HTML elements and ARIA attributes to platform AX roles, and exposes the result. The mapping is honest: if the source says "this is a div", the AX node is AXGroup. If the source says "this is a div with role=button and aria-label='Send'", the AX node is AXButton with AXTitle of "Send". The whole chain is that simple, and the whole limitation is that simple too.

A short map of how the major Electron apps fare under this translation, from inspection of dumps with the bridge forced on:

Slack

Channels, messages, and timestamps surface as AXStaticText. The composer, send button, reaction picker, file uploader, and thread controls are anonymous AXGroup nodes. An agent can read the conversation. It cannot click "send" without falling back to vision or a saved coordinate.

Discord

Same shape as Slack. Voice-channel rows carry slightly more semantics because the team added them, but the message-area buttons stay anonymous. The DM list is dense static text with no role.

Notion

The document surface is a contenteditable div tree. AX sees a long sequence of AXStaticText nodes for visible content. The sidebar and the slash-command menu are groups. Cursor positioning has to go through keyboard.

VS Code / Cursor

Better. Monaco exposes line content through an ARIA live region, so the visible buffer is readable line by line. The cursor is rendered on a canvas, so it has no AX position. Agents drive it through keyboard shortcuts instead.

The implication for an agent on macOS: do not assume that flipping the renderer flag is the fix. The flag opens the bridge. The web app inside the shell decides whether the tree is dispatchable. A useful agent has to detect the depth at runtime and pick a path.

The third layer that bites in production: an overloaded error code

AXError.cannotComplete (raw value -25204) is the error an Electron app with no AX support tends to return. It is also the error an app returns when the system Accessibility permission is stuck, and the error you get from a per-process TCC cache that went stale on macOS 26 (Tahoe) after an app re-sign. Three very different problems, one code. A naive agent loops back to the user with a settings dialog every time it fires. A useful agent disambiguates first.

What the disambiguation looks like at runtime

The pattern lives in the Fazm source at Desktop/Sources/AppState.swift. testAccessibilityPermission at line 480 is the call that fires on every front-app change. confirmAccessibilityBrokenViaFinder at line 517 is the probe against com.apple.finder. The third layer, probeAccessibilityViaEventTap at line 539, is the tie-breaker for the rare case where Finder is not running: it tries to create a CGEvent tap, which queries the live TCC database and bypasses the per-process AX permission cache. The whole file is at github.com/mediar-ai/fazm.

The decision the agent has to make

Put together: an agent that wants to act on a focused window on macOS has to answer three questions in order. Is the system permission live? Does this app expose a usable AX tree? If not, is there a vision fallback that is worth invoking? The diagram below is the exact sequence Fazm runs the first time it sees a new front-app per session.

Front-app focus change, first time this session

AgentAXFinderVisioncopyAttr(frontPid, focusedWindow).cannotComplete (ambiguous)copyAttr(finderPid, focusedWindow).success (permission is fine)deep traverse frontPid200 AXGroup nodes, no rolesfall back to screenshot + OCRlabelled regions + bounding boxes

What this gets you: an agent that does not nag the user about a permission that is fine, does not waste tokens trying to dispatch against an empty tree, and does not silently fail when the user focuses an Electron app. The cost is one extra round-trip per session per app. The cost is worth it.

The practical shortlist if you are building this

  1. Do not assume Electron means "no AX". Call AXUIElementCreateApplication first and try.
  2. Treat AXError.cannotComplete as ambiguous, not as a permission failure. Probe a known-good control (Finder is the right one on macOS) before deciding.
  3. When the AX tree is reachable but mostly AXGroup, count the named-role descendants. If you find fewer than ten roles you can dispatch on per window, mark the app AX-incompatible for this session and fall back to vision.
  4. Cache the AX-incompatible result by bundle identifier. The shape of the tree does not change between front-app activations. It changes when the app updates Electron, which is rare.
  5. Persist a dump of the tree the model actually received. Fazm writes one per tool call at /tmp/macos-use/<timestamp>_<tool>.txt. When a click fails, you can grep the file to see whether the target was missing or just misidentified.

Building an agent that has to drive Electron apps?

Twenty minutes on the disambiguation pattern, the dump format, and the cases where a hybrid AX-plus-vision pipeline is the right architecture.

Frequently asked questions

Do macOS Accessibility APIs reach Electron apps at all, or are they fully blocked?

They reach. Chromium implements the NSAccessibility informal protocol, so an Electron window does respond to AXUIElementCopyAttributeValue and friends. The reach is just not useful. Two things have to be true for an external AX agent to act on an element: the element has to exist in the tree, and it has to carry a role and name a model can dispatch on. Electron clears the first bar (the tree exists) but usually fails the second (the tree is mostly empty AXGroup nodes with no AXTitle or AXValue, because the underlying DOM is styled <div> soup with no aria-label or role attributes). So the right framing is not 'blocked', it is 'reached, but at a depth that does not let an agent do anything'.

What is the opt-in part of the bridge?

Chromium does not build a full accessibility tree by default. It builds one when an AT client opens it, or when the embedder asks for it explicitly. Electron exposes this as the app-level property app.accessibilitySupportEnabled, which a developer can set to true to force-on at startup. It can also be forced via the Chromium command-line switch --force-renderer-accessibility, passed by appending it to Electron's commandLine. Without either, your AXUIElementCopyAttributeValue calls return a thin top-level tree (the window chrome, the menu bar) and a single AXGroup that represents the entire content area, no children. The reason most articles tell you to turn the flag on is that it gets you from one node to a few hundred nodes. They stop there. The flag does not promise the few hundred are labelled.

Why is the resulting tree mostly AXGroup nodes even with the flag on?

Because Chromium's accessibility tree is a translation of the DOM. It walks the live document, maps HTML elements and ARIA attributes to platform AX roles, and exposes the result. If the DOM is <div>...</div> nested ten deep with class names but no aria-label, no role, no semantic element, the translation is a stack of generic group nodes. Slack and Discord render most of their interactive surface this way. Notion does too. VS Code's editor surface is a custom canvas-like text widget that the AX bridge can describe at the line level but not at the token level. The conclusion is operational: --force-renderer-accessibility is a necessary toggle for these apps, never a sufficient one. The actual quality of the tree is the responsibility of the web app inside the shell.

Concretely, what does my agent see when it tries to read a Slack window?

A top-level AXApplication, an AXWindow with the workspace name in its title, a menu bar with the standard macOS menu items, and one giant AXGroup containing the entire content view. Inside that group there are a few hundred more AXGroup nodes with no name, occasional AXStaticText nodes that carry the visible message text, and very few buttons or roles. There is no 'Send message' button with role AXButton and title 'Send'. The agent cannot pick a target by role plus name the way it would on Mail or Finder or Apple Music. So the agent ends up reading the AXStaticText nodes for context, then either guessing coordinates from the bounding boxes in the AX dump, or falling back to a screenshot pipeline with OCR. Same problem in Discord, with a slightly thicker tree because their voice-channel widgets are slightly more semantic.

There is a famously overloaded error code here. Which one and why does it matter?

AXError.cannotComplete (raw value -25204). On macOS this single code is returned in three very different situations: the user revoked the system-wide Accessibility permission, the per-process TCC cache went stale after an OS update or app re-sign, or the target app simply did not implement NSAccessibility at the depth the call needs. The first two are permission states the user can fix by toggling Settings. The third is an app-shape problem the user cannot fix. An agent that maps cannotComplete to 'permission broken' will pop a settings dialog every time the user focuses an Electron app that did not opt in. Users hate that, and they will not trust an agent that nags. The fix is to disambiguate before deciding.

What is the disambiguation pattern Fazm ships?

Probe a known-good control. When testAccessibilityPermission in Fazm's AppState.swift (line 480) gets AXError.cannotComplete from the frontmost app, it does not conclude anything yet. It calls confirmAccessibilityBrokenViaFinder at line 517. That function pulls the running com.apple.finder process and runs the same AXUIElementCopyAttributeValue against it. Finder always implements NSAccessibility and is always running on a Mac. If Finder also fails, the permission is truly stuck, the agent surfaces a Quit-and-Reopen dialog. If Finder succeeds, the original failure was app-specific: the agent logs the suspect app, marks it AX-incompatible for this session, and falls back to vision. A third tie-breaker exists for the case where Finder is not running (rare): probeAccessibilityViaEventTap at line 539 attempts to create a CGEvent tap, which queries the live TCC database and bypasses the per-process AX permission cache that can go stale on macOS 26 (Tahoe). Three layers, one call each, no nagging.

Will turning on --force-renderer-accessibility fix this on Slack and Discord?

Partially. The flag flips Chromium from 'thin top-level tree' to 'full DOM-derived tree', which usually means the number of nodes in your AX dump goes up by an order of magnitude. The new nodes contain the visible static text in AXStaticText nodes, which helps the agent understand context. What does not change: the buttons inside the web app are still styled divs with no semantic role unless the web app added one. So 'click the third reaction button on the second message' is still guesswork. The fix that actually works lives upstream: Slack would have to add aria-label and role=button to the elements you want to act on. Until they do, the agent has to fall back to a vision pipeline for those interactions even with the flag on. The flag is necessary, not sufficient.

What about Cursor and VS Code, which are Electron but built on Monaco?

Better than Slack or Discord, worse than a native editor. Monaco exposes its line model through an ARIA live region with text content, which means an AX agent can read the visible buffer line by line via AXStaticText. It cannot place the cursor at column 17 of line 200 by addressing an AX element, because Monaco renders its cursor on a canvas. The agent ends up driving the editor via keyboard shortcuts (cmd-g for goto-line, cmd-f for find) instead of AX-level positioning. This is fine if the agent is doing high-level operations like 'open this file and run this command'. It is not fine if the agent needs surgical edits, in which case a real LSP or a direct file edit beats the AX path. Cursor inherits VS Code's tree wholesale.

Does the Tauri framework have the same problem since it uses native webviews?

Yes, and the failure mode is slightly different. Tauri renders content in the platform's native webview (WKWebView on macOS), and WKWebView's accessibility bridge to the host AX layer is even more conservative than Chromium's. The DOM is there, but the AT-side exposure is opt-in per webview and the resulting tree tends to be even sparser than Electron's. The same disambiguation pattern (probe Finder, fall back to vision) applies. The thing that changes is the runtime knob: there is no --force-renderer-accessibility equivalent for WKWebView, the developer has to enable it in their Cocoa setup at app build time.

How can I see what my agent actually sees on an Electron app?

If you run the Fazm agent locally, every accessibility tool call writes a file to /tmp/macos-use/<timestamp>_<tool>.txt with the exact AX tree the model received. Open the file in any editor. Each line is one element, format is [AXRole (subrole)] "title" x:N y:N w:W h:H, and the file header shows the element count and traversal time. Run a tool against Slack with --force-renderer-accessibility off, then run the same tool with the flag on. Diff the two files. You will see exactly where the bridge stopped translating, and you will see how much of the visible UI is still missing from the labelled-element view. That diff is the ground truth.

Is the honest answer 'screenshot agents are better for Electron'?

It is the honest answer for the apps inside the shell. Screenshot-and-coordinate agents (Anthropic's Computer Use, OpenAI's Operator) do not care about the AX tree. They tokenize a captured frame and predict a click. On Slack and Discord, where the AX tree is thin, that is a tie or a win for the screenshot model. On Mail, Finder, Settings, and most AppKit / SwiftUI apps, where the AX tree is dense, the AX agent is faster, cheaper, deterministic, and immune to layout reflow. The right answer on macOS today is hybrid: AX-first for apps that expose a usable tree, vision fallback for the Electron and canvas-rendered apps, and an explicit disambiguation step so the agent knows which path it is on at any moment. The boundary work in confirmAccessibilityBrokenViaFinder is what makes the hybrid possible without nagging the user.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.