AI agents and legacy desktop apps: what the accessibility tree actually returns.
People who have not shipped a desktop agent assume the accessibility tree is empty on anything older than 2020. It is not. On macOS the AX API is from 2002 and most software that ran on a Mac since then registers at least a partial tree. The failure mode worth your time is the one OS error code that overloads three meanings, and the small disambiguation pattern that tells them apart.
Direct answer (verified 2026-05-08)
Yes, an accessibility-tree AI agent can drive a lot of legacy desktop apps on macOS. Most software shipped after Mac OS X 10.2 registers an AX tree, including Carbon and older AppKit apps. The ones that do not (Qt without the AT-SPI bridge, Tk and Tcl apps, OpenGL or Metal canvases, ancient Carbon dialog boxes) all return AXError.cannotComplete from every call, which is also what you get back when the user revoked accessibility permission or the macOS 26 TCC cache went stale. A useful agent disambiguates by re-running the same AX call against a control app like Finder before deciding whether to fall back to vision or ask the user for permission. The Fazm code path that ships this is in Desktop/Sources/AppState.swift.
Why "legacy" is the wrong axis
"Legacy desktop app" usually means "old enough to have no REST API." That tells you almost nothing about whether an accessibility-tree agent can drive it. The thing that actually predicts AX coverage is the UI toolkit the app was built on, not the year it shipped.
A Cocoa app from 2010 that uses standard NSButton, NSTextField, and NSTableView controls will hand back a clean AX tree to a modern agent in 2026. A 2024 Electron app shipped by a YC company will hand back a flat sea of AXGroup nodes that the agent cannot dispatch on. The first is "legacy" by industry vocabulary, the second is "modern," and the AX picture is the opposite of what those labels imply.
The split that matters: did the toolkit hook NSAccessibility, or does it draw its own widgets. AppKit, SwiftUI, Cocoa Carbon, and Java with the Java Access Bridge hooked in. Qt without the AT-SPI bridge, Tk, OpenGL canvases, and pre-Electron-25 web shells did not. The age of the binary is downstream of that choice.
Legacy app categories: what AX returns, what to do
One row per category, ordered roughly from "drive it directly with AX" at the top to "AX is a dead end, go to vision" at the bottom. "AX tree" is what AXUIElementCopyAttributeValue gives you when you walk the focused window. "Fazm response" is what the agent does when it sees that shape.
| Feature | Fazm response | AX tree |
|---|---|---|
| Native AppKit / SwiftUI (any vintage) | Rich tree, every control has AXRole and AXTitle, frames in points | Drive directly via AXPress, AXValue, AXFocused. No vision needed. |
| Older Cocoa with custom NSView subclasses | Mostly rich, sometimes a custom view shows up as opaque AXGroup | AX for everything except the opaque view. That subtree gets a screenshot pass. |
| Carbon (pre-2008) with standard controls | Partial tree, AXButton and AXTextField mostly work, custom dialogs may not | AX where labeled, vision fallback for the unlabeled regions, no nag dialogs. |
| Java / Swing with Java Access Bridge enabled | Usable tree once the JVM is launched with the Access Bridge agent | AX driven, with a one-time check that the bridge is loaded into the target JVM. |
| Java / AWT or Swing without bridge | AXError.cannotComplete from every call | Disambiguate against Finder, log as AX-incompatible, route to vision. |
| Qt without the AT-SPI bridge plug-in | AXError.cannotComplete from every call | Same as above. Most cross-platform scientific tools land here. |
| Tk / Tcl (PyMOL, IDLE, Python tutorial UIs) | AXApplication and AXWindow only, no children | Empty-tree signature triggers vision fallback immediately. |
| OpenGL / Metal canvas (Blender, CAD, games) | AXWindow with no children, all UI is rendered pixels | Vision-only path, no AX calls after the first probe. |
| Electron < v25 without --force-renderer-accessibility | Flat AXGroup nodes, no roles or names you can dispatch on | Hybrid: read window title via AX, click via OCR + coordinates. |
| Web app inside Safari or Chrome | Rich, ARIA-derived, but addressed via the browser's bridged tree | Routed through the Playwright tool, not the desktop AX tool. |
AX coverage on a given app can be improved by the user enabling assistive features (e.g. Java Access Bridge), or by the app's developer adding NSAccessibility roles. The categories above are the default behavior on a fresh macOS 14.x install.
The one error code that overloads three meanings
AXError.cannotComplete is the OS reply you get from AXUIElementCopyAttributeValue when the call cannot be answered. The same code is returned for three conditions, and the agent has to pick the right one before doing anything else.
cannotComplete, three meanings
- 1
Permission is genuinely revoked
User toggled it off, or TCC cache is stuck after a re-sign.
- 2
App never implemented AX
Qt, Tk, OpenGL canvases, very old Carbon return this for every call.
- 3
App alive but unresponsive
Beachballed, blocked on a dialog, or hung mid-launch.
An agent that just retries on cannotComplete will spin on a Qt app forever. An agent that maps it to "permission broken" will nag the user every time they open Blender. The fix is to probe a control app, watch what comes back, and act differently in each case.
The disambiguation, in real Swift
This is the function that runs every time a Fazm tool call touches a frontmost app. Finder is the control because it is always running on a Mac and always implements NSAccessibility correctly. If Finder fails on the same call shape, the permission is the problem. If Finder works, the suspect app is just legacy in the AX-thin sense, and the agent switches to vision on that window without bothering the user.
The second probe, probeAccessibilityViaEventTap, is the tie-breaker for when Finder is not running. It calls CGEvent.tapCreate with a listen-only mask. Tap creation reads the live TCC database, not the per-process AXIsProcessTrusted cache that goes stale on macOS 26 (Tahoe) after an app re-sign or update. A tap that succeeds while AXIsProcessTrusted returns false is the signature of a stale cache. The retry timer above fires every five seconds for three attempts before showing the Quit and Reopen alert, which is the only real fix for a stuck cache.
What the model actually reads on a legacy app
When Fazm calls a desktop tool the bundled mcp-server-macos-use writes a flat text dump to /tmp/macos-use/<timestamp>_<tool>.txt and the LLM sees that file as context. On a working AppKit app, the dump is hundreds of lines. On a Tk app like PyMOL, the same dump is two or three lines, an AXApplication node and an AXWindow, with no children. The empty-tree signature is the trigger that flips the agent into vision mode for that window.
That file is ground truth. If a click landed, the element was in the dump. If the click failed, the dump tells you whether the model was guessing from a stale tree or whether the element was simply absent. On a legacy app this is how you tell "AX is thin here" from "AX is broken here." The path is logged in the tool-call response, so you can grep it out of the chat transcript after the fact.
“The thing nobody tells you about computer-use agents on macOS: the accessibility tree handles way more legacy software than people assume, and the actual hard part is making the agent quiet about the apps where AX is genuinely empty. AXError.cannotComplete looks the same as a permission failure. Fixing that one ambiguity is most of what makes the agent feel reliable.”
What this means in practice
If your reason to run an AI agent is the long tail of business software that has no API (a 2018 desktop CRM, a 2014 invoicing tool, a Java clinic scheduler from a vendor that has not shipped a release in three years), the practical answer on macOS is to try the AX path first. Most of those apps are not actually AX-thin. The dental clinic scheduler that "looks legacy" probably ships an AXTable and AXTextField for every cell.
For the apps that genuinely are AX-thin (the Tk PyMOL, the Qt scientific tool, the Blender plug-in workflow), an accessibility-first agent still needs to do the right thing. That means detecting the empty tree quickly, not nagging the user about permissions, and routing the next action through screen capture and vision. The disambiguation pattern above is what makes that routing decision deterministic instead of a coin flip.
The thing other "AI agent for legacy apps" content gets wrong is framing this as a binary, AX agents versus screenshot agents. On real workloads the right answer is hybrid, and the engineering work is in the seam between the two layers, not in either layer on its own.
Have a legacy app you want to drive?
If you have a specific desktop app and you want to know whether AX, vision, or hybrid is the right path before you commit, jump on a 20-minute call and we will look at the AX tree together.
Frequently asked questions
Can an AI agent drive a desktop app from 2014 with no API?
On macOS, often yes. The accessibility API was introduced in Mac OS X 10.2 (2002), so most apps shipped after that, even ones written in Carbon or older AppKit, register at least a partial NSAccessibility tree. Fazm reaches them by calling AXUIElementCreateApplication(pid) on the app's process and walking kAXChildrenAttribute. Where it fails: apps that opted out of accessibility entirely (some Qt builds without the AT-SPI bridge, most Tk/Python apps, OpenGL or Metal canvases, ancient Carbon dialog boxes that draw their own controls). Those return AXError.cannotComplete from every call. The honest workflow on macOS is AX-first, vision fallback when AX is empty, and explicit disambiguation when the error code is ambiguous.
What is AXError.cannotComplete and why is it the one error code that matters here?
AXError.cannotComplete is what AXUIElementCopyAttributeValue returns when the call cannot be answered, but the OS does not tell you why. The same code is returned for at least three different conditions: the calling process has lost accessibility trust (the user revoked it, or the macOS 26 TCC cache went stale after an app re-sign), the target legacy app never implemented NSAccessibility, or the target app is alive but unresponsive. An agent that retries on cannotComplete will spin on a Tk app forever. An agent that maps it to 'permission broken' will nag the user every time they open Blender or PyMOL. You have to disambiguate before you act.
How does Fazm tell those three cases apart?
It re-runs the same AX call against a control app that is always installed and always implements accessibility correctly: Finder. The function is confirmAccessibilityBrokenViaFinder() in Desktop/Sources/AppState.swift around line 517. If Finder also returns cannotComplete, the permission is genuinely stuck and the agent surfaces a Quit and Reopen dialog. If Finder succeeds, the original failure was app-specific, the permission is fine, and the agent logs the suspect app as AX-incompatible and falls back to vision. There is also a second probe via CGEvent.tapCreate, used when Finder is not running, because event-tap creation reads the live TCC database and bypasses the per-process AXIsProcessTrusted cache.
What kinds of legacy apps does the accessibility tree handle well on macOS?
Anything that ships as a native AppKit or SwiftUI app, including older versions. Apple's own legacy bundles (older Pages, Keynote, Numbers, Aperture from before it was discontinued, classic iWork) all expose usable trees. Most third-party native apps that were on the Mac App Store before 2018 do too: BBEdit, MarsEdit, Things, OmniGraffle, Coda, Transmit, classic Photoshop CS5 era builds. Java apps work if the user installed Java with the JavaAccessBridge enabled, which is the default for current Adoptium and Azul builds. Carbon apps built before Apple deprecated it work as long as they did not draw their own control widgets. The general pattern: if VoiceOver can read the app, an accessibility-tree agent can drive it.
What kinds of legacy apps break, even on macOS?
Five categories. First, Qt builds where the developer did not link the QAccessible AT-SPI bridge (most Linux-cross-compiled scientific tools fall here). Second, Tk and Tcl apps including PyMOL, IDLE, and most Python tutorial apps. Third, OpenGL or Metal canvases that draw all their UI as pixels (Blender, many CAD tools, game engines). Fourth, Electron apps below ~v25 without --force-renderer-accessibility, which expose mostly generic AXGroup nodes you cannot dispatch on. Fifth, very old Carbon apps (think pre-2008 shareware) that draw control widgets via QuickDraw and never registered an AX role. For all five, the right move is: detect the AX gap, stop calling AX on this window, capture pixels with ScreenCaptureKit, route through OCR or a vision model.
Where exactly is the disambiguation code?
Three concrete spots in github.com/m13v/fazm. AppState.swift around lines 480 to 512 is testAccessibilityPermission(): the switch statement that maps each AXError to a concrete decision. AppState.swift around lines 514 to 534 is confirmAccessibilityBrokenViaFinder(): the control-app probe. AppState.swift around lines 536 to 553 is probeAccessibilityViaEventTap(): the live-TCC tie-breaker that bypasses the stale per-process cache. The retry timer that fires every five seconds for three attempts is right above. PermissionsPage.swift in MainWindow renders the user-facing flow. All Swift, no obfuscation, you can clone the repo and trace one accessibility check from the timer firing to the AXError code being mapped.
Is screenshot-based computer use better for legacy apps?
On the apps where AX is empty, yes, but not in the way most blog posts frame it. Screenshot agents (Anthropic Computer Use, OpenAI Operator) read pixels and predict coordinates. They handle Tk and OpenGL apps because every other approach fails on those. They also burn tokens, lag on multi-monitor coordinate math, and break when the app's UI shifts between the screenshot and the click. On the apps where AX is rich, screenshot agents are slower and more expensive than AX agents for no benefit. The right architecture is hybrid: AX-first when the tree has roles you can dispatch on, vision when AX returns nothing useful. The disambiguation pattern is what makes hybrid switching possible without a heuristic that gets it wrong.
What does a real fazm AX dump look like for a legacy app?
Every macos-use traversal writes a flat text file to /tmp/macos-use/<timestamp>_<tool>.txt. For a working native app the file starts with a header like '# Pages — 412 elements (0.21s)' followed by one line per element in the format [AXRole (subrole)] "title" x:N y:N w:W h:H. For a Tk app, the same file is a few lines: an AXApplication node, an AXWindow, and that is it, because Tk never registered children. The empty-tree signature is the trigger to fall back to vision. The full path is logged in the macos-use response so you can read what the model saw on any turn.
Does this work on Windows or Linux too?
Conceptually yes, the function names change. Windows UI Automation is closer to a single coherent API, and most Win32, WinUI, and WPF apps participate including very old ones. The big Windows-specific quirk is the elevation asymmetry: an unelevated UIA client cannot read elements in elevated processes. Linux AT-SPI works only when the toolkit (GTK, Qt with the bridge plug-in) opts in, which makes it the most fragmented surface. Fazm is macOS only today (14.0+), so the code in this guide is the macOS path. The shape of the problem, three boundaries crossed at once and an ambiguous error code, is the same on all three OSes.
Why not just always use vision on legacy apps to be safe?
Cost and latency. A vision model pass on a 4K screenshot is hundreds of tokens minimum and adds a network round-trip you do not need on a native AppKit app where the AX tree gives you exact frames in points. Running the legacy paths with the screenshot fallback also means inheriting the screenshot model's failure modes (coordinate drift, modal interception, multi-monitor confusion) on apps that did not need any of that. AX-first with explicit fallback is faster, cheaper, and more deterministic on the workloads where it works, which on macOS is most of the apps a small business actually uses.
Related guides
Accessibility tree limits beyond the browser
Four boundaries a browser-AX dev crosses at once: API, trust, addressing, error codes.
Computer-use accessibility limits on macOS, by app category
Where AX-API agents reach and where they degrade on Electron, Qt, and canvas-rendered apps.
Mac automation: what survives system updates
Why the AX tree is the only automation layer that survives a macOS update.