Field notes for agent builders

Codex, cross-platform, and accessibility APIs. There is no single API. There are three.

The question reads like one question and is actually two stacked on top of each other. Yes, Codex reads UI state through an accessibility API. No, that API is not cross-platform. The cross-platform layer most people imagine when they say “accessibility API” is actually three separate stacks with three separate quirks. Below is what Codex ships today, what the three OS APIs each give you, and where Fazm fits, written by someone who maintains the macOS half.

M
Matthew Diakonov
7 min read

Direct answer (verified 2026-05-14)

No. Codex Computer Use is macOS only as of May 2026 and reads the macOS AXUIElement hierarchy. OpenAI lists Windows and Linux on the roadmap with no public timeline. There is no single cross-platform accessibility API: macOS uses AXUIElement, Windows uses UI Automation (UIA), Linux uses AT-SPI. An agent that genuinely covers all three operating systems ships three readers behind a shared schema, not one API call. Source: developers.openai.com/codex/app/computer-use.

Unpacking the phrase

“Codex cross-platform data accessibility API” collapses four assumptions into one. The first is that Codex ships everywhere. It does not. The Codex chat app is on macOS and Windows, the CLI runs where Node runs, and Computer Use (the part that actually clicks inside other apps) is macOS only. The second is that the three operating systems share a data layer. They do not. The third is that an agent reading accessibility data is calling something like a REST API. It is not. AXUIElement is a CoreFoundation C API behind a Cocoa wrapper. UIA is COM. AT-SPI is D-Bus. The fourth, and the most consequential, is that the data shapes are similar enough to be unified. They are similar in spirit, different in detail, and the differences are exactly where agents break.

The three stacks, side by side

Each operating system exposes a logically similar tree of widgets, buttons, text fields, lists, but the calls you make and the names the calls return are not the same. The W3C's Core Accessibility API Mappings spec exists precisely because browsers have to bridge these three for assistive tech, and they do not bridge cleanly.

What you call, what you get back

The accessibility API surface, per OS

macOS — AXUIElement

Cocoa / CoreFoundation. Calls like AXUIElementCreateApplication and AXUIElementCopyAttributeValue. Roles are kAXButton, kAXWindow, kAXTextField. Permission flows through TCC and the Accessibility privacy pane. What Codex Computer Use and Fazm both use.

Windows — UI Automation

COM-based. IUIAutomation interface, ControlTypeId enums (Button, Window, Edit), AutomationId properties. Inspect.exe is the tooling everyone uses. No equivalent to AXIsProcessTrusted. What our Windows project mediar-ai/terminator uses.

Linux — AT-SPI

D-Bus messages to org.a11y.atspi.Accessible. Same role taxonomy as IAccessible2 (push button, text, frame). GTK and Qt expose it well; Electron apps depend on Chromium's AT-SPI bridge. No major vendor desktop agent ships against it today.

What Codex Computer Use ships today

From the OpenAI page, verbatim, the feature reads the AX hierarchy of any open macOS app, captures screenshots and visible window content, and can interact with windows, menus, keyboard input, and clipboard state. It refuses to drive terminal apps and Codex itself, refuses to authenticate as administrator, refuses to approve macOS Privacy and Security prompts, and asks for explicit per-app approval the first time it touches a new app. Screenshots and visible content travel to OpenAI as part of processing context. The feature is not available in the EEA, the UK, or Switzerland at launch.

1 of 3 OSes

Computer use is currently available on macOS, except in the European Economic Area, the United Kingdom, and Switzerland at launch.

OpenAI, Codex Computer Use docs, May 2026

The fanout an agent actually needs

A model that wants to read state from a desktop app does not call some “accessibility API.” It calls one of three readers, and a shaping layer in front of them turns the answer into something the model can reason over. This is the shape of every serious cross-OS agent: per-platform readers, one schema. Drawn from the dispatch logic in our own codebase and in xa11y, AccessKit, and the browser engines.

An accessibility-API agent, drawn honestly

Model turn
Tool call
Agent host
macOS
Windows
Linux

What “reading data” actually costs in each call

The three readers are not interchangeable in latency or shape. A rough mental model of what a single “give me the title of the focused window” call looks like:

0OS-specific stacks to support
0CoreFoundation round-trip per AX call on macOS
0MMillion people locked out by Codex EEA/UK/CH block
0Bytes leaving the Mac when Fazm walks the AX tree

The anchor fact: why one reader, one codebase, one OS

Fazm's permission probe lives in Desktop/Sources/AppState.swift around line 488. It calls AXUIElementCreateApplication on the frontmost app, then AXUIElementCopyAttributeValue for kAXFocusedWindowAttribute. If the call returns .cannotComplete, the probe falls back to Finder as a control, because plenty of macOS apps (Qt, OpenGL, Python-based apps like PyMOL) under-implement AX and return the same error code regardless of whether the agent's permission is granted. Without the fallback, a stuck-on-Qt frontmost app would falsely report broken accessibility.

That logic does not translate. Windows' UIAutomationCore.dll has no equivalent to AXIsProcessTrusted; UIA registration is a different ritual. AT-SPI on Linux talks over a system bus and the failure modes are bus-shaped (timeout, missing bridge), not TCC-shaped. Same problem, different shape. Which is why the same team that ships Fazm on macOS ships mediar-ai/terminator for Windows as a separate Rust project. Two repos, one product family. That split is the honest evidence that “cross-platform accessibility API” is not one API.

What the abstraction libraries do, and where they stop

You will find three libraries when you search for cross-platform accessibility, and they solve different parts of the problem.

  • xa11y exposes a CSS-selector style query against AXUIElement, UI Automation, and AT-SPI2 from Rust, Python, and Node. The closest thing to one API, but the abstraction is shallow: the role and property names are still platform-shaped in the result. Useful for prototypes, less useful when an agent has to handle a missing role differently per OS.
  • AccessKit is the inverse of what most agent builders need. It is for UI toolkit authors who want to push a single accessibility schema into the three OS APIs (so a Rust GUI lib like Druid or Iced can have a screen reader work on day one). Not a reader. Not for agents.
  • acacia from Igalia is an inspector. It exposes the three native APIs through a common interface, but it is built for tooling, not for hot-path production agent calls. Slow and complete, not fast and lossy.

The pattern across all three: they will save you boilerplate, they will not paper over the differences. The agent still has to know it is on macOS when something fails with cannotComplete from a Qt window.

If you need this today, what to actually do

Three honest paths, no marketing.

  1. macOS only: use Codex Computer Use if you are fine with screenshots leaving the Mac and you are not in the EEA, UK, or Switzerland. Use Fazm if you want the same AX-tree approach kept local, with persistent sessions, one-click chat forking, and no auto-compact. Both call into the same underlying API; the difference is where the data goes after.
  2. Windows only: Anthropic's computer use on Claude desktop is the largest vendor option as of May 2026. For an open implementation against UI Automation, mediar-ai/terminator is a Rust project that reads the UIA tree, named for playwright for Windows.
  3. Linux or all three: no vendor agent ships this end to end. The honest answer is roll your own per-platform reader and unify the schema yourself, or use xa11y for prototypes. Anyone selling you “cross-platform accessibility API” in one product pitch is selling either macOS plus screenshots on Windows, or vapor.

Building an agent that walks the AX tree on macOS?

Happy to compare notes on AXUIElement quirks, TCC weirdness, and what works on Sequoia vs Tahoe. 20 minutes, no pitch.

More questions, same lookup

Does Codex run on Windows or Linux?

The Codex chat app does run on Windows (and the CLI runs on Linux), but the part of Codex that reads UI state from another app and clicks back into it (Computer Use) does not. OpenAI's Computer Use page is direct about this: computer use is currently available on macOS, except in the European Economic Area, the United Kingdom, and Switzerland at launch. Windows and Linux are described as on the roadmap with no published date. So if your question is can Codex read a value out of an open SAP window on Windows by walking the accessibility tree, today the answer is no. The Windows desktop app does not expose that surface yet.

Why does Codex Computer Use only work on macOS today?

Because shipping the same agent on three operating systems is three engineering projects, not one. macOS gives you AXUIElement and a Cocoa-shaped accessibility hierarchy. Windows gives you UI Automation (UIA), a COM-based API with a different tree shape, different role names, and a different permission model. Linux gives you AT-SPI over D-Bus, which is the assistive interface most desktop environments expose but plenty of apps under-implement. There is no library that hides all three behind one function call without losing fidelity. OpenAI shipped the macOS half first; until the other two land, Computer Use is single-platform.

Is there really no cross-platform accessibility API I can call?

There are abstraction layers, but they are thin and they leak. xa11y is an MIT-licensed Rust library that exposes a CSS-selector style query against AXUIElement, UI Automation, and AT-SPI2; it has Python and Node bindings. AccessKit goes the other direction: it is for UI toolkit authors who want to push accessibility data into the three platform APIs, not for agents that want to read it. Igalia's acacia inspects platform accessibility APIs but stops short of a one-line cross-platform call. In practice, any production agent that targets more than one OS ends up with a per-platform reader plus a shared schema, the same shape every browser uses internally. There is no one-line API, and pretending there is one is the fastest way to ship a brittle agent.

What does Codex Computer Use actually read from the AX tree?

Per OpenAI's launch docs, Codex captures screenshots and the visible window content, and it can interact with windows, menus, keyboard input, and clipboard state. The MacStories writeup describes it as reading the accessibility hierarchy of any open macOS app, which lets the model produce precise coordinates instead of guessing from pixels. The data flow point worth knowing: screenshots and visible content become part of Codex's processing context and are subject to ChatGPT's data controls. That means the AX tree dump and the pixels both leave the Mac during a task. Different from a fully local agent that walks the same tree and never uploads it.

How does Fazm relate to all this?

Fazm is macOS only by design, runs the same kind of AXUIElement loop Codex Computer Use does, and ships an open-source implementation you can read. The permission probe lives at Desktop/Sources/AppState.swift around line 488: it calls AXUIElementCreateApplication on the frontmost app, then AXUIElementCopyAttributeValue for kAXFocusedWindowAttribute, and if the call returns cannotComplete it tries again against Finder as a control to rule out Qt or OpenGL apps that don't implement AX. That fallback logic is macOS-specific and does not translate to UI Automation or AT-SPI. The honest version of cross-platform for the same team is shipping a second codebase, in our case the Windows-only Rust project at github.com/mediar-ai/terminator, which uses UI Automation. Two repos, one product family, because there is no shared reader.

I need data from a desktop app on Linux today, what do I actually do?

If the app speaks AT-SPI cleanly (most GTK and Qt apps, browsers, Electron with a recent runtime), you can walk the tree directly with pyatspi2 or the AT-SPI D-Bus interface. If it doesn't, you fall back to one of the older techniques: scrape window state with xdotool, OCR a screenshot, or use the app's network traffic. The honest constraint: no large-vendor desktop agent ships AT-SPI today. Anthropic's computer use targets macOS and Windows. OpenAI's Codex targets macOS. Self-hosted is your option until that changes.

Why not just take screenshots and skip accessibility APIs entirely?

Because the per-turn cost is real and the model can hallucinate coordinates. Anthropic publishes a tool-definition cost of around 735 tokens and a system-prompt cost of around 480 tokens for the screenshot-driven Computer Use tool, plus the screenshot itself up to 1568 pixels on the long edge, every turn. A single AXUIElementCopyAttributeValue call returns structured text in one CoreFoundation round trip, and the model gets a string it can reason over instead of a region of an image it has to ground. For tasks that touch a button with a label, accessibility-API agents beat screenshot loops on speed, cost, and reliability. For tasks that touch a canvas (Figma, Photoshop, a video editor), the pixel-grounded approach wins. Pick by app, not by ideology.

Will Codex ever ship a unified data layer across the three operating systems?

It would need three readers (AXUIElement, UI Automation, AT-SPI), a unifier (probably the IAccessible2 / Core AAM mapping the W3C maintains, or a homegrown schema), and a per-OS permission flow (TCC on macOS, UIA registration on Windows, AT-SPI bus auth on Linux). The W3C Core Accessibility API Mappings spec exists precisely so the same role and property model maps across all three. Browsers already implement this internally. Whether OpenAI ships that publicly, and when, is unknown. There is no announcement as of mid-May 2026.

Where do I verify these claims about Codex's platform support?

developers.openai.com/codex/app/computer-use is the OpenAI-published page, which contains the macOS at launch wording and the EEA / UK / Switzerland exclusion. The MacStories piece at macstories.net (April 19, 2026) confirms Mac-only at launch and describes the AX hierarchy approach. The Codex on Mac walkthroughs at findskill.ai and eesel.ai cross-check the platform constraint independently. For the three-stack cross-platform reality, xa11y.dev and the AccessKit GitHub repo are the canonical references on how the API abstractions actually work.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.