Mac automation: two weeks of building, 12 things that survived, and why the AX tree is the only layer that holds
A post on r/MacOS documented the honest result of two weeks of trying to automate a Mac workflow. Shortcuts, AppleScript, the accessibility API - all tested. Only 12 boring workflows survived day-to-day use. The conclusion: “Shortcuts feels abandoned and the AX tree is the only thing left that survives a system update.” That one sentence is the entire architectural thesis behind Fazm.
“Shortcuts feels abandoned and the AX tree is the only thing left that survives a system update.”
r/MacOS · 16,000 views · April 30, 2026
After two weeks of building, the AX tree is the only macOS automation layer that survives a system update
The original post: old.reddit.com/r/MacOS/comments/1szkekx tested Shortcuts (fires actions fine, no real third-party integration), AppleScript (better, but app-specific quirks everywhere), and screenshot-based approaches (break the moment a button index changes). The survivor across all scenarios: semantic AX tree reads on native Cocoa apps.
Fazm is built on this conclusion. The automation engine reads the AX tree, finds elements by semantic role and label, and does not depend on pixel coordinates that invalidate on every app update. The 12 workflows that survived in the Reddit post are exactly the patterns Fazm is designed to handle reliably.
What each automation layer actually gives you
- Shortcuts: good for first-party Apple apps, weak for third-party (Notion, Linear, Slack have no meaningful Shortcuts integration)
- AppleScript: broad reach, but every app has its own quirks and native Cocoa apps require app-specific scripts
- AX tree via Accessibility framework: the most stable surface for native Cocoa apps; Electron apps flatten the tree into weak semantic roles
- Browser DOM automation: fully semantic for any web-based app, survives app updates as long as the HTML structure holds
- Screenshot-based agents: high latency per step, breaks any time the UI changes, no structural understanding
- API-level automation: survives everything, but most desktop apps do not expose an API for their core workflows
The wedge in one sentence
The AX tree is the only macOS automation surface that is semantically stable across system updates - and Fazm is the only consumer-grade macOS agent built on it first, with Electron fallback for apps that flatten the tree.
Every automation approach on macOS has a surface it owns and a surface where it fails. Shortcuts owns Apple-first apps. AppleScript owns native Cocoa apps if you are willing to write app-specific scripts. Screenshot agents own nothing that survives an update. The AX tree is the one surface that covers native Cocoa apps reliably and degrades predictably (rather than silently) when you hit an Electron app.
Most computer use agents shipped in 2025 and 2026 are screenshot-based. They look impressive in demos because demos use controlled environments. In production, the first time an app ships an update that moves a button or changes a label, the workflow breaks silently. The user notices two days later when the report did not generate.
The r/MacOS post is the clearest public articulation of this problem we have seen. 16,000 views and 15 comments in a thread where every response confirmed the same experience: most automations break, the survivors are boring, the AX tree is the only mechanism that holds.
The automation that breaks vs the one that survives
You build a workflow that clicks a button in Linear. It works for three weeks. Linear ships an update that changes the button's position in the UI. The automation fires, sends clicks to the wrong coordinates, and corrupts your data or silently fails. You find out when someone notices the task board has not been updated for two days.
- Breaks silently on any UI change
- No way to know an update invalidated your coordinates
- Electron apps (Linear, Notion, Slack) update frequently
What the two approaches look like in practice
A screenshot agent asks the model to find the “Create” button in pixels. An AX tree agent asks the system for the element with that label. One survives a UI update. The other does not.
How Fazm reads your screen without sending pixels to a model
The model is invoked at decision points (which element matches the intent, what to do next, when to stop), not on every click. The execution layer runs entirely on-device without cloud round-trips.
The conversation that ran in parallel
In the same 48-hour window as the r/MacOS post, a cluster of posts on X expanded on the same diagnosis from different angles. The thread connects the macOS automation failure to the broader computer use agent landscape.
vision-first computer use looks great in a demo but the moment you stack 3 models the latency math collapses. AX tree returns the same structural data in microseconds and skips vision entirely for anything an app already exposes. https://fazm.ai
the gap isnt model availability anymore. qwen 35b on a macbook is fine in a chat window, the moment you want it to drive an app or edit a file the throughput math falls apart. running a local llm as chat vs as an agent are completely different problems
this is the pattern most pricing fights miss. once devs hit a weekly cap mid-refactor a couple of times, they quietly stop trusting the loop and the model quality argument stops mattering. local plus one month behind beats frontier plus rate limits for actual work.
I've been optimizing transcription accuracy (local vs cloud, model size vs latency). Turns out that's not the constraint. The real bottleneck is that voice without friction spirals. Keyboard forces you to pause and think. Voice at your Mac doesn't. Shipped a hold-to-talk interface that helped with this.
The common thread: demos look good, production math does not. Vision agents hit latency walls. Local models hit throughput walls on agentic contexts. Rate limits erode developer trust. And the macOS automation layer itself is fragile unless you build on the AX tree.
AX tree agents vs screenshot and Shortcuts agents
| Feature | Screenshot or Shortcuts agent | AX tree agent (Fazm) |
|---|---|---|
| Survives OS update | Breaks if button index or label changes | Survives because AX roles are semantically defined |
| Works with Electron apps | AX tree is flat, no semantic roles | Falls back to DOM via browser debug protocol |
| Latency per action step | 200-600ms per step (vision model round-trip) | Microseconds for AX read; model only at decision points |
| Breaks on app update | Screenshot coordinates invalidated immediately | Semantic role stays valid unless app redesigns UI |
| Third-party app support | Only apps that implement Shortcuts extensions | Any app with an accessible AX tree or DOM |
| Privacy surface | Pixel stream to cloud vision model on every step | Structured data read on-device; model called at decision points only |
What to actually do this week
Audit your current automations
Identify which ones broke in the last 6 months. The ones that broke were almost certainly targeting pixel coordinates, button indices, or Shortcuts actions in third-party apps.
Separate first-party from third-party
Apple apps (Mail, Calendar, Reminders, Safari, Numbers) have stable Shortcuts and AX surfaces. Third-party apps need a different strategy: AX tree for native Cocoa, DOM for Electron.
Start with data-movement workflows
The 12 survivors documented in the r/MacOS thread were all data-movement tasks. Pick one workflow where you know both the input source and the output target, and build that first.
Validate the AX tree for your target apps
Open Accessibility Inspector (Xcode > Open Developer Tool). Navigate to your target app and inspect the tree. If roles are meaningful (AXButton, AXTextField, AXStaticText with labels), the app is automatable via AX tree.
Add one voice trigger
The Reddit thread on voice as primary interface found that friction is the constraint, not transcription accuracy. A hold-to-talk shortcut that fires a single well-scoped automation is more likely to stick than a complex multi-step workflow.
Myths this post corrects
What the 12 surviving automations have in common
The post named two examples directly: “pull the latest invoice number from email and stick it in this spreadsheet column” and “rename screenshots to the active app's name.” Both share the same structure:
- A single, well-defined input source with a stable API or AX surface (Mail, file system)
- A single, well-defined output target (Numbers column, file rename)
- No cross-app reasoning required at execution time
- Failure is explicit and visible, not silent
The automations that broke were the ones that depended on the rendered state of third-party apps: UI positions in Slack or Notion, button coordinates in Linear, menu structures in non-first-party apps. Every OS update, every app update, every design refresh invalidated them.
This is not a temporary problem. The macOS app ecosystem is divided between Apple-controlled apps with stable accessibility surfaces and third-party apps, especially Electron-based ones, that have weak or flat AX trees. Until Electron apps implement better accessibility semantics (not happening at scale in 2026), the workaround is DOM access via the browser debugging protocol. That is what Fazm does for apps like Notion, Linear, and Slack.
Honest caveats
- AX tree coverage is not universal. Sandboxed App Store apps can hide elements from the accessibility tree entirely. Game engines, video players, and some drawing apps render into a surface with no AX representation. The 12-survivor pattern holds here too: limit your automations to apps with good accessibility coverage.
- DOM access is not always available. The browser debugging protocol requires the target app to have devtools enabled or to be running in a debug mode. For enterprise Electron apps with locked-down configurations, this path may not be available.
- Semantic stability is not guaranteed. Apps can change accessible labels in redesigns. The AX tree approach is more stable than screenshot-based approaches, but it is not immune. A major app redesign can still break your automation.
- Local LLMs are a real tradeoff. Running a 35B model locally on an M-series Mac works well for short, well-scoped tasks. For long agentic contexts (15+ steps, multi-app workflows), frontier models still outperform local ones on reliability. The relevant thread: x.com/m13v_/status/2049538035130532072.
Fazm is a macOS agent built on the AX tree. Open source on GitHub, free to start.
Download Fazm for macOSFrequently asked questions
Why do most Mac automations break after an OS update?
Because they depend on surface-level details that change: button labels, menu positions, pixel coordinates, UI element indices. Shortcuts tries to abstract this but only for apps Apple controls. AppleScript and accessibility APIs reach deeper, but Electron apps serialize their entire UI into a flat tree with no semantic roles, so any update to the app's internal structure silently breaks your selectors. The only automations that genuinely survive are ones targeting stable semantic roles in native Cocoa apps, or ones talking to APIs that live below the UI layer entirely.
What is the AX tree and why does it matter for automation?
The AX tree (accessibility tree) is a structured representation of every UI element in a running macOS app, exposed through Apple's Accessibility framework. It gives you elements by their semantic role (button, text field, list row) and label, not by pixel coordinates. A native Cocoa app's AX tree is stable across minor updates because Apple maintains the accessibility surface deliberately. Electron apps are the exception: they render a flat tree with weak semantic roles, so AX-tree automation on Electron is only marginally better than screenshot automation.
What are the 12 types of automations that actually survive?
The patterns that survive are all data-movement workflows between stable surfaces: pulling invoice numbers from email and writing them to a spreadsheet, renaming files based on active app context, posting to first-party Apple apps (Mail, Calendar, Contacts), reading from and writing to native text fields in Cocoa applications. These work because the target elements are named semantically and Apple maintains that surface. Browser-based automations also survive when targeting the DOM directly rather than the rendered pixels.
Why does Shortcuts feel abandoned for third-party apps?
Shortcuts actions for third-party apps require those apps to explicitly implement the SiriKit or App Intents extension. Most third-party apps, especially Electron apps like Notion, Linear, and Slack, do not implement these extensions at a meaningful depth. Notion exposes basic page creation; Slack exposes almost nothing. Apple's own apps (Mail, Calendar, Reminders, Safari) have deep Shortcuts integration because Apple built it. The framework is not abandoned, it just requires investment from app developers who have not prioritized it.
How does Fazm handle the Electron app problem?
Fazm uses a hybrid approach. For native Cocoa apps it reads the AX tree directly, which is fast and stable. For Electron apps it falls back to DOM access via the browser's debugging protocol, which gives it the full semantic structure of the web page rendered inside the Electron shell. This bypasses the flat AX tree problem for apps like Notion and Linear entirely, since the underlying HTML has all the semantic roles the AX tree flattens away.
What is vision-first computer use and why does it break the latency budget?
Vision-first agents take a screenshot of your screen on every action step, send the pixels to a vision-capable model, and wait for coordinates and a verb back. Stacking two or three model calls per step creates latency that compounds quickly: 200ms per vision call times 15 steps in a workflow is 3 seconds of pure model latency, before any actual action. The AX tree approach reads the structure in microseconds on-device and only calls the model at decision points, not on every interaction.
Does running a local LLM solve the latency problem for agentic tasks?
Partly. A 35B-class local model on an M-series Mac handles chat-window inference well enough. The constraint that changes with agents is throughput over long contexts. A 15-step workflow generates a lot of intermediate tool-call history, and as the context grows the on-device latency increases non-linearly. Local LLMs are a reasonable fallback or privacy solution, but treating them as equivalent to frontier models on extended agentic tasks in 2026 overstates current hardware capability.
What does 'boring office plumbing' mean in the context of automation?
The phrase comes from the viral Reddit post. The 12 automations that survived two weeks of building were all mundane: pulling invoice numbers, renaming screenshots, moving data between known fields. No autonomous research pipelines, no multi-step reasoning chains. The pattern repeats across every automation practitioner who measures survival rates: the automations still running after 12 months are the ones with a single, well-defined input source and a single, well-defined output target.
Where can I read the source code for how Fazm uses the AX tree?
The core accessibility interaction layer is in github.com/m13v/fazm. The AX tree read path is in Desktop/Sources/Accessibility/. The hybrid fallback for Electron apps is in Desktop/Sources/Browser/. Both are MIT-licensed and readable without Xcode for anyone who wants to verify the architecture before trusting it.
Adjacent reading
Accessibility API vs screenshot agents: the honest comparison
Why the AX tree approach carries less vendor dependency than screenshot-based agents, and how Fazm's execution layer survives even when the model endpoint goes down.
Simple automations that actually stick
The automations running for 18 months in production are embarrassingly boring: form filling, data moving, report compiling. The complexity trap and how to avoid it.
AI desktop automation: boring tools that work
What the practitioner community has learned about desktop automation survival rates. The pattern is consistent: simple input-output pipelines outlast multi-agent architectures.
Want to see which of your Mac workflows the AX tree can handle?
Twenty minutes with the team. Bring your current automation stack and we will trace which workflows survive a system update and which ones are one app update away from breaking.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.