TWO DEFINITIONS / ONE CATEGORY MISSING / AX TREE NOT PIXELS

The roundups of local AI apps for Mac all list the same category. Fazm is a different one.

Open any article titled “best local AI apps for Mac” and you will see LM Studio, Jan, Ollama, GPT4All, Enclave, BoltAI, Locally AI. Different UIs around the same idea: run a quantized language model on your hardware, give it a chat window, done. Fazm does not compete in that lane. It is a local Mac app whose job is to read the AXUIElement tree of every other app you have open and take actions inside them, using the same accessibility surface that VoiceOver uses. The proof it belongs to a different category: at boot it runs a three-stage liveness probe on that surface, because if the tree is unreachable the app has nothing to do.

Matthew Diakonov, Fazm

Published April 24, 202611 min read

4.9from Written from the Fazm source tree

Reads the real AX tree, not screenshots

Three-stage permission probe at boot

Works with every AX-compliant Mac app

Five bundled MCP servers

Signed notarized macOS app

Local AI, defined twice

One definition is already well-covered. The other one isn't.

Most local AI apps run a model on your hardware

Fazm runs on your hardware and acts on your other apps

It reads the macOS Accessibility tree, not pixels

At boot it probes that tree three different ways

Because nothing works if the tree is dead

0:00 / 0:05

THE PHRASE IS AMBIGUOUS

Two things people mean when they type “local AI apps”

The phrase has two reasonable meanings and the existing guides only cover one of them.

Meaning one: “an AI whose model runs locally on my hardware.” This is the meaning almost every article assumes. The answers are LM Studio, Jan, Ollama, GPT4All, Enclave, BoltAI, Locally AI. They all do the same thing with minor UX variations: download a quantized model, host it on your Mac, put a chat window in front of it. They are real and useful, especially if your bar is privacy or you have no network.

Meaning two: “an AI app that lives on my Mac and acts on the other apps I already have open there.” This is the meaning nobody lists for, and it is the meaning that matters if what you want is something that finishes a task rather than answers a question. The apps in meaning one cannot touch Mail, Finder, Messages, Safari, or Slack. They read what you paste into them; they do not operate anything.

Fazm is meaning two. This page is written from its source tree, not from marketing copy, so everything below is verifiable against files on disk.

THE LANDSCAPE, SORTED HONESTLY

What the popular options are actually for

A side-by-side that reflects what each of these apps does, rather than the marketing claim. None of this is a knock on any of them. They are excellent at what they are for. The point is that what they are for is not what this page is about.

Feature	Typical local-LLM chat apps	Fazm
Model runs on your hardware	Yes, that is the whole product	Cloud Claude Sonnet by default, swap is a config change
App runs on your Mac	Signed macOS app	Signed macOS app
Reads the AX tree of other Mac apps	No, the app only sees its own window	Yes, AXUIElement family via a bundled Swift MCP server
Clicks, types, scrolls in other apps	No	Yes, CGEvent synthesized via macos-use
Needs Accessibility permission	No, none required	Yes, and probes it three ways at boot
Works offline end to end	Yes, that is the point	Not by default, reasoning is cloud
Uses screenshots as the grounding signal	N/A, nothing to ground	No, AX tree is the default, screenshot on demand
Drives your real Chrome for logged-in sites	No	Yes, via Playwright MCP Bridge extension
Bundled MCP servers	None	Five (fazm_tools, playwright, macos-use, whatsapp, google-workspace)
Chat with it about a document you paste in	Yes	Yes

THE ANCHOR FACT

Three stages to check that the Accessibility tree is alive

A local chat app never has to ask this question. Its world is a text box and a model. Fazm’s world is your whole Mac, so the first thing it does on boot is check that the part of macOS it depends on is actually working. There are three stages and they each exist for a specific failure mode that the single-line API AXIsProcessTrusted() is known to miss.

The whole routine is in Desktop/Sources/AppState.swift. If you clone the repo and jump to line 308 you can read it yourself. An abridged version of the real AX round-trip, stage one:

Desktop/Sources/AppState.swift (lines 433 to 462, abridged)

The second stage is confirmAccessibilityBrokenViaFinder, which re-runs the same round-trip against Finder specifically. Finder is a known-well-behaved AX citizen, so if the tree of Finder is unreadable, the permission is truly broken. The third stage only fires if Finder is not running: it builds a CGEventTap and checks whether the tap can be created. Event tap creation hits the live TCC database and therefore bypasses the per-process permission cache that can go stale on macOS 26 (Tahoe) after a system update or an app re-sign.

WHAT THE BOOT LOG LOOKS LIKE

On a healthy Mac, the probe logs look like this

Every line below is something the app actually writes to its log file. The values change (pid, frontmost app name, model version), but the shape is the same on every boot. If the tree is broken the probe writes the failing line and the onboarding flow takes over.

~/Library/Logs/Fazm/app.log

NUMBERS THAT ARE IN THE SOURCE

Specific values pulled from files on disk

Every number below is a literal in the repo. Nothing is invented. You can clone the public mirror of the Fazm source tree and grep for any of them.

0bundled MCP servers in BUILTIN_MCP_NAMES

0stages in the accessibility probe

0line of BUILTIN_MCP_NAMES in acp-bridge/src/index.ts

0line of testAccessibilityPermission in AppState.swift

Line numbers are from the current main branch. The three probe stages are testAccessibilityPermission (stage one), confirmAccessibilityBrokenViaFinder (stage two), probeAccessibilityViaEventTap (stage three). The five MCP names are fazm_tools, playwright, macos-use, whatsapp, google-workspace.

THE EXECUTION SURFACE

Where an English sentence ends up on your Mac

From one English sentence to actions across your apps

The hub is a local process, not a server. It is a short-lived Node program that multiplexes between the agent and the five MCP peers. The peers are all either bundled binaries or a local Playwright instance. Nothing in this diagram is a cloud component except the reasoning model at the top, and the reasoning model is swappable.

AXUIElementkAXFocusedWindowAttributeAXIsProcessTrustedCGEventTapAXUIElementCreateApplicationAXUIElementCopyAttributeValueBUILTIN_MCP_NAMESkAXTrustedCheckOptionPromptAXError.cannotCompleteAXError.apiDisabledmcp-server-macos-usemacos-usefazm_tools

WHY AX AND NOT PIXELS

The accessibility tree is smaller, faster, and correct by construction

A full accessibility traversal of a mid-size app is a few kilobytes of flat text. A 2x Retina screenshot of the same window, base64-encoded for a model, is on the order of 500 KB to 2 MB. That is a two to three order-of-magnitude gap in per-step cost, and it repeats on every turn of the agent loop. The AX path is also correct in cases where the screenshot path is flaky.

Exact coordinates, not pixel guessing

Every element row is shaped [Role] "text" x:N y:N w:W h:H visible. The click handler auto-centers at (x+w/2, y+h/2) in screen coordinates. Screenshot-based agents routinely confuse window-relative and screen-relative coordinates because pixel positions in a screenshot do not match the screen origin offset.

Labels, not OCR

A Confirm button is labeled Confirm in the tree even when it is rendered as a glyph, an emoji, or an icon font. Screenshot agents have to OCR the glyph on every step and can misread custom controls.

Dark mode and RTL survive

Accessibility labels are independent of theme and layout direction. A Mac in dark mode with Hebrew locale returns the same tree structure as a default setup.

The agent does not see the screen

Nothing the agent reads contains pixel data unless a screenshot is explicitly captured. That also means the model cannot be spooked by an off-brand splash screen or a loading spinner that happens to look like a button.

Shared with screen readers

Every app that works with VoiceOver works with Fazm for free, because both consume the same AX surface. When a vendor improves their accessibility, Fazm inherits the improvement on the next launch.

WHAT IT LOOKS LIKE END TO END

One task, from English to a completed click

You type one English sentence

No triggers, no formatting, no slash commands. The floating control bar accepts a plain sentence like: reply to the last email from Anna with a short acknowledgement, then open Calendar and book an hour tomorrow morning.

The agent picks a tool and a target

Claude Sonnet receives the sentence along with the bundled tool schemas for macos-use, playwright, whatsapp, google-workspace, and fazm_tools. It decides which tool to call first. For Mail, it calls macos-use with a traverse of the Mail app.

The Swift server reads the AX tree

The Swift MCP binary bundled at Contents/MacOS/mcp-server-macos-use calls AXUIElementCreateApplication on the Mail pid and walks its tree. It writes a flat .txt file with one element per line and returns the path. The agent greps that text for the correct row.

A click or keypress is synthesized

The agent replies with a click_and_traverse or type_and_traverse tool call. The server issues CGEvent-based clicks or key presses to the target process. A fresh traversal is written immediately after so the agent can verify the state changed the way it expected.

The loop continues

For the Calendar leg the agent switches targets and repeats. The transcript lists every tool call, every file path, every outcome. You can read it after the fact, or stop the agent mid-run by closing the bar.

WHAT YOU DO NOT HAVE TO DO

Everything the model-on-device crowd still requires

Model-on-device apps are genuinely private, but they also carry a tax most users do not want to pay. Fazm is a consumer install. None of the setup below applies.

Not required to use Fazm

Downloading a 4 GB to 40 GB quantized model file
Choosing between GGUF, MLX, and vanilla safetensors
Picking a quantization level
Configuring a context window and a system prompt
Running a server on localhost and remembering the port
Installing Xcode, Homebrew, pip, npm, or Docker
Managing a llama.cpp or MLX backend yourself
Watching your Mac heat up while a model loads into RAM

This is not a knock on model-on-device apps, which solve a real problem for people who specifically need offline inference. It is a note that the setup they ask of you is not the setup Fazm is asking of you, and that difference is part of what places them in a different category.

Want to see the tree probe on your own Mac?

Fifteen minutes on a call, we run Fazm on your machine, walk through the accessibility probe, and show you a task end to end on your own apps.

Frequently asked

Why is Fazm not listed on other 'local AI apps for Mac' roundups?

Because those roundups are almost always a catalog of on-device LLM chat wrappers (LM Studio, Jan, Ollama, GPT4All, Enclave, BoltAI, Locally AI). The sorting variable they use is 'which quantized models can you run on Apple Silicon.' Fazm does not compete on that axis. It is a signed macOS app whose job is to read the accessibility tree of every other app you have open and take actions inside them. The model it uses for reasoning is Claude Sonnet (see DEFAULT_MODEL on line 1245 of acp-bridge/src/index.ts, set to claude-sonnet-4-6), called over the network, so by the strict 'model runs on my hardware' definition it does not qualify. By the more useful definition ('AI that lives on my Mac and acts on my local apps'), it is exactly the thing the phrase should mean.

What does 'reading the accessibility tree' mean in practice on a Mac?

macOS has an API family rooted at AXUIElement that every assistive technology uses. VoiceOver, Switch Control, and third-party screen readers all talk to it. It exposes a typed tree of UI elements for every app: role (button, text field, window), label, visible bounds, children. Fazm calls AXUIElementCreateApplication on the pid of the target app, then AXUIElementCopyAttributeValue with attributes like kAXFocusedWindowAttribute and walks from there. You can see the exact call site in Desktop/Sources/AppState.swift at line 439. The agent receives this tree as structured text rather than pixels, which is why it behaves correctly on dark mode, non-Latin languages, custom fonts, and high-DPI displays where screenshot-based agents tend to fail.

Does this mean my data stays on my Mac, like a local LLM would keep it?

The execution surface is local. Every AX call, every click coordinate, every keystroke is synthesized inside your user session and never leaves the machine. The accessibility rows the agent consults are the same metadata VoiceOver sees (role, label, visible frame), not pixel data. Conversation state, indexed files, and the per-user knowledge graph are all in a GRDB SQLite database at ~/Library/Application Support/Fazm/ on your disk. The reasoning model is cloud-hosted today, but swapping it for an on-device model is a config change, not a rewrite, because the entire agent loop is already structured around local tools rather than around an in-process LLM.

Why does Fazm's startup run three different accessibility checks instead of one?

Because AXIsProcessTrusted, the one-line macOS API that every tutorial points at, returns cached TCC state that can be stale for hours after a macOS update or after an app re-sign. On macOS 26 (Tahoe) the cache is worse. If Fazm trusted that one call, it would tell users 'permission granted' on boots where the permission is actually broken, leading to silent 'agent does nothing' failures. The boot sequence is in Desktop/Sources/AppState.swift. Stage one is AXIsProcessTrusted plus a real AX round trip against the frontmost app. Stage two runs the same round trip against Finder specifically if stage one returned cannotComplete (which could mean either broken permission or an app like PyMOL that simply doesn't implement AX). Stage three, the tie-breaker, creates a CGEventTap because event tap creation checks the live TCC database and bypasses the stale cache entirely. A local chat app never has to do any of this because it never touches another app's tree.

Which apps can Fazm actually drive through this accessibility path?

Any app that exposes an accessibility tree, which is close to all of them on macOS. Native Apple apps (Finder, Mail, Calendar, Notes, Messages, Safari, System Settings, Keynote, Pages, Numbers) all work. Electron apps (Slack, Discord, Notion, Cursor, VS Code) work because Electron forwards through the Chromium accessibility layer. Catalyst apps like WhatsApp work and get a dedicated MCP server on top (see BUILTIN_MCP_NAMES Set at acp-bridge/src/index.ts line 1266). The class of apps where AX is thin or broken (some Qt apps, some OpenGL or Metal full-surface apps, older cross-platform framework apps) is handled by falling back to screenshots, but that is the exception path, not the default.

How is Fazm different from the 'computer-use' screenshot-based agents I keep seeing demos of?

Screenshot-based agents see pixels and guess what they mean. Fazm reads the accessibility tree and knows. This matters for three concrete reasons. First, the tree contains coordinates: every element line looks like [Role] "text" x:N y:N w:W h:H visible, so the click handler auto-centers at (x+w/2, y+h/2) rather than guessing from a screenshot where (0,0) is the window, not the screen. Second, the tree contains labels: a Confirm button is labeled Confirm even if it is rendered as a glyph; screenshot-only agents have to re-OCR on every step. Third, the tree is small: a full traversal of a mid-size app is kilobytes of text, not megabytes of base64 PNG, so the round trip per action is faster and cheaper. Fazm does take screenshots, but only when explicitly needed for visual context, never as the default grounding signal.

If I am already using LM Studio or Ollama, do I also need Fazm?

They answer different questions. LM Studio and Ollama answer 'can I run a large language model without an API key.' Fazm answers 'can I say in English what I want done across the apps on my Mac, and have something actually go do it.' The two are compatible: if you point LM Studio at a model on localhost and then wire Fazm to use that endpoint via a custom provider, you get both shapes at once, a fully-on-device loop where the model is local and the execution surface is local. That combination is a config change, not a rewrite, because Fazm's provider layer is already pluggable (see Desktop/Sources/Providers/). The shipped default is cloud Claude for reasoning quality, with no change required to the rest of the agent loop to swap it.

How do I know the agent is not going to do something unexpected on my Mac?

Every AX-driven action passes through a single executor path that logs the tool call and the resulting tree before the next turn runs, so you can read the transcript. In the browser leg, Fazm injects a full-viewport overlay with id fazm-overlay at z-index 2147483647 and pointer-events:none on every page the agent touches, so the user can see the AI is driving and still click through. You can stop the agent mid-task by closing the floating control bar or by typing a new instruction that interrupts. The permission surface itself is scoped: Fazm can only do what macOS Accessibility allows any assistive technology to do, which specifically excludes direct kernel access, keychain reads without prompts, and other high-privilege operations.

Is this a developer tool I have to configure, or can my non-technical parent use it?

It is a consumer app. There is no pip, no npm, no Docker, no API key to generate, no Homebrew tap. You download the DMG, drag into /Applications, grant Accessibility permission when prompted, and start typing in English. Under the hood five MCP servers are bundled (fazm_tools, playwright, macos-use, whatsapp, google-workspace) and a signed universal binary Swift server at Contents/MacOS/mcp-server-macos-use wraps the AX APIs. The user touches none of that. The model chain, the tool selection, the tree reading, the click synthesis are all below the English bar.

What happens on the first run if I never grant Accessibility permission?

You cannot use the part of the app that acts on other apps. The onboarding flow asks explicitly and shows macOS's own permission sheet via AXIsProcessTrustedWithOptions with kAXTrustedCheckOptionPrompt set to true (see Desktop/Sources/AppState.swift around line 518). If you decline, the chat still works but any tool call into macos-use returns a structured error that the agent reads and then reports to you in the reply. If you grant and it breaks later (common on macOS updates), the three-stage probe catches it and the app shows a Quit & Reopen prompt rather than silently continuing with a dead tree.