Local host AIAccessibility over screenshotsSource-verified, April 19 2026

A local AI that can use your other apps, not just run on your Mac

Search results for local host AI all stop at the same place. Here is how to download Ollama, here is an M-series quant, here is your localhost:11434 endpoint. What happens after you have a running model on your host? Can it read the email you have open? Can it click the calendar button? The honest answer in every one of those tutorials is no, not out of the box. Fazm is built around the other half. Its testAccessibilityPermission function does a three-stage probe of the macOS accessibility stack so the model on your local host can actually reach the apps running next to it.

F
Fazm
11 min read
4.9from 200+
Every file and line number cited is a real symbol in the Fazm open-source repo
Three-stage AX probe verified in Desktop/Sources/AppState.swift on 2026-04-19
Works around the documented macOS 26 Tahoe TCC per-process cache staleness
3 stages

testAccessibilityPermission (AppState.swift, lines 431-463) runs AXUIElementCopyAttributeValue on the frontmost app's PID, treats apiDisabled as unambiguous failure, on cannotComplete falls back to confirmAccessibilityBrokenViaFinder (lines 468-485), and finally to probeAccessibilityViaEventTap (lines 490-504) which calls CGEvent.tapCreate to bypass the stale per-process TCC cache on macOS 26 Tahoe. Retry every 5 seconds via schedulAccessibilityRetry. On confirmed failure, an NSAlert asks the user to Quit & Reopen, and relaunchApp runs /bin/sh -c 'sleep 1 && open <bundlePath>' before terminating.

Desktop/Sources/AppState.swift, Fazm open source

What the top SERP results actually give you

Every one of these is useful. Every one of them stops at model-on-disk. None of them answer the question can this local AI use the app I have open right now.

Ollama (serves a model)LM Studio (serves a model)Jan (serves a model)MLX-LM (serves a model)Foundry Local (serves a model)llama.cpp (serves a model)Fazm (serves a local agent that uses your apps)

The difference a real accessibility tree makes

Screenshot-based agents paint every screen, send it upstream, and ask a vision model to point at the button. Accessibility-based agents ask macOS for the widget tree the OS already maintains for VoiceOver, and target elements by role and description. The difference shows up in latency, cost, and correctness.

What the local host asks the OS

Your prompt
Mic input
Dropped file
Scheduled task
Fazm local host
kAXFocusedWindow
AXButton AXPress
AXValue read
CGWindow capture
tccutil reset
NSWorkspace open
0Stages in the AX permission probe
0Seconds between retry ticks
0AXError codes counted as healthy
0Relaunch to defeat TCC cache staleness

The Swift function that decides whether the local host can act

This is the real function. No pseudo-code, no translation. The only thing the model's agent loop needs to know is the boolean that falls out of it. Everything fancy Fazm does, clicking, typing, reading apps, gates on this check returning true.

Desktop/Sources/AppState.swift, lines 431-463

The five-stage recovery path when macOS gets confused

A one-shot call to AXIsProcessTrusted is what most tutorials write. That is not enough on modern macOS because TCC lies. Fazm wraps the probe in a five-stage loop that survives the common failure modes the user never thinks about.

1

Stage 1. Live AX call against the frontmost app

testAccessibilityPermission grabs the current NSWorkspace frontmostApplication, wraps its PID with AXUIElementCreateApplication, and calls AXUIElementCopyAttributeValue(appElement, kAXFocusedWindowAttribute, ...). Four return codes count as healthy: success, noValue, notImplemented, attributeUnsupported. AppState.swift lines 433-447.

2

Stage 2. Disambiguate cannotComplete against Finder

If the AX call returns cannotComplete, the permission might be broken, or the frontmost app might just not implement AX (Qt apps, OpenGL apps, PyMOL). Fazm re-runs the same AX probe against com.apple.finder, a known AX-compliant app. Only if Finder also fails is the permission declared truly broken. AppState.swift lines 468-485.

3

Stage 3. CGEvent tap probe for macOS 26 Tahoe TCC cache

If Finder is not running, or as a final tie-break, Fazm calls CGEvent.tapCreate with .cgSessionEventTap and .listenOnly. Unlike AXIsProcessTrusted, tap creation checks the live TCC database, bypassing the per-process cache that goes stale on macOS 26 when the user toggles the Accessibility switch. AppState.swift lines 490-504.

4

Stage 4. Retry timer, not a one-shot

The check does not run once and give up. A Timer fires every 5 seconds with a maxAccessibilityRetries budget. On every failed tick the state flips to isAccessibilityBroken, the floating UI shows a hint, and on retry success it clears automatically. AppState.swift lines 301-400.

5

Stage 5. Forced relaunch when macOS lies

On macOS Sequoia and Tahoe there is a failure mode where AXIsProcessTrustedWithOptions returns true but AX calls still fail. Fazm catches this, pops an NSAlert titled Accessibility Permission Needs Restart, and when the user clicks Quit & Reopen it spawns open "<bundlePath>" via /bin/sh and terminates. AppState.swift lines 403-429.

Why a CGEvent tap is the only probe that does not lie

AXIsProcessTrusted uses the per-process TCC cache. After macOS 26 Tahoe, that cache can report you as trusted while the actual AX calls keep failing. Creating a CGEventTap forces the kernel to re-read live TCC state for the calling process. If the tap handle comes back non-nil, you are genuinely trusted right now. Fazm invalidates it immediately, it only cares about the yes or no.

Desktop/Sources/AppState.swift, lines 490-504

What the local host can actually do once the probe returns true

Every capability below is a direct consequence of reading the Accessibility tree instead of pixels. The OS already maintains this graph for VoiceOver users, so the agent is piggybacking on twenty years of Apple's work rather than asking a vision model to recreate it.

Read the focused window tree

AXUIElementCopyAttributeValue(kAXFocusedWindowAttribute) returns the actual widget hierarchy with text, roles, and bounds. That is not a screenshot; it is the accessibility graph the OS already maintains for VoiceOver.

Click a button by role, not by pixel

A click targets an element with role AXButton and a specific AXDescription. Resolution-independent, layout-change-resilient, works in dark mode, works on a dragged window.

Read a field without OCR

Text fields expose AXValue directly. No Tesseract, no GPT-4V round trip, no screenshot upload. The OS hands you the string the user is looking at.

Inspect Finder, Xcode, Numbers

Apple's own apps are fully AX-instrumented. The agent reads cells, rows, selected items, the open folder path, the currently opened file.

Fall back to capture only when useful

If an app is Qt or Electron with a gnarled AX tree, ScreenCaptureManager.captureAppWindow grabs just that window (via CGWindowListCreateImage with .optionIncludingWindow) and hands it to a vision model. That is the exception, not the default.

Survive macOS 26 Tahoe's new TCC

The per-process TCC cache in Tahoe can flag you as trusted while actual AX calls return cannotComplete. Fazm catches the mismatch and self-heals with a relaunch instead of silently doing nothing.

What the health check looks like on a real Mac

When Fazm boots, it logs a line per probe stage to /tmp/fazm.log. Grep for ACCESSIBILITY_CHECK and you can watch the state machine make up its mind.

/tmp/fazm.log

Fazm's local host versus a classic local LLM stack

This is not a model benchmark. Both sides can run any model you point them at. The difference is what happens between the model and the rest of your Mac.

FeatureOllama / LM Studio / Foundry LocalFazm
What you run on your local hostA Docker stack or a Python venv running an LLM serverA signed consumer Mac app that hosts the agent on your box
How the local AI reads other appsTypically nothing. The model sits in a REPL or a web UImacOS Accessibility API (AXUIElement*) reading the real widget tree
Vision fallbackN/A or full-screen screenshot on every stepScreenCaptureKit / CGWindowListCreateImage, only when you ask
Scope of apps the AI can reachNone by default; browser-only via Playwright if wired upAny running app on your Mac, not just the browser
Permission discoveryAXIsProcessTrusted() one-liner that lies on macOS 263-stage probe: AX call + Finder tie-break + CGEventTap
Recovery when macOS TCC cache is staleYou restart the terminal and hopeNSAlert + relaunch-via-open so the new process reads fresh TCC
Shape of the deliverableREADME, Makefile, brew install, pip install, docker compose upDMG, signed and notarized, App Store style install

Two failure modes Fazm refuses to swallow

A local host AI is useful only to the degree you can trust it to tell you when it has stopped working. Two states that every tutorial glosses over:

Permission stuck after toggling

How Fazm recovers

  • Retry the AX call every 5 seconds for 30 seconds
  • If still stuck, show an NSAlert with Quit & Reopen
  • On Quit & Reopen, spawn /bin/sh -c 'sleep 1 && open <bundlePath>'
  • New process reads fresh TCC, probe passes, agent works

App does not implement AX at all

How Fazm recovers

  • AX call returns cannotComplete for this app only
  • Finder re-check passes, so permission is fine
  • Fall back to CGWindowListCreateImage to capture the window
  • Route pixels to a vision model for that one action, AX still handles the rest

Put the local host AI on your Mac

Fazm runs on your machine, talks to your apps through the accessibility layer Apple built for VoiceOver, and asks for screen capture only when AX is not enough. Free, open source, signed and notarized.

Download Fazm

Local host AI, answered against the source

What does 'local host AI' actually mean, and how is that different from a local model?

A local model means a neural net running on your hardware (Ollama, LM Studio, MLX, Foundry Local). A local host AI is the model plus everything around it: permissions, IO, and the ability to reach other apps on the same machine. If your model is local but it cannot read what is in Mail or type into Keynote, then your local host AI is a fancy chat window. Fazm's answer is that the local model calls a Mac-native host that talks to the OS through Accessibility APIs.

Why use macOS Accessibility APIs instead of screenshots like computer-use models do?

Three reasons. First, the AX tree is a structured string graph; screenshots are pixels that have to be re-read by a vision model on every turn. Second, the AX tree already has roles (AXButton, AXTextField, AXStaticText) so the agent can target elements semantically instead of guessing what a button looks like. Third, it is cheap: one AX call is a local function, a screenshot is a PNG plus an inference round trip. Fazm uses screenshots only as a fallback for apps that do not expose AX properly (Qt, some Electron builds).

Where in the Fazm source can I see this permission probe?

Desktop/Sources/AppState.swift in the open-source Fazm repo. Function names: testAccessibilityPermission (lines 433-463), confirmAccessibilityBrokenViaFinder (lines 468-485), probeAccessibilityViaEventTap (lines 490-504). The public retry timer that wraps them all is schedulAccessibilityRetry (lines 376-400). Everything is MIT licensed.

What is the macOS 26 Tahoe cache bug Fazm works around?

On Tahoe, each process has a cached view of which TCC permissions it owns. When the user toggles Accessibility for Fazm in System Settings while Fazm is running, AXIsProcessTrustedWithOptions starts returning true immediately, but the live AX calls continue returning cannotComplete or apiDisabled until Fazm restarts. Fazm sidesteps the cache entirely by testing with a real AX call and, if that is ambiguous, a CGEvent.tapCreate call which checks the live TCC database. If both come back bad it asks the user to relaunch.

Does Fazm send anything from my screen to a cloud?

By default, no. The AX tree never leaves your Mac. When you explicitly ask for screen context, ScreenCaptureManager uses CGWindowListCreateImage to capture the target window and feeds it into the model. You can see the captures in /tmp/fazm.log. If you pair Fazm with a local-inference backend like Ollama or LM Studio, even that round trip stays on your host.

Which apps does this actually work with?

Any app that implements AX. That includes Apple's whole family (Finder, Mail, Notes, Calendar, Messages, Xcode, Numbers, Keynote, Pages, System Settings), Safari and Chrome (including extension UIs), native Electron apps when they remember to set accessibility flags, Slack, Discord, Notion, Linear, Figma, Zoom, VS Code, Terminal, iTerm, and so on. The floating control bar detects the frontmost app and pulls its window tree regardless. Qt and raw SDL apps are the usual misses.

How is this different from just running Ollama plus a Python script?

Ollama gives you a model endpoint on localhost:11434. It does not know anything about your running apps. You could wire pyatspi or pyobjc on top, deal with the AX framework in Python, negotiate TCC permissions per process, and build a retry loop for macOS 26 Tahoe. Fazm does all of that in Swift and ships it as a signed DMG. The model half is interchangeable; Fazm talks to Claude by default but it can point at any local endpoint. The host half is the unique work.

What happens if I deny Accessibility permission?

Fazm keeps running but the agent loses its hands. You can still chat and the app can still render screen captures, but it cannot click, type, or read widget text. A floating hint appears in the control bar and a settings pane has a one-click tccutil reset flow that triggers an NSAlert and relaunches the app cleanly. The code path is resetAccessibilityPermissionDirect in AppState.swift line 544.

Is this open source so I can audit the claims?

Yes. Fazm's desktop is MIT-licensed at github.com/mediar-ai/fazm. Every file and line number cited on this page exists in that repo on 2026-04-19. The three functions that form the three-stage permission probe are all in Desktop/Sources/AppState.swift. The screen capture fallback is in Desktop/Sources/FloatingControlBar/ScreenCaptureManager.swift. The floating UI that reacts to the isAccessibilityBroken flag is next to it.

fazm.AI Computer Agent for macOS
© 2026 fazm. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.