Local AI chatbot, the second definition every guide skips
The top search results for this keyword agree that "local" means the model weights run on your Mac. Ollama, LM Studio, Jan, Locally AI, Enclave AI, GPT4All. Every one of those is a good answer to the question "can an LLM run on my laptop." None of them answer the more useful question: can a chatbot read what is on my screen right now without bouncing pixels through a vision model. Fazm does. It calls `AXUIElementCopyAttributeValue` on whatever app you have focused and walks the macOS accessibility tree, so the chatbot sees your Mail, your Numbers sheet, your Cursor file, or your Figma canvas as structured text instead of a JPEG. The one line of Swift that does it is pinned below.
“A Fazm query does not start with a screenshot. It starts with AXUIElementCopyAttributeValue(appElement, kAXFocusedWindowAttribute as CFString, &focusedWindow) at AppState.swift line 441. That one Apple framework call returns a handle to the real focused window of whichever app you are in, as a tree of roles, titles, values, and children. The chatbot sees structured UI, not pixels.”
Desktop/Sources/AppState.swift, line 441
The anchor fact: how Fazm reads your screen without looking at it
Every other Mac chatbot that advertises "sees your screen" does one of two things. Either it prompts you to paste text into its window, or it takes a screenshot and feeds the JPEG to a vision-capable model. Fazm does neither by default. It reaches into the focused app through the same OS API that VoiceOver uses. Here is the code, unedited.
`frontmostApplication` is the app you were in a moment before you invoked Fazm, not Fazm itself. `AXUIElementCreateApplication` builds a live handle to that app. `AXUIElementCopyAttributeValue` with `kAXFocusedWindowAttribute` returns a handle to the specific window currently focused. From there a chatbot can descend the accessibility hierarchy, pull the value of the text field under the cursor, the title of the selected row in a list, or the label on the button you are hovering, all as plain strings. Nothing is rendered. Nothing is captured.
The AX read from Swift is half the story. The other half is a native binary called `mcp-server-macos-use` that ships inside `/Applications/Fazm.app/Contents/MacOS/`. It is a Model Context Protocol server that exposes a structured interface over the same Accessibility APIs to Claude. Every tool call prefixed with `mcp__macos-use__` routes through this binary, which walks the AX tree in Rust and returns structured results in milliseconds. You can verify the file with `ls /Applications/Fazm.app/Contents/MacOS` on any Mac with Fazm installed.
The two meanings of "local" in a chatbot, side by side
Both are legitimate. Both solve real problems. They are almost always framed as the same thing by the SERP, and they are not. Picking the right one depends on whether your priority is offline privacy or on-machine usefulness.
Local inference vs local context
The language model runs on your Mac. Zero network calls, zero cloud bill, full offline operation. The chatbot is a text window; it sees only what you type or paste into it. If you want to ask about a Mail thread you are reading, you copy the text into the chat first. Good fit for notes, brainstorming, sensitive documents, and flights with no WiFi.
- Model weights sit on your disk
- Fully offline, zero cloud round trips
- Chat window cannot see other apps
- You copy and paste what you want it to know
How a single Fazm query flows through your Mac
The diagram is literal. Four local inputs converge on the ACP bridge, which is the Node subprocess Fazm spawns. The bridge negotiates with Claude, but every input on the left lives on your machine. The cloud model only ever sees the distilled structured-context packet, not the raw pixels or permissions.
Fazm chat query: local inputs to model output
Six dimensions of "local" in a chatbot
The word hides more than one decision. A chatbot can be local on one axis and remote on another. The SERP currently treats only the first axis as "local," but the other five matter just as much for a real Mac workflow.
Local model weights
Ollama, LM Studio, Jan, Locally AI, Enclave, GPT4All. The model binary sits on your disk and every token is generated on your Mac. Solves privacy, works offline, costs zero marginal per message.
Local screen context
Fazm. A chatbot window that can read the focused app through Accessibility APIs, control other apps through a bundled MCP server, and store conversations in a local SQLite file. The model is remote, everything else is on your machine.
Local conversation history
Both categories store chat logs in a local database and let you search them offline. Ollama and Fazm both write to SQLite-style stores, never to a cloud sync service.
Local tool execution
The chatbot runs commands, reads files, and manipulates apps from your machine, not a cloud sandbox. This is where the Accessibility API story matters most, and where the local-inference crowd has historically been weakest.
Local permissions model
macOS gates Accessibility, Screen Recording, and Automation per-app in System Settings. A proper local chatbot lives inside that permissions model and asks for exactly the grants it needs, nothing more.
Local knowledge graph
Fazm ships an on-disk knowledge graph of files, people, and past observations. The observer agent writes to it continuously; the chatbot queries it without leaving your machine.
Local inference vs local context on specific workflows
Concrete tasks a Mac user actually does, scored against the two definitions of local. Not "Fazm beats Ollama." Ollama wins every offline row. The point is that the shape of the problem decides the shape of the tool.
| Feature | Local inference (Ollama / LM Studio / Locally AI) | Local context (Fazm) |
|---|---|---|
| Works offline with no internet | Yes, model runs on your CPU or M-series GPU | No, Claude is cloud-hosted |
| Reads your focused Mail message without copy-paste | No, chat window cannot see other apps | Yes, via kAXFocusedWindowAttribute |
| Clicks a button in another app on your behalf | No, no tool channel into the OS | Yes, via bundled mcp-server-macos-use |
| Answers 'what is in my clipboard' without paste | No, requires manual paste | Yes, AX-adjacent NSPasteboard read |
| Uses your Claude Pro or Max subscription allowance | No, no Claude integration | Yes, via OAuth bridge (ACPBridge.swift line 350) |
| Zero marginal cost per message | Yes, unconditionally | Yes, if on Claude allowance or Haiku trial |
| Works on a flight with airplane mode on | Full chat, no limits | Stored messages only, no new responses |
| Stores every conversation on disk in SQLite | Varies by app; most also use local stores | Yes, ~/Library/Application Support/Fazm/fazm.db |
| Surfaces conversation history to a screen reader | Varies by app, usually yes | Yes, native SwiftUI views with AX labels |
| Bundled native MCP server inside .app | No (not the same problem space) | Yes, Contents/MacOS/mcp-server-macos-use |
What happens, step by step, when you ask Fazm about your screen
Six steps from keystroke to answer. Every step is either a local API call or a subprocess on your Mac. Only step five leaves your machine, and what leaves is a structured string, not a screenshot.
1. You type a question into the Fazm floating bar
The bar is a tiny always-on-top window. It is NOT your focused app — that is whatever you were working in a moment ago, tracked via `FloatingControlBarManager.shared.lastActiveAppPID`.
2. Swift builds an AX handle to that app, not to itself
`let appElement = AXUIElementCreateApplication(frontApp.processIdentifier)` (AppState.swift line 439). The PID is the app you were last in, so Fazm reads the real thing you are working on, not its own chat UI.
3. Fazm pulls the focused window element over the AX bridge
`AXUIElementCopyAttributeValue(appElement, kAXFocusedWindowAttribute as CFString, &focusedWindow)` (AppState.swift line 441). This is the one line that lets a chatbot see a Mail message, a Numbers cell, a Figma layer, or a Cursor file tab without a screenshot.
4. The bundled MCP server traverses the AX tree on demand
The `mcp-server-macos-use` binary sitting at `Contents/MacOS/mcp-server-macos-use` is wired as a Claude tool (acp-bridge/src/index.ts line 1063). When the model needs a button coordinate, a text-field value, or a menu path, it calls an MCP tool that walks the accessibility hierarchy in Rust, not a vision model that guesses at pixels.
5. Only the structured snippet ships to Claude
What the model sees is a short JSON blob: role, title, children, coordinates. Not a 1568-pixel JPEG. Token cost drops by an order of magnitude compared to a vision-first path, and the model can reason about the UI as a tree instead of as a picture.
6. Screenshot fallback exists but is the last resort
`Desktop/Sources/Providers/ChatToolExecutor.swift` lines 895-927 implements `capture_screenshot` with mode 'screen' or 'window'. It is reachable, just not the default. AX tree reads happen first because they are faster, cheaper, and more precise for structured UI.
The local wiring, in numbers
Four line numbers. One native binary. One AX call. That is the whole local-context stack.
Verify the local-context story with three greps
If any of the above sounds too neat, these three commands prove the wiring exists. Run them against a fresh clone of the Fazm desktop repo or against the bundled source in `/Applications/Fazm.app`.
Three consequences of choosing local context over local inference
Picking the second definition of local changes the daily experience of the chatbot more than the model choice does. These are the three things a user actually notices within the first week.
Copy-paste disappears
The chatbot already sees the Mail message, the Numbers cell, the Figma frame. You ask "summarize this thread" without any setup. The structured AX read covers what used to be the paste step in a local-inference workflow.
Action replaces suggestion
Because the mcp-server-macos-use binary can click, type, and navigate in any native app, the chatbot moves from "here is a draft" to "I sent the reply, here is the message ID." The offline chatbot has no comparable action channel.
Permissions become the UI
Screen Recording, Accessibility, Automation. Each one is a capability the chatbot unlocks. An offline chatbot has nothing to ask for. A local-context chatbot guides you through a one-time System Settings grant and from there has any-app reach.
How the "local AI chatbot" SERP currently maps
Every product the top results list, tagged by its definition of local. Fazm is the only entry on the context axis.
The chatbot that reads your live Mac, not a screenshot of it
Fazm is a macOS-native AI chatbot that treats your Mac as the context, not as an afterthought. It reaches into the focused app through macOS Accessibility APIs, controls other apps through a bundled native MCP server, and stores every conversation in a local SQLite database on your disk. The cloud model is Claude over your own OAuth session. Everything else is local.
Download Fazm →Frequently asked questions
What does 'local AI chatbot' actually mean in April 2026?
The SERP treats it as a single thing: a chat app that runs the language model on your Mac instead of calling a cloud API. Ollama, LM Studio, Jan, Enclave AI, Locally AI, Chatbox, and GPT4All all fit that definition. That is local inference. There is a second, underserved meaning: a chatbot that runs locally on your machine and reads your live desktop context through operating-system APIs, so it can actually help with what you are doing in whatever app you have focused. Fazm is the second kind. It uses a cloud model but its desktop integration is deeply local: structured accessibility reads of the focused window, clipboard, running apps, and a bundled native mcp-server-macos-use binary inside the .app bundle.
Is Fazm a local AI chatbot if the model runs in the cloud?
Depends on which definition you use. Model-locality: no, Fazm talks to Claude (or a bundled Anthropic key during the free trial). Context-locality: yes, more aggressively than the on-device-inference apps. The chatbot lives in a floating bar on your Mac, reads the focused app through Accessibility APIs before every query, controls other apps through the bundled mcp-server-macos-use MCP server, and stores every message in a local SQLite database. The screen content never leaves your computer as pixels, it leaves as a tiny structured JSON snippet (role, title, value, children) only when the model needs it.
How does Fazm read the focused app without a screenshot?
Through macOS Accessibility APIs. The code is in `/Users/matthewdi/fazm/Desktop/Sources/AppState.swift` line 441: `AXUIElementCopyAttributeValue(appElement, kAXFocusedWindowAttribute as CFString, &focusedWindow)`. `appElement` is built from `NSWorkspace.shared.frontmostApplication`, so the read targets whatever app you are currently in, not Fazm's own window. From the focused window element Fazm can walk the AX tree, pull button labels, menu items, list rows, text-field values, and selected text, all as structured strings. No pixel capture, no OCR, no VLM.
What is the native binary inside Fazm's app bundle that does the UI reads?
A Rust binary called `mcp-server-macos-use`, wired up in `/Users/matthewdi/fazm/acp-bridge/src/index.ts` line 1063. The TypeScript ACP bridge registers it as an MCP server named `macos-use` when the binary exists in `Contents/MacOS/mcp-server-macos-use`. From there every Claude tool call that opens with `mcp__macos-use__` runs against the live accessibility tree. You can verify the binary yourself with `ls /Applications/Fazm.app/Contents/MacOS` on any machine with Fazm installed.
Why use Accessibility APIs instead of screenshots for a chatbot?
Three reasons, all practical. One, latency: an AX tree read is milliseconds, a screenshot-plus-vision round trip is seconds. Two, precision: AX returns role=Button title='Send' directly, vision has to infer the same thing from pixels and often misreads icon-only buttons. Three, any-app coverage: AX works the same way in Mail, Numbers, Finder, Cursor, Xcode, Slack, and 99 percent of native macOS apps, whereas screenshot parsing depends on visible pixels and breaks on minimized windows, overlays, and off-screen content. The tradeoff is Accessibility permission, which the user grants once in System Settings.
How does Fazm compare to Ollama, LM Studio, Locally AI, and Enclave AI?
Those are all high-quality local-inference apps. They run the model on your Mac, keep conversations in a chat window, and never call a cloud API. Their constraint is that the chatbot only sees what you paste into it. Fazm's constraint is inverted: it uses a cloud model but sees your whole Mac. If your question is 'can I use an LLM without internet,' pick Ollama or Locally AI. If your question is 'can I ask an LLM what is on my screen right now and have it actually do something about it,' Fazm is the short list. Different tools, different problems, both valid.
Does Fazm work with any app or only specific ones?
Any AX-compliant native macOS app, which is the overwhelming majority of apps you use day to day: Finder, Mail, Notes, Safari, Chrome, Firefox, Slack, Discord, Telegram, Messages, WhatsApp, Spotify, Zoom, Numbers, Pages, Keynote, Xcode, Cursor, VS Code, Figma, Excel, Word, Outlook, Notion, Linear, Arc, Affinity, Sketch, Terminal, iTerm2, System Settings. The short list of things that are harder: Qt apps, OpenGL game canvases, and some Python tooling, which is why `Desktop/Sources/AppState.swift` line 454-458 has a specific `.cannotComplete` fallback that double-checks against Finder before assuming the permission itself is broken.
Is Fazm open source and can I verify these file paths?
The Fazm desktop app is fully open source. Every file path in this guide is a real path in the repository. The relevant files: `Desktop/Sources/AppState.swift` for the AX permission and focused-window probe, `Desktop/Sources/Providers/ChatToolExecutor.swift` for the capture_screenshot tool that is used only when a pure AX read is not enough, and `acp-bridge/src/index.ts` for the MCP server wiring. You can `rg -n kAXFocusedWindowAttribute Desktop/Sources` and `rg -n macos-use acp-bridge/src` yourself.
What if I want both a local model and local screen context?
Nothing in Fazm's architecture blocks a local model. The ACP bridge spawns a Node subprocess and hands it a Claude OAuth session today, but the same bridge could hand it an Ollama endpoint instead. The harder problem is the Accessibility integration, which is what Fazm is actually providing. Running a llama on your Mac is a solved problem in 2026. Having a chatbot that understands what is on your screen without bouncing pixels to a vision model is not, and is the part that the top 'local AI chatbot' SERP results skip entirely.
Where is the capture_screenshot tool and when does Fazm fall back to pixels?
`Desktop/Sources/Providers/ChatToolExecutor.swift` lines 895-927. The tool accepts mode='screen' or mode='window' and returns a base64 JPEG. It is wired explicitly as a fallback: when the user asks 'what does this look like' or when Claude decides it needs visual confirmation of a chart, icon, or rendered document, it calls `capture_screenshot`. For 'what is in the to-address field of this email', 'which row is selected', 'what is the subtotal shown here', Fazm stays on the Accessibility path because it is faster and more precise.
How is this different from the accessibility features inside other chatbots?
Most 'accessible chatbot' guides on the web are about making the chatbot itself readable by screen readers (ARIA labels, semantic HTML, keyboard navigation). That is a separate topic. Fazm uses the same operating-system Accessibility APIs in reverse: instead of exposing Fazm to VoiceOver, Fazm uses macOS AX to read every other app on your Mac. Both are valid uses of the AX stack, but the SEO keyword 'local ai chatbot' is about the second kind, even though the SERP does not currently reflect that.
Does Fazm need internet at all?
For chat responses, yes. It calls Claude over HTTPS. For everything else (reading your screen, controlling apps, storing your messages, searching your local knowledge graph, running bundled skills), it is fully local. The conversation database is a SQLite file at `~/Library/Application Support/Fazm/fazm.db`. The AX tree reads go through `/System/Library/Frameworks/ApplicationServices.framework`. The mcp-server-macos-use binary runs as a child process inside the app bundle. An offline Mac will still remember everything, just cannot generate new responses.
What every "local AI chatbot" article stops short of saying
The offline inference story is covered exhaustively. There is a good guide for every model format, every quantization, every Apple Silicon trick. If you want to run a llama on your Mac in April 2026, the SERP is ready for you.
What the SERP has not caught up to is that the most useful kind of local chatbot is the one that reads your live desktop. Accessibility APIs have been sitting in macOS since 2001. The one Apple framework call at `AppState.swift` line 441, plus a native `mcp-server-macos-use` binary shipped inside the app bundle, is what turns a chat window into something that knows what you are working on. The model can be anywhere. The context has to be local.
If that framing is right for your workflow, Fazm is free to start, fully open source, and every file path in this guide is something you can verify yourself.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.