Your locally-hosted model is the easy half. Plugging it into an agent that actually does things is the other half.
Every top result for local hosted AI ends at the same sentence. Pick a 7B, download Ollama, here is your localhost:11434. Great, you have a chat window. You do not have an agent. You do not have tool use. You do not have anything touching Mail or Calendar or Numbers. Fazm has a three-line shim in ACPBridge.swift (lines 379-382) that redirects its whole agent loop to your local endpoint. Paste a URL into Settings and your on-box model becomes the brain of a tool-using Mac agent.
“The entire local hosted AI capability in Fazm is three lines. ACPBridge.swift reads UserDefaults.standard.string(forKey: "customApiEndpoint") and, if non-empty, sets env["ANTHROPIC_BASE_URL"] = customEndpoint on the long-lived Node.js agent subprocess (spawned with --max-old-space-size=256 at line 343). The Node agent reads the env var at HTTP call time, so every /v1/messages request goes to your endpoint instead of api.anthropic.com. The settings UI is one @AppStorage-bound TextField in SettingsPage.swift line 936 with the placeholder https://your-proxy:8766.”
Desktop/Sources/Chat/ACPBridge.swift, Fazm open source
The three lines that make Fazm speak to any local model
No wrapper, no plugin, no model adapter registry. A UserDefaults read, a guard on empty string, and an env-var write. The Node subprocess Fazm spawns for its agent loop inherits that env, and the Anthropic client inside Node reads ANTHROPIC_BASE_URL at HTTP call time. That is it.
What happens between your prompt and your model
Every turn fans out through a Swift actor into a child Node process, talks to your endpoint, receives tool calls, and the Swift side executes them against the macOS Accessibility APIs. All of that sits on your Mac.
The local-hosted turn, end to end
The four-step recipe to get a locally-hosted agent running
Pick any model runner. Drop any Anthropic-shape shim in front of it. Paste the URL into Fazm. Use it.
- 1
Host a model
Anything that serves chat completions: Ollama, LM Studio, MLX-LM, llama.cpp, Foundry Local. Gets you http://localhost:11434 or :1234.
- 2
Put an Anthropic shim in front of it
LiteLLM, Claude-Code-Router, ollama-anthropic, or a 40-line Node proxy. Translates /v1/messages into /v1/chat/completions. Binds :8766.
- 3
Paste the URL into Fazm Settings
Settings → Custom API Endpoint → https://your-proxy:8766. Fazm writes it to UserDefaults key customApiEndpoint, restarts the Node agent subprocess, done.
- 4
Watch a local model click buttons in your apps
From here on, every agent turn reads the AX tree of your frontmost app, decides, and emits tool calls. None of that traffic leaves your Mac.
One agent turn, drawn by the wire
The important part is that the model round trip and the tool execution are two separate hops, not one. The Node child talks to the model, the Swift parent drives the Mac. That split is why the model can be anywhere while the actuation stays on your host.
prompt to click, with a local model
The hand-off, stage by stage, named after the real symbols
If you want to audit it yourself, every stage below maps to a function and a file in the MIT-licensed Fazm repo. Nothing on this page is aspirational. It is all running in the signed build shipping today.
Stage 1. User types into the floating bar or presses the hotkey
FloatingControlBarManager captures the prompt and the frontmost app context. AppState.swift already confirmed Accessibility works via testAccessibilityPermission. Nothing has left the host yet.
Stage 2. ChatProvider forwards the turn to ACPBridge
ACPBridge is a Swift actor that manages a long-lived Node.js subprocess speaking Agent Client Protocol over stdio. The Swift side never calls an HTTP API directly; the Node child does.
Stage 3. Process spawn reads your custom endpoint
In start() the bridge reads UserDefaults.standard.string(forKey: "customApiEndpoint"). If non-empty, env["ANTHROPIC_BASE_URL"] = customEndpoint. ACPBridge.swift lines 379-382. That is the whole shim.
Stage 4. Node agent calls your proxy, not api.anthropic.com
The bundled agent runtime reads ANTHROPIC_BASE_URL at construct time. Every subsequent /v1/messages POST goes to your shim. The shim translates and hits Ollama. Your Mac, your model, your ports.
Stage 5. Tool calls come back and hit AX APIs
The Node child emits tool_use frames back over stdio. Swift decodes them and performs the actual AXUIElementPerformAction, AXSetAttributeValue, or screen capture on your Mac. The model never sees pixels unless you ask.
What can sit behind the shim
Anything that speaks Anthropic's message and tool-use protocol. Because a shim is a tiny piece of code, the universe of things Fazm can drive is the universe of things that run on a Mac today. The agent does not know, or care, which model is under the hood.
Six things that stay on your host when Fazm uses a local model
Local hosted AI is not a privacy slogan. Each of the following is a specific thing happening on your Mac because of a specific component Fazm ships. The headline is that nothing about Fazm is a remote cockpit.
Model runs on your host
Ollama, LM Studio, MLX-LM, llama.cpp, Foundry Local, vLLM, or Claude Code Router. Any endpoint speaking an OpenAI or Anthropic compatible shape is fair game once the shim sits in front.
Agent runs on your host
ACPBridge spawns Node with --max-old-space-size=256 --max-semi-space-size=16. It is a child process, not a daemon. Kill Fazm, the agent dies with it. No hidden always-on service.
Screen context reads on your host
AXUIElementCreateApplication plus AXUIElementCopyAttributeValue on the frontmost PID. The OS already maintains this tree for VoiceOver. Fazm is a consumer of that graph, not a screenshot pipeline.
Clicks and keystrokes fire on your host
AXUIElementPerformAction(kAXPressAction) for buttons, AXUIElementSetAttributeValue for text fields, CGEvent for synthesized key events. No remote actuator, no desktop streaming.
Logs stay on your host
Every tool call, every AX return code, every model round trip goes to /tmp/fazm.log. Grep it. Delete it. Own it. Fazm does not phone home with the session content.
One relaunch, new endpoint live
Change the text field, the Swift Binding fires restartBridgeForEndpointChange. The Node subprocess is torn down and respawned with fresh env on the next query. No app restart needed.
The text field you paste the URL into
Everything above comes out of a single SwiftUI TextField bound to an @AppStorage key. The toggle above it controls visibility. The onSubmit handler restarts the bridge. Four lines of UI, real users can use it without opening Terminal.
Four shims you can put in front of a local model today
Fazm does not endorse any one of these. All four are open source and run in one binary. Pick the one that already matches your setup.
LiteLLM
One Python binary, YAML config, speaks 100+ model shapes, exposes /v1/messages on demand. Fits if you have multiple local servers and want one URL.
Claude Code Router
Node.js proxy purpose-built to translate Anthropic traffic onto Ollama, LM Studio, OpenAI-compatible endpoints. The closest thing to a drop-in for Fazm.
ollama-anthropic
Single-file Go binary, zero config, routes /v1/messages to /v1/chat/completions on Ollama. Works for one model on one port.
Your own 40-line Node proxy
If you already run a gateway in front of your LLM, adding an /v1/messages handler that relays to /chat/completions and reshapes tool_use is an afternoon of work. You keep the gateway, Fazm does not care.
What the first agent turn against a local model looks like
Tail the Fazm log and you can watch the bridge read your endpoint, spawn Node, and start a turn. The line to grep for is ACPBridge, the startup log covers the whole env surface.
Fazm's local hosted AI versus a classic self-host stack
Both columns can run the same model. The difference is everything above and below the model.
| Feature | Self-host stack (Ollama + OpenWebUI + DIY agent) | Fazm |
|---|---|---|
| What ships with the product | A Homebrew tap, a pip package, a Python venv, a .service file, and a README with 12 numbered steps. | A signed consumer Mac app. The agent loop, the AX plumbing, the settings UI, the relaunch logic, all in one DMG. |
| How the model gets swapped | Edit a YAML, restart the daemon, hope the model client library reads the env var the way you expect. | One text field in Settings. Paste URL. Swift Binding triggers restartBridgeForEndpointChange. Next turn uses the new endpoint. |
| Where ANTHROPIC_BASE_URL is injected | Your shell rc file, a systemd unit, or a .env your agent happens to source. | ACPBridge.swift line 381, into the child Process environment the Swift actor owns. |
| Who is allowed to see the env | Usually the whole login session, sometimes every child of your terminal. | Only the Node subprocess Fazm spawned. No other app on the Mac, no shell. |
| What the agent can do once connected | Print tokens into a terminal, or feed a vision model screenshots of what the user is staring at. | Read any app's AX tree, click real widgets, type into text fields, open URLs, run bundled skills, trigger Playwright on the user's real Chrome. |
| What you expose to get this working | Often needs a Cloudflare tunnel, a reverse proxy, or a VPN because the model server does not speak the right protocol. | Local TCP port to Fazm only. No remote ingress, no public listener. |
| Cost of being wrong about the endpoint | Model client throws, agent state corrupts, you restart the whole stack. | An error banner, a one-line toggle back to default. Bridge restarts cleanly. |
What to check first when the shim is not taking effect
Nine out of ten failures are one of these. The remaining one is usually the local model returning malformed tool JSON, which is a model problem, not a Fazm problem.
Shim sanity list
- Proxy responds with 200 on POST /v1/messages, not just GET /
- Streaming response uses text/event-stream, not application/json
- tool_use and tool_result blocks survive the translation to and from /v1/chat/completions
- The model behind the shim actually supports tool calling (gpt-oss 20b yes, llama3 8b patchy)
- URL in the Settings field has no trailing slash, no trailing newline, no stray space
- Cert is valid: either a real HTTPS cert or http:// for localhost only, Fazm accepts both
Ship a local-hosted AI you can actually use
Fazm is the consumer Mac app that speaks Accessibility APIs and accepts any Anthropic-compatible model endpoint. Point it at your Ollama, your LM Studio, your Mac Studio across the LAN, and you have a real tool-using agent that never leaves your host.
Download Fazm →Local hosted AI, answered against the source
What does 'local hosted AI' actually mean if my agent still talks to Anthropic?
Two parts. First, 'local' is about where the computation runs. A local-hosted AI runs the model on your hardware (Ollama, LM Studio, MLX, Foundry Local). Second, there is the agent loop: the thing that decides which tool to call, reads the screen, clicks buttons. Even if the model is local, the agent usually lives elsewhere. Fazm's contribution is that the agent is also a local process (Swift plus a sandboxed Node subprocess), and you can redirect its model calls to your locally-hosted endpoint with one line. That is the combination most 'local hosted AI' guides never deliver.
Which three lines are doing all the work?
Desktop/Sources/Chat/ACPBridge.swift, lines 379 through 382. Literally: `if let customEndpoint = defaults.string(forKey: "customApiEndpoint"), !customEndpoint.isEmpty { env["ANTHROPIC_BASE_URL"] = customEndpoint }`. That Swift block sets an environment variable on the Node.js child process that runs the agent. The Node agent's Anthropic client reads ANTHROPIC_BASE_URL at HTTP call time. From that moment, every model request goes to your endpoint instead of api.anthropic.com. The three lines are the shim.
Why go through a proxy? Why not just support Ollama natively?
Because the agent runtime Fazm embeds speaks Anthropic's message and tool-use protocol. Ollama speaks a different shape. Rather than fork the agent to also speak Ollama's shape, Fazm bets on the shim: you put a small translator in front of your local model and Fazm keeps one code path. Practically, LiteLLM, Claude-Code-Router, and ollama-anthropic are all single-binary shims you run once. Net result is that Fazm can ride any model you can host, without Fazm having to know about that model.
Where is the settings UI, and what does it write?
Desktop/Sources/MainWindow/Pages/SettingsPage.swift, the aiChatSection, specifically the Custom API Endpoint card around line 906. The TextField on line 936 has the placeholder 'https://your-proxy:8766'. It is bound to the @AppStorage key 'customApiEndpoint'. When the value changes, the onSubmit handler (line 939) and the enable toggle (lines 920-932) both call chatProvider?.restartBridgeForEndpointChange(), which tears down the Node subprocess and respawns it with the new env.
Is my prompt actually staying on my Mac, or does Fazm still log things?
If your ANTHROPIC_BASE_URL points at a process on your Mac, the model round trip never hits the wire. What Fazm logs locally: every tool call, every AX return code, every bridge lifecycle event, all to /tmp/fazm.log for dev builds and /tmp/fazm.log for the packaged app. PostHog analytics fires for coarse events (app launched, chat session started) but never for the message body. You can neuter that in Settings → Privacy.
Which local models are actually good enough to drive a tool-using agent?
The bottleneck is tool calling, not raw text quality. As of April 2026, gpt-oss 20b, Qwen 2.5 32b, and Llama 3.3 70b all handle multi-turn tool use cleanly enough for desktop agent work. Llama 3 8b can write but trips on structured tool calls after about three turns. MLX-LM exposes Qwen cleanly on Apple Silicon. For a Mac Studio M4 Ultra, Qwen 2.5 32b at q4 is the sweet spot: fast first token, reliable tool JSON, fits in 40 GB of unified memory.
Does this work with a remote self-hosted model too, not just localhost?
Yes. ANTHROPIC_BASE_URL is a URL. Point it at a Hetzner box, a home-lab Tailscale address, a corporate GPU cluster behind a reverse proxy. The only constraints are that the endpoint speaks Anthropic's message protocol (or there is a shim that makes it speak it), and that Fazm can reach it from the Mac. Common setups: a colleague runs LiteLLM on a GPU node, exposes Tailscale HTTPS, everyone on the team points Fazm at the Tailscale URL.
What happens if the proxy goes down or the model refuses a tool call?
The agent turn fails cleanly. ChatProvider catches the error in chatAgentError (ChatProvider.swift line 2892) and shows the user a red banner in the floating chat. The bridge stays alive (the subprocess is not killed by a bad upstream), so the next prompt is instant. If you toggle the Custom API Endpoint off, restartBridgeForEndpointChange() respawns Node without the env var set, and you are back on the default within one retry.
How do I verify the shim is actually taking effect without reading source?
Three quick checks. One, tail /tmp/fazm.log and grep for 'ANTHROPIC_BASE_URL'. Fazm logs bridge startup environment. Two, put a real URL in the field, point it at a server that just 404s everything, and prompt the agent. You should see the agent's error contain your proxy hostname, not api.anthropic.com. Three, use the open-source repo: clone mediar-ai/fazm, run from Xcode, add a print("env=\(env)") before line 396 in ACPBridge.swift, and watch your endpoint appear in the console.
Is the source for all of this really open so I can audit it?
Yes. Fazm's desktop app is MIT-licensed at github.com/mediar-ai/fazm. The file and line numbers on this page are real as of 2026-04-19. ACPBridge.swift handles the subprocess and the env shim. SettingsPage.swift holds the text field and the AppStorage key. ChatProvider.swift owns restartBridgeForEndpointChange. AppState.swift holds the three-stage accessibility probe that makes the local-hosted model useful on the Mac once it is wired up.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.