Local hosted AIANTHROPIC_BASE_URL shimSource-verified, April 19 2026

Your locally-hosted model is the easy half. Plugging it into an agent that actually does things is the other half.

Every top result for local hosted AI ends at the same sentence. Pick a 7B, download Ollama, here is your localhost:11434. Great, you have a chat window. You do not have an agent. You do not have tool use. You do not have anything touching Mail or Calendar or Numbers. Fazm has a three-line shim in ACPBridge.swift (lines 379-382) that redirects its whole agent loop to your local endpoint. Paste a URL into Settings and your on-box model becomes the brain of a tool-using Mac agent.

Fazm

Published April 19, 202612 min read

Download Fazm for Mac

4.9from 200+

Three-line env-var shim verified in Desktop/Sources/Chat/ACPBridge.swift on 2026-04-19

Settings text field placeholder literally reads https://your-proxy:8766

Node subprocess restarts cleanly on endpoint change, no Fazm relaunch required

Local hosted AI, the other half

The model is yours. The agent is yours. The shim is three lines.

SERP top 5: install Ollama, here is your localhost

Nobody shows how that model becomes an agent that acts

Fazm writes ANTHROPIC_BASE_URL to the Node subprocess env

Point it at a local proxy, the agent uses your model

Your Mac, your model, your ports. Nothing leaves.

0:00 / 0:05

3 lines

“The entire local hosted AI capability in Fazm is three lines. ACPBridge.swift reads UserDefaults.standard.string(forKey: "customApiEndpoint") and, if non-empty, sets env["ANTHROPIC_BASE_URL"] = customEndpoint on the long-lived Node.js agent subprocess (spawned with --max-old-space-size=256 at line 343). The Node agent reads the env var at HTTP call time, so every /v1/messages request goes to your endpoint instead of api.anthropic.com. The settings UI is one @AppStorage-bound TextField in SettingsPage.swift line 936 with the placeholder https://your-proxy:8766.”

Desktop/Sources/Chat/ACPBridge.swift, Fazm open source

Lines of Swift that power the redirect

MB max heap for the Node agent subprocess

Machines other than yours that see your prompt

Bridge restart needed after an endpoint change

The three lines that make Fazm speak to any local model

No wrapper, no plugin, no model adapter registry. A UserDefaults read, a guard on empty string, and an env-var write. The Node subprocess Fazm spawns for its agent loop inherits that env, and the Anthropic client inside Node reads ANTHROPIC_BASE_URL at HTTP call time. That is it.

Desktop/Sources/Chat/ACPBridge.swift, lines 379-382

What happens between your prompt and your model

Every turn fans out through a Swift actor into a child Node process, talks to your endpoint, receives tool calls, and the Swift side executes them against the macOS Accessibility APIs. All of that sits on your Mac.

The local-hosted turn, end to end

The four-step recipe to get a locally-hosted agent running

Pick any model runner. Drop any Anthropic-shape shim in front of it. Paste the URL into Fazm. Use it.

1
Host a model
Anything that serves chat completions: Ollama, LM Studio, MLX-LM, llama.cpp, Foundry Local. Gets you http://localhost:11434 or :1234.
2
Put an Anthropic shim in front of it
LiteLLM, Claude-Code-Router, ollama-anthropic, or a 40-line Node proxy. Translates /v1/messages into /v1/chat/completions. Binds :8766.
3
Paste the URL into Fazm Settings
Settings → Custom API Endpoint → https://your-proxy:8766. Fazm writes it to UserDefaults key customApiEndpoint, restarts the Node agent subprocess, done.
4
Watch a local model click buttons in your apps
From here on, every agent turn reads the AX tree of your frontmost app, decides, and emits tool calls. None of that traffic leaves your Mac.

One agent turn, drawn by the wire

The important part is that the model round trip and the tool execution are two separate hops, not one. The Node child talks to the model, the Swift parent drives the Mac. That split is why the model can be anywhere while the actuation stays on your host.

prompt to click, with a local model

The hand-off, stage by stage, named after the real symbols

If you want to audit it yourself, every stage below maps to a function and a file in the MIT-licensed Fazm repo. Nothing on this page is aspirational. It is all running in the signed build shipping today.

Stage 1. User types into the floating bar or presses the hotkey

FloatingControlBarManager captures the prompt and the frontmost app context. AppState.swift already confirmed Accessibility works via testAccessibilityPermission. Nothing has left the host yet.

Stage 2. ChatProvider forwards the turn to ACPBridge

ACPBridge is a Swift actor that manages a long-lived Node.js subprocess speaking Agent Client Protocol over stdio. The Swift side never calls an HTTP API directly; the Node child does.

Stage 3. Process spawn reads your custom endpoint

In start() the bridge reads UserDefaults.standard.string(forKey: "customApiEndpoint"). If non-empty, env["ANTHROPIC_BASE_URL"] = customEndpoint. ACPBridge.swift lines 379-382. That is the whole shim.

Stage 4. Node agent calls your proxy, not api.anthropic.com

The bundled agent runtime reads ANTHROPIC_BASE_URL at construct time. Every subsequent /v1/messages POST goes to your shim. The shim translates and hits Ollama. Your Mac, your model, your ports.

Stage 5. Tool calls come back and hit AX APIs

The Node child emits tool_use frames back over stdio. Swift decodes them and performs the actual AXUIElementPerformAction, AXSetAttributeValue, or screen capture on your Mac. The model never sees pixels unless you ask.

What can sit behind the shim

Anything that speaks Anthropic's message and tool-use protocol. Because a shim is a tiny piece of code, the universe of things Fazm can drive is the universe of things that run on a Mac today. The agent does not know, or care, which model is under the hood.

Fazm

ANTHROPIC_BASE_URL

Ollama

LM Studio

MLX-LM

llama.cpp

Foundry Local

vLLM

Copilot bridge

Claude Code Router

Six things that stay on your host when Fazm uses a local model

Local hosted AI is not a privacy slogan. Each of the following is a specific thing happening on your Mac because of a specific component Fazm ships. The headline is that nothing about Fazm is a remote cockpit.

Model runs on your host

Ollama, LM Studio, MLX-LM, llama.cpp, Foundry Local, vLLM, or Claude Code Router. Any endpoint speaking an OpenAI or Anthropic compatible shape is fair game once the shim sits in front.

Agent runs on your host

ACPBridge spawns Node with --max-old-space-size=256 --max-semi-space-size=16. It is a child process, not a daemon. Kill Fazm, the agent dies with it. No hidden always-on service.

Screen context reads on your host

AXUIElementCreateApplication plus AXUIElementCopyAttributeValue on the frontmost PID. The OS already maintains this tree for VoiceOver. Fazm is a consumer of that graph, not a screenshot pipeline.

Clicks and keystrokes fire on your host

AXUIElementPerformAction(kAXPressAction) for buttons, AXUIElementSetAttributeValue for text fields, CGEvent for synthesized key events. No remote actuator, no desktop streaming.

Logs stay on your host

Every tool call, every AX return code, every model round trip goes to /tmp/fazm.log. Grep it. Delete it. Own it. Fazm does not phone home with the session content.

One relaunch, new endpoint live

Change the text field, the Swift Binding fires restartBridgeForEndpointChange. The Node subprocess is torn down and respawned with fresh env on the next query. No app restart needed.

The text field you paste the URL into

Everything above comes out of a single SwiftUI TextField bound to an @AppStorage key. The toggle above it controls visibility. The onSubmit handler restarts the bridge. Four lines of UI, real users can use it without opening Terminal.

Desktop/Sources/MainWindow/Pages/SettingsPage.swift, lines 920-941

Four shims you can put in front of a local model today

Fazm does not endorse any one of these. All four are open source and run in one binary. Pick the one that already matches your setup.

LiteLLM

One Python binary, YAML config, speaks 100+ model shapes, exposes /v1/messages on demand. Fits if you have multiple local servers and want one URL.

Claude Code Router

Node.js proxy purpose-built to translate Anthropic traffic onto Ollama, LM Studio, OpenAI-compatible endpoints. The closest thing to a drop-in for Fazm.

ollama-anthropic

Single-file Go binary, zero config, routes /v1/messages to /v1/chat/completions on Ollama. Works for one model on one port.

Your own 40-line Node proxy

If you already run a gateway in front of your LLM, adding an /v1/messages handler that relays to /chat/completions and reshapes tool_use is an afternoon of work. You keep the gateway, Fazm does not care.

What the first agent turn against a local model looks like

Tail the Fazm log and you can watch the bridge read your endpoint, spawn Node, and start a turn. The line to grep for is ACPBridge, the startup log covers the whole env surface.

/tmp/fazm.log

Fazm's local hosted AI versus a classic self-host stack

Both columns can run the same model. The difference is everything above and below the model.

Feature	Self-host stack (Ollama + OpenWebUI + DIY agent)	Fazm
What ships with the product	A Homebrew tap, a pip package, a Python venv, a .service file, and a README with 12 numbered steps.	A signed consumer Mac app. The agent loop, the AX plumbing, the settings UI, the relaunch logic, all in one DMG.
How the model gets swapped	Edit a YAML, restart the daemon, hope the model client library reads the env var the way you expect.	One text field in Settings. Paste URL. Swift Binding triggers restartBridgeForEndpointChange. Next turn uses the new endpoint.
Where ANTHROPIC_BASE_URL is injected	Your shell rc file, a systemd unit, or a .env your agent happens to source.	ACPBridge.swift line 381, into the child Process environment the Swift actor owns.
Who is allowed to see the env	Usually the whole login session, sometimes every child of your terminal.	Only the Node subprocess Fazm spawned. No other app on the Mac, no shell.
What the agent can do once connected	Print tokens into a terminal, or feed a vision model screenshots of what the user is staring at.	Read any app's AX tree, click real widgets, type into text fields, open URLs, run bundled skills, trigger Playwright on the user's real Chrome.
What you expose to get this working	Often needs a Cloudflare tunnel, a reverse proxy, or a VPN because the model server does not speak the right protocol.	Local TCP port to Fazm only. No remote ingress, no public listener.
Cost of being wrong about the endpoint	Model client throws, agent state corrupts, you restart the whole stack.	An error banner, a one-line toggle back to default. Bridge restarts cleanly.

What to check first when the shim is not taking effect

Nine out of ten failures are one of these. The remaining one is usually the local model returning malformed tool JSON, which is a model problem, not a Fazm problem.

Shim sanity list

Proxy responds with 200 on POST /v1/messages, not just GET /
Streaming response uses text/event-stream, not application/json
tool_use and tool_result blocks survive the translation to and from /v1/chat/completions
The model behind the shim actually supports tool calling (gpt-oss 20b yes, llama3 8b patchy)
URL in the Settings field has no trailing slash, no trailing newline, no stray space
Cert is valid: either a real HTTPS cert or http:// for localhost only, Fazm accepts both

0Swift lines that power the redirect

0UserDefaults key: customApiEndpoint

0TextField in Settings → Custom API Endpoint

0Port in the placeholder URL

Ship a local-hosted AI you can actually use

Fazm is the consumer Mac app that speaks Accessibility APIs and accepts any Anthropic-compatible model endpoint. Point it at your Ollama, your LM Studio, your Mac Studio across the LAN, and you have a real tool-using agent that never leaves your host.

Download Fazm →

Local hosted AI, answered against the source

What does 'local hosted AI' actually mean if my agent still talks to Anthropic?

Two parts. First, 'local' is about where the computation runs. A local-hosted AI runs the model on your hardware (Ollama, LM Studio, MLX, Foundry Local). Second, there is the agent loop: the thing that decides which tool to call, reads the screen, clicks buttons. Even if the model is local, the agent usually lives elsewhere. Fazm's contribution is that the agent is also a local process (Swift plus a sandboxed Node subprocess), and you can redirect its model calls to your locally-hosted endpoint with one line. That is the combination most 'local hosted AI' guides never deliver.

Which three lines are doing all the work?

Desktop/Sources/Chat/ACPBridge.swift, lines 379 through 382. Literally: `if let customEndpoint = defaults.string(forKey: "customApiEndpoint"), !customEndpoint.isEmpty { env["ANTHROPIC_BASE_URL"] = customEndpoint }`. That Swift block sets an environment variable on the Node.js child process that runs the agent. The Node agent's Anthropic client reads ANTHROPIC_BASE_URL at HTTP call time. From that moment, every model request goes to your endpoint instead of api.anthropic.com. The three lines are the shim.

Why go through a proxy? Why not just support Ollama natively?

Because the agent runtime Fazm embeds speaks Anthropic's message and tool-use protocol. Ollama speaks a different shape. Rather than fork the agent to also speak Ollama's shape, Fazm bets on the shim: you put a small translator in front of your local model and Fazm keeps one code path. Practically, LiteLLM, Claude-Code-Router, and ollama-anthropic are all single-binary shims you run once. Net result is that Fazm can ride any model you can host, without Fazm having to know about that model.

Where is the settings UI, and what does it write?

Desktop/Sources/MainWindow/Pages/SettingsPage.swift, the aiChatSection, specifically the Custom API Endpoint card around line 906. The TextField on line 936 has the placeholder 'https://your-proxy:8766'. It is bound to the @AppStorage key 'customApiEndpoint'. When the value changes, the onSubmit handler (line 939) and the enable toggle (lines 920-932) both call chatProvider?.restartBridgeForEndpointChange(), which tears down the Node subprocess and respawns it with the new env.

Is my prompt actually staying on my Mac, or does Fazm still log things?

If your ANTHROPIC_BASE_URL points at a process on your Mac, the model round trip never hits the wire. What Fazm logs locally: every tool call, every AX return code, every bridge lifecycle event, all to /tmp/fazm.log for dev builds and /tmp/fazm.log for the packaged app. PostHog analytics fires for coarse events (app launched, chat session started) but never for the message body. You can neuter that in Settings → Privacy.

Which local models are actually good enough to drive a tool-using agent?

The bottleneck is tool calling, not raw text quality. As of April 2026, gpt-oss 20b, Qwen 2.5 32b, and Llama 3.3 70b all handle multi-turn tool use cleanly enough for desktop agent work. Llama 3 8b can write but trips on structured tool calls after about three turns. MLX-LM exposes Qwen cleanly on Apple Silicon. For a Mac Studio M4 Ultra, Qwen 2.5 32b at q4 is the sweet spot: fast first token, reliable tool JSON, fits in 40 GB of unified memory.

Does this work with a remote self-hosted model too, not just localhost?

Yes. ANTHROPIC_BASE_URL is a URL. Point it at a Hetzner box, a home-lab Tailscale address, a corporate GPU cluster behind a reverse proxy. The only constraints are that the endpoint speaks Anthropic's message protocol (or there is a shim that makes it speak it), and that Fazm can reach it from the Mac. Common setups: a colleague runs LiteLLM on a GPU node, exposes Tailscale HTTPS, everyone on the team points Fazm at the Tailscale URL.

What happens if the proxy goes down or the model refuses a tool call?

The agent turn fails cleanly. ChatProvider catches the error in chatAgentError (ChatProvider.swift line 2892) and shows the user a red banner in the floating chat. The bridge stays alive (the subprocess is not killed by a bad upstream), so the next prompt is instant. If you toggle the Custom API Endpoint off, restartBridgeForEndpointChange() respawns Node without the env var set, and you are back on the default within one retry.

How do I verify the shim is actually taking effect without reading source?

Three quick checks. One, tail /tmp/fazm.log and grep for 'ANTHROPIC_BASE_URL'. Fazm logs bridge startup environment. Two, put a real URL in the field, point it at a server that just 404s everything, and prompt the agent. You should see the agent's error contain your proxy hostname, not api.anthropic.com. Three, use the open-source repo: clone mediar-ai/fazm, run from Xcode, add a print("env=\(env)") before line 396 in ACPBridge.swift, and watch your endpoint appear in the console.

Is the source for all of this really open so I can audit it?

Yes. Fazm's desktop app is MIT-licensed at github.com/mediar-ai/fazm. The file and line numbers on this page are real as of 2026-04-19. ACPBridge.swift handles the subprocess and the env shim. SettingsPage.swift holds the text field and the AppStorage key. ChatProvider.swift owns restartBridgeForEndpointChange. AppState.swift holds the three-stage accessibility probe that makes the local-hosted model useful on the Mac once it is wired up.

Your locally-hosted model is the easy half. Plugging it into an agent that actually does things is the other half.

The three lines that make Fazm speak to any local model

What happens between your prompt and your model

The local-hosted turn, end to end

The four-step recipe to get a locally-hosted agent running

One agent turn, drawn by the wire

The hand-off, stage by stage, named after the real symbols

Stage 1. User types into the floating bar or presses the hotkey

Stage 2. ChatProvider forwards the turn to ACPBridge

Stage 3. Process spawn reads your custom endpoint

Stage 4. Node agent calls your proxy, not api.anthropic.com

Stage 5. Tool calls come back and hit AX APIs

What can sit behind the shim

Six things that stay on your host when Fazm uses a local model

Model runs on your host

Agent runs on your host

Screen context reads on your host

Clicks and keystrokes fire on your host

Logs stay on your host

One relaunch, new endpoint live

The text field you paste the URL into

Four shims you can put in front of a local model today

LiteLLM

Claude Code Router

ollama-anthropic

Your own 40-line Node proxy

What the first agent turn against a local model looks like

Fazm's local hosted AI versus a classic self-host stack

What to check first when the shim is not taking effect

Local hosted AI, answered against the source

Comments (••)

Comments ()