Eight surfaces, not one

Local AI privacy beyond inference

Most guides on local AI privacy answer one question. Where does the model run. That answer covers about twelve percent of the data picture for a desktop agent. The other seven surfaces decide whether your microphone audio, your screen pixels, the labels in your accessibility tree, your telemetry events, your crash dumps, your auth tokens, and your logged-in browser sessions stay on your Mac or land somewhere else.

M
Matthew Diakonov
8 min read

Direct answer, verified 2026-05-10

For a desktop AI agent, "local AI privacy" is at least eight separate questions. Local model inference answers one of them. The other seven are voice transcription, voice output, screen capture, accessibility tree extraction, telemetry, crash reports, auth and identity, and any logged-in browser actions the agent takes on your behalf. A truly local-first agent answers each of these with either "stays on this machine" or "goes to a vendor I configured". If any of the eight answers is "goes to a vendor I cannot see", the local claim is incomplete.

The frame people get wrong

The shorthand "local AI" came out of the local-LLM community, where the question was always: did you run the weights on your hardware, or did you call somebody else's API. That frame is correct for an isolated inference workload. It is incomplete for a desktop agent because an agent is not just a model. It is a model plus a microphone, plus a screen reader, plus a clicker, plus a telemetry pipe, plus an auth layer, plus a crash reporter. Each of those is its own data path. Each one has a default destination, and the defaults are almost never "your Mac."

A useful exercise is to grep an open-source agent for every hard-coded URL the binary can reach. For Fazm the count is small enough to enumerate by hand. There are five hardcoded external hostnames in the Swift source, plus a sixth (the LLM endpoint) that is read from an environment variable. That is the entire externally-talking surface of the app. The list below is what each of those endpoints does, why it is there, and what you can do about it.

The eight surfaces, by destination

Default behavior of Fazm out of the box on the left half of each row, what you can switch it to on the right. The point is not that any one column is uniformly better, the point is that every row is a separate decision.

FeatureSurfaceDefault Fazm
Model inference (the LLM call)Cloud API on every turn (api.anthropic.com)Pluggable via ANTHROPIC_BASE_URL, point at 127.0.0.1 for fully local
Voice transcription (mic to text)wss://api.deepgram.com/v1/listen, audio streams continuouslyDefault Deepgram, swappable for an on-device path or off
Voice output (text to speech)api.elevenlabs.io/v1/text-to-speech and api.deepgram.com/v1/speakCloud-only by default, can be skipped (text replies are local)
Screen capture (pixels of your display)Sent to a vision model on every stepOpt-in tool, not the default control loop
Accessibility tree (structured elements)Often not used, or extracted by a screenshot OCR passPrimary control surface, read on-device via macOS AX APIs
Telemetry (event analytics)Whatever vendor SDKs the binary includesPostHog at us.i.posthog.com, single hardcoded host
Crash reportsOften a dedicated SDK with stack tracesSentry SDK, opt-in, no user content in the payload
Auth and identityWhatever the sign-in flow happens to useFirebase (identitytoolkit, securetoken, oauth2.googleapis.com)

Verified against the open MIT source at github.com/m13v/fazm, re-checked 2026-05-10. Hostnames live in TranscriptionService.swift, ChatToolExecutor.swift, PostHogManager.swift, AuthService.swift, and ACPBridge.swift.

Surface 1, model inference

The one most readers think about. Default behavior in most agents is a cloud LLM call on every conversation turn. Fazm is no different out of the box: the chat subprocess talks to api.anthropic.com unless you override it. The override is a single environment variable, ANTHROPIC_BASE_URL, which the agent subprocess reads at launch. The assignment lives in Desktop/Sources/Chat/ACPBridge.swift around line 468. The Settings UI surfaces it as the "Custom API Endpoint" field.

Three useful values to put in that field, in order of how local you want to be. Your corporate proxy if you have one. AWS Bedrock or Vertex AI in your region if you need a specific data residency. http://127.0.0.1:11434 or wherever your Ollama, LM Studio, or llama-server lives if you want fully local inference. Once you set it, nettop should show no api.anthropic.com row while the agent is working.

Surface 2, voice transcription

The loud one most local-first writeups skip. A voice-first agent has a microphone hot during conversation, and turning audio into text is the work of a transcription service. Fazm uses Deepgram by default. The endpoints are hardcoded at TranscriptionService.swift:276 (wss://api.deepgram.com/v1/listen for streaming) and TranscriptionService.swift:561 (https://api.deepgram.com/v1/listen for batch). When you talk, the audio goes to Deepgram in real time. That is true even if your LLM endpoint is on 127.0.0.1.

Two ways to fix it. Turn voice off and use text input. Or run an on-device transcriber (whisper.cpp with metal acceleration, MLX-Whisper, or WhisperKit) and route the agent's audio there instead of Deepgram. Either way, the verification is the same: while you talk, nettop should show no Deepgram row.

Surface 3, screen capture

Screenshot agents are the most data-heavy approach to desktop control. Every step sends a full-resolution image of your display (or a focused window) to a vision model. The pixel volume is enormous compared to anything else the agent does, and the bytes contain whatever was on screen at that moment, including content from apps the user did not consciously include in the request.

Fazm has a screen-capture path (Desktop/Sources/FloatingControlBar/ScreenCaptureManager.swift, using CGWindowListCreateImage) but it is not the default control loop. It is exposed as the capture_screenshot tool that the model can call when it explicitly needs visual confirmation, e.g. to verify a UI state that is not in the accessibility tree. Most steps do not call it. The default surface for "what is on the user's screen" is the accessibility tree, which is much smaller and much harder to use to reconstruct private content.

Surface 4, accessibility tree

The structured alternative to pixels. macOS exposes every UI element as an AXUIElement with attributes (role, title, value, position, children). The agent walks that tree to find the right button and fires a click at its computed coordinates. The data the model sees is text and structure, not images. The tree is read on-device by the OS, and what you ship to the LLM is whatever subset is relevant to the current step.

The privacy implication is one most readers underweight. A pixel-based agent has to send the full visual context every time. A tree-based agent sends the minimum slice needed to act. If the model is local, that slice never leaves your machine at all. If the model is cloud-hosted, the slice is dramatically smaller and harder to reconstruct screenshots from. Same task, less data on the wire, fewer privacy questions to argue about.

Surfaces 5 through 8, the quiet ones

The remaining surfaces ship by default in almost every consumer Mac app. They are individually small but collectively they decide whether a vendor knows you exist, what version you ran, when you opened the app, and which feature you used.

  • Telemetry. Fazm uses PostHog. The host is hardcoded at PostHogManager.swift:14 as us.i.posthog.com. Events fire on app open, agent run, settings change, etc. The property bag never includes user content (chat text, file paths, accessibility tree contents) by design. The destination is a single host you can grep for, watch in nettop, and block at the firewall if you want to.
  • Crash reports. Sentry SDK. SentrySDK.setUser is called in AuthService.swift:735-738 with a user id and email so a crash report can be attributed. No screen content, no audio, no AX tree contents. If you opt out, no crash reports are sent.
  • Auth and identity. Firebase. The endpoints are accounts.google.com/o/oauth2 (sign-in), oauth2.googleapis.com/token (token exchange), identitytoolkit.googleapis.com (account state), securetoken.googleapis.com (refresh), and firestore.googleapis.com (account-scoped state). These fire when you sign in and when your token refreshes, not continuously. If you sign in with Google or Apple, your identity goes to Firebase. The agent itself does not need an account to run.
  • Browser cookies. When the agent drives Chrome via Playwright, it is using your real browser session and your real cookies. That is a feature (the agent can do work in your logged-in apps) and a privacy surface (the model gets to see whatever a logged-in tab shows). Treat the agent's browser actions as actions you took yourself, because from the third-party site's point of view that is exactly what they are.

The eight surfaces, what flows where on a default Fazm install

1

User input (voice or text)

Microphone audio streams to api.deepgram.com if voice is on. Text input stays in process.

2

Agent loop on your Mac

Reads accessibility tree on-device, runs tools, decides next step. Pixel capture only when capture_screenshot is called.

3

LLM call

By default api.anthropic.com. Override via ANTHROPIC_BASE_URL to a corporate proxy, EU region, or 127.0.0.1.

4

Action executes locally

Click, type, scroll, file write, browser drive. Touches whatever third parties the task requires.

5

Telemetry, crash, auth

PostHog (us.i.posthog.com), Sentry, Firebase. Event-based, not continuous. Each is a single hardcoded host.

A two-command sanity check

The shortest possible audit. Run nettop scoped to the agent process while you exercise the feature you care about. Then read its TCC row to see which permissions it actually holds. Anything you cannot explain is a surface worth investigating.

audit.sh

The shape of a "mostly local" agent

You do not need to make every surface fully local to dramatically reduce the data picture. Three changes get you most of the way for a modest capability hit. Set the LLM endpoint to a local server (or to a private gateway in your region). Turn voice off, or repoint it to an on-device transcriber. Opt out of analytics if the build exposes the toggle. After those three, the surfaces that survive are auth (one-time at sign-in), crash (rare, opt-in), and the actions you actually asked the agent to take. That is a defensible answer to "where does my data go," and it is much more useful than a binary local-or-not flag.

The reason this works is that the surfaces are independent. Fixing inference does not fix voice, fixing voice does not fix telemetry, and so on. The corollary is that any agent that lets you fix only one of them and calls itself "private" is using the word loosely. Walk the eight surfaces, decide each one, then ship.

Want a walk-through of the eight surfaces on your machine?

A short call to look at what your agent actually talks to in nettop, and how to lock down the surfaces you care about.

Common questions about the surfaces beyond inference

Is a local LLM the same thing as a private AI agent?

No. Inference is one of eight surfaces a desktop agent touches. If the model runs on your Mac but the agent ships your microphone audio to Deepgram for transcription, captures full-resolution screenshots and sends them to a vision model, fires PostHog events on every action, and OAuth's into your Google account, then 'local' is doing a lot of marketing work. Pinning down a private agent means walking each surface, not just the inference one.

Which surface leaks the most bytes if you do nothing about it?

Voice transcription, by a wide margin. A streaming connection to a cloud transcription provider can move 30-100 KB/s of audio for as long as the microphone is hot. A model call sends a few KB of text per turn. A telemetry event is a hundred bytes. If you only have time to fix one surface, fix the audio path: either turn voice off, or run on-device transcription (whisper.cpp, MLX-Whisper, or WhisperKit) and route the agent at that.

How can I tell if a closed-source AI app on macOS is actually doing what it claims?

Two cheap tools. First, sudo nettop -p $(pgrep AppName) -P -t external -L 0 shows live external sockets per process. Launch the app, exercise the feature, watch the rows. Second, the System Settings > Privacy & Security panel itemizes which TCC permissions the app holds (Accessibility, Screen Recording, Microphone, Input Monitoring, Full Disk Access). The combination is the data envelope. An app with Screen Recording, Microphone, and Accessibility plus an external socket to a vendor it did not name in its privacy policy is the case to walk away from.

Why does Fazm read the accessibility tree instead of taking screenshots?

Screenshots send pixel data on every step. The accessibility tree returns structured element data that lives in the OS already (button labels, text-field values, role hierarchy). For an agent, the tree is enough to act in most apps and it is much smaller and much more privacy-friendly than pixels. The screenshot path still exists in Fazm (it is the capture_screenshot tool, source in Desktop/Sources/FloatingControlBar/ScreenCaptureManager.swift), but it is opt-in and used only when the model explicitly needs visual confirmation. The default control loop is AX-tree-driven.

What about telemetry? Is it really a privacy issue if it is anonymous?

It depends on what gets sent. Anonymous event names ('app_opened', 'agent_run_started') are low-stakes and useful. Anonymous events that include input strings, file paths, URL fragments, or selected text get less anonymous fast. The right bar for a local-first AI app is: events should be opt-in, the property bag should never include user content, and the destination should be visible in nettop. PostHog's host for Fazm is hardcoded at PostHogManager.swift:14 (us.i.posthog.com), so you can confirm it in five seconds.

Does local inference still send anything over the network?

If your local server is on 127.0.0.1 (Ollama, LM Studio, llama-server, vLLM with --host 127.0.0.1) and the agent's ANTHROPIC_BASE_URL points there, the inference traffic stays on the loopback interface. nettop -t external will not show those rows. The traffic that survives is the non-inference surfaces: any voice path you did not redirect, any analytics/telemetry the app emits, any auth refresh, and any browser-driven action that contacts a third party because that is what the user asked it to do. The model is local; the rest of the agent's I/O is whatever it was before.

I run a small business and I am not paranoid. Why does this matter to me?

Because the question is not 'is the AI private', it is 'where does the data on my screen end up'. If the screen capture path is sending an image of QuickBooks to a vision model in another country, your customer ledger is there too. If the accessibility tree is being snapshotted into a third-party event store, the names of your clients are in it. The eight-surface walk-through is a cheap way to know exactly what your AI is allowed to see and what it is forbidden from sending. Five minutes of verification is worth more than a thousand-word policy page.

What is the practical end-state for a 'mostly local' Fazm setup?

Three changes give you most of the privacy upside without giving up frontier-model quality. One, in Settings > Advanced > AI Chat, set the Custom API Endpoint to a private gateway you control (corporate proxy, AWS Bedrock in your region, or 127.0.0.1 for a fully local model). Two, turn off voice input or point voice at an on-device transcriber. Three, opt out of analytics in Settings if the option is exposed in your build. After that, run nettop while you use the agent. The external rows that survive are the ones the agent literally needs to do the task you asked.