Field notes, not a model shootout

A macOS AI code agent that does not stop at the editor

Search this and the top pages will rank Cursor, Claude Code CLI, OpenAI Codex CLI, Cline, Aider, and Windsurf, then argue about which model they ship with. They are all real options. They also share one property nobody seems to call out: the loop only ever sees the project tree, the shell, and (sometimes) a sandboxed browser. Everything else on your Mac is invisible to it. This page is about what changes when the agent process is part of a Mac app instead of a terminal program, and is anchored to one shipping desktop agent (Fazm) whose source you can open right now.

Matthew Diakonov, Written with AI

Published May 7, 20269 min read

Direct answer (verified May 7, 2026)

A macOS AI code agent is an AI process running natively on a Mac that writes and edits code in a loop. The popular ones live in a terminal (Claude Code CLI, OpenAI Codex CLI, Aider) or inside an editor (Cursor, Cline, Windsurf). A smaller group ships as a real Mac app, in which case the agent can also operate native Mac apps and the user's real browser through macOS accessibility APIs, not just the file tree. Fazm is one example of the second kind, with the agent loop spawned at acp-bridge/src/index.ts:855 and five MCP servers wired by default at acp-bridge/src/index.ts:1823.

Sources cross-checked: github.com/m13v/fazm, github.com/browser-use/macOS-use.

Three honest categories, one of them under-covered

If you put every tool that calls itself a macOS AI code agent in a row, they sort cleanly into three groups. The grouping is about where the loop runs and what it is allowed to touch, not about which model is fashionable this month.

1. Terminal-bound

Claude Code CLI, OpenAI Codex CLI, Aider, Cline (CLI mode)

The agent is a process you launch from a shell, in the directory you want it to work in. Access surface: the cwd, your shell, plus whatever MCP servers you wire up by hand in a config file. Strengths: scriptable, scoped to one project, easy to reason about. Limitations: it cannot see anything you are doing outside that terminal session, and it will not learn to either.

2. Editor-bound

Cursor, Windsurf, Cline (VS Code mode), Continue

The agent runs inside an editor. Access surface: every open buffer, the project tree, sometimes a built-in terminal, sometimes a sandboxed Playwright. Strengths: the diff lands where you already read code, the apply/reject loop is one keystroke. Limitations: when the work bleeds out of the editor (and a non-trivial amount of real coding work does, mostly into browser tabs and chat threads), the agent loses sight of it.

3. Mac-app-bound (the under-covered one)

Fazm, and a small handful of others shipping in 2025-2026

The agent process is spawned by a real Mac app. The app is already holding tools the agent can use without any setup: an extension token for the user's real Chrome session, a native binary that reads the macOS accessibility tree, a Whisper-based voice input pipeline, a SQLite store. The consequence is the agent's access surface includes the rest of your machine, on purpose, with the trade-offs that implies. This is the category most write-ups skip, because it is the smallest and the newest.

The anchor: five MCP servers wired into every code session

The thing that makes Fazm a Mac-app-bound agent and not just another CLI wrapped in Electron is one constant in one file. In acp-bridge/src/index.ts, line 1823:

const BUILTIN_MCP_NAMES = new Set([
  "fazm_tools",
  "playwright",
  "macos-use",
  "whatsapp",
  "google-workspace",
]);

Every coding session the agent starts (the spawn happens at line 855, the per-session wiring at the buildMcpServers function on line 1512) gets all five. There is no opt-in step in the UI. The same agent that just wrote your Stripe webhook handler can, on the next turn, log into your real Stripe dashboard in the browser you already have open, copy the webhook secret out of it, and click the right button. Whether that is the right default for your threat model is a real question (see the safety FAQ at the bottom). The point here is that the choice was made at agent-process bring-up time, not at prompt time.

fazm_tools

bash, file read/write/edit, screen capture, SQLite, plus the framework's internal helpers, all exposed through a Unix socket back to the Swift app.

playwright

drives the user's real Chrome session through the Fazm extension token (not a headless instance), with snapshots written to /tmp/playwright-mcp.

macos-use

native Swift binary that reads and writes the macOS accessibility tree, so the agent can click in any Mac app the way a person does.

controls the WhatsApp Catalyst app via accessibility APIs. Search, open, send, read, scroll, and quit.

google-workspace

bundled Python server for Gmail, Calendar, Docs, Sheets, Drive. Credentials kept under ~/.google_workspace_mcp/credentials/.

Each item above is a real MCP subprocess. You can list them at runtime by tailing the bridge log: tail -f /tmp/fazm.log and watching for buildMcpServers on session start.

What this looks like in practice

One concrete prompt, one session, one boring kind of work. The terminal trace below is what shows up in the bridge log when the agent boots for a coding turn. The exact lines are paraphrased from a real /tmp/fazm-dev.log capture; spawn order and server names are verbatim.

acp-bridge boot for one coding turn

The thing to notice is what the agent already has by the time the first model token comes back. It can run shell commands and edit files (fazm_tools). It can drive the browser tab where the failing deploy is open (playwright with --extension, so it attaches to your real Chrome rather than a fresh fingerprint-naive headless one). It can poke around in any other Mac app via macos-use, which is a tiny native Swift binary in the app bundle at Contents/MacOS/mcp-server-macos-use. And it can read your inbox or your Calendar with Workspace credentials it already has on disk.

For a CLI-bound agent, the equivalent setup would mean: install five MCP servers, configure their auth, tell the agent about them in ~/.claude/mcp_servers.json (or equivalent), and remember to do that in every project. Most people never do it, which is why the editor- and CLI-bound agents end up working only on the parts of a problem that fit inside the project tree.

The kinds of work this changes

Most of what an agent does on a coding day is still write code, and on that axis the Mac-app and CLI versions of the same model do roughly the same job. The interesting cases are the boundary ones, where the work straddles the editor and something else.

Reading why a CI run failed. The error is not in the diff, it is in the build log on a tab you already have open. A CLI agent gets the URL pasted in. A Mac-app agent reads the tab.
Setting an env var on Vercel before the deploy. A CLI agent dictates the steps and waits. A Mac-app agent opens the project, navigates Settings, Environment Variables, and fills the field, while you watch.
Confirming Stripe webhook delivery during a fix. A CLI agent asks you to grep the response. A Mac-app agent reads the dashboard, finds the most recent event, and pastes the status into the chat.
Replying to the customer who reported the bug. A CLI agent drafts the email and stops. A Mac-app agent can hand the draft to Gmail (via the google-workspace server) and queue it as a draft for you to send.
Reading an Xcode build error from inside a SwiftUI project. A CLI agent gets the text. A Mac-app agent can read the accessibility tree of Xcode itself (via macos-use) and click into the failing line.

None of these are exotic. They are the ten-minute interruptions that would otherwise pop the developer out of the loop and into the browser. The Mac-app-bound agent can keep the loop closed, with the obvious caveat that you are also handing it the keys.

Models you can point it at

A common worry with any code agent is that you are locked into one model vendor. Fazm runs Claude as the default (DEFAULT_MODEL = claude-sonnet-4-6 at acp-bridge/src/index.ts:1802) and a separate Codex provider when the user picks any model whose id matches /^(gpt-|codex-|o[0-9]-?)/i (CODEX_MODEL_PATTERN at codex-query.ts:61). For everything else, the Custom API Endpoint setting exports ANTHROPIC_BASE_URL into the spawned subprocess, so any endpoint that speaks the Anthropic Messages API works.

Routes the agent loop has been pointed at, in production

Pluggable model targets

Claude Sonnet 4.6

default coding model via @agentclientprotocol/claude-agent-acp

Claude Opus 4.7

selectable in the model picker

OpenAI Codex / GPT-5

spawned via @zed-industries/codex-acp when picked

Vertex AI

set CLAUDE_CODE_USE_VERTEX in env

Corporate proxy

Anthropic-compatible gateway via Custom API Endpoint

GitHub Copilot

Anthropic-compatible bridge from the GH side

LiteLLM

as a translator in front of Ollama or LM Studio

Local Ollama / LM Studio

via LiteLLM or any Anthropic-format shim

The setting copy in the app is verbatim: “Route API calls through a custom endpoint (e.g. local LLM bridge, corporate proxy, or GitHub Copilot bridge). Leave empty to use the default Anthropic API.” (Desktop/Sources/MainWindow/Pages/SettingsPage.swift, around line 946.)

Honest trade-offs

The Mac-app category does not strictly dominate the other two. The things it gives up are real, and pretending otherwise is the kind of thing that gets a tool dropped after a week.

No CI usage. A Mac-app-bound agent is a desktop app. It is not what you reach for to run an autonomous code-fix loop on every failing test in a GitHub Action. CLI agents (especially Claude Code in headless mode and Codex CLI) are still the right tool for that.

Single-machine. The session and the access surface live on one Mac. There is no shared sandbox a teammate can join the way they can join a hosted runner.

Bigger blast radius. An agent that can click anywhere on your machine can also click the wrong thing. The mitigations described in the safety FAQ below are the actual surface controls, not platitudes about human-in-the-loop.

Mac only. Linux and Windows users are not served by this category at all today. macOS-use, the accessibility binary, has no equivalent on those platforms; the Linux-side AT-SPI and the Windows UIAutomation stories are different enough that pretending one product covers them is a lie.

Picking one

The honest matrix is short and works for most people:

If the work is fully inside one repo and you want a scriptable loop, pick a CLI agent. Claude Code CLI and Codex CLI are the two mainstream options in May 2026.
If the work is fully inside one editor and you want apply-or-reject diffs, pick an editor agent. Cursor and Windsurf are the two mainstream options.
If the work straddles the editor and the rest of your machine (browser tabs, native apps, inbox, calendar), and you are on a Mac, the Mac-app category is the only one in the running. Fazm is the open source option I happen to maintain; check Apple Intelligence + Shortcuts if you want a vendor-built one.

Want to see this on your own Mac, with your own repo?

A 20-minute call where we run Fazm against the work you actually do. Bring a real failing PR or a real boring task; we will see whether the agent reaches past the editor on it or not.

Frequently asked questions

What is a macOS AI code agent, in one sentence a developer would actually say back?

A process that runs natively on macOS, holds a model in a loop, and is allowed to read and write your files, run your shell commands, and (depending on the tool) operate the rest of your machine. Cursor, Claude Code CLI, OpenAI Codex CLI, Cline, and Aider are all in this category. They differ on where the loop runs (editor process, terminal subprocess, hosted runner), what its access surface is (just the project tree, or the whole laptop), and which model it ships with by default.

How is a macOS AI code agent different from a coding assistant or copilot?

A copilot writes inline suggestions and waits for you to accept them. An agent runs an autonomous loop: it picks the next tool call, runs it, reads the result, decides what to do next, and only stops when the goal is met or it hits a budget. The 'agent' word in macOS AI code agent is specifically claiming that loop, not just autocomplete. Whether the loop is worth handing the keys to is a separate question, and the answer mostly depends on the access surface, not the model.

Why does it matter whether the agent runs on my Mac vs in a hosted sandbox?

Two reasons. First, latency: every tool call that crosses a network adds round-trip cost on top of the model token cost, and code work fires a lot of small tool calls. Second, access: a hosted sandbox, by construction, cannot read your real Chrome session, your local Mac apps, your menubar, your dotfiles, or anything outside the cloned repo. A locally running agent can. Whether you want it to is a different question, but the option only exists for the local one.

What can a macOS AI code agent reach that a CLI or editor coding agent cannot?

A CLI or editor coding agent's access surface is the project tree, the shell, and (sometimes) a sandboxed browser. A Mac-native shipped-app agent that wires accessibility APIs can additionally read and operate any Mac app the way a person does: Xcode build errors, Linear desktop tickets, Slack threads, the Vercel deploy status in your real browser, the failing GitHub Action you have open in another tab. In Fazm's case the agent process spawns five MCP servers by default, including playwright (your real Chrome session) and macos-use (the native accessibility binary). Re-check the line: acp-bridge/src/index.ts:1823, the constant is BUILTIN_MCP_NAMES = new Set(["fazm_tools", "playwright", "macos-use", "whatsapp", "google-workspace"]).

Which model does it run? Can I bring my own?

Fazm spawns @agentclientprotocol/claude-agent-acp as a Node subprocess (acp-bridge/src/index.ts:855), so the default coding model is Claude Sonnet 4.6 (the DEFAULT_MODEL constant at line 1802). It also spawns a separate Codex provider (codex-provider.ts) when the user picks any model whose id matches CODEX_MODEL_PATTERN = /^(gpt-|codex-|o[0-9]-?)/i (codex-query.ts:61). For routing through a different gateway, set Custom API Endpoint in Settings; the bridge passes it through as ANTHROPIC_BASE_URL into the spawned process. Anything that speaks the Anthropic Messages API works: corporate proxies, GitHub Copilot's Anthropic-compatible bridge, a small translator in front of Ollama or LM Studio.

Is this an open source project I can read, or is it a marketing claim?

Open. The bridge is at github.com/m13v/fazm under acp-bridge/src/index.ts, MIT-licensed. The line numbers cited above (855 for the spawn, 1512 for buildMcpServers, 1802 for DEFAULT_MODEL, 1823 for BUILTIN_MCP_NAMES) all resolve in that file. The macOS-use accessibility binary lives in github.com/browser-use/macOS-use and is bundled into the app bundle under Contents/MacOS/mcp-server-macos-use. No claim above is true unless those files say so.

How does this compare with Claude Code CLI specifically?

Claude Code CLI is a terminal program. Its access surface is your shell, your file system, and a fixed set of MCP servers you configure in ~/.claude/mcp_servers.json. Fazm wraps the same Claude Agent SDK loop (the patched ACP entry at acp-bridge/src/patched-acp-entry.mjs imports ClaudeAcpAgent from @agentclientprotocol/claude-agent-acp) and ships it inside a Mac app, with five MCP servers pre-wired and an extra path through Codex. The interesting difference is not the model, the model is the same. It is that the agent process is now spawned by a desktop app whose other tools (browser overlay, accessibility daemon, Whisper-based voice input) are sitting right next to it in the same address space, so the agent can use them without you wiring them up.

Does the voice-first part actually help for coding work?

Sometimes. Voice is right when the prompt is long-form ("add a Stripe webhook handler that retries on 5xx, has a 30s idempotency window, and writes to the existing fazm_emails table") and your hands are on the keyboard for editing the response. It is wrong for short tight loops where typing is faster (a one-line refactor). The honest answer is that voice-first is a meaningful upgrade for agent prompts, which tend to be a paragraph long, and a wash or worse for individual code edits. Plenty of users keep the chat window open and only press the voice button when they have something paragraph-shaped to say.

What about safety? An agent that can click anywhere on my Mac is a bigger blast radius than one that can only edit files.

Correct, and worth thinking through before you run one. Three controls help. First, every Fazm tool call is logged to /tmp/fazm.log (or fazm-dev.log in dev) before it executes, so a tail -f at startup is enough to see everything. Second, browser pages get a visible 'Browser controlled by Fazm' overlay injected at runtime; the loud failure mode is in acp-bridge/src/index.ts around lines 1574 to 1597. Third, you can run with FAZM_OBSERVER=true and the agent gets only the fazm_tools server, no browser, no accessibility, no Whatsapp, no Workspace (line 1543). Use observer mode when you want the loop without the keys.

Adjacent reading

Argument

Local first AI coding agent, when local means the agent and not just the model weights

Where the loop runs is the harder half of the local-first conversation. Same agent process, framed as the layer that survives switching the model.

Read

Architecture

Open source AI agent framework that ships as a Mac app

The same agent process, framed as a framework you can read instead of a SaaS endpoint you can only invoke.

Read

Mechanism

macOS accessibility automation, what works and what does not

Field notes on the AX layer that lets a code agent reach Xcode, Linear, Slack, and the failing CI tab in your real Chrome.

Read