GuidemacOS 14+Open source

An AI agent for macOS is four different things

Every roundup on this topic lists a dozen apps and ranks them. None of them tell you the thing that actually matters when you pick one: the categories are not interchangeable, and the model inside the app is rarely what separates them.

M
Matthew Diakonov
11 min read

Direct answer · verified May 19, 2026

An “AI agent for macOS” is one of four categories: terminal coding agents (Claude Code, Codex CLI), screenshot computer-use agents (OpenAI Operator, Simular), AI chat clients (BoltAI, Raycast AI), and native-UI desktop agents (fazm, Dottie, EnConvo). They look similar in a feature table and behave nothing alike in daily use.

fazm sits in the last category and is unusual inside it: it does not ship its own agent loop. It wraps the real Claude Code and Codex loops over the Agent Client Protocol, pinned at claude-agent-acp 0.29.2 and codex-acp 0.12.0 in its public source. You run a loop you already trust, on a native Mac surface.

The four things people call an AI agent for macOS

If you ask five people for the best AI agent for their Mac, you will get answers from four different categories, and they will argue past each other. Here is the split, plainly.

Terminal coding agents

Claude Code, Codex CLI, Gemini CLI. The agent loop most engineers already trust: real tool use, MCP servers, multi-step edits. It lives inside a terminal window, and it forgets the session the moment you close it.

Screenshot computer-use agents

OpenAI's Operator, Anthropic's computer use, Simular. They read the screen as pixels and click by coordinate. Broad reach, but slow, brittle on layout changes, and often a remote VM rather than your actual Mac.

AI chat clients

BoltAI, Raycast AI. Fast native Mac apps that put many models behind one window. They answer questions and draft text. They do not take actions on your machine, so they are assistants, not agents.

Native-UI desktop agents

fazm, Dottie, EnConvo. Native Mac apps that take real actions across your apps and browser. fazm is the one in this group that does not ship its own agent loop.

Where each category shines, and where it breaks

Terminal coding agents

Claude Code and Codex CLI are, for most engineers, the best agent loops available right now. They do real multi-step work: read a repo, plan, edit files, run tests, call MCP servers. If your work is code and you live in a terminal, the raw CLI is genuinely good and you may not need anything else.

The friction shows up at the edges. Close the terminal tab and the session is gone. Branching a conversation to try a second approach means copying a session id and resuming it by hand. On long runs the context window auto-compacts, and the summary it writes is not the summary you would have written. None of these are model problems. They are surface problems, and they are exactly the problems a native UI can fix without touching the loop.

Screenshot computer-use agents

OpenAI’s Operator, Anthropic’s computer use, and tools like Simular take a screenshot, send it to a vision model, and act on the coordinates the model returns. The appeal is universality: if a human can see it, the agent can in principle click it. The cost is real. A vision round trip per step is slow, the model misreads small UI, and a layout that shifts a few pixels breaks the plan. Several of these also run in a remote virtual machine, so the agent is not really operating your Mac, it is operating a copy.

Use this category when you genuinely need to drive software that exposes no accessibility information and no API. Outside that case, the screenshot tax is paid on every single action.

AI chat clients

BoltAI and Raycast AI are excellent native Mac apps, and they are often recommended in “AI agent” threads where they do not belong. They put ChatGPT, Claude, Gemini, and local models behind one fast window. That is useful. It is also not an agent: they generate text and code for you to copy, they do not click, type, or run anything on your machine. If what you want is fast multi-model chat, pick one of these and skip the agent question entirely.

Native-UI desktop agents

This is the category that actually takes actions on your Mac inside a native app: fazm, Dottie, EnConvo, and a few others. Most of them build their own agent loop. fazm took the opposite bet, and that bet is the rest of this page.

The distinction every roundup skips: the loop versus the surface

An AI agent has two separable parts. The loop is the thing that reasons, decides which tool to call, reads the result, and decides again. The surface is where you see it and talk to it: a terminal, a remote VM, a native window. Roundups rank apps as if the loop and the surface were one thing. They are not.

Once you separate them, the obvious move is to keep the best loop and swap only the surface. That is what the Agent Client Protocol makes possible. The same Claude Code loop can sit behind a terminal, behind Zed, or behind a native Mac app, with no change to the loop itself.

Agent
loop
Claude Code / Codex
Terminal
Screenshot VM
Native Mac UI

One loop at the center. Three surfaces it can run behind. fazm picks the native Mac UI and leaves the loop alone.

Where fazm sits: a native surface, not a new loop

fazm is a native Swift and SwiftUI macOS app. Underneath, it does not run a homegrown agent. It runs an ACP bridge that speaks to two adapters: the Claude Code adapter and the Codex adapter. You can check the exact versions yourself. They are dependencies in the public repo.

// acp-bridge/package.json  (excerpt)
"dependencies": {
  "@playwright/mcp": "0.0.73",
  "@agentclientprotocol/claude-agent-acp": "0.29.2",
  "@zed-industries/codex-acp": "0.12.0",
  "ws": "^8.20.0",
  "zod": "^4.0.0"
}

That choice has consequences you can feel. Because the loop is the real Claude Code, your usage hits your own Claude Pro or Max plan, and any improvement upstream lands in fazm without a rewrite. Codex is a swappable backend you select per chat, so a single window can run whichever loop fits the task. And because the loop is untouched, fazm is free to fix the three terminal frustrations purely at the surface layer.

Persistent sessions: every open chat window is written to a UserDefaults registry and restored on the next launch, so a Mac restart does not cost you a conversation. Forking: a chat has a fork button that issues a real session/fork call, opening a new window with the full prior context while the original is left alive on disk and reachable from Conversation History. And the loop reads the macOS accessibility tree and screen context rather than screenshots, so the same agent reaches past the terminal into your browser, native Mac apps, and Google Workspace. Voice input is there too: hold a hotkey and talk instead of typing.

Claims you can check before you trust any of this

Anything that can click around inside your logged-in apps should be inspectable. Every item below is greppable in the public fazm tree. If one of them is wrong, the page is wrong.

Verifiable in github.com/mediar-ai/fazm

  • acp-bridge/package.json pins @agentclientprotocol/claude-agent-acp at version 0.29.2
  • The same file pins @zed-industries/codex-acp at 0.12.0, a swappable backend you pick per chat
  • patched-acp-entry.mjs wraps ClaudeAcpAgent.createSession, the method ACP calls for newSession, resumeSession, loadSession and forkSession
  • Forking sends a real session/fork JSON-RPC call: ACPBridge.swift posts {"type":"forkSession"} with a fromSessionKey and a toSessionKey
  • ChatProvider.forkSession keeps the source session alive on disk, reachable from Conversation History, so neither branch is destroyed
  • Open chat windows persist to a UserDefaults registry via DetachedChatWindow.saveWindowRegistry and are restored on next launch
  • The whole stack is MIT licensed and buildable from source at github.com/mediar-ai/fazm

How to actually pick one

The honest version of a recommendation admits where it loses. Here is when each category wins.

  • Pick raw Claude Code or Codex CLI

    if your work is entirely code, you are comfortable in a terminal, and you do not mind losing sessions on restart or forking by hand. The loop is excellent and the terminal is not in your way.

  • Pick a screenshot computer-use agent

    if you must automate visual software that exposes no accessibility tree and no API. Accept the per-step latency and the brittleness, because nothing else can reach that software.

  • Pick a chat client like BoltAI or Raycast AI

    if you only want fast multi-model chat and you will copy the output yourself. You do not need an agent, and pretending you do just adds permissions you will not use.

  • Pick fazm

    if you want the trusted Claude Code or Codex loop, but with sessions that survive a restart, one-click forking, voice input, and reach across your browser, native Mac apps, and Google Workspace, in an open-source app you can read before you grant it permissions.

Trying to choose an AI agent for your Mac?

Book 20 minutes and we will walk through which of the four categories fits your actual workflow, including the cases where fazm is not the right answer.

Frequently asked questions

Frequently asked questions

What is an AI agent for macOS?

It is software that does not just answer questions, it takes actions on your Mac: running code, editing files, clicking through apps, filling forms in your browser. In practice the term covers four different categories. Terminal coding agents (Claude Code, Codex CLI) run a trusted agent loop inside a terminal. Screenshot computer-use agents (Operator, Simular) read the screen as pixels and click by coordinate. AI chat clients (BoltAI, Raycast AI) are fast native apps that answer questions but do not act. Native-UI desktop agents (fazm, Dottie, EnConvo) are native Mac apps that take real actions across your apps. Picking one starts with deciding which category you actually need.

Is Claude Code an AI agent for macOS?

Yes, and it is one of the better agent loops you can run on a Mac, but it runs in a terminal. That is fine for engineers who live in a terminal, and limiting for everyone else. It also has three rough edges in long use: the session is gone when you close the terminal, branching a conversation means a manual session-id dance, and on long runs the context auto-compacts. fazm wraps that exact Claude Code loop (via the claude-agent-acp adapter, pinned at 0.29.2 in acp-bridge/package.json) in a native Mac window so the loop you trust gets persistent sessions, one-click forking, and reach beyond the terminal.

What is the difference between a screenshot agent and an accessibility-based agent?

A screenshot agent captures the screen as an image, sends it to a vision model, and the model returns pixel coordinates to click. It works on anything visible but is slow and breaks when a layout shifts by a few pixels. An accessibility-based agent reads the macOS accessibility tree instead: a structured list of every button, field, and menu item with its real role and label. That is faster and far more reliable for native Mac apps. fazm uses accessibility APIs and structured screen context rather than screenshot guessing, which is why the same agent reaches into native apps and Google Workspace, not only the browser.

What does ACP mean here, and why does it matter?

ACP is the Agent Client Protocol, a JSON-RPC protocol (from Zed Industries) that lets any user interface talk to any agent backend. It matters because it decouples the agent loop from the surface it runs on. fazm does not write its own agent loop. It speaks ACP to two adapters: claude-agent-acp for Claude Code and codex-acp for Codex. So the loop doing the reasoning and tool calls is the real upstream agent, and fazm is the native Mac surface around it. You can verify both adapter versions in acp-bridge/package.json.

Can an AI agent for macOS run locally and stay private?

Partly, and it depends on what you mean. The model inference is the only piece that has to leave your machine, and with fazm that call goes to your own Claude Pro or Max plan, or a custom Anthropic-compatible endpoint you choose. The agent loop, the accessibility reads, the screen context, and your chat history all run and stay on your Mac. fazm is also fully open source, which is the only honest way to trust something that can click around inside your logged-in browser.

Do I need to use the terminal to run fazm?

No. fazm is a native Swift and SwiftUI macOS app. You download it from fazm.ai or build it from source, and end users never touch a terminal. The agent loop underneath is the same Claude Code or Codex loop a terminal user would run, but the surface is a normal Mac app with a floating control bar, voice input, and detached chat windows.

Which AI agent for macOS should I pick?

If you only ever code and you are happy in a terminal, raw Claude Code or Codex CLI is fine. If you need an agent to operate arbitrary visual software with no accessibility support, a screenshot computer-use agent is the honest pick despite the speed cost. If you just want fast multi-model chat, a chat client like BoltAI is the right tool and you do not need an agent at all. fazm is the pick when you want the trusted Claude Code or Codex loop, but with persistent sessions, one-click forking, voice input, and reach across your browser, native apps, and Google Workspace.

What macOS version does fazm need?

macOS 14.0 or newer. It is a native macOS-only app. There is no Windows or Linux build, because the whole approach depends on the macOS accessibility APIs.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.