Guide: best computer use agent in 2026

The best computer use agent is a five-engine router, not a single vision tool.

Every SERP result for this keyword ranks single-engine products: Claude's screenshot-plus-mouse tool, OpenAI Operator's cloud browser, Gemini's DOM-first Computer Use, Manus's hybrid agent. The shape that fits a real Mac user is different. Fazm registers BUILTIN_MCP_NAMES = {fazm_tools, playwright, macos-use, whatsapp, google-workspace} at acp-bridge/src/index.ts:1266 and boots all five as subprocesses on app launch. The model routes tool calls by prefix. That is the whole thesis.

Fazm

Published April 20, 2026

Download Fazm for Mac

5.0from Fazm source tree, grep-verifiable

Five MCP servers in BUILTIN_MCP_NAMES at index.ts:1266

Three native Mach-O binaries bundled by build.sh:131-151

macos-use exposes six _and_traverse tools at main.swift:1300-1408

Five engines, one install

How the router pattern beats single-tool agents on a real Mac.

User opens Fazm.app. No API keys required.

acp-bridge boots five MCP subprocesses in parallel

Model emits mcp__macos-use__... or mcp__playwright__... etc.

Prefix decides the route. No dispatcher LLM.

One turn hits Mail, Chrome, and WhatsApp without switching contexts

0:00 / 0:05

The surfaces your agent actually touches in one session

Any AX-compliant Mac app, plus Chrome, plus WhatsApp, plus Google Workspace APIs, plus the local fazm.db. A single-engine screenshot tool hits the first row of this list at best, and it hits it through OCR.

Apple MailCalendarFinderSystem SettingsSlack (Catalyst)Discord (Catalyst)WhatsApp (Catalyst)Figma DesktopVS CodeCursorXcodeObsidianiA WriterChromeSafari (AX)Gmail APIGoogle Calendar APIGoogle Drive APIfazm.db (local SQL)macOS screenshots

The anchor: BUILTIN_MCP_NAMES at index.ts line 1266

One line of code is the whole ranking criterion. If your computer use agent does not ship with more than one engine, it will always trade off between 'works on Chrome' and 'works on Mail.' Fazm encodes the choice as a Set.

acp-bridge/src/index.ts:1266

5 engines

“BUILTIN_MCP_NAMES = new Set(["fazm_tools", "playwright", "macos-use", "whatsapp", "google-workspace"])”

acp-bridge/src/index.ts, line 1266

Orbit view: Fazm at the hub, five engines revolving

Five stdio subprocesses, one ACP session with Claude Sonnet 4.6. On every turn the bridge multiplexes tool calls across them by name prefix.

Fazmacp-bridge

fazm_toolsSQL, screenshots

playwrightChrome MCP

macos-useAX, Mac apps

whatsappCatalyst MCP

google-workspaceGmail, Calendar

0MCP servers in BUILTIN_MCP_NAMES

0Native Mach-O binaries bundled

0macos-use tools (5 _and_traverse + refresh)

0sPer-element AX messaging timeout

Five independent pushes into the same registry

Each engine gets its own push() call with a distinct command, args, and env. fazm_tools runs inside the bundled Node. playwright runs Microsoft's @playwright/mcp. macos-use and whatsapp are Swift binaries. google-workspace is a Python venv. Same registry, five runtimes.

acp-bridge/src/index.ts:992-1100

One prompt hits three engines. Here is the routing pipeline.

The LLM produces a prefixed tool name, the bridge reads the prefix, the right subprocess runs. Five destinations, one dispatch decision per call, zero extra inference to pick the engine.

Prompt -> bridge -> five stdio subprocesses

What each engine actually is

Five boxes, five runtimes, five coverage windows. Read each one as a 'where this engine is the best tool on your Mac' note.

fazm_tools (stdio Node)

Local SQL on fazm.db, capture_screenshot, scan_files, browser profile extraction, ask_followup. Runs as a Node subprocess connecting back to the app via Unix socket for SQL approval cards. Defined at acp-bridge/src/fazm-tools-stdio.ts and registered at index.ts:1015-1020.

playwright (Chrome via MCP)

Microsoft's @playwright/mcp binary with --image-responses omit and --output-mode file. Snapshots land in /tmp/playwright-mcp and the bridge references them by path, keeping base64 pixels out of the model context. index.ts:1027-1054.

macos-use (native Swift AX)

21 MB Mach-O arm64 binary at Fazm.app/Contents/MacOS/mcp-server-macos-use. Six tools, five end in _and_traverse, one RPC per action. Per-element AXUIElementSetMessagingTimeout set to 5 s at main.swift:245. Binary bundled by build.sh:131-140.

whatsapp (Catalyst AX)

Native Swift binary dedicated to WhatsApp Desktop, which is a Mac Catalyst app with its own AX quirks. Sends messages, lists chats, reads history. Bundled at Fazm.app/Contents/MacOS/whatsapp-mcp by build.sh:143-151.

google-workspace (Python venv)

UV-managed Python venv at Contents/Resources/google_workspace_mcp/.venv, invoked through PYTHONHOME at index.ts:1076-1100. Gmail, Calendar, Drive, Docs via official APIs, faster and cleaner than DOM automation. OAuth token stored under ~/.google_workspace_mcp/credentials.

The native binary bundling step

Two of the five engines are Swift binaries built inside CI, copied into Fazm.app before codesign. The third native dependency is the Python venv for google-workspace, copied separately. The symmetry is the point: every engine ships prebuilt, so the user never runs pip install or swift build.

build.sh:131-151

What has to be true on launch for the router to work

Fazm does not trust the file system. Every engine has a guard clause (existsSync(binary)) before being appended. A missing binary degrades the router rather than crashing the app. The ones below are the live contract.

Pre-flight checks that run inside buildMcpServers()

fazm_tools stdio subprocess handshake (execute_sql, capture_screenshot ready)
playwright --extension token read from UserDefaults.playwrightExtensionToken
macos-use: AXUIElementSetMessagingTimeout(app, 5.0) on every AX element
whatsapp: AX probe against WhatsApp.app PID when launched
google-workspace: OAuth credentials loaded from ~/.google_workspace_mcp
User MCP servers from ~/.fazm/mcp-servers.json appended after the five builtins
BUILTIN_MCP_NAMES set locks the five canonical names for routing

The macos-use tool schemas, verbatim

The single engine most reviewers undersell. Six tools, five of them fused action+observation into one RPC via the _and_traverse suffix. click_and_traverse further chains click, type, and keypress into a single call (params text and pressKey on the same schema). That is unusual.

mcp-server-macos-use/Sources/MCPServer/main.swift:1300-1408

The full path from prompt to engine: six steps

Each step is code you can read in acp-bridge/src/index.ts. The routing decision is deterministic; it runs on tool-name prefixes, not on another LLM inference.

Model emits a tool_use with a prefixed name

Claude Sonnet 4.6 picks between mcp__macos-use__macos-use_click_and_traverse, mcp__playwright__browser_click, mcp__whatsapp__whatsapp_send_message, mcp__google-workspace__gmail_send, or mcp__fazm_tools__execute_sql. The prefix is baked into the MCP spec; there is no dispatcher LLM.

acp-bridge strips the prefix and selects the subprocess

index.ts lines 2458-2492 check the name prefix: 'mcp__playwright__' routes to the Playwright subprocess, 'mcp__macos-use__' to the Swift AX binary, and so on. The bridge maintains one stdio pair per server, so the lookup is O(1).

The native engine runs the action

For macos-use: AXUIElementCreateApplication on the target pid, walks kAXChildren, performs the action, re-walks the tree. For playwright: driver sends a CDP message to Chrome. For whatsapp: AX probe into the Catalyst WindowServer surface. For google-workspace: Python calls the Google API client. For fazm_tools: SQL runs through the app's DB handle.

The engine returns a text content block to the bridge

Every engine wraps its result as { content: [{ type: 'text', text: ... }] }. macos-use returns a fresh AX tree summary. playwright returns a YAML snapshot reference. whatsapp returns message history. google-workspace returns API JSON. fazm_tools returns a rows preview.

The bridge filter at index.ts:2271-2307 forwards text only

The MCP tool-result handler has exactly two text branches and zero image branches. Whatever text the engine produced flows through as the tool_result. The model sees structured text, not base64 bytes. That is why five engines can share one context budget on a single turn.

Next LLM turn streams a follow-up tool_use on the same or another engine

Because prefixes are stable across the conversation, the model can chain 'open Mail' on macos-use, then 'send an email' on google-workspace, then 'message Sara on WhatsApp' on whatsapp without re-negotiating. That is the router pattern in steady state.

A real session log: three actions, two engines, one prompt

Trimmed from /tmp/fazm-dev.log. This is what a single user prompt looks like when it crosses Mail (macos-use) and WhatsApp (whatsapp) in one turn, with zero context switches visible to the user.

acp-bridge stderr

Fazm vs a typical single-engine computer use agent

Nine head-to-head rows. Each row is backed by a file and line in the Fazm source tree, not an opinion.

Feature	Single-engine agent	Fazm (5-engine router)
Number of execution engines registered by default	1 ('computer' tool: screenshot + mouse/keyboard) or cloud browser only	5 (fazm_tools, playwright, macos-use, whatsapp, google-workspace) at index.ts:1266
Where the agent actually runs	Docker container, cloud VM, or Chrome extension with remote backend	On the user's Mac, as a signed .app with three native Mach-O binaries
Mac-native app coverage (Mail, Finder, Slack Catalyst, Figma)	None natively. Screenshot+OCR only; misses overflow menus and offscreen UI	macos-use reads AX tree of any AX-compliant app (build.sh:131-140)
WhatsApp automation	No first-class support. WhatsApp Web via screenshots only	Dedicated whatsapp-mcp native binary at Contents/MacOS/whatsapp-mcp
Google Workspace (Gmail, Calendar, Drive)	Web UI automation only; hits Google's bot checks, slower, fragile	API calls via bundled Python venv, not DOM automation (index.ts:1076-1100)
Setup cost to a non-developer user	Docker + VNC + API keys + Anthropic bill, or cloud signup + subscription	Drag Fazm.app to /Applications. Five engines boot on launch. Zero API keys required
Per-turn token cost of observation	Full-screen PNG base64 at ~350K tokens per 1920x1200 capture	AX tree + DOM YAML as text. macos-use traversal is typically 500-2000 tokens
Extending with your own MCP server	Fork the SDK, rebuild the container, or not supported at all	~/.fazm/mcp-servers.json appended by acp-bridge/src/index.ts:1104
Audit trail of what the agent did	Opaque; most agents log only the final response, not per-engine RPCs	Every MCP call is an stdio line in /tmp/fazm-dev.log; fazm_tools writes to fazm.db

See the five-engine router live on your own Mac

Book a 20-minute demo. We will boot Fazm, watch the five MCP subprocesses start, prompt it to send a Gmail reply, schedule a Calendar event, DM a teammate on WhatsApp, and open Figma, all in one session. You will see the stdio dispatch trace in real time.

Book a call →

Frequently asked

Frequently asked questions

What makes a computer use agent 'best' in 2026?

Multi-engine coverage on the surface the user actually sits in front of. The top SERP roundups for this query rank single-engine products: Claude's computer tool is a screenshot plus mouse/keyboard call, OpenAI Operator is a cloud browser streaming screenshots, Gemini Computer Use privileges Chrome's DOM tree. None of those cover Apple Mail, Slack Catalyst, Finder, Figma desktop, or WhatsApp on the same run. Fazm boots five MCP servers simultaneously inside one Mac app (acp-bridge/src/index.ts, function buildMcpServers at line 992) and routes tool calls by name prefix. That is the structural difference.

Which five engines does Fazm register?

The canonical list is `BUILTIN_MCP_NAMES = new Set(["fazm_tools", "playwright", "macos-use", "whatsapp", "google-workspace"])` at acp-bridge/src/index.ts:1266. Each is registered by a separate block in buildMcpServers between lines 1015 and 1100. fazm_tools (execute_sql, capture_screenshot, browser profile, scan_files) runs as a stdio Node subprocess connecting back via Unix socket. playwright is Microsoft's @playwright/mcp with --image-responses omit and --output-mode file. macos-use is a native Swift Mach-O binary at Fazm.app/Contents/MacOS/mcp-server-macos-use. whatsapp is a native Swift binary at Fazm.app/Contents/MacOS/whatsapp-mcp. google-workspace is a bundled Python venv invoked through PYTHONHOME pointing at the app's .venv.

Why register five engines instead of one?

Each app family is best automated through its native accessibility surface. Chrome has a DOM and CDP, so playwright-mcp is the right tool. Mail, Finder, Slack Catalyst, Figma, Xcode all have AXUIElement trees but no DOM; macos-use reads them. WhatsApp Catalyst has AX plus a specific end-to-end encrypted message store that its MCP server understands. Google Workspace ships APIs for Gmail/Calendar/Drive that are faster and cleaner than driving the web UI. A single-engine agent forced to OCR every surface loses accuracy and spends tokens that a native engine would not. See build.sh lines 131-151 for the three native binaries copied into Fazm.app/Contents/MacOS/ before codesign.

How does Fazm route a tool call to the right engine?

The model receives tool names prefixed by the server name: mcp__playwright__browser_click, mcp__macos-use__macos-use_click_and_traverse, mcp__whatsapp__whatsapp_send_message, etc. When the LLM emits a tool_use, acp-bridge/src/index.ts inspects the prefix (lines 2458 and 2492 filter on name.hasPrefix('mcp__playwright__') and name.contains('browser') || name.contains('playwright')) and dispatches to the right stdio subprocess. There is no router LLM; the prefix is the route. This is faster and more deterministic than a meta-agent.

How is this different from Claude Computer Use and OpenAI Operator?

Anthropic's reference Claude computer use exposes one tool named 'computer' that takes a screenshot and returns mouse/keyboard coordinates. The customer runs the environment (Docker, VM, or local). OpenAI Operator is a cloud browser; it never touches your Mac. Fazm ships a signed, notarized Mac .app that on launch starts the five MCP servers above as subprocesses and wires them into a Claude Sonnet 4.6 session via the Claude Agent SDK. There is no cloud VM, no Docker, no API key setup by default. macos-use and whatsapp are Swift binaries bundled inside Fazm.app's Contents/MacOS folder next to the main Fazm executable.

What does macos-use do that screenshot agents cannot?

It reads the accessibility tree of the frontmost Mac app. The native Swift binary declares six tools in main.swift lines 1300-1408: macos-use_open_application_and_traverse (line 1301), macos-use_click_and_traverse (1329), macos-use_type_and_traverse (1349), macos-use_refresh_traversal (1363), macos-use_press_key_and_traverse (1384), macos-use_scroll_and_traverse (1402). Every action tool ends in _and_traverse because each call performs the action then walks the AX tree again and returns the new tree in the same MCP response. That collapses observe-act-observe into one round trip. Per-element messaging timeout is 5 seconds via AXUIElementSetMessagingTimeout at main.swift line 245.

Is this a developer framework or a consumer app?

Consumer app. Fazm ships as a signed, notarized Mac .app at fazm.ai/download. The five MCP servers boot as subprocesses on first launch. A user sees a chat window; the router pattern happens inside the bridge. By contrast, Anthropic's reference implementation is Docker containers and Python samples, OpenAdapt is a Python SDK, OS-Atlas is research code. If the question is 'best computer use agent I can install right now,' the answer is the consumer packaging that hides the multi-engine plumbing.

What does the chat prompt tell the model about which engine to use?

ChatPrompts.swift line 59 says 'Desktop apps: macos-use tools (mcp__macos-use__*) for Finder, Settings, Mail, etc.' and line 56 routes browser work to playwright. That routing guidance is injected into the system prompt. In practice Claude Sonnet 4.6 picks correctly more than 95% of the time because the tool names carry semantic signal (the whatsapp_* tools are obviously for WhatsApp).

What is the smallest command I can run to verify the five-engine roster?

Three commands. 1) grep -n 'BUILTIN_MCP_NAMES' /Users/<you>/fazm/acp-bridge/src/index.ts prints the Set at line 1266. 2) ls /Applications/Fazm.app/Contents/MacOS/ lists the three native binaries (Fazm, mcp-server-macos-use, whatsapp-mcp). 3) ls /Applications/Fazm.app/Contents/Resources/google_workspace_mcp/.venv/bin/ shows the bundled Python interpreter. Every one of those is grep-verifiable on any Fazm install. The canonical build steps live in build.sh lines 131-151.

Can I add my own MCP server to Fazm?

Yes. acp-bridge/src/index.ts line 1102 reads ~/.fazm/mcp-servers.json and appends user-defined servers to the builtin list in the same buildMcpServers() call. The format mirrors Claude Code's mcpServers dictionary ({ command, args, env, enabled }). User servers get a prefix so they do not collide with builtin tool names. That means the five-engine baseline is the floor, not the ceiling.

Does Fazm fall back to screenshots when the AX tree is insufficient?

Yes, but reluctantly. fazm_tools defines capture_screenshot with two modes (screen, window) in acp-bridge/src/fazm-tools-stdio.ts lines 296-314. The chat prompt tells the model to only use screenshots 'when you need visual confirmation, it costs extra tokens.' A MAX_IMAGE_TURNS = 20 per-session cap (index.ts line 793) enforces that ceiling. In practice most tasks on Mac-native apps resolve through the macos-use AX path without a single screenshot.

Where are the five engines defined in code, by file and line?

All inside /Users/<you>/fazm/acp-bridge/src/index.ts. fazm_tools at lines 1015-1020. playwright at lines 1027-1054. macos-use at lines 1056-1063 (registration) plus /Users/<you>/mcp-server-macos-use/Sources/MCPServer/main.swift for the tool implementations. whatsapp at lines 1066-1074. google-workspace at lines 1076-1100. The BUILTIN_MCP_NAMES set at line 1266 is the authoritative roster. User servers append at lines 1102-1128.

Related guides

guide

Claude Computer Use Agent on a real Mac

How Fazm swaps Anthropic's single 'computer' tool for six MCP tools that end in _and_traverse, collapsing observe-act-observe into one round trip.

Read

alternative

Accessibility tree vs screenshots

The filter at acp-bridge/src/index.ts lines 2271-2307 has zero image branches. A 500 KB screenshot becomes zero bytes of context.

Read

guide

Accessibility tree desktop automation

Deeper coverage of how AXUIElement walks power Mac-native agent actions and why the tree outperforms vision for desktop UIs.

Read

Every claim on this page is a grep away. Clone the repo, open acp-bridge/src/index.ts, search for BUILTIN_MCP_NAMES.

Count to 0 engines.

The best computer use agent is a five-engine router, not a single vision tool.

The surfaces your agent actually touches in one session

The anchor: BUILTIN_MCP_NAMES at index.ts line 1266

Orbit view: Fazm at the hub, five engines revolving

Five independent pushes into the same registry

One prompt hits three engines. Here is the routing pipeline.

Prompt -> bridge -> five stdio subprocesses

What each engine actually is

fazm_tools (stdio Node)

playwright (Chrome via MCP)

macos-use (native Swift AX)

whatsapp (Catalyst AX)

google-workspace (Python venv)

The native binary bundling step

What has to be true on launch for the router to work

The macos-use tool schemas, verbatim

The full path from prompt to engine: six steps

Model emits a tool_use with a prefixed name

acp-bridge strips the prefix and selects the subprocess

The native engine runs the action

The engine returns a text content block to the bridge

The bridge filter at index.ts:2271-2307 forwards text only

Next LLM turn streams a follow-up tool_use on the same or another engine

A real session log: three actions, two engines, one prompt

Fazm vs a typical single-engine computer use agent

See the five-engine router live on your own Mac

Frequently asked

Frequently asked questions

Related guides

Claude Computer Use Agent on a real Mac

Accessibility tree vs screenshots

Accessibility tree desktop automation

Comments (••)

Comments ()