APRIL 2026 / CLASSIFIED BY PRIMITIVE

Open source AI agents for Mac, April 2026: grouped by what they actually drive

Every other roundup for this query lists the same six projects in the same star-ranked order. That ordering hides the only thing that decides whether the agent can do the job: what surface it actually controls. This roundup classifies the open source AI agents you can run on a Mac in April 2026 by automation primitive: vision pixels, browser DOM, native accessibility tree, terminal only, or MCP host. The accessibility-tree row is the one most listicles forget exists, so I include the actual binary path inside the signed Fazm.app that ships it pre-built.

M
Matthew Diakonov
12 min read
4.9from written from the source of mediar-ai/mcp-server-macos-use and mediar-ai/fazm
5 categories, 1 page
Anchor binary: Contents/MacOS/mcp-server-macos-use
5 tools exposed, names verified in main.swift
Covers releases through 2026-04-20
MIT licensed host and macos-use server

The category every roundup forgets

Look at the first ten Google results for this query. Nine list the same six names, sorted by GitHub stars. browser-use leads because it has the most stars. UI-TARS sits next because it ships a consumer Electron app. OpenHands and Goose round out the tail because they reliably show up on the Awesome AI Agents list. None of them tell you what the agent can actually touch on your machine.

That matters because a browser agent is useless when the work lives in Calendar.app. A terminal coding agent cannot answer a Slack message. A vision agent burns tokens reading a list it could have parsed for free if it called the accessibility API. The right way to read the April 2026 landscape is by primitive, not by stars. Five primitives, one page.

Vision pixels

Render the screen to an image, send it to a vision LLM, parse coordinates from the response, click. Works on anything visible. Pays a vision-token budget per step. Breaks on theme changes and DPI shifts.

Browser DOM

Drive Chromium through CDP. Read and click DOM nodes, type into inputs, navigate URLs. Stable inside the browser, blind everywhere else.

Accessibility tree

Call macOS AXUIElement APIs to read role, title, value, frame, focus state for every UI element. Click, type, press into the live tree. Works in any app that exposes accessibility data, which is most of them.

Terminal only

Drive a shell, edit files, run tests. No UI surface at all. The right primitive for coding agents, the wrong one for everything else.

MCP host

Not a primitive itself. A consumer app that hosts a Claude or OpenAI loop and lets you plug any open source MCP server into the tool list. The agent inherits whatever surface its servers expose.

The five April 2026 categories, with names

Within each category, the list is short and the picks are concrete. Releases referenced are the ones that landed in the first three weeks of April 2026.

1

1. Vision pixels

Best for apps that refuse to expose accessibility data. Worst for token economics.

  • UI-TARS Desktop v1.6.0, shipped 2026-04-04. Smaller vision encoder, better OCR under dark mode. ByteDance, Apache 2.0.
  • Open Interpreter 0.5.x, bundles a desktop pop-out and a vision fallback path. Killer-Mode plugin for screen control.
2

2. Browser DOM

Stable, mature, blind to anything outside Chromium. Half the SERP for this query is browser agents pretending to be desktop agents.

  • browser-use 0.7, shipped 2026-04-08. Native MCP transport, multi-tab sessions, Python-first.
  • Skyvern 0.4, headless-first browser agent for repeated workflow runs.
3

3. Accessibility tree

The category most roundups skip. Reads structured data straight from AppKit, no vision call required for navigation.

  • Fazm 2.4.0, shipped 2026-04-20. Consumer .app, MIT licensed, ships Contents/MacOS/mcp-server-macos-use pre-signed.
  • mcp-server-macos-use, standalone MCP binary. 437 lines of Swift. Five tools. Plug into Claude Desktop, Cline, Zed, or your own host.
4

4. Terminal only

Coding agents that live in a shell. Useful for repos, useless for everything visible.

  • Goose 1.2, shipped 2026-04-10. Block, Apache 2.0. Automatic MCP server discovery from a known registry.
  • OpenHands 0.27, multi-agent coding harness with a sandboxed shell and container backends.
  • Hermes Agent v0.5, shipped 2026-04-14. Persistent procedural memory across sessions. Discord, Telegram, Slack, terminal.
5

5. MCP host

Not a primitive, a routing layer. The category that grew fastest in April 2026 because every other host added it.

  • Fazm 2.4.0, five builtin MCP servers plus user-defined entries from ~/.fazm/mcp-servers.json.
  • Cline and Zed, both extended their MCP host surface in early April.

The accessibility-tree row, opened up

Most readers have not actually seen the binary that drives an accessibility-tree agent. It is small and it is short. Five tools. Less than 500 lines of Swift. One Mach-O. Fazm bundles it pre-signed inside the .app, but you can also build it yourself in 30 seconds.

Below are the literal tool names from mcp-server-macos-use/Sources/main.swift. Lines 158, 174, 189, 203, 224. Verbatim. The descriptions below are mine, but the names you can grep for in the repo.

mcp-server-macos-use/Sources/main.swift
0Lines of Swift in main.swift
0Tools the binary exposes
0Mach-O bundled in Fazm.app
0Vision tokens per traversal

How Fazm loads the binary on every session

When you launch Fazm, the ACP bridge spins up, reads the bundled binaries from Contents/MacOS, and registers each one as an MCP server. The macos-use registration is two TypeScript blocks. Here are both.

acp-bridge/src/index.ts

Session startup, accessibility-tree path

Fazm.app launch
ACP bridge
User prompt
Claude agent loop
macos-use
Mac app under test
Click / type / read

Two ways to make a single click happen

The cleanest way to see why category matters is to put a single click side by side. Vision loop on the left, accessibility loop on the right. Same outcome, very different cost and very different failure modes.

vision_loop.ts
// Vision-based loop, 1 step
1. screencapture -i /tmp/frame.png
2. POST frame.png to vision LLM ~ 1500 image tokens
3. Parse JSON for "click 412, 188"
4. CGEvent left-click at 412, 188

// Failure modes:
//   - dark mode flips colors, OCR misses
//   - DPI 2x makes coords off by factor of 2
//   - element off-screen: invisible to vision
//   - disabled state vs enabled: looks the same
ax_loop.ts
// Accessibility-tree loop, 1 step
1. macos-use_refresh_traversal pid: 4321
   -> { role: "AXButton", title: "Send", enabled: true,
        frame: { x: 412, y: 188, w: 64, h: 28 } }
2. macos-use_click_and_traverse pid: 4321,
   x: 444, y: 202

// Failure modes:
//   - target app refuses to expose AX tree
//     (some Electron, some games, fallback to vision)

Side-by-side picks for the five categories

FeatureVision-only / browser-onlyFazm
Native Mac app surfaceno, screenshot or browser onlyyes, AXUIElement tree across any AppKit/SwiftUI app
Install steppip install + Python venv + driver setupdrag .app to Applications, grant accessibility, done
Tokens per click~1500 image tokens for vision agentstens of text tokens for the AX tree node
Survives dark mode and DPI changesno for vision, yes for DOMyes, the AX tree is text
Works inside Mail, Calendar, Notes, Preferencesnoyes
MCP host for additional serversvariesyes, ~/.fazm/mcp-servers.json merges into session
Licensevaries, often Apache 2 or MITMIT for app, MIT for macos-use server
browser-use 0.7UI-TARS Desktop v1.6.0Goose 1.2Hermes Agent v0.5OpenHands 0.27Open Interpreter 0.5Skyvern 0.4Fazm 2.4.0mcp-server-macos-useAIRI 0.3

What I would actually install this month

Three picks, three different jobs. Pick one per category, do not pick two from the same row.

ACCESSIBILITY TREE

Fazm

Consumer .app, MIT licensed. Drag in, grant accessibility, describe what you want done. Works across Mail, Calendar, Notes, Preferences, Finder, and any app that exposes AX.

fazm.ai →
BROWSER DOM

browser-use 0.7

The Python library for headless or headed Chromium. Native MCP transport in 0.7. Multi-tab sessions. The right tool for forms, flight searches, scraping that needs login.

github.com/browser-use →
TERMINAL ONLY

Goose 1.2

Block's open source coding agent. Automatic MCP server discovery in 1.2. Apache 2.0. Plays well as a sidecar to a desktop agent like Fazm.

github.com/block/goose →
437

The Fazm.app bundle ships a Mach-O at Contents/MacOS/mcp-server-macos-use built from a 437-line Swift main.swift. Five tools, one binary, no Python, no install.

Verified in mediar-ai/mcp-server-macos-use main.swift and acp-bridge/src/index.ts:1057

Try the accessibility loop yourself in three commands

You do not need Fazm to use the binary. If you want to try accessibility-tree control directly with Claude Desktop or Cline, here is the entire setup.

setup_macos_use.sh

Picking the right open source Mac AI agent

  • Want a consumer app, not a Python venv? Fazm.
  • Want to drive Calendar.app and Mail.app, not just a browser? Accessibility tree.
  • Want to fill a form on a website behind login? browser-use.
  • Want a coding sidekick in a shell? Goose 1.2 or OpenHands 0.27.
  • Want a stubborn app that refuses AX to still work? Vision fallback (UI-TARS).
  • Want to plug a third-party MCP server in? Use a host that loads ~/.fazm/mcp-servers.json or the Cline equivalent.

Want a walkthrough of the accessibility-tree loop?

15 minutes, screen share, we open Mail or Calendar live and watch the tree dump come back from the bundled Mach-O.

Book a call

Frequently asked questions

What is the actual binary that lets Fazm drive any Mac app?

It is a 437-line Swift program built from Sources/main.swift in github.com/mediar-ai/mcp-server-macos-use. The compiled Mach-O ships inside the signed Fazm.app at Contents/MacOS/mcp-server-macos-use and is loaded as an MCP server on session start by acp-bridge/src/index.ts at line 1057. It uses the macOS accessibility APIs (AXUIElement, AXUIElementCopyAttributeValue, kAXFocusedWindowAttribute) to read and act on the live UI tree, not screenshots. The whole server exposes exactly five tools: macos-use_open_application_and_traverse, macos-use_click_and_traverse, macos-use_type_and_traverse, macos-use_press_key_and_traverse, and macos-use_refresh_traversal.

Why classify open source Mac AI agents by automation primitive instead of by GitHub stars?

Stars measure attention, not capability. browser-use, UI-TARS, and macos-use solve completely different problems and live on completely different surfaces. Sorting them by stars on the same list buries the only decision a reader is actually trying to make: what do I want my agent to touch. A vision-only agent cannot read a hidden text field. A DOM-only agent cannot click a Mail.app toolbar. A terminal agent cannot fill a form in Preferences. The only useful axis is the automation primitive, because that is what determines what the agent can do on day one without you writing glue code.

Which open source agents in April 2026 actually run on a Mac and control native apps, not just a browser?

Three projects in this category are worth knowing this month. Fazm (mediar-ai/fazm) is the consumer-installable desktop app and bundles macos-use as its default native action layer. UI-TARS Desktop (bytedance/ui-tars-desktop) is the most popular vision-language alternative, with v1.6.0 shipped in early April that improved its dark-mode OCR. mcp-server-macos-use (mediar-ai/mcp-server-macos-use) is the standalone MCP server, useful if you want to plug accessibility-tree control into Claude Desktop, Cline, or your own agent loop. Everything else in the popular roundups is browser-only or terminal-only.

What is the difference between vision-based and accessibility-tree-based Mac control?

Vision-based agents render the screen to pixels, send the image to a vision-language model, and ask for click coordinates. Accessibility-tree agents call AXUIElement APIs and get back a structured tree of role, title, value, focused state, and frame for every element. Vision works on anything visible, but it pays an image-token budget per step (Claude downscales vision inputs to 1568px on the longest edge), it breaks on theme switches, and it cannot read disabled or off-screen elements. Accessibility tree is fast and stable, costs almost nothing per call (it is text), survives dark mode and DPI changes, but it depends on the target app exposing accessibility data correctly. Some Electron apps and some games do not. The right answer in 2026 is to default to accessibility and fall back to vision only when the app is opaque.

Is the macos-use MCP server open source and reusable outside Fazm?

Yes. The repo is github.com/mediar-ai/mcp-server-macos-use, MIT licensed. You can build it yourself with swift build and point any MCP-aware host at the binary. The README at the repo root documents the five tools and their exact parameter shapes, including pid, x, y, text, keyName, and modifierFlags. Fazm just ships a pre-built signed copy inside its .app so you do not have to install Xcode and run swift build to use it.

If I want one open source AI agent for my Mac in April 2026, which should I install?

If you want a consumer app that works on first launch and drives any Mac app through accessibility, install Fazm. If you want only browser automation and you are happy writing Python, install browser-use. If you want a vision-language alternative for stubborn apps that refuse to expose accessibility data, install UI-TARS Desktop. If you want a terminal-only coding agent, install Goose 1.2 or OpenHands. The four categories do not overlap as much as the listicles imply, and you can run more than one at the same time.

What changed for open source desktop agents in April 2026 specifically?

Several things shipped in the same window. UI-TARS Desktop v1.6.0 landed on 2026-04-04 with a smaller vision encoder. browser-use 0.7 shipped on 2026-04-08 with native MCP transport. Goose 1.2 shipped on 2026-04-10 with automatic MCP server discovery. Hermes Agent v0.5 shipped on 2026-04-14 with persistent procedural memory. Fazm 2.4.0 shipped on 2026-04-20 with custom MCP server support, opening its host to any third-party MCP server you drop into ~/.fazm/mcp-servers.json. The shared trend is that MCP went from a Claude Desktop curiosity to the default extension protocol for every host on every platform.

Does Fazm need internet access to drive my Mac?

Yes for the model call. The accessibility tree traversal and the click and type actions are local Mach-O code running on your Mac with no network. The reasoning step that decides what to click next runs against a hosted Claude model, which requires an internet connection. If you swap the underlying model for a local one through a custom MCP server or through Anthropic's local API, the network requirement goes away, but at the cost of a much slower agent loop because local Llama-class models are still significantly behind on tool use as of April 2026.

Can I use the macos-use MCP server with Claude Desktop instead of Fazm?

Yes. Build the binary with swift build, then add it to your Claude Desktop config at ~/Library/Application Support/Claude/claude_desktop_config.json under mcpServers. The README has the exact JSON snippet. Restart Claude Desktop and the five macos-use tools appear in the tool palette. Same for Cline, Zed, Continue, and any other host that speaks MCP. Fazm just removes the install-and-configure step and ships the binary pre-signed in the app bundle.