Open source AI desktop agents, April 2026: host-extensible MCP is the real story
Every SERP result this month ranks open source desktop agents by GitHub stars: browser-use, UI-TARS, UFO, OpenHands, and whichever Python framework shipped a changelog in the last two weeks. That ranking skips the news. April 2026 is the month the desktop agent story quietly moved off framework leaderboards and onto host clients that let you plug any open source MCP server into the agent loop. This guide walks through the shift, points at the two dozen lines of code in Fazm 2.4.0 that made it concrete, and shows what to put in your own ~/.fazm/mcp-servers.json this month.
The real April 2026 news
SERP coverage for this keyword is mostly framework listicles and a handful of news digests. Nine of the ten top-ranked pages talk about the same six names (browser-use, UI-TARS, UFO, OpenHands, Open Interpreter, AutoGPT). The tenth is a weekly AI news roundup. None of them name the change that actually happened this month: three agent hosts, on three separate Mondays, shipped user-level MCP server configuration. That is a bigger deal for open source desktop agents than any single framework release.
When a host is extensible, the open source agent community stops spending its energy on framework glue and starts spending it on MCP servers that do one thing well. The same MCP server now runs under Claude Desktop, Cline, Zed, Continue, Google ADK, OpenAI Agents SDK, and Fazm with zero changes. Forking Aider to add a Jira tool is a dead end; writing a Jira MCP server is work that compounds across every host.
Fazm 2.4.0 session startup
What the 2.4.0 release actually ships
Here is the relevant entry, verbatim, from /Users/matthewdi/fazm/CHANGELOG.json (trimmed to the lines that matter for desktop agents this month):
Three lines. The first one is the shift. The second one means the host stops gatekeeping which Claude model you use. The third one updates the protocol the host speaks to its agent. Put together, they change what it means to pick a desktop agent in April 2026.
The 30 lines in acp-bridge that made it concrete
The feature lives in one function in acp-bridge/src/index.ts. Every time the Fazm app opens a new chat session, it calls buildMcpServers(). That function starts with the five hardcoded builtins and then appends anything the user put in ~/.fazm/mcp-servers.json. Here is the exact block that does the appending.
The five servers Fazm already ships
Before writing your own MCP entry, know what is already on the session. Missing this is the single most common mistake people make on their first day of extending a host: they add a server that duplicates a builtin and wonder why two tools with the same name are showing up. Line 1266 declares the set.
fazm_tools
In-process Swift dispatch. Opens pop-out windows, queries the local file index, runs the chat observer, writes observer cards, and routes write-intent SQL through a confirmation prompt in ChatToolExecutor.swift.
playwright
Launches @playwright/mcp with --output-mode file --image-responses omit --output-dir /tmp/playwright-mcp. Bridge-side extraction in acp-bridge skips image payloads to keep token budgets sane.
macos-use
Native Mach-O at Contents/MacOS/mcp-server-macos-use, bundled in the signed .app. Drives any Mac app through AXUIElement, not screenshots. MIT licensed at github.com/mediar-ai/mcp-server-macos-use.
Second native Mach-O at Contents/MacOS/whatsapp-mcp. Talks to the WhatsApp Catalyst app via accessibility APIs: list chats, open a chat, read messages, send a message.
google-workspace
Python MCP server bundled under Contents/Resources/google-workspace-mcp/. Gmail threads and messages, Calendar events, Drive files. Uses OAuth with a token stored under ~/.fazm/.
What to put in your ~/.fazm/mcp-servers.json on day one
The file format mirrors Claude Code's mcpServers dictionary. Each key is a server name, each value is a shape with command, args, env, and an optional enabled flag. Missing enabled means true. Anything with enabled set to false is silently skipped on load, which is useful for parking a config while you test something else.
After saving, open or re-open a chat. The bridge emits an mcp_servers_available event containing every active server tagged builtin or user, and the Settings panel reflects the same list. If a server fails to load, the reason lands in the app log at /tmp/fazm.log, which is the fastest way to diagnose a missing binary or a typo in args.
Why accessibility still wins over screenshots this month
Host extensibility matters most when the host already has something strong in the box. Fazm's macos-use server has been the single most undervalued open source desktop agent of the last three months precisely because accessibility APIs read structured data, not pixels. You pay a few hundred tokens to describe a window, not an image budget. The tree survives dark mode, high-DPI, theme changes, and localization. Screenshots only earn their keep on canvas-heavy surfaces (PDFs with figures, Figma, drawing apps) where the app does not publish semantics to AX.
“Typical token cost to describe a focused macOS window as an accessibility tree, versus a screenshot that gets downscaled to 1568px on the longest edge before upload and charged as an image.”
ScreenCaptureManager.swift line 153, Claude image-size constraint
Picking an open source desktop agent in April 2026
The decision tree flipped this month. In March, the question was which framework to fork. In April, with three major hosts speaking MCP, the question is which host to run and which MCP servers to load into it. Here is how the pieces fit.
The April 2026 decision path
Pick a host, not a framework
Claude Desktop, Cline, Zed, Continue, or Fazm. Each is an open source or source-available MCP-aware client. Fazm is the only one that is a signed consumer macOS app with bundled MCP servers on first launch.
Audit what the host ships with
Fazm 2.4.0 ships five: fazm_tools, playwright, macos-use, whatsapp, google-workspace. Claude Desktop ships none; you add them yourself. Cline is dev-focused, so it assumes you will wire filesystem and terminal servers.
Drop user MCP servers into the host's config path
For Fazm, that is ~/.fazm/mcp-servers.json. For Claude Desktop, it is ~/Library/Application Support/Claude/claude_desktop_config.json. The shape is nearly identical because both derive from the Claude Code mcpServers schema.
Verify the merge
Open a new session, check the tool list, and confirm the user servers landed without duplicating a builtin name. Fazm logs every server it loads in /tmp/fazm.log. Claude Desktop logs to ~/Library/Logs/Claude/mcp.log.
Rebuild only the MCP layer when you want new behavior
Need a Jira tool? Write or fork a Jira MCP server and point your host at it. You never need to touch the host source, the agent loop, or any framework to get the new capability.
Old path vs the April 2026 path
| Feature | Fork a Python framework | Host + user MCP servers |
|---|---|---|
| Adding a new capability | Fork the framework, write a tool in its tool schema, rebuild, redeploy, hope the host still works. | Add one entry to ~/.fazm/mcp-servers.json. No rebuild, no restart beyond new session. |
| Reusing work across hosts | A CrewAI tool does not run under LangGraph without a rewrite. A LangGraph tool does not run under ADK without a rewrite. | Same MCP server runs under Fazm, Claude Desktop, Cline, Zed, Continue, Google ADK, OpenAI Agents SDK 0.4+. |
| Distribution | Tool ships inside the agent codebase. Distribution means maintaining a fork for every host. | MCP server is a separate repo. Users install by pasting a config block. |
| Desktop control surface | pyautogui bindings, framework-specific browser tools, everything reinvented per framework. | macos-use ships native in the host; you add filesystem, github, postgres, notion, etc. as MCP entries. |
| Permissioning | Framework has to replicate every permission flow itself, usually badly. | Host owns the permission prompts (accessibility, screen recording, OAuth). MCP servers see a scoped capability. |
The anchor fact, in one sentence
On 2026-04-20, Fazm 2.4.0 started reading ~/.fazm/mcp-servers.json in every session and merging whatever is there with the five builtins declared at acp-bridge/src/index.ts line 1266. That is the single change that pulls open source desktop agents out of the framework-leaderboard era and into the host-extensible MCP era. The exact code is 36 lines at acp-bridge/src/index.ts 1102-1137.
A 10-minute April 2026 setup
- Install Fazm 2.4.0+ from fazm.ai or build from source at github.com/mediar-ai/fazm
- Grant Accessibility and Screen Recording permission on first launch
- Confirm the five builtins load: tail -f /tmp/fazm.log | grep mcp_servers_available
- Create ~/.fazm/mcp-servers.json with at least a filesystem entry pointed at your active project directory
- Open a new session and verify the user server shows up tagged (user) in the log emission
- Iterate: add a github, postgres, or notion MCP, toggle with the enabled flag
The quiet win: 0 framework forks
Under the old path, adding a Jira capability to your agent meant forking whichever framework you used, writing a tool in its schema, rebuilding the app, and hoping upstream updates did not break the fork. Under the April 2026 path, you write (or install) a Jira MCP server in its own repo, paste a 5-line block into ~/.fazm/mcp-servers.json, and open a new session. Zero framework forks, zero host rebuilds, zero agent-loop changes. That is the actual productivity delta this month, and it is buried under a pile of star-count listicles that do not mention it.
Want us to walk you through extending Fazm with your own MCP servers?
15 minutes on a call, we will look at your workflow and tell you which MCP entries will move the needle first.
Book a call →Frequently asked questions
What changed for open source AI desktop agents in April 2026?
Two things shifted at the same time. First, Fazm 2.4.0 shipped on 2026-04-20 with custom MCP server support: the desktop app now reads ~/.fazm/mcp-servers.json on every session and merges whatever the user declares there with the five MCP servers bundled in the signed .app. Second, Fazm's ACP bridge was upgraded to Claude agent protocol v0.29.2 on the same date. Together those two changes mean the question 'which open source desktop agent should I pick' is no longer 'which Python framework do I want to write glue code against', it is 'which MCP servers do I want my host client to load'.
Which five MCP servers does Fazm bundle by default?
Line 1266 of acp-bridge/src/index.ts declares the set: fazm_tools, playwright, macos-use, whatsapp, google-workspace. fazm_tools is the in-process dispatch layer for actions that live in the Swift app (send message, open pop-out, query the local index, run the chat observer). playwright is the Playwright MCP bridge extension for browser control. macos-use is the native Mach-O binary at Contents/MacOS/mcp-server-macos-use inside the signed bundle, driving any Mac app through AXUIElement accessibility calls. whatsapp is a second native binary (Contents/MacOS/whatsapp-mcp) that talks to the WhatsApp Catalyst app through accessibility APIs. google-workspace is a Python MCP server bundled under Contents/Resources/google-workspace-mcp/ that exposes Gmail, Calendar, and Drive. All five ship in the notarized .app, so they work on first launch without any install step.
How does custom MCP server support actually work under the hood?
Every session the ACP bridge calls buildMcpServers() in acp-bridge/src/index.ts. That function starts with the five builtins, then at lines 1102 to 1137 it reads ~/.fazm/mcp-servers.json if the file exists. The format mirrors Claude Code's mcpServers dictionary: each key is a server name, and the value is an object with command (required), optional args, optional env, and an optional enabled boolean. Entries with enabled:false are skipped, and entries missing a command are logged and skipped. Everything else is appended to the server list and handed to the Claude agent for the session. The Settings UI writes to the same JSON file, so adding a server from the UI and pasting one in by hand are equivalent.
Which open source MCP servers are worth plugging in on day one?
The ones that do something you cannot already do through the five builtins. macos-use, playwright, and google-workspace are already there, so you do not need to redeclare them. Good additions in April 2026: filesystem MCP for controlled project access, a GitHub MCP for repo browsing, a postgres MCP for local dev databases, a Notion MCP if you keep docs there, a Linear MCP for tickets, a Figma MCP for design lookups, and any of the vertical MCP servers from the MCP ecosystem catalog. The rule is the same as picking any open source tool: prefer ones that are in active maintenance, license-clean, and scoped to one domain.
Why is MCP the right seam for open source desktop agents, not a Python framework?
A Python framework binds your agent to a specific runtime, a specific harness, and a specific client. If you write a tool for CrewAI you still need to rewrite it to run under LangGraph or OpenAI Agents SDK or Claude Code. MCP is a protocol, not a framework: the same MCP server runs under Claude Desktop, Cline, Zed, Continue, and Fazm without modification. April 2026 confirms the direction: Google's Agent Development Kit shipped with MCP support out of the box on 2026-04-09, the OpenAI Agents SDK added MCP tool-use in 0.4 on 2026-04-05, and Fazm 2.4.0 opened its own host to third-party MCP servers on 2026-04-20. When every major agent host speaks MCP, the open source work that keeps its value is the MCP server layer, not framework-specific glue.
How is this different from screenshot-based desktop agents like browser-use or UI-TARS?
Screenshot-based agents render the screen to pixels and then pass them to a vision model. That costs an image budget per step (Claude downscales images to 1568px on the longest edge before upload, and the cost per frame is much higher than a text payload), and reliability depends on vision OCR holding up to dark mode, high-DPI, and app-specific theme quirks. Accessibility-API agents read structured data directly: AXUIElement returns role, title, value, focused state, frame, and children as text. Fazm's macos-use server defaults to the accessibility path and only captures screenshots when visual reasoning is actually required (a PDF figure, a design canvas). That means lower token cost per step, faster responses, and survival through theme changes. The open source desktop agents worth betting on this month are the ones that either default to AX/ATK trees or can fall back to them when vision is overkill.
Is Fazm itself open source?
Yes. The Fazm macOS app is MIT licensed at github.com/mediar-ai/fazm. The macos-use MCP server is MIT licensed at github.com/mediar-ai/mcp-server-macos-use under the same org. The ACP bridge is in the fazm repo under acp-bridge/. You can build Fazm yourself from source using build.sh and distribute.sh at the repo root, or use the signed and notarized release builds. The only part that is not source-available is the Claude model weights (closed source) and the Codemagic signing certificates (revocable secrets).
If I only install one custom MCP server from ~/.fazm/mcp-servers.json this month, which should it be?
Depends on your workflow, but the highest-leverage pick for most people is a filesystem MCP pointed at your active project directory. It gives the agent a bounded, inspectable view of a folder without granting full disk access, which pairs well with macos-use for UI-level actions and playwright for browser-level actions. A config example: {"fs": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/you/projects/current"], "enabled": true}}. Drop that at ~/.fazm/mcp-servers.json and the server is available on the next session. If you already have that, the next best picks in April 2026 are a GitHub MCP for repo navigation and a postgres MCP for local dev databases.