Alternative to harness frameworks

AI agent harness vs model: what the desktop case looks like when
the model is one env var
.

Every harness-vs-model essay published in the last six weeks argues the harness wins, then points at coding agents. Phil Schmid frames it as the operating system around the CPU. Aakash Gupta declares 2026 the year of harnesses. Adnan Masood calls it the AI control plane. Atlan and Martin Fowler list frameworks. Every one of those pieces is a developer essay about Claude Code or LangGraph.

Fazm is the desktop case in code. The model is one Settings field. The harness is everything else, and most of the work the agent does on a Mac happens in the harness, not in the weights. This page walks through the file paths.

Download Fazm for Mac View source on GitHub

5.0from Verified in source

ACPBridge.swift lines 391-393: env['ANTHROPIC_BASE_URL'] = customEndpoint

5 MCP servers registered at index.ts lines 1015-1141

17 .skill.md files in Desktop/Sources/BundledSkills

Image filter at index.ts:2271-2307, MAX_IMAGE_TURNS = 20 at line 793

The thesis, applied to a Mac

A computer-use agent has to do many things at once. It has to know that Mail is open. It has to know which row in the inbox table is selected. It has to click a button labeled Reply, then type into a focused text field, then press Cmd-Shift-D to send. It has to do all of that without burning a Claude context window on PNG bytes, without losing AX permission halfway through, and without forgetting which user it is acting on behalf of.

None of those concerns are model concerns. The model picks the next tool. The harness has to make every other decision: which subprocess owns this tool name, which env var routes through which provider, what happens to the screenshot bytes when they come back, whether the AX permission is still live, whether the user's skill library is mounted. The model gets the cleanest possible view of the world. The harness produces that view.

This is the same argument the essays make. The difference is where it lands. They land it on coding agents that ship as developer libraries. Fazm lands it on a signed Mac app that runs end-to-end in front of a user who never opened Terminal. The same harness layer doing all the work; a different deliverable.

Who owns each responsibility on a turn

Read this as a property table on a single tool-use round trip. The right column is what stays the same when you change the model.

Feature	Model layer	Harness layer (Fazm)
Decides which tool to call next	Model. Same.	Model. Issues a tool_use block with name and args. Harness routes it.
Knows kAXRole, kAXTitle, kAXValue exist on a macOS window	Model never sees this; only the harness can produce a text accessibility tree.	Harness. macos-use binary registered at index.ts:1058 traverses the AX tree.
Strips a 500 KB base64 PNG before it bloats context	Model would happily ingest the PNG and burn ~350K tokens per turn.	Harness. index.ts:2271-2307 has zero type:'image' branches; bytes drop on the floor.
Routes the call through GitHub Copilot, a corporate proxy, or DeepSeek	Model has no concept of where it is hosted; that is a wiring decision.	Harness. ACPBridge.swift:391-393 reads UserDefault customApiEndpoint into ANTHROPIC_BASE_URL.
Caps per-session screenshots at 20 to dodge Anthropic's image-size policy	Model cannot self-cap; it would hit the per-session image limit and error.	Harness. MAX_IMAGE_TURNS = 20 at index.ts:793; counter at imageTurnCounts map.
Asks for macOS Accessibility permission and re-probes Finder if ambiguous	Model has no OS handle; it cannot probe permissions.	Harness. AppState.swift:433-463 disambiguates apiDisabled from cannotComplete.
Ships 17 reusable skills (.md files) for repeat workflows	Model only knows what the harness includes in the system prompt that turn.	Harness. Desktop/Sources/BundledSkills/ copies into ~/.claude/skills on first launch.
Plans 5 tool calls into the future to write an invoice in Numbers	Model. Same. (The one place it still earns its keep.)	Model. Tool-use durability over long sessions is a model-side property.
Detects MCP-vs-ACP envelope shape and forwards only text	Model has no idea which envelope shape the SDK chose this version.	Harness. index.ts:2282 catches direct MCP, 2287 catches ACP-wrapped.

Only one row in the table is exclusively the model's job. The rest of the agent is the harness.

What "swap the model" actually looks like in code

Here is the section of Fazm's Swift code that prepares the environment for the acp-bridge child process, side by side with what the same swap looks like in a typical harness-as-framework world. The point is not that the framework version is bad; it is that one of these ships to a non-developer and the other one is an SDK.

Two ways to swap the model

// A typical "harness framework" approach (LangGraph / CrewAI / AutoGen sketch)
import { ChatAnthropic } from "langchain/chat_models/anthropic";
import { ChatOpenAI } from "langchain/chat_models/openai";
import { Tool } from "langchain/tools";

class MyAgent {
  constructor(provider: "anthropic" | "openai") {
    // Provider is baked into the agent class. Swap = redeploy.
    this.model = provider === "anthropic"
      ? new ChatAnthropic({ apiKey: process.env.ANTHROPIC_API_KEY })
      : new ChatOpenAI({ apiKey: process.env.OPENAI_API_KEY });

    // Tools defined in Python/TS, registered at construction time.
    this.tools = [new BrowserTool(), new ShellTool(), new ScreenshotTool()];
  }

  async run(prompt: string) {
    // The user runs THIS code. They are the harness author.
    return await this.model.call(prompt, { tools: this.tools });
  }
}
// To swap providers: change the import, redeploy. The user is a developer.
// To add a tool: edit code. The user is still a developer.

-45% fewer lines

The Fazm side is real. ACPBridge.swift is in the public repo. The customApiEndpoint UserDefault is wired to a single text field in Settings > Advanced. The user pastes a URL; the next chat session spawns with ANTHROPIC_BASE_URL pointed at it; the model on the other end of the wire is whatever the URL resolves to.

One turn through the harness

Tracing a single user request, "reply to the latest invoice in Mail." Five actors. The model only enters the diagram twice; the harness enters it on every other line.

reply to the latest invoice in mail

Notice the self-message on acp-bridge: "filter at index.ts:2271-2307 (text only; image branches do not exist)." That line runs whether the model is Claude Sonnet, DeepSeek-V3.2, or a corporate proxy that forwards to GPT-class weights. The screenshot bytes from any of them land in the same dead branch.

The six layers of the Fazm harness

Each card is a real file path you can open. None of these layers know what model they are talking to. The model is a sink that receives a system prompt and emits tool_use blocks; everything in the cards below is shaped before the prompt goes out and after the tool result comes back.

Process spawn + env injection

Desktop/Sources/Chat/ACPBridge.swift launches the acp-bridge Node child with a custom env. Lines 391-393 inject ANTHROPIC_BASE_URL from the customApiEndpoint UserDefault. Line 384 sets PLAYWRIGHT_USE_EXTENSION. This is the surface the user controls from Settings.

MCP server registration

acp-bridge/src/index.ts lines 1015-1141 registers fazm_tools, playwright, macos-use, whatsapp, google-workspace, then appends user MCPs from ~/.fazm/mcp-servers.json. Five processes are running before the model issues its first tool_use.

Image-byte filter

index.ts:2271-2307. Two text branches, no image branch. A 500 KB screenshot from a tool result is dropped on the floor before it touches the model context. Per-session image cap MAX_IMAGE_TURNS = 20 at line 793 backs it up.

AX permission probe

AppState.swift lines 433-463. AXUIElementCopyAttributeValue on kAXFocusedWindowAttribute. If the result is .cannotComplete, re-probe Finder, then a CGEvent.tapCreate fallback. The harness owns this; the model never sees it.

BundledSkills copy-on-launch

Desktop/Sources/BundledSkills contains 17 .skill.md files. They get copied into ~/.claude/skills the first time the app runs. ChatProvider.swift:1566-1576 lists them in the prompt as available_skills. The model loads one with the Skill tool when it needs to.

Tool-use prompt shaping

ChatPrompts.swift line 61 says 'only call browser_take_screenshot when you need visual confirmation, it costs extra tokens'. ChatPrompts.swift line 56 routes capture_screenshot for visual verification only. The harness writes the rules; the model follows them.

What the harness handles regardless of model

A flat list, in case the bento grid felt loose. Every item below keeps running when ANTHROPIC_BASE_URL points somewhere new.

Harness-side responsibilities

Spawning the Node child process with the right env vars
Injecting ANTHROPIC_BASE_URL from a single UserDefault key
Registering five MCP servers and forwarding user-defined ones
Routing tool_use blocks to the right MCP subprocess by name prefix
Walking the macOS Accessibility tree on demand via AXUIElementCopyAttributeValue
Handling both ACP-wrapped and direct MCP envelope shapes in tool results
Stripping every type:'image' content item before context bloat
Counting image turns per session and capping at 20 to dodge size policies
Probing macOS Accessibility permission and re-probing Finder on ambiguity
Copying 17 bundled skills into ~/.claude/skills on first launch
Surfacing the skill list as available_skills inside the system prompt
Emitting MCP server startup events to the chat UI for diagnostics

Where the model still earns its keep

This page is not the argument that models do not matter. They do. Two specific ways. First, tool-use format stability: a model that hallucinates malformed tool_use JSON breaks the round trip regardless of what the harness does. DeepSeek-V3.2 and Claude 3.5 Sonnet both pass this bar in early 2026; many smaller open models do not. The harness can validate, retry, and surface the error, but it cannot fix a model that emits invalid JSON.

Second, durability across long sessions. Fazm's observer at index.ts:1023-1024 isolates a separate session for chat-history summarization, but the main agent session can run hundreds of tool calls deep on a single task. How well the model holds the plan across that span is a model-side property. There is no way to shape it from the harness; it is in the weights.

Everything else, the part that decides whether your Mac actually does the thing you asked for, is the harness. Most of the work is the harness. That is the point.

The recommendation

If you are building an agent product, treat the model as a swappable component from day one. Hard-code one provider and you ship a new app every time a better one lands; that is a six-to-ten week treadmill in 2026. One env var injection is a one-day fix for the user.

If you are a non-developer running a small business and the question is "which Mac agent should I install," the question to ask is not "which model does it use." It is "what does the harness handle for me." Permission probing, AX-tree traversal, image-byte gating, MCP wiring, skill bundling, prompt shaping. Those are the layers you do not want to build yourself, and they are the layers that decide whether the agent actually drives Mail or quietly fails.

Fazm ships those layers as a signed Mac app and lets you point at any Anthropic-compatible endpoint with a Settings field. If another harness ships a better version of those layers tomorrow, install it. The argument here is structural, not brand-loyal.

Want to see the harness drive your actual workflow?

Bring a workflow you do every week. We will run it through Fazm on a call and show you which layer of the harness owns each step.

Frequently asked questions

What does 'the harness matters more than the model' actually mean for a desktop agent like Fazm?

It means the moving parts that decide whether the agent can drive Mail, Finder, Slack, or Chrome do not live inside Anthropic's Claude weights. They live in the ~2,914 lines of acp-bridge/src/index.ts and the Swift code under Desktop/Sources. The model issues tool_use calls; the harness routes them, traverses the macOS Accessibility tree, strips screenshot bytes from MCP tool results, caps per-session image turns at 20, and ships 17 .skill.md files to ~/.claude/skills on first launch. Swap the model and every one of those layers keeps working.

Where in Fazm's source does the 'model is a swappable slot' claim become real?

Desktop/Sources/Chat/ACPBridge.swift lines 391-393. The Swift code reads UserDefaults for the key 'customApiEndpoint'. If the user has filled in the Custom API Endpoint field in Settings, the harness writes it to env['ANTHROPIC_BASE_URL'] before spawning the acp-bridge child process. That single env-var injection is how a user routes through GitHub Copilot, a corporate gateway, a self-hosted DeepSeek-V3.2, or any Anthropic-compatible endpoint. Nothing else in the binary changes when the model changes.

How many MCP servers does the harness register, and where?

Five first-party servers, registered between index.ts lines 1015 and 1100. fazm_tools at line 1015 (the bridge between the chat UI and the Swift app). playwright at line 1049 (browser automation, with --image-responses omit at line 1033 and --output-mode file). macos-use at line 1058 (native AX-tree traversal binary). whatsapp at line 1068 (WhatsApp Catalyst control). google-workspace at line 1081 (Python stdio transport for Gmail, Calendar, Drive). Then a user MCP loader at lines 1102-1137 reads ~/.fazm/mcp-servers.json and appends arbitrary user-defined servers. The model on the other end of the wire never sees any of this; it sees a tool surface and decides which tool to call.

Why does this matter for a non-developer running a small business?

Because the next 'better' frontier model lands every six to ten weeks now, and a hosted agent product that hard-coded one provider has to ship a new app to use it. With Fazm the model is one Settings field. A user who got better tool-use stability out of DeepSeek-V3.2 in April 2026 typed the endpoint URL into the Custom API Endpoint field and the same Mac app, the same skills, the same MCP wiring, started routing through DeepSeek instead of Claude. Zero developer work.

What does the harness handle that no model does?

Six concrete things, all of them in the source. (1) Permission probing: AppState.swift lines 433-463 calls AXUIElementCopyAttributeValue on kAXFocusedWindowAttribute and disambiguates apiDisabled vs cannotComplete by re-probing Finder. (2) Image-byte gating: index.ts:2271-2307 drops every type:'image' content item from MCP tool results before they touch the model context. (3) Per-session image cap: MAX_IMAGE_TURNS = 20 at index.ts:793 prevents the per-session 2000px-image limit from firing. (4) Skill discovery and copy-on-launch: Desktop/Sources/BundledSkills/ ships 17 .skill.md files that get copied into ~/.claude/skills the first time the app runs. (5) MCP routing: index.ts:1015-1141 wires five processes plus user-defined ones. (6) Per-app tool-use prompting: ChatPrompts.swift line 61 tells the model 'only call browser_take_screenshot when you need visual confirmation, it costs extra tokens'.

How is Fazm different from a 'harness framework' like LangGraph, CrewAI, or AutoGen?

Those are Python libraries you build a harness with. Fazm is a harness shipped as a signed macOS app. You do not pip install anything. You download a .app, click through the AX permission prompt, optionally paste an API endpoint into Settings, and start talking. The harness ships compiled, the MCP servers are already wired, the skills are already bundled. A small business owner is not building a graph; they are running one. That is the difference between a developer framework and a consumer harness. The argument 'the harness matters more than the model' is the same argument; the deliverable is different.

Does the model still matter at all?

Yes, in two specific ways. First, tool-use format stability: a model that hallucinates malformed tool_use JSON breaks the round trip regardless of how good the harness is. DeepSeek-V3.2, Qwen 3, and Claude 3.5 Sonnet are reliable here in early 2026; smaller open-source models are not always. Second, durability over long sessions: how well the model follows instructions across hundreds of tool calls without losing the plot. The harness can shape the prompt, cap image turns, and gate context, but the planning between tool calls is the model. Where the harness wins is the rest of the stack, and the rest of the stack is most of the stack.

How do I verify all of this in the source tree?

Six anchors, all in the public Fazm repo. (1) Model swap: /Users/matthewdi/fazm/Desktop/Sources/Chat/ACPBridge.swift lines 391-393. (2) MCP registrations: /Users/matthewdi/fazm/acp-bridge/src/index.ts lines 1015-1141. (3) Image-stripping filter: index.ts lines 2271-2307. (4) Image-turn cap: index.ts line 793. (5) Playwright omit flag: index.ts line 1033. (6) Bundled skills: /Users/matthewdi/fazm/Desktop/Sources/BundledSkills/ contains 17 .skill.md files (ai-browser-profile, canvas-design, deep-research, doc-coauthoring, docx, find-skills, frontend-design, google-workspace-setup, pdf, pptx, social-autoposter, social-autoposter-setup, telegram, travel-planner, video-edit, web-scraping, xlsx).

Is the harness open source?

Yes. Fazm is on GitHub at github.com/m13v/fazm. Every line referenced on this page is in that repo. The chat bridge (acp-bridge), the macOS app (Desktop/Sources), the bundled MCP servers, and the skill library are all readable. The model is whatever Anthropic-compatible endpoint a user configures, so 'open source harness, bring your own model' is a literal description of the deployment.

Does swapping the model break the skills or the MCP servers?

No. Skills are .md files that get included in the prompt as a list of available_skills (ChatProvider.swift lines 1566-1576). The model decides whether to call the Skill tool to load one. MCP servers are subprocesses spawned by the harness; the model issues tool_use blocks with a name and args, the harness dispatches. As long as the model emits valid tool_use, the wiring is identical across providers. The only thing that breaks is when a model is bad at tool-use format stability, which is a model-side property, not a harness-side one.