Open-source LLM releases, April 2026: one env var swaps any of them into a shipping Mac agent
Every roundup for this keyword stacks DeepSeek-V3.2, Qwen 3.5, Llama 4, Gemma 4, and Mistral Large 3 by benchmark. None show the surface area a real consumer app uses to route through them. Fazm has two: one text field in Settings and one line that writes ANTHROPIC_BASE_URL. And because the control loop feeds the macOS accessibility tree as text, the multimodal leaderboard everyone is excited about is not actually the right chart.
The April 2026 open-source chart, and the two lines that turn it into swaps
DeepSeek-V3.2, Qwen 3.5, Meta Llama 4, Gemma 4, and Mistral Large 3 are the five the top SERP keeps ranking. The two line numbers above are the entire surface area a real consumer Mac app uses to swap any of them in.
“Custom API endpoint (allows proxying through Copilot, corporate gateways, etc.)”
Desktop/Sources/Chat/ACPBridge.swift line 379 comment, April 2026
The anchor: one UserDefault, one env var
Every April 2026 open-source LLM that can be wrapped in an Anthropic-shape messages endpoint becomes a candidate for Fazm's main agent loop, because the whole routing story fits in three lines of Swift. The UI exposes a single text field. The bridge process reads one UserDefault. One line writes the env var. That is the surface area.
That is it. When the ACP bridge subprocess spawns, it inherits this env var, and the Claude Code runtime inside the bridge treats the endpoint as its base URL for every message. Put a proxy there that speaks Anthropic messages on the ingress and DeepSeek-V3.2 on the egress, and the rest of Fazm (the tools, the macos-use MCP server, the accessibility tree capture) does not know or care.
Why the multimodal leaderboard is not the relevant chart here
Most competitor desktop agents feed a screenshot to the model on every turn. That forces them to pick from the multimodal column of the April 2026 open-source leaderboard, which is a much thinner list than the text column. Fazm sends the macOS accessibility tree instead, captured through the AXUIElementCreateApplication family of APIs: role, label, value, and frame for each on-screen element as structured text. The model never sees a pixel.
Screenshot-based agent vs accessibility-tree agent, on the same April 2026 open-source chart
# Input to the model on every turn
POST /v1/messages
{
"model": "llama-4-maverick",
"messages": [{
"role": "user",
"content": [
{ "type": "image", ... 1.2 MB base64 ... },
{ "type": "text", "text": "click the Save button" }
]
}]
}
# You need:
# - a multimodal open-source model
# - vision tokens on every turn
# - a model that can OCR small UI text
# Your April 2026 pick list shrinks to:
# Llama 4 Maverick (native multimodal)
# ... and not much else in the open packThis is the part that most April 2026 open-source LLM roundups cannot frame for you, because they are not written from the seat of a product whose input format changes which column of the leaderboard is relevant.
What the routing actually looks like
Every April 2026 open-source LLM on the left can drive the Fazm loop on the right, provided an Anthropic-shape shim sits in the middle. The shim is the entire compatibility layer.
April 2026 open-source LLMs → Anthropic-shape shim → Fazm
Which April 2026 open-source releases are actually usable as a Mac agent brain
Benchmark rank is not the answer. Tool-use format stability, licensing, local hardware fit, and honest hosted-token cost are. These are the practical notes per release, ordered by how close to "just works through the Custom API Endpoint field" they land today.
DeepSeek-V3.2
Released MIT. Strongest tool-use format stability in the April 2026 open-source pack. Token cost about one tenth of frontier Claude on hosted inference (Together, Fireworks, DeepInfra). Current best practical swap for Fazm's text control loop.
Qwen 3.5
122B total, 10B active (MoE). Apache. The one you actually run fully offline on a 64 GB MacBook through LM Studio or Ollama. Multilingual coverage covers 119 languages, useful when accessibility-tree labels are not in English.
Meta Llama 4 Maverick + Scout
Native multimodal, MoE. Maverick is the bigger sibling. Fazm's main loop is text-only, so Llama 4's multimodal capability is unused in that loop. Still a strong text-and-tool-use candidate if you prefer Meta's license terms.
Gemma 4
Google's new MoE flagship. 26B parameters, roughly 14 GB on disk, around 85 tokens per second on consumer hardware per Google's own posts. Tool-use format tends to drift more than DeepSeek or Qwen, so shim defensively.
Mistral Large 3
Mistral's EU-data-residency-positioned flagship. Practical pick when data residency matters more than the absolute top benchmark. Routes through the same ANTHROPIC_BASE_URL field as every other open-source swap-in.
Kimi K2.5 / Nemotron Cascade 2
Both landed in the April 2026 window and both sit below DeepSeek-V3.2 on tool-use reliability in real agent loops. Good to know about, not the first thing to try.
Doing the swap, in the terminal
This is the local-first version. Qwen 3.5 under Ollama on the same Mac, an Anthropic-shape shim on port 8766, and Fazm's Custom API Endpoint pointed at it. The shim is whichever one-file proxy you prefer; claude-relay and litellm's anthropic adapter both work.
The five-step swap, concretely
Same swap as the terminal version above, expanded into what you actually change and where. Follow top to bottom and the next time you open Ask Fazm, the agent loop runs on the April 2026 open-source model of your choice.
1. Run the open-source model locally or on a cheap VPS
Pick one. Qwen 3.5 via LM Studio or Ollama on your own Mac for offline. DeepSeek-V3.2 via a hosted inference provider (Together, Fireworks, DeepInfra) if you want frontier reasoning without a 200 GB download. Meta Llama 4 or Gemma 4 if you prefer their licenses.
2. Put an Anthropic-shape shim in front of it
litellm's anthropic adapter, claude-relay, or any one-file Python FastAPI proxy that exposes POST /v1/messages and returns the Anthropic streaming event sequence. The shim translates Anthropic messages into whatever your model actually speaks (Ollama-shape JSON, OpenAI-compatible, vLLM endpoint, etc.). Bind it to localhost:8766.
3. Open Fazm Settings and flip the Custom API Endpoint toggle
SettingsPage.swift line 840 stores it as UserDefaults key customApiEndpoint. Paste http://localhost:8766 into the field. The UI placeholder is https://your-proxy:8766. The toggle lives next to the Claude Account selector in the AI Chat settings card.
4. Let Fazm restart the ACP bridge
ChatProvider.restartBridgeForEndpointChange (ChatProvider.swift around line 2100) stops the bridge so the new endpoint is picked up on next query. Next time you hit Ask Fazm, ACPBridge.swift line 381 writes your endpoint into ANTHROPIC_BASE_URL and the bridge subprocess spawns with your proxy in the path.
5. Sanity-check against a small task first
Ask Fazm to read the current selection from the foreground app. That exercises the macos-use MCP server and one round of tool calls. If the shim mangles the tool_use JSON, you will see it fail here before you get to anything destructive. Once this works, move up to multi-step tasks.
What the Anthropic-shape shim has to do
The shim is the whole compatibility story. If it gets any of these wrong, Fazm's symptom is either a stuck session or a silently truncated context, both of which surface in the ACPBridge stderr as visible breadcrumbs in Sentry.
Minimum viable shim surface
- Serve POST /v1/messages with the Anthropic message format
- Accept system prompt, messages array, tools array with JSON Schema input_schema
- Emit tool_use content blocks with structured JSON arguments
- Handle tool_result content blocks coming back from the client
- Stream responses as content_block_delta events, not a single JSON blob
- Respect max_tokens and the 200k Claude-shaped context envelope or flag truncation loudly
- Return proper HTTP 429 on rate limits so ACPBridge can retry rather than stall
Fazm's open-source path vs typical screenshot-based agents
The point of the table below is not that Fazm is better. It is that the keyword "open-source LLM releases April 2026" means something different from each seat, because the input format to the model is different.
| Feature | Typical screenshot agent | Fazm (accessibility tree) |
|---|---|---|
| Works with any April 2026 open-source LLM | Usually no, tied to a vendor's hosted API | Yes, through ANTHROPIC_BASE_URL + Anthropic-shape shim |
| Needs a multimodal model | Yes, screenshot agents need vision SKUs | No, accessibility tree is text |
| Runs fully offline with Qwen 3.5 / Ollama | Rare, most require a cloud round trip | Yes, with a local shim on port 8766 |
| Swap model without an app release | Usually requires a new build | Yes, one text field change |
| Passes tool_use JSON through unchanged | Often rewritten by the vendor middleware | Depends on the shim, but the app-side path is unchanged |
| Open-source weights required for privacy tier | Hosted-only in practice | Yes, DeepSeek-V3.2, Qwen 3.5, Gemma 4 all qualify |
Drive your Mac with any April 2026 open-source LLM
Install Fazm, flip the Custom API Endpoint toggle, paste your proxy URL, and the next Ask Fazm query runs on the open-source model of your choice. No new build, no rewrite, no pixel piping.
Download Fazm →Frequently asked questions
What were the notable open-source LLM releases in April 2026?
The short list is DeepSeek-V3.2 (MIT, strong reasoning and tool-use), Qwen 3.5 (Apache, 122B total parameters with 10B active via MoE, runs on a 64 GB MacBook), Meta Llama 4 Maverick and Scout (native multimodal, MoE), Google Gemma 4 (26B MoE flagship, about 14 GB on disk, roughly 85 tokens per second on consumer hardware), and Mistral Large 3 (the company's EU-residency-positioned flagship). Most recap pages stop at the benchmark table. The question this guide covers is what any of those releases actually looks like when you try to pipe it into a shipping consumer Mac app.
Can I drive a Mac agent with an April 2026 open-source LLM through Fazm?
Yes, as long as you put an Anthropic-shape messages API in front of it. Fazm's main control loop reaches the model through the ACP bridge, which reads the customApiEndpoint UserDefault (see Desktop/Sources/MainWindow/Pages/SettingsPage.swift line 840) and injects it as ANTHROPIC_BASE_URL at Desktop/Sources/Chat/ACPBridge.swift line 381. Point that URL at a local proxy that translates Anthropic messages requests into whatever protocol your open-source model speaks (Ollama, vLLM, TGI, LM Studio, llama.cpp), and the rest of the app (tool calls, the macos-use MCP server, the accessibility tree capture) keeps working unchanged.
Why does accessibility-tree input change which April 2026 open-source model you should pick?
Most other desktop agents send screenshots to their LLM, so they are forced to run on a multimodal model. That rules out most April 2026 open-source releases unless you pick Llama 4 Maverick or another native-multimodal SKU. Fazm sends the macOS accessibility tree as structured text (role, label, value, coordinates for each element, captured through AXUIElementCreateApplication). The model never sees a pixel. That means any strong text-and-tool-use open-source model, including DeepSeek-V3.2 and Qwen 3.5 which are text-first, becomes a first-class candidate. The most hyped multimodal benchmarks on the April 2026 open-source charts are not the relevant axis for this kind of agent.
What exactly does the Anthropic-shape shim need to expose?
At minimum it needs to serve POST /v1/messages, accept the Anthropic message format (system prompt, messages array with role/content, tools array with JSON Schema input_schema, tool_use and tool_result content blocks), and return either a final message or a stream of events that the Claude Code runtime understands. The shim translates that into whatever the underlying open-source model speaks (OpenAI-shape JSON for Ollama, Qwen's native format, vLLM's OpenAI-compatible endpoint, etc.). Projects like Ollama with the claude-relay pattern, litellm's anthropic adapter, and any of the one-file Python proxies on GitHub are enough for a weekend experiment. Production use is a different story because Fazm's agent loop fires many tool calls per minute and shim failures surface as stuck sessions.
Does Fazm currently ship with DeepSeek or Qwen wired in by default?
No. ShortcutSettings.defaultModels (Desktop/Sources/FloatingControlBar/ShortcutSettings.swift lines 151-155) hardcodes the three Claude tiers: Haiku, Sonnet, Opus. Gemini 2.5 Pro is wired into a separate passive screen-observer loop (GeminiAnalysisService.swift line 67: private let model = 'gemini-pro-latest'). Any April 2026 open-source model gets in through the Custom API Endpoint field in Settings, which the SettingsPage UI labels as an advanced setting for proxies, corporate gateways, and custom shims. The moment you flip the toggle on and paste your proxy URL, Fazm restarts the bridge (ChatProvider.restartBridgeForEndpointChange) and the next query runs against whatever open-source model sits behind that URL.
Which April 2026 open-source release is the most practical for a Mac agent right now?
DeepSeek-V3.2 is the current working answer, for three concrete reasons. First, its tool-use format stability is the best of the open-source pack, which matters more than raw reasoning score when the model spends its life emitting tool_use blocks with structured JSON arguments. Second, it is released under MIT, so a self-hosted shim is legally simple for commercial use. Third, its token cost on a hosted inference provider (Together, Fireworks, DeepInfra) is roughly an order of magnitude below frontier Claude, which is the real-world rate-limit problem Fazm hit in April 2026 with Opus 4.6 and now with agent loops generally. Qwen 3.5 is second, mainly because at 10B active parameters it runs on a 64 GB MacBook locally, which matters for the privacy-first user segment.
What breaks when you swap in an open-source model through a shim?
Three things, in order of annoyance. Tool-use format drift: open-source models occasionally emit malformed tool_use arguments, and the Claude Code runtime rejects those as non-retryable, which shows up as a stuck session in the ACPBridge stderr log. Context-window mismatches: Claude Sonnet 4.6 expects a 200k context window for long sessions, and if your shim silently truncates to the open-source model's limit, the agent starts forgetting mid-task. Streaming event shape differences: Anthropic-shape streaming uses a specific sequence of content_block_delta events. Shims that approximate this break the UI's typing indicator. None of these are fatal, but all three are why open-source models are not currently Fazm's default.
Can I run an April 2026 open-source LLM fully offline on my Mac with Fazm?
Yes, with the right pair of tools. Run Qwen 3.5 via LM Studio or Ollama locally, put a shim like claude-relay or litellm (configured with the anthropic adapter) in front of it on localhost:8766, and paste http://localhost:8766 into Fazm's Custom API Endpoint field. The main agent loop now runs fully offline against Qwen. The separate Gemini screen-observer loop still reaches out to Google for multimodal analysis, but that loop is opt-in through Session Recording and can be left disabled. On a 64 GB M-series MacBook, Qwen 3.5 at 10B active parameters plus Fazm's accessibility-tree input keeps the agent loop interactive for the kinds of tasks where you previously used Sonnet.
Does Fazm support the April 2026 Llama 4 native-multimodal release any differently from text-only open-source models?
Not by default. Fazm's main agent loop does not send images to the model, so Llama 4 Maverick's multimodal capability is unused in that loop. If you wanted Llama 4 to drive a screen-observer loop comparable to the Gemini 2.5 Pro one in GeminiAnalysisService.swift, you would need to build that loop yourself, because the screen-observer path does not flow through ANTHROPIC_BASE_URL, it calls Google's generative AI SDK directly. For the text-only control loop, picking Llama 4 over Qwen 3.5 or DeepSeek-V3.2 is a tradeoff between token cost, tool-use format reliability, and licensing terms, not a multimodal question.
Why should I trust this more than the benchmark roundups on the top of the SERP?
Because everything on this page is traced to a file and a line in Fazm's shipping desktop codebase, not to a press release. The Custom API Endpoint toggle is at SettingsPage.swift line 840 with the UI at lines 906-950. The env-var injection that routes every Claude-shaped request through the proxy is one line, ACPBridge.swift line 381. The default model list that the picker ships with is ShortcutSettings.swift lines 151-155. The passive screen observer that stays on Gemini and does not touch the open-source path is GeminiAnalysisService.swift line 67. Those references are the difference between what the April 2026 leaderboards say and what a real product actually does with those releases.
Every source reference in this guide points at a real file and line in the Fazm desktop codebase as of April 2026. If you want to verify any of them in your own checkout: Desktop/Sources/Chat/ACPBridge.swift, Desktop/Sources/MainWindow/Pages/SettingsPage.swift, and Desktop/Sources/FloatingControlBar/ShortcutSettings.swift.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.