A view from the consumer-surface layer

The weights got loud. The consumer agents stayed quiet.

April 2026 brought a fresh wave of open weight releases. Llama, Qwen, DeepSeek, Gemma, Mistral. Every one of them a real improvement. And yet the fastest way for a non-developer to use any of them on a Mac is still a terminal command, a Python virtualenv, or a closed-source desktop wrapper. This piece is about that gap. Written from inside a MIT-licensed consumer Mac agent.

Matthew Diakonov, Fazm, founder

Published April 24, 202611 min read

4.9from Shipping on macOS, MIT-licensed

Fazm is MIT-licensed, source on GitHub

The model list is dynamic, not a hard-coded Swift array

The family map has 4 rows today, substring-matched

An unknown model ID is still selectable, at sort order 99

Open source is a two-layer word

Weights in one hand, consumer surface in the other

The weights shipped. Again.

Llama. Qwen. DeepSeek. Gemma. Mistral.

But how does a non-developer hold them?

Fazm is the consumer surface layer.

MIT. On GitHub. Shipping on Mac.

0:00 / 0:05

The anchor fact

Fazm’s model catalog is 0 rows of Swift

Not a hard-coded list of IDs. A substring-matched table in ShortcutSettings.swift at line 159. Any model ID that does not match one of the four substrings falls to the else branch at line 189 and arrives in the picker at sort order 99. That is the seam where an open weight model would one day sit, the day a second bridge exists.

A tight recap of the month’s open weight side

The pattern has held since mid-2025. Every major lab now treats open weights as a standing release track. April 2026 continued it. Meta pushed another point release of Llama. Alibaba shipped a fresh Qwen3 checkpoint. DeepSeek shipped incremental V3 weights. Google pushed a new Gemma drop aimed at on-device. Mistral shipped a small reasoner. The benchmark gap between the best open weight model and the best closed model is the narrowest it has ever been.

That is the part every other article about this topic covers, usually with bigger benchmark tables. There is no reason to repeat it here. What no one covers is the second layer.

The open weight families in play this month

Llama

Open source is a two-layer word

When a founder says “open source LLM,” they almost always mean the weights. That is one layer. There is a second: the consumer surface that lets a human who does not write code actually hold the model. Runtime. Daemon. Picker. Screen grounding. Update channel. Most open weight news skips all five.

1. Weights

The download on HuggingFace. April 2026 has several fresh entries. Llama, Qwen, DeepSeek, Gemma, Mistral. This is the only layer most coverage names.

2. Runtime

llama.cpp, MLX, vLLM, Ollama. What actually turns a .safetensors file into a running endpoint on a Mac. Mature. Open source. Mostly done.

3. Daemon

The process that stays alive. Accepts prompts. Returns tokens. For Fazm this is the ACP bridge, a Node subprocess. An open weight equivalent would be a second bridge to a local runtime.

4. Model picker

The UI a non-developer clicks to choose a model. Fazm ships one. It reads a @Published availableModels array. It substring-matches IDs against a 4-row table.

5. Screen grounding

How the agent sees your screen. Fazm uses accessibility APIs, not screenshots. This is the part no open weight model ships with, because it is not a model problem.

6. Update channel

How new code, new skills, new bridges reach end users. App Store or Sparkle, plus a GitHub releases feed for those who want to build from source. MIT license means you can.

The 10-line function that absorbs a new model

Every new Claude model that Fazm learned about in April 2026 came through this function. The same function would serve an open weight model on the day a second bridge exists. It lives in acp-bridge/src/index.ts and begins at line 1271.

acp-bridge/src/index.ts

Three moving parts. A filter. A diff. A single IPC message. The Swift side subscribes to models_available and rebuilds the picker. This shape does not care whether the upstream is Anthropic’s SDK or a local runtime. It just carries a list of model IDs with names.

Where a second bridge would plug in

Today only the Claude bridge feeds the picker. The diagram below shows how a second bridge, one that talks to a local runtime holding an open weight model, would fan into the same hub. No new Swift code. No new picker. No new render path.

The one seam that would accept an open weight model

The 4-row family map, and the fifth row that would name an open weight

Fazm does not call Sonnet, Sonnet. It calls it Fast. Opus is Smart. Haiku is Scary. That relabel is not hard-coded against version strings. It is a substring match against a 4-row table. An open weight family would take exactly one new row to get a clean label.

Desktop/Sources/FloatingControlBar/ShortcutSettings.swift

The matters-for-shipping detail: the else branch at line 189 assigns sort order 99 to any model ID whose substring is not in the map. The model is not hidden. It is not behind a flag. It is in the picker, at the bottom, with its raw name. A 5th-row label is a polish, not a gate. The agent works day zero on an unknown family.

0Rows in Fazm's model family map

0Rows needed to cover Llama + Qwen + DeepSeek + Gemma + Mistral

0Sort order the picker assigns to an unknown model family

0IPC messages per real model change

Lifecycle: from HuggingFace to your picker

Six steps between an open weight lab publishing new weights and a user clicking the model in a floating bar. Steps 3, 4, 5, 6 are what the bridge-plus-family-map architecture already solves. Step 3 is the one the team has not built yet.

The route a new open weight model would take

An open weight lab publishes new weights

Say Qwen3-Next or Llama 4.1. The .safetensors and tokenizer files land on HuggingFace. The runtime community (llama.cpp, MLX) ships a compatible build within days.

A local runtime exposes the model over a socket

Ollama, LM Studio, or a raw llama.cpp server binds to a local port and serves /v1/chat/completions or a native protocol. The model is now a process on your machine.

A second ACP-shape bridge speaks to that runtime

This is the piece Fazm does not ship today. The shape is known: a subprocess that, on session/new, returns result.models.availableModels. Everything downstream of that is already built.

emitModelsIfChanged fires once

Same code path as Claude today. acp-bridge/src/index.ts line 1271. Drops the 'default' pseudo-ID if present. Stringifies. Diffs. Sends 'models_available' over stdout. Every surface in Swift subscribes.

updateModels routes through the family map

ShortcutSettings.swift line 180. A new row in the map — one line like ('llama', 'Local', 'Llama', 3) — turns a qwen3-32b-mlx ID into a picker entry labeled 'Local (Qwen, latest)'. No hard-coded version.

The picker shows the open weight model

Floating bar. Settings panel. Keyboard shortcut list. All three surfaces read the same @Published array. One publish. One render. A user clicks. Their next prompt routes through the new bridge and the new runtime.

Two ways a consumer app can model its catalog

Most desktop AI apps hard-code their model list. A new weight lands, users wait for the next app update. The other approach, the one Fazm uses, is a bridge that reports whatever the SDK knows, and a substring-matched family map. The consumer-facing difference shows up the first Tuesday a lab ships a new model.

Hard-coded catalog vs. dynamic absorption

The model list is a static enum in Swift. A new release means a new app build. The user either waits, or sideloads a TestFlight.

Hard-coded Swift enum of allowed model IDs
New weight = new app version = new review or Sparkle update
Unknown IDs are filtered out of the picker
A non-developer cannot select a just-released model day zero

Verify any of this yourself

None of the claims above depend on being inside the codebase. The repo is public. The files are named. The line numbers resolve. Here is the check anyone can run on their own machine in under three minutes.

verify.sh

The consumer surface, side by side

The table below frames Fazm against the shape of most desktop AI products shipping in April 2026. The point is not to call anyone out. It is to show which choices make an open weight model easy to plug into, and which make it hard.

Feature	Typical closed Mac AI app	Fazm
License	Closed, desktop only	MIT, source on GitHub
Model list	Hard-coded dropdown, updated per release	Dynamic via ACP bridge, 4-row family map
Unknown model ID	Hidden until a new build ships	Still selectable at sort order 99
Screen grounding	Screenshots, browser only	Accessibility APIs, any Mac app
Open weight support	Not exposed to end users	Second bridge plugs into the existing pipeline

What actually moves through memory when a new model arrives

A short sequence, for people who want to trace the specific hops. Each scene below is a real line of code in the repo.

From bridge warmup to a selectable row

01 / 05

Bridge warmup

acp-bridge/src/index.ts preWarmSession fires on boot. Three sessions: main, floating, observer. Each calls session/new.

The honest delta

What still has to happen for an open weight model to ride this pipeline

Everything in this article is true about the shape of the codebase. The delta between today and an open weight model in the Fazm picker is one thing: a second bridge subprocess that speaks the ACP-shape IPC and reports availableModels against a local runtime. The bridge is the work. The picker, the family map, the IPC, the filtering, the sorting, the re-render, the migration of saved selections, that is all already shipped. The product team has not committed to a specific open weight model because the consumer-surface work dwarfs the inference call, not because the architecture resists it.

What you could verify in the repo right now

The MIT license line in README.md
The 4-row modelFamilyMap at ShortcutSettings.swift:159
The unknown-family fallback at ShortcutSettings.swift:189 (sort order 99)
The 10-line emitModelsIfChanged at acp-bridge/src/index.ts:1271
The two call sites at acp-bridge/src/index.ts:1347 and 1361
The @Published availableModels in ShortcutSettings.swift that three pickers read

2026-04

“The benchmark gap between the best open weight model and the best closed model is the narrowest it has ever been.”

This piece, stating a view from April 2026

Want to see the second bridge before it ships?

30 minutes on Zoom. I will walk you through the exact lines in the repo, show you the picker in the running app, and answer the parts this guide could not.

Frequently asked questions

What open source LLMs got attention in April 2026?

The month kept the pattern that has held for most of 2025 and 2026: Meta shipped refresh builds of the Llama family, Alibaba shipped a new Qwen3 checkpoint, DeepSeek shipped incremental V3 weights, Mistral shipped a smaller reasoner tuned for local inference, and Google pushed a fresh Gemma drop tuned for on-device. The specific version strings matter less than the shape: every major lab now treats open weights as a standing release track, and the benchmark gap between the best open weight and the best closed model keeps tightening. The actual story, though, is not on the weights side. It is that a non-developer consumer still has very few ways to hold any of these models day zero in a product they would pay for.

Is Fazm an open source LLM?

Fazm is not a model. Fazm is a MIT-licensed consumer Mac agent. The model it talks to today is Claude, through the Claude agent SDK bundled as an ACP bridge subprocess. The relevant open source story is on the client side: the Swift app source, the bridge source, the skills, the installer, all of it lives on GitHub under an MIT license. That is the layer most of April 2026's open source LLM coverage skips. A list of open weight models is not a product. A product that can hold them is.

Could Fazm actually run a Llama or a Qwen model instead of Claude?

Not today, honestly. The whole model list pipeline is bound to one bridge subprocess that speaks ACP and talks to the Claude agent SDK. What exists today is the seam: emitModelsIfChanged in acp-bridge/src/index.ts at line 1271, plus the 4-row modelFamilyMap in Desktop/Sources/FloatingControlBar/ShortcutSettings.swift at line 159. A second bridge subprocess that spoke the same IPC shape against a local llama.cpp or MLX runtime would flow through the same code. The picker already tolerates unknown model IDs: the else branch at ShortcutSettings.swift line 189 assigns sort order 99 to any ID that does not contain haiku, sonnet, opus, or default. So the day a second bridge existed, an open weight model ID would show up at the bottom of the picker the same launch.

What does 'consumer surface' mean and why does it matter more than which model won?

A model on HuggingFace is a download, not a workflow. Getting from a 40 GB weight file to a grandmother-uses-it agent requires: a runtime that can serve the model on consumer hardware, a daemon that stays alive, a UI that lets a non-developer pick the model, an accessibility layer that lets the agent see and act on a real screen, and an update channel that can keep the whole stack current. The model is one of six layers. In April 2026 the model layer has five new entries this month. The other five layers have roughly zero new MIT-licensed, consumer-ready entries this month. That is the gap this piece is about.

Where exactly is the 4-row family map and what does it do?

It lives in Desktop/Sources/FloatingControlBar/ShortcutSettings.swift at line 159 and has exactly four rows: (haiku, Scary, Haiku, 0), (sonnet, Fast, Sonnet, 1), (opus, Smart, Opus, 2), (default, Smart, Opus, 2). Each row is (substring, shortLabel, familyLabel, sortOrder). When a new model ID arrives from the ACP bridge, updateModels at line 180 substring-matches it against the table. A match yields a human label like 'Fast (Sonnet, latest)' and a known sort order. A miss lands at the else branch at line 189, which uses the raw API name and assigns sort order 99. Everything in the picker, from the floating bar to the Settings panel, reads off the same availableModels array.

Why would a 5th row in that map be enough to label an open weight model?

Because the map is substring-matched, not full-string-matched. A row like ('llama', 'Local', 'Llama', 3) would cover llama-3, llama-4, llama-4.1-instruct, llama-4-70b-mlx, and any future naming variant the community settles on, in one line. The picker would put it at sort order 3, right after Smart. The relabel would read 'Local (Llama, latest)'. Qwen, DeepSeek, Gemma, Mistral each become another single row. Anthropic's 4 rows expanding to 9 would give the picker coverage of every major open weight family without a single hard-coded version string.

Why is emitModelsIfChanged ten lines and not a class?

Because the problem it solves is very small. Take a list, drop one entry with a known pseudo-ID, JSON-stringify it, compare against the memoized previous string, fire one IPC message if different. Everything else (the UI rebuild, the picker order, the keyboard shortcut list) is downstream of that single 'models_available' message. Wrapping this in a class would add surface without removing anything. The 10-line function also caps context cost for every LLM that later reads it, which is a real consideration when your own product is driven by one.

Where can I verify these claims in the source?

The repo is at github.com/m13v/fazm, MIT-licensed. Three files matter: Desktop/Sources/FloatingControlBar/ShortcutSettings.swift for the 4-row modelFamilyMap (line 159) and the updateModels sorter (line 180), acp-bridge/src/index.ts for emitModelsIfChanged (line 1271) and its two call sites at lines 1347 and 1361, and README.md for the MIT license line and the 'Fully open source. Fully local.' tagline. Clone, grep, read. Nothing in this piece is abstract: every line number resolves to code you can hold.

What about privacy and local-only inference — does that land this year?

Locally, there is nothing blocking it architecturally. The same IPC shape ('models_available' carrying a list of modelId, name, description) works equally well for a local runtime. The real constraints are device, not software: a useful open weight agent model wants RAM and a fast GPU, and the experience is very different on a 16 GB Air versus a 64 GB Studio. The honest answer is that the bridge design already supports it, and the product team has not yet committed to a specific open weight model because the consumer-surface work is the larger cost, not the inference call.

Adjacent pieces that stay on the same thread