Inside a shipping consumer Mac app
April 2026 shipped seven frontier LLMs. Inside our binary, two of them route a token.
Every other guide on the April release wave is a market timeline. This one is a parts list. It walks through the four model roles inside the Fazm Mac app, names the exact API string each role is bound to, points at the Swift line that holds the binding, and lists which of April’s new launches would have to wait for new code before any of them could run.
The parts list, role by role
April’s headline number is the count of frontier LLMs that landed in two weeks: Claude Opus 4.7, Claude Mythos Preview, GPT-5 Turbo, GPT-5.5, DeepSeek V4-Pro, DeepSeek V4-Flash, Gemini 2.5 Pro, Gemma 4 31B, Llama 4 Scout, Llama 4 Maverick, GLM-5.1, Qwen 3.6-Plus. The longer that list grows, the easier it is to forget that a consumer Mac app has a small, finite number of model roles and that each one has to be bound to a specific API string.
Fazm has four roles. Three of them speak Anthropic’s Claude agent SDK, one speaks Google’s Gemini API. Below is what each one does, what it is bound to in the source today, and which April release affected it.
Main chat session
Pre-warmed at startup. Model: claude-sonnet-4-6. Hardcoded at ChatProvider.swift line 1048. Carries the user's primary chat history and runs every long-form question they ask in the main window.
Floating bar session
Pre-warmed at startup. Model: claude-sonnet-4-6. Hardcoded at ChatProvider.swift line 1049. Powers the cmd+J quick-question bar that floats above whatever app is focused.
Observer session
Pre-warmed at startup. Model: claude-sonnet-4-6. Hardcoded at ChatProvider.swift line 1050. Runs in parallel and writes long-term memory entries while the user chats. Largest token consumer in the binary.
Hourly screen analysis
Background loop. Model: gemini-pro-latest, currently resolving to gemini-2.5-pro after April 1, 2026 GA. Pinned at GeminiAnalysisService.swift line 67. Drains 60 minutes of session recordings into one batch verdict.
Smart picker override
Per-query, on demand. Model: whatever the agent SDK reports as the current opus alias (claude-opus-4-7 after April 22). The Fast and Scary picker buttons stay on Sonnet and Haiku respectively. The Smart button is the only path Opus 4.7 takes inside Fazm today.
The anchor fact
0 roles. 0 providers. The catalog fits in two API strings.
The conversational layer is bound to claude-sonnet-4-6 at three call sites in ChatProvider.swift, lines 1048 through 1050. The screen-analysis loop is bound to gemini-pro-latest at one call site in GeminiAnalysisService.swift, line 67. That is the entire model catalog Fazm shipped through April 2026.
The three Claude sessions, all on Sonnet 4.6
Fazm pre-warms three Anthropic sessions when the app starts. Each one has its own system prompt, its own conversation thread, and its own callback wiring; none of them pre-warms with Opus. Here is the literal warmup call.
Why all three on Sonnet: cost, plus diminishing returns. The observer alone runs through more tokens than any other session because it watches every conversation batch and writes memory entries. Promoting that loop from Sonnet to Opus would roughly triple its bill (Sonnet is 3 dollars input and 15 dollars output per million tokens; Opus is 5 and 25) for output that does not measurably improve at the structured-extraction work the observer does.
The April release wave did not change this. Opus 4.7’s GA on April 22 is reachable through the Smart picker on a per-query basis, but the always-on Sonnet sessions stay where they are.
The hourly Gemini loop, pinned to a moving alias
The fourth role is structurally different. It does not run on Anthropic at all. It is a 60-minute background loop that drains buffered session-recording chunks into Google’s File API, lets Gemini watch them, and asks for a NO_TASK / TASK_FOUND / UNCLEAR verdict. The analyzer is bound by literal alias, not version.
Why Gemini and not Claude: native video. Anthropic’s API does not accept multi-clip video uploads at the latency or cost Fazm needs for an hourly background task, so the loop runs on the Google side. The two providers never share a session; the conversational layer never sees Gemini, the analysis layer never sees Claude.
Why a moving alias: the analyzer fires once an hour in the background, so a silent server-side upgrade has a small blast radius. If the alias resolves to a different Gemini Pro version one hour to the next, the worst-case is a single batch comes back with slightly different verdict shape, which the parsing code already tolerates with the UNCLEAR fallback. That tolerance is what made the April 1 Gemini 2.5 Pro GA a non-event for Fazm: the next analysis after the cutover fired against 2.5 Pro automatically, and the trace at line 264 simply recorded the resolved name as gemini-2.5-pro instead of the prior version.
Two API strings, four model roles, and the path each user request takes
Release by release: routed, aliased, or unwired
The fork below is the honest version of where every April 2026 release lands inside the binary. The left column is what you would see if you treated the release wave as a scorecard and made every model a picker row. The right column is what the binary actually does.
| Feature | Scorecard picker (every model = a row) | What Fazm actually does |
|---|---|---|
| Claude Opus 4.7 (Anthropic, April 22 GA) | Picker row labeled 'Opus 4.7' | Routed via Smart button per-query |
| Claude Sonnet 4.6 (Anthropic, established) | Picker row labeled 'Sonnet 4.6' | Hardcoded across all 3 pre-warmed sessions |
| Gemini 2.5 Pro (Google, April 1 GA) | Picker row labeled 'Gemini 2.5 Pro' | Resolved automatically via gemini-pro-latest alias |
| GPT-5 Turbo (OpenAI, April 7) | Picker row labeled 'GPT-5 Turbo' | Not wired. Requires a new ACP-style bridge |
| DeepSeek V4-Pro / V4-Flash (open MoE) | Picker row labeled 'DeepSeek V4-Pro' | Not wired. Requires a new bridge (hosted or local) |
| Gemma 4 31B Dense (Google, Apache 2.0) | Picker row labeled 'Gemma 4 31B' | Not wired. Local-inference bridge target |
| Llama 4 Scout / Maverick (Meta, MoE) | Picker row labeled 'Llama 4 Scout' | Not wired. Local-inference bridge target |
| GLM-5.1 (Zhipu, MIT) / Qwen 3.6-Plus (Alibaba) | Picker row labeled with each model name | Not wired. Hosted-API bridge target |
The pattern: when a new model is in the same provider catalog Fazm already speaks, absorption is automatic or one-line. When it is a different provider, a different protocol, or a local-inference target, it has to wait for a bridge. April’s wave produced two of the first kind and roughly seven of the second.
What April 2026 shipped that does not run in Fazm today
Honest accounting matters here. The list below is every notable April 2026 release that is not currently bound to any role in the binary. Not because they are bad; the binary just does not have a bridge for them. Each one represents a discrete piece of work, not a config flag.
Releases waiting on a bridge
- GPT-5 Turbo (April 7) — would need an OpenAI responses-API bridge
- GPT-5.5 / GPT-5.5 Pro (April 24) — same bridge target as GPT-5 Turbo
- DeepSeek V4-Pro and V4-Flash (April 24) — open MoE, hosted or local bridge
- Gemma 4 31B Dense (April 2, Apache 2.0) — local-inference bridge target
- Llama 4 Scout / Maverick (April 5) — local-inference bridge target
- GLM-5.1 (744B MoE, MIT) — hosted-API bridge
- Qwen 3.6-Plus (1M context, agentic coding) — hosted-API bridge
- Claude Mythos (Project Glasswing, ~50 partners) — gated, not generally available
The April 2026 timeline as the binary saw it
Same fortnight, but viewed through the lens of what actually changed inside an installed copy of Fazm. Most of the wave was background news; only two days mattered.
Five dates that mattered to the binary
April 1: Gemini 2.5 Pro GA
Google flips the gemini-pro-latest alias to 2.5 Pro. Inside Fazm, the next 60-minute analysis batch fires against 2.5 Pro automatically because GeminiAnalysisService.swift line 67 hardcodes the alias, not the version. The user-visible signal is one row in the local external_llm_traces table where the resolved model goes from gemini-2.0-pro to gemini-2.5-pro.
April 7: GPT-5 Turbo, Claude Mythos Preview
OpenAI ships GPT-5 Turbo with native multimodal generation. Anthropic announces Claude Mythos through Project Glasswing for ~50 partner organizations. Inside Fazm, neither of these moves a single token. The conversational layer talks to the Claude agent SDK; GPT-5 Turbo would need a second bridge, and Mythos is gated behind a partner program Fazm is not in.
April 20: Fazm 2.4.0 ships
The model picker stops hard-coding IDs. availableModels becomes a @Published array fed by the Claude agent SDK at runtime. This is the change that makes Opus 4.7 absorbable two days later. The Gemini side does not need this change because its alias was already dynamic from day one.
April 22: Claude Opus 4.7 GA
Anthropic pushes the new Opus ID into the agent SDK. Inside Fazm, the dynamic picker at SettingsPage.swift line 2025 absorbs it on the next app launch. The user's Smart button now routes to Opus 4.7 instead of Opus 4.6. The hardcoded Sonnet sessions are unaffected. No build, no migration step.
April 24: GPT-5.5, DeepSeek V4, more open weights
OpenAI ships GPT-5.5. DeepSeek drops V4-Pro and V4-Flash open-source. The pace at other labs accelerates. Inside Fazm, none of these change the binary's behavior. The architecture is open to a second bridge but no second bridge has been written, so the picker stays at three labels and the analysis loop stays on Gemini.
How to verify which model handled your last query
You do not have to take any of this on faith. The Fazm dev log writes the resolved model ID on every chat query and on every background analysis batch. Tail the log and watch.
The first three lines are the conversational layer. Notice that floating and main both show claude-sonnet-4-6 by default; the third only shows claude-opus-4-7 because the user explicitly clicked the Smart picker for that turn. The bottom block is the hourly Gemini loop, which logs the resolved model name (currently gemini-2.5-pro because that is what gemini-pro-latest currently resolves to).
“The April 2026 release wave brought 12 frontier LLMs in two weeks. Inside the Fazm binary, two of them moved a token: Gemini 2.5 Pro on April 1 (via the gemini-pro-latest alias) and Claude Opus 4.7 on April 22 (via the dynamic Smart picker).”
Source-line audit, Fazm Desktop, April 24, 2026
Why a consumer Mac app picks this shape
A consumer downloads Fazm because they want a smart friend on their Mac, not because they have an opinion about which provider trained the better model on which benchmark. That single fact dictates the shape of the parts list:
- The conversational layer should be one provider with a known voice. Sonnet 4.6 is the everyday default because Opus 4.7 costs more without speaking measurably better at three-sentence chat answers.
- The expensive model is reachable but not default. The Smart button exists for the rare hard question, not for every short follow-up.
- The background analysis loop should be the cheapest viable multimodal model that can watch video. Gemini Pro fits; Claude does not currently take video at the latency or cost Fazm needs for an hourly batch.
- Routing to a new provider is real engineering work, not a config flag. That is why GPT-5 Turbo, DeepSeek V4, Gemma 4, and Llama 4 are not in the binary even though every blog this month says they should be.
Want a tour of which model handles which role on your Mac?
Book 20 minutes. I will screenshare the dev log live so you can see claude-sonnet-4-6 on the chat sessions and gemini-2.5-pro on the hourly analysis loop, then flip the Smart picker and watch the model ID change in real time.
Questions readers actually ask
Which new LLMs released in April 2026 actually run inside the Fazm Mac app today?
Two providers, four roles. On the Anthropic side, three Claude sessions are pre-warmed at startup (main, floating, observer) and all three are hardcoded to claude-sonnet-4-6 at ChatProvider.swift line 1048 through 1050. The picker can route a single user query to claude-opus-4-7 via the Smart button, but the always-on sessions stay on Sonnet. On the Google side, GeminiAnalysisService.swift line 67 pins a fourth role to the alias gemini-pro-latest, which currently resolves to Gemini 2.5 Pro after Google's April 1 GA. That is it. GPT-5 Turbo, DeepSeek V4, Gemma 4, Llama 4 Scout, Llama 4 Maverick, GLM-5.1, and Qwen 3.6-Plus do not route a single token through the shipping binary today.
What does the screen analysis loop actually do, and why is it on Gemini instead of Claude?
The buffer is set at GeminiAnalysisService.swift line 69 to targetDurationSeconds: 3600. After every 60 minutes of session recordings, the loop concatenates 60 video clips of whatever app was focused at each moment, sends them to Gemini's File API, and asks for a verdict: NO_TASK, TASK_FOUND, or UNCLEAR. The reason this lives on Gemini and not on Claude is multimodal video. Anthropic's API does not accept native video uploads at the latency or cost Fazm needs for an hourly background task, so the screen-watching loop runs on Gemini and the conversational layer runs on Claude. They never share a session. The split is by media type, not by quality.
How did the binary pick up Gemini 2.5 Pro on April 1 with zero changes?
Because the model field at GeminiAnalysisService.swift line 67 is the literal string gemini-pro-latest, not gemini-2.0-pro or gemini-1.5-pro. Google's API resolves that alias server-side to whichever Gemini Pro is the current GA model. April 1 was the GA cutover for 2.5 Pro, and from that morning every Fazm install in the wild started pulling 2.5 Pro for screen analysis without a single download, sparkle update, or App Store review. The trace at line 264 records the resolved name as gemini-2.5-pro, which is what shows up in the user's external_llm_traces table after April 1.
Why is the conversational layer pinned to Sonnet 4.6 across all three sessions instead of Opus 4.7?
Cost. The observer session is the largest token consumer in the binary — it watches every conversation batch and writes memory entries to local files. If the observer ran on Opus 4.7 (5 dollars input, 25 dollars output per million tokens) instead of Sonnet 4.6 (3 dollars, 15 dollars), the bill for an active user would roughly triple, with no measurable benefit on the kind of structured-extraction work the observer does. The same logic holds for the floating bar's default queries: most of them are short follow-ups where Sonnet's quality is already at the ceiling. Opus 4.7 is reachable, but only when the user explicitly asks for it by selecting the Smart label in the picker.
Which April 2026 releases would have to be wired up before the binary could route a token to them?
Every non-Anthropic, non-Google release. GPT-5 Turbo would need a second ACP-style bridge that talks to OpenAI's responses API, with its own emitModelsIfChanged so the dynamic picker can absorb it the same way it absorbed Opus 4.7. DeepSeek V4-Pro and V4-Flash would need either a hosted-API bridge or a local-inference bridge, depending on whether you want cloud or on-device. Gemma 4 31B and the Llama 4 family are open-weight, so a bridge to llama.cpp or vLLM running locally is the natural target. Each of these is a discrete piece of work, not a config flag — until that bridge exists, the picker simply cannot show those models.
What changes for an existing Fazm user when a new Anthropic model goes GA mid-month?
Almost nothing visible. The dynamic picker absorbs the new ID at the next session/new call without a Fazm build. If the user had a stored selection like claude-opus-4-6, the upgrade chain at ShortcutSettings.swift line 208 through 210 finds a model whose ID contains opus and silently migrates the selection forward. The label they see is still Smart. The keyboard shortcut they bound is still bound. The Settings preference pane is still consistent. The only thing that changed is the model ID in the underlying selectedModel string and the API surface the next query lands on.
Is the screen analysis loop optional, and what happens if I disable it?
Yes. ShortcutSettings.swift line 299 through 301 holds @Published var screenObserverEnabled, defaulting to true. With it off, no clips are buffered, no 60-minute analysis fires, no chunks accumulate at Application Support/Fazm/gemini-analysis. Gemini 2.5 Pro's daily token contribution drops to zero for that install, and the cost shifts entirely onto the Anthropic side. Disabling the observer is also the cleanest way to verify how much of your token spend was background analysis versus interactive chat, because the recordLlmUsage call at GeminiAnalysisService.swift line 254 tags those tokens with account observer_gemini.
Why does Fazm pin one model with a 'latest' alias and another with a hardcoded version?
Different blast radii. Gemini's analyzer runs once an hour in the background — if the alias quietly upgrades from 2.0 Pro to 2.5 Pro, the worst case is one analysis comes back differently shaped, which the prompt at line 56 through 64 already tolerates with the NO_TASK verdict. The conversational layer is real-time and user-facing, so a silent upgrade from Sonnet 4.6 to Sonnet 5.0 mid-conversation could change the voice in the middle of someone's question. Pinning the conversational sessions to claude-sonnet-4-6 keeps the upgrade path explicit: Anthropic ships a new model, the dynamic picker shows it, the user picks Smart or Fast deliberately. Background loop gets the latest alias; foreground UI gets the pinned version.
Has the Fazm architecture changed because of the April 2026 release wave?
Two changes in two weeks, both calibrated for it. April 20, 2026 shipped Fazm 2.4.0 with the dynamic picker that reads availableModels from the Claude agent SDK at runtime, which is what made absorbing Opus 4.7 on April 22 a non-event. The Gemini analyzer was already pinned to gemini-pro-latest before April, so the April 1 cutover happened on its own. Nothing about GPT-5 Turbo, DeepSeek V4, Gemma 4, or Llama 4 changed the Fazm code path because none of them run in the binary today. The architectural answer to those releases is a new bridge, and that has not been written yet.
Where can I see exactly which model handled a specific request inside the app?
Two places. For Gemini-handled background analysis, recordExternalLlmTrace at GeminiAnalysisService.swift line 263 writes the resolved model name (gemini-2.5-pro after April 1) into the local trace store with source fazm_observer. For Claude-handled chat queries, the log line at ChatProvider.swift line 2559 emits Chat query started (session=..., mode=..., model=..., cwd=...). Tail dev log from the Fazm menu, or read /tmp/fazm-dev.log directly, and you will see the exact model ID that handled each query. The model field is whatever the picker resolved at click time — claude-sonnet-4-6 for Fast, the current opus alias for Smart.
Adjacent pieces, written from inside a shipping consumer Mac app
More on the April 2026 release wave
Large language model releases, April 2026
The other half of the picker story: how the three-label UI absorbed five frontier launches without a single new row.
Open source LLM, April 2026 news
Where Gemma 4, Llama 4, and DeepSeek V4 fit on the bridge map, and what would need to change before they could run on a Fazm install.
Anthropic Claude latest news, April 2026
Why Sonnet 4.6 is the everyday default, why Opus 4.7 is the per-query override, and why the observer never gets either.