LLM updates, April 2026 - consumer Mac app view

April 2026 LLM news, read from seven lines of Swift that moved every Opus user to Sonnet

Every April 2026 LLM roundup leads with the same names: Gemini 3.1 Pro, GPT-5 Turbo, Llama 4 MoE, DeepSeek R2, and whichever Claude 4.x point release shipped that week. Benchmarks, context windows, pricing per million tokens. None of them describe what a shipping consumer Mac app actually changed when those models landed. This page does. The change is seven lines of Swift in a Fazm file called ShortcutSettings.swift, and it is the most honest April 2026 LLM decision I can point at.

Gemini 3.1 Pro, GPT-5 Turbo, Llama 4, DeepSeek R2

Opus to Sonnet migration, one-time

ShortcutSettings.swift:310-316

accessibility-API first

Matthew Diakonov, Fazm, maintainer

Published April 20, 202611 min read

4.9from written from the source, line numbers verified 2026-04-20

Gemini 3.1 Pro

GPT-5 Turbo

Llama 4 MoE

DeepSeek R2

Claude 4.x picker

Opus to Sonnet migration

Accessibility-API first

April 2026. New flagships every week.

What a Mac agent app actually changed

Gemini 3.1 Pro, GPT-5 Turbo, Llama 4, DeepSeek R2 all landed in April

The Fazm picker still shows three tiers: Scary, Fast, Smart

Seven lines of Swift moved every Opus default to Sonnet on first launch

Reason: per-hour rate limit, not benchmark score

Gemini 2.5 Pro still runs a separate passive-analysis loop

0:00 / 0:05

What every April 2026 LLM roundup covered

The first-page search results for "llm updates april 2026 news" point the same way: a table of flagship releases, benchmarks, context windows, and per-million-tokens pricing. Useful work. None of it describes the shape of a real loop running in a real consumer app against a real account.

llm-stats.com: April 2026 model table

TokenCalculator: AI News April 2026

PricePerToken: new model releases

Mean.CEO: LLM news April 2026

Google Blog: Gemini 3.1 Pro rollout

Anthropic: Claude 4.x updates

Meta: Llama 4 MoE announcement

DeepSeek: R2 model card

What those roundups do not mention

The consumer-app gap

None describe a shipping consumer app's actual model-picker change in response to the April releases.
None separate tool-calling loop workloads from benchmark-shaped single-prompt workloads, which is where the rate-limit ceiling matters.
None distinguish accessibility-text agents from screenshot-and-vision agents, and the two populations benefit from totally different April releases.
None cite the cross-loop decision pattern (Claude for chat, Gemini for passive video analysis) that most shipping apps actually use.
None explain why short-alias model IDs (haiku, sonnet, opus) beat hard-coded version strings when the ACP SDK's dynamic model list changes mid-month.

The anchor fact

Seven lines of Swift silently moved every Opus default to Sonnet in April 2026.

The file is Desktop/Sources/FloatingControlBar/ShortcutSettings.swift. The migration block is lines 310-316. It reads the UserDefaults key shortcut_selectedModel, checks whether it contains opus, writes sonnet back, and flips shortcut_didMigrateFromOpus so it never runs again. The comment on line 310 names the reason: Opus to Sonnet rate limit fix. This is the real April 2026 LLM decision in a shipping Mac agent, and it is not in any roundup.

▸ShortcutSettings.swift:310-316 - the seven lines of Swift that migrate Opus users to Sonnet on first launch after April update.
▸ShortcutSettings.swift:151-155 - the three-tier defaultModels array (haiku=Scary, sonnet=Fast, opus=Smart).
▸ShortcutSettings.swift:169-175 - normalizeModelId() that collapses claude-opus-4-6 style IDs to the short aliases the ACP SDK expects.
▸ShortcutSettings.swift:178 - updateModels() receives the dynamic model list from the ACP SDK; new Claude versions land without an app release.
▸GeminiAnalysisService.swift:67 - private let model = "gemini-pro-latest" for the passive screen-observer loop.
▸GeminiAnalysisService.swift:264 - analytics tracer logs the call as "gemini-2.5-pro" against the observer_gemini account.
▸AppState.swift:439 and :470 - AXUIElementCreateApplication, called once per frontmost app and once for Finder; the entry points for accessibility-tree perception.

The migration, verbatim

The code below is the actual source, not paraphrased. It runs inside the ShortcutSettings singleton's private initializer on first launch after the April update. After it runs once, the flag is set and the branch is dead code on every subsequent launch.

Desktop/Sources/FloatingControlBar/ShortcutSettings.swift (lines 310-316)

What Fazm looks like after April, in four numbers

Four counts that tell the whole shape of the app's model strategy. These count up from zero as you scroll them into view.

0Claude tiers in Fazm picker

0Lines of Swift for the Opus to Sonnet migration

0Minutes of buffered video before Gemini analysis fires

0Separate LLM loops in a single app

For context, the passive screen-observer waits 0 minutes of buffered recordings before it fires a single Gemini call. That ratio (many quick Claude chat calls per minute versus one Gemini call per hour) is why the two loops end up on two different model families.

April 2026 flagships, routed to actual loops

Five prominent April releases on the left. Four loops on the right. Only one source-to-sink mapping matters for a given workload. Everything else is news you do not consume.

April 2026 LLMs -> Fazm loops

The four model slots Fazm actually runs

Three visible in the picker, one invisible and passive. Each is picked for the shape of its loop, not for its benchmark score.

Scary (Haiku)

Short-alias id 'haiku'. The fastest, cheapest tier. Used by the floating bar's reflexive one-shot replies. Its per-call cost is low enough that the rate-limit ceiling rarely bites.

Fast (Sonnet)

Short-alias id 'sonnet'. The default for the main agent loop. Sustained tool-call workloads survive a per-hour cap better here than on Opus. Post-April, this is the default for every new install AND for every Opus user the migration touched.

Smart (Opus)

Short-alias id 'opus'. Still in the picker, still selectable. Best on hard plans and multi-step reasoning, but fires through per-hour quota much faster under a tool-calling loop. Pre-April it was a reasonable default, post-April it is an opt-in.

Gemini 2.5 Pro (separate loop)

Not in the picker. Lives in GeminiAnalysisService.swift. Fazm calls it passively every 60 minutes of buffered screen recordings to surface task ideas. Long context, multimodal video, and bulk pricing are the right tradeoffs for that loop, not the chat one.

Short aliases instead of version strings

The picker stores haiku, sonnet, and opus, never full version IDs like claude-sonnet-4-6. The ACP SDK maps short aliases to whichever version is current on Anthropic's side. When Claude 4.7 shipped, the picker picked it up without a Fazm release. This is the pattern that makes mid-month LLM updates a non-event for users.

Desktop/Sources/FloatingControlBar/ShortcutSettings.swift (lines 144-163)

April 2026, inside a shipping consumer app

Five moments from the month. Only one of them required a Fazm source change. The other four flowed through the picker and the ACP SDK without touching the app binary.

Early April - Gemini 3.1 Pro rolls out globally

2M-token context, pricing around $2 input / $12 output per 1M tokens under 200K, ARC-AGI-2 at roughly 77.1 percent. Fazm does not put it into the chat loop. The chat loop uses accessibility-tree text, which is small enough that a 2M context is overkill, and the pricing is still higher than Sonnet for the sustained-tool-calls shape. But Fazm does pay attention for one reason: the passive video-analysis loop already uses Gemini, and Google's model-card page signals the upgrade path. The string 'gemini-pro-latest' at GeminiAnalysisService.swift:67 is what picks up the change.

Early-mid April - GPT-5 Turbo ships with native multimodal

Strong on vision benchmarks. For a browser-agent or screenshot-first app this is the headline. For an accessibility-API Mac agent it is a second-tier change. Fazm does not send pixels to a model in the chat loop; it sends AX tree text. A stronger vision model does not make a text-first loop faster or cheaper. Noted, not consumed.

Mid April - Llama 4 MoE and DeepSeek R2

Open-weights news for self-hosters. Fazm runs in-process on the user's Mac and talks to Anthropic via the ACP SDK, not to a locally hosted model. If a user wants to route through a custom endpoint (ollama, OpenRouter, a proxy) the bridge supports ANTHROPIC_BASE_URL, but the default path does not require it. These releases move the needle for self-hosted agent stacks, not the Fazm default.

Mid-late April - Claude 4.x picker reshuffle

The ACP SDK's dynamic model list picks up new Claude versions without an app release. Fazm's updateModels() at ShortcutSettings.swift line 178 receives the new list and rewrites the three-tier picker. The short aliases haiku, sonnet, opus stay stable; the underlying version floats. Users see 'Fast (Sonnet, latest)' without ever being told a version number.

April - the one-time Opus-to-Sonnet migration fires

On first launch after the April build, every existing user whose default was Opus gets moved to Sonnet. It runs once per install (gated on the shortcut_didMigrateFromOpus boolean). The user is free to switch back from the picker; only the default is touched. This is the actual April 2026 LLM decision a Fazm user feels, and it is a production-shape decision, not a benchmark decision.

Why the same April release list reads differently for two apps

A tool-calling loop on accessibility-tree text and a single-shot multimodal prompt are different workloads. They pay different costs. They benefit from different April releases. Pretending otherwise is how roundup-readers end up defaulting to the wrong model.

Feature	Typical multimodal chat	Fazm (AX-tree, tool calls)
Workload shape	Single large prompt with image or long document	Many small tool calls per minute over accessibility-tree text
Dominant cost	Per-million-tokens input/output price	Per-hour account rate limit, not per-million-tokens
Right April 2026 model	Gemini 3.1 Pro or GPT-5 Turbo for single-shot multimodal power	Sonnet (post-migration), Haiku for reflex, Gemini for passive video
Why Opus gets deprioritized	Not a factor; one prompt does not strain a per-hour cap	Burns per-hour quota fast under sustained tool calls
Why vision flagships matter less	Screenshot-based loops need vision; vision gains pay off directly	AX tree is text, not pixels; vision is wasted on it

The second LLM loop, on a different model family

Not every loop in a consumer app should run on the same model. Fazm's passive screen observer buffers chunks for up to an hour, then sends them to Gemini for multimodal video analysis. Long context, multimodal-native, and cheap bulk pricing are the right tradeoffs for that loop. They are the wrong tradeoffs for chat. So Fazm runs two loops on two families, unapologetically.

Desktop/Sources/GeminiAnalysisService.swift (lines 67, 264)

Why the April vision flagships matter less here

The chat loop never sends pixels to a model. It sends text from the macOS accessibility tree, which is what AXUIElementCreateApplication and AXUIElementCopyAttributeValue return. A stronger vision model (GPT-5 Turbo, Gemini 3.1 Pro) is a huge win for a screenshot-first agent and an irrelevance for a text-first one. This is the single most important thing the April 2026 SERP does not explain: which release benefits which architecture.

Desktop/Sources/AppState.swift (lines 439-472, condensed)

Verify the line numbers yourself

A repo checkout and three shell commands. If any line number on this page has drifted, this is how you catch it. Captured on 2026-04-20 against the main branch.

sed and grep over the Fazm source

The honest takeaway

Read April 2026 roundups for the release dates and prices. Do not read them for a default-model recommendation. The right default for a tool-calling accessibility-API loop is not the highest-benchmark model; it is the model that survives a per-hour rate limit under realistic traffic. Fazm spent April discovering that the hard way, shipped seven lines of Swift to fix it, and left the Opus option in the picker for anyone who wants it. That is the only April 2026 LLM decision I can point at with a line number.

There is a second lesson in here. Two loops, two families. Claude for perception-and-action, Gemini for long-context passive video. An app that forces a single model family to do every job pays for it. The April release list is more useful when you read it as a menu of specialized tools, not a leader board. The flagship that wins the benchmark may lose the loop, and the reverse is just as likely. Pick per loop, not per app.

If any of this affects your own app, the public Fazm source is the clearest place I know to read the pattern end to end. Clone the repo, grep the file names on this page, and watch the normalizer at line 169 map a legacy model ID to a short alias the ACP SDK understands. That is five minutes of reading that will save you a release.

Want the three-tier model picker pattern in your own agent app?

I will walk through Fazm's ShortcutSettings.swift migration, the ACP SDK dynamic list, and the two-loop Claude-plus-Gemini split on a 20-minute call.

Book a call →

LLM updates April 2026 news FAQ

What are the major LLM updates in April 2026?

The April 2026 flagships that filled the SERP: Google rolled out Gemini 3.1 Pro globally in early April with a 2M context window, under-200K pricing at about $2 input and $12 output per 1M tokens, and an ARC-AGI-2 score around 77.1 percent. OpenAI shipped GPT-5 Turbo with native multimodal and tool-use upgrades. Meta pushed Llama 4 in a Mixture-of-Experts configuration. DeepSeek released R2 with strong math scores. Anthropic was mid-stream on the Claude 4.x line (4.6 to 4.7 transitions in the picker). Every public roundup leads with those six points. This page is about what actually changed inside a shipping Mac agent in April when those models landed, which is not what the benchmarks predicted.

What did Fazm change in April 2026 in response to the new LLM lineup?

Fazm added a one-time silent migration that moves every user who had Opus as their default model back to Sonnet on first launch after update. It is seven lines of Swift in Desktop/Sources/FloatingControlBar/ShortcutSettings.swift at lines 310 to 316. It reads the UserDefaults key shortcut_selectedModel, checks whether the string contains 'opus', writes 'sonnet' back, and flips a boolean flag shortcut_didMigrateFromOpus so the migration runs exactly once and is never re-applied. The comment on line 310 names the reason: 'Opus -> Sonnet rate limit fix'. Opus is still in the picker and still selectable; Fazm only overrides the default for pre-April defaults that chose it.

Why would a Mac agent prefer Sonnet over Opus even though Opus scores higher on benchmarks?

Because the workload shape of a desktop agent loop is not the workload shape benchmark suites measure. A benchmark run sends one large prompt and waits. A Fazm chat fires many small tool calls per minute: read the accessibility tree, plan, click, verify, plan again. Under a per-hour Anthropic account cap, Opus hits the cap faster than Sonnet for the same workload because each Opus call counts more. Sonnet, with a lower per-token weighting, lets the same loop keep running longer before it is throttled. For a user, the practical difference is whether the chat keeps working past lunch. That is the regression the April migration corrected.

What are the three Claude tiers in Fazm's model picker?

Fazm presents three options, declared in ShortcutSettings.swift at lines 151 to 155. 'Scary' maps to Haiku (latest). 'Fast' maps to Sonnet (latest). 'Smart' maps to Opus (latest). The IDs are the short aliases haiku, sonnet, opus, and a normalizer at lines 169 to 175 collapses any legacy full model ID (for example claude-opus-4-6 or claude-sonnet-4-6) to the short alias before it reaches the ACP SDK. That design is why Fazm can ride new Claude versions the day they ship: the app never hard-codes a version string in the UI. When Anthropic rolls Claude 4.7 into the ACP SDK's dynamic model list, Fazm's picker updates automatically via updateModels() at line 178.

Does Fazm use any of the new non-Anthropic April 2026 models?

Yes. Gemini 2.5 Pro runs a completely separate loop in Desktop/Sources/GeminiAnalysisService.swift. The model string is declared at line 67 as 'gemini-pro-latest', and the analytics tracer at line 264 logs it as 'gemini-2.5-pro'. This is the passive screen observer: it buffers screen-recording chunks and, every 60 minutes of recorded content (line 69), sends them to Gemini for multi-minute video analysis to surface suggested tasks. It is decoupled from the chat loop for a specific reason: multimodal video analysis rewards long context and cheap bulk pricing. Chat rewards latency. Different workload, different model, different loop.

Why does Fazm use macOS accessibility APIs instead of screenshots for the chat loop?

Because text is smaller, cheaper, and faster than pixels. A screenshot of a typical Mac window is a few hundred KB of JPEG; the corresponding accessibility tree is a few KB of structured text (role, label, coordinates, visible state, value). For a chat loop that fires tool calls every second, sending pixels to a multimodal model is the expensive path. The accessibility approach lets Fazm use the cheaper, faster text-only Claude tiers (Haiku, Sonnet) for perception and action, and reserve the multimodal Gemini call for the passive video-analysis loop that already assumed long context. The relevant code lives in Desktop/Sources/AppState.swift around lines 439 to 472: AXUIElementCreateApplication gets the frontmost app, AXUIElementCopyAttributeValue reads the focused window, and similar calls walk the tree for Finder and any other AX-enabled app.

How does this change what the April 2026 LLM roundups are useful for?

It reframes which numbers in the roundup actually matter for a given workload. For a cloud chatbot with a big context, Gemini 3.1 Pro's 2M window and pricing change the math. For a vision-first browser agent, GPT-5 Turbo's native multimodal matters. For a Mac agent that reads accessibility trees and fires tool calls, the most relevant April change is not a flagship release at all: it is the fact that Anthropic's rate-limit behavior in production made Opus the wrong default and Sonnet the right one. None of the roundups will tell you that because they cannot; they can only read the public release notes. You have to watch a tool-calling loop run against real accounts to see it.

Can I verify the line numbers in this page?

Yes. Clone the Fazm repository and run these commands from the repo root. To see the one-time Opus migration: sed -n '310,326p' Desktop/Sources/FloatingControlBar/ShortcutSettings.swift. To see the three-tier picker: sed -n '151,163p' Desktop/Sources/FloatingControlBar/ShortcutSettings.swift. To see the Gemini model strings: grep -n 'gemini-pro-latest\|gemini-2.5-pro' Desktop/Sources/GeminiAnalysisService.swift. To see the accessibility-API calls: grep -n 'AXUIElementCreateApplication\|AXUIElementCopyAttributeValue' Desktop/Sources/AppState.swift. All line numbers are current as of 2026-04-20.

What should I take away if I am building a similar app?

Three things. First, a benchmark score does not predict tool-call loop survival under a per-hour rate limit; measure the thing you actually do. Second, a three-tier picker with short aliases (haiku, sonnet, opus) and a normalizer lets the ACP SDK swap in new model versions without an app release. Third, the perception layer and the action layer should be decoupled: Fazm's accessibility reads keep working through Anthropic outages because they never call Anthropic, and the Gemini video-analysis loop is independent of the Claude chat loop for the same reason. Pick models per loop, not per app.