AI LLM news, April 2026Shipping product viewMac desktop agent

AI LLM news, April 2026: which model is actually shipping inside a Mac automation app

Every April 2026 LLM news recap catalogs the same benchmark scores and the same per-token price tables. None of them show you what a real shipping product does with those numbers. Fazm runs three Claude 4.x tiers side by side, plus Gemini 2.5 Pro on a separate loop, and in April 2026 shipped a one-time migration that silently moves Opus-default users back to Sonnet. The story behind that migration is the story the leaderboards cannot tell.

Fazm

Published April 17, 202611 min read

Try Fazm free

4.9from 200+

Every model fact traced to a real file and line in the Fazm desktop source tree

Includes the one-time Opus to Sonnet migration shipped in April 2026

Covers why Gemini 2.5 Pro runs a separate loop from the Claude agent

April 2026 LLM news, through a shipping Mac app

Three Claude tiers + Gemini 2.5 Pro, picked for the job, not the benchmark

Haiku 4.5 = 'Scary' (fast reflexive queries)

Sonnet 4.6 = 'Fast' (main agent loop, default)

Opus 4.6 = 'Smart' (hard plans only, auto-migrated off default)

Gemini 2.5 Pro = passive screen analysis

Each model picked because of what it actually does in prod

0:00 / 0:05

The April 2026 numbers, and the four that shipped in a real app

0Claude tiers in Fazm's picker (Haiku 4.5, Sonnet 4.6, Opus 4.6)

0Line in ShortcutSettings.swift where the picker is defined

0Line where the Opus to Sonnet migration runs

0LLM providers wired into the default config (Anthropic + Google)

The headline SERPs around this query all lead with Claude Opus 4.6 SWE-bench 65.3%, Gemini 3.1 Pro 2M context, and DeepSeek R2 AIME 92.7%. Those are the loud facts. The four above are the quiet ones, and they live in Fazm's shipping binary.

Opus to Sonnet

“One-time migration: switch existing Opus users to Sonnet (Opus burns through rate limits too fast)”

ShortcutSettings.swift line 315 comment, April 2026

The anchor fact: a production decision the SERP cannot show you

When Claude Opus 4.6 landed in April 2026, every news recap led with its SWE-bench Verified score and its Arena ranking. Inside Fazm, a different signal dominated: users who had set Opus as their default were hitting their per-hour rate cap well inside normal usage. The April 2026 update ships a one-time migration that flips them back to Sonnet 4.6. It runs exactly once per user, guarded by a separate UserDefaults flag so it never fires again. Here is the code.

Desktop/Sources/FloatingControlBar/ShortcutSettings.swift, lines 315-324

This is the part the leaderboards cannot capture. The benchmark delta between Opus 4.6 and Sonnet 4.6 is real. The rate-limit delta is also real, and in the actual usage pattern of a desktop agent that fires many tool calls per minute, the rate-limit delta wins.

The three April 2026 Claude tiers, as they appear in the settings screen

Fazm's model picker ships a static default list in ShortcutSettings.defaultModels that gets overridden at runtime by the model list the Claude Code runtime publishes through ACP. The labels ('Scary,' 'Fast,' 'Smart') are editorial choices designed to steer users toward the right tier for their task without forcing them to read benchmark tables.

Desktop/Sources/FloatingControlBar/ShortcutSettings.swift, lines 150-155

Three tiers, three jobs. Haiku 4.5 serves reflexive questions from the floating bar. Sonnet 4.6 is the default for the main agent loop. Opus 4.6 is there for users who explicitly want it on hard multi-step plans. The editorial labels reveal what the leaderboards do not: the 'Smart' model is not the default.

How an April 2026 request actually gets routed

The April 2026 news cycle makes it sound like you pick a model and then that is your agent. In practice a shipping product routes different request types to different models. Here is the Fazm view: different Mac apps feed the same accessibility tree capture, and the capture is then routed to whichever LLM fits the task.

Apps on the left, models on the right, accessibility-tree hub in the middle

The accessibility-tree hub matters because it is why text-only Claude models work great for the control loop. The multimodal-heavy April 2026 news (Gemini 3.1 Pro 2M context, GPT-5 Turbo native multimodal) is less load-bearing for this path than for a screenshot-based agent, which needs pixels in the prompt.

Why Gemini 2.5 Pro lives on a different loop

Not every task inside Fazm is an accessibility-tree agent call. A separate actor buffers session recording chunks, waits a few minutes, and ships the whole batch to Gemini for vision-based analysis plus function calling into the local database. Different input shape, different length, different model. The filename and model string are visible in the source.

Desktop/Sources/GeminiAnalysisService.swift, line 67 and line 264

This loop runs independently of whichever Claude tier the user has selected for the chat. They are two pipes, picked for two different strengths. The April 2026 news that matters here is not Gemini 3.1 Pro's 2M context (which this loop does not need), it is the steady improvements in bulk multimodal pricing and function-calling reliability that Gemini 2.5 Pro already has and that the loop relies on.

The April 2026 lineup, rated by whether they actually ship in Fazm

This is not a benchmark ranking. It is a rating by what percentage of Fazm users' traffic each model actually handles today. Four of them do. The rest are reachable through the Custom API Endpoint field if a user sets up a shim, but they are not the default path.

Claude Sonnet 4.6

Label in Fazm: 'Fast (Sonnet).' Default for the main agent loop. April 2026's working consensus pick for desktop automation that has to chain many tool calls and not burn through per-hour rate limits. ShortcutSettings.swift:153.

Claude Opus 4.6

Label in Fazm: 'Smart (Opus).' Still available; still the smartest on complex multi-step plans. Auto-migrated away from as default in April 2026 because it chews through user rate limits during normal agent use.

Claude Haiku 4.5

Label in Fazm: 'Scary (Haiku).' Full ID claude-haiku-4-5-20251001. For reflexive one-shot queries in the floating bar where speed beats reasoning depth.

Gemini 2.5 Pro

Not in the agent loop. Used by GeminiAnalysisService for passive screen analysis: multi-minute video chunks, function calling into the local database, long-context reasoning. Different job, picked for multimodal strength.

GPT-5 Turbo

April 2026 OpenAI refresh with native multimodal. Not wired into Fazm's default loop. Reachable through Custom API Endpoint with an Anthropic-shape shim in front, like any other non-Claude frontier model.

Gemini 3.1 Pro

2M context, Vertex GA. Bigger context than Fazm's accessibility-tree capture typically needs. Same shim story as GPT-5 Turbo if you want to route the main loop through it.

What happens to a user request, step by step

Most 'ai llm news april 2026' recaps show you a pricing table. Here is the alternative: what actually happens inside a shipping Mac app when a user presses a hotkey.

1. User opens the floating bar and asks a quick question

One-shot, no tool calls expected. Latency matters more than reasoning depth. Haiku 4.5 is labeled 'Scary' in the picker for this tier.

2. User sends a normal Ask Fazm request

The default path. Sonnet 4.6 serves the main chat session and drives tool calls against the accessibility tree. Fast, reliable, does not torch the user's hourly cap.

3. User explicitly picks Opus for a hard task

Opus 4.6 is the 'Smart' option in the picker. Still available for complex multi-step plans. The one-time migration does not remove it, it only stops it from being the default for users who were on it.

4. The screen observer wakes up in the background

GeminiAnalysisService buffers up to a few minutes of session recording and sends it to Gemini 2.5 Pro with function calls to the local database. This is a completely separate loop from the Claude chat, running on a different model for a different reason.

5. If the user has a Custom API Endpoint set

ACPBridge.swift reads the customApiEndpoint UserDefaults key and injects it as ANTHROPIC_BASE_URL before the ACP subprocess spawns. That rewires the entire Claude-shaped path through a shim, which can sit in front of GPT-5 Turbo, Gemini 3.1 Pro, DeepSeek R2, or any April 2026 release.

Verify the April 2026 migration yourself

If you do not trust the claim that Fazm silently moves Opus users back to Sonnet, here is the check. Every line below is grep-able against the Fazm desktop source tree.

Three greps, one migration

Opus 4.6 versus Sonnet 4.6, through a shipping-product lens

The April 2026 news cycle mostly compares these two on benchmarks. For a desktop agent that drives Mac apps through accessibility APIs, the relevant axis is not raw benchmark deltas. It is cost, latency, and rate-limit friendliness over sustained usage.

Feature	Opus 4.6 (Fazm 'Smart' opt-in)	Sonnet 4.6 (Fazm default)
Fazm label	Smart	Fast
Default for new users in April 2026	No	Yes
Existing Opus users migrated here on update	They get moved off	Yes (from Opus)
Tool-call format stability in agent loops	Highest	High
Per-hour rate limit headroom during normal use	Gets hit during normal use	Plenty
Cost per multi-step agent session	High	Low
Right choice for reflexive one-shot queries	No	No, use Haiku 4.5 (Scary)
Right choice for complex multi-app plans	Yes, explicitly	Usually fine

April 2026 LLM news, condensed

The headlines every recap covers. Fazm defaults wire up two of these. The rest are reachable through Custom API Endpoint plus a shim.

Claude Sonnet 4.6Claude Opus 4.6Claude Haiku 4.5Gemini 3.1 Pro (2M ctx)Gemini 2.5 ProGPT-5 TurboDeepSeek R2Mistral Large 3Llama 4 MaverickLlama 4 ScoutQwen 3.5Gemma 4Codestral 2

Three things the shipping-product view adds to the April 2026 benchmark tables

Not instead of the benchmark tables. Beside them. The benchmarks give you the upper bound. A shipping product gives you the actual operating point.

Rate-limit reality

Opus 4.6 tops SWE-bench. It also tops the speed at which your account hits its per-hour cap. Sonnet 4.6 wins the default because the cap does not care about benchmarks.

Multi-model routing is the real answer

One model for reflexive queries, one for the agent loop, one for vision-heavy background analysis. The April 2026 releases matter in relation to each other, not as a single winner.

Accessibility-tree input changes the math

Text payloads make the vision SKUs of the April 2026 wave optional on the control loop. That shifts which benchmark columns matter and which ones are marketing.

See the April 2026 model picker on your own Mac

Fazm ships with Claude Haiku 4.5, Sonnet 4.6, and Opus 4.6 in the picker, plus Gemini 2.5 Pro running a separate screen-analysis loop in the background. The main agent uses the macOS accessibility tree, not screenshots, which is why text-first Claude tiers work so well. Sonnet 4.6 is the default. Opus is one click away when you actually need it.

Download Fazm →

Frequently asked questions

What were the headline LLM releases in April 2026?

Anthropic's Claude Sonnet 4.6 and Opus 4.6 refreshed the frontier with SWE-bench Verified scores in the mid-60s and near-zero tool-call hallucination rates in structured workflows. OpenAI released GPT-5 Turbo with native multimodal. Google shipped Gemini 3.1 Pro with a 2M token context window, general availability on Vertex. Mistral announced Large 3 with EU data-residency positioning. DeepSeek R2 pushed AIME 92.7% with a ~70% price cut relative to frontier cloud offerings. Meta's Llama 4 Maverick and Scout moved Mixture of Experts into mainstream open weights. Anthropic also shipped Claude Haiku 4.5 as the tier below Sonnet: fast, small, cheap.

Which April 2026 model should I actually use for a desktop agent?

Depends on the job inside the agent. For the main accessibility-driven control loop where reliability matters more than per-token cost, Sonnet 4.6 is the real-world pick, not Opus. In April 2026 Fazm shipped a one-time migration at Desktop/Sources/FloatingControlBar/ShortcutSettings.swift lines 315-324 that moves Opus-default users back to Sonnet, because Opus burns through rate limits too fast in agent loops that fire many tool calls per minute. Opus is still in the picker (labeled 'Smart') for users who want it on specific hard tasks, and Haiku 4.5 is labeled 'Scary' for fast reflexive queries. For passive screen analysis where you feed back-to-back minutes of video and need multimodal, Gemini 2.5 Pro is used instead of any Claude model.

Why does a shipping product run three different Claude tiers at the same time?

Because the benchmark leaderboard is not the decision. Haiku 4.5 (ModelOption.id 'claude-haiku-4-5-20251001' in ShortcutSettings.swift line 152) is the cheapest and fastest, used for one-shot reflexive answers from the floating bar. Sonnet 4.6 (line 153) is the default for the main agent loop: fast enough to feel interactive, reliable enough for chained tool calls, cheap enough to run through a day of real usage without hitting your rate limit. Opus 4.6 (line 154) is reserved for hard multi-step planning where the user explicitly opts in. The three-model picker is the practical translation of the April 2026 benchmark chart into actual product defaults.

What does 'Opus burns through rate limits' mean exactly?

Opus is slower and more expensive per token, so sustained agent use against a user's Claude subscription chews through the account-level throttle faster than Sonnet does at the same task success rate. A user opening Fazm, asking Ask Fazm four questions, and running one multi-step agent loop on Opus 4.6 can hit their subscription's per-hour Opus cap before lunch. On Sonnet 4.6 the same workload runs all day and the agent still finishes tasks. That is the practical gap that turned up in production and prompted the one-time migration at ShortcutSettings.swift:318-321.

Why does Fazm use Gemini 2.5 Pro for screen analysis instead of a Claude vision model?

Different job, different requirements. The passive screen observer in GeminiAnalysisService.swift (line 67 uses 'gemini-pro-latest', the analyzed records at line 264 tag model='gemini-2.5-pro') accumulates multi-minute video chunks of what happened on your screen and feeds them through an agentic loop with function calls into the on-device database. It needs a strong multimodal model that handles long video context and cheap bulk vision inference, which is exactly where Gemini 2.5 Pro is strongest. The main control loop does not need vision because it uses the macOS accessibility tree as the payload, which is structured text. Two loops, two models, picked for what they are actually best at rather than one model stretched across both.

How does accessibility-tree input change which April 2026 model you need?

Most agents that compete with Fazm use screenshots, which forces a multimodal model. That biases them toward the top of the April 2026 benchmark chart because the vision-capable SKUs are the most expensive ones. Fazm sends text (role, label, value, coordinates for each on-screen element, captured through AXUIElementCreateApplication) so tool-call reliability and planning depth matter more than vision. Sonnet 4.6 is a strong fit for that because it nails tool-call format stability and multi-step planning without needing to understand pixels. That is why the April 2026 'best model' question looks different from the Fazm seat than from a screenshot-based agent's seat.

Did Fazm change default model in April 2026?

Yes, partially. Users who had explicitly set Opus 4.6 as their default model before April 2026 were migrated back to Sonnet 4.6 at first launch after the update, with a flag to make sure it only runs once. The relevant code is in the init() of ShortcutSettings at Desktop/Sources/FloatingControlBar/ShortcutSettings.swift lines 315-324: it checks UserDefaults for shortcut_selectedModel == claude-opus-4-6 and a separate shortcut_didMigrateFromOpus flag, flips the selection to Sonnet, and sets the flag so it never triggers again. New users start on Sonnet 4.6 already.

What does the ACP model list look like in practice?

ACPBridge.swift exposes an onModelsAvailable callback (lines 220-234 of Desktop/Sources/Chat/ACPBridge.swift) that receives short-alias model IDs from the Claude Code runtime (haiku, sonnet, opus). ShortcutSettings.normalizeModelId (lines 170-180) maps those aliases to the full IDs (claude-haiku-4-5-20251001, claude-sonnet-4-6, claude-opus-4-6) and the picker shows them in a fixed order: Scary, Fast, Smart. When Anthropic publishes a new alias, the list updates without a Fazm release.

Which April 2026 models are off the table for Fazm's default loop?

Anything that does not expose an Anthropic-shape messages endpoint out of the box, unless the user points the Custom API Endpoint field at a shim. GPT-5 Turbo, Gemini 3.1 Pro, DeepSeek R2, Mistral Large 3, and Llama 4 Maverick are all routable through Fazm only if a translation layer is in front of them. The default loop uses Claude Sonnet 4.6 directly. The vision loop uses Gemini 2.5 Pro directly. Other April 2026 frontier releases are one settings change away but not wired in by default.

Is Fazm keeping up with the April 2026 release cadence on the codebase side?

Yes, by design. The three-model picker is a static default list in ShortcutSettings.defaultModels (lines 151-155) that ACP can override at runtime via updateModels (lines 183-217). When Anthropic ships Haiku 5 or Sonnet 5 later in 2026, the picker picks it up from ACP and re-sorts according to the modelFamilyMap (lines 159-163: haiku=Scary=0, sonnet=Fast=1, opus=Smart=2). The point is to treat the April 2026 news as a stream that flows through the settings layer, not a release event that forces a new build.

The part of April 2026 that was not in the recaps

Claude Opus 4.6 is a better model than Sonnet 4.6 on most benchmarks that April 2026 cares about. The Fazm migration is not a statement that it is not. It is a statement that in the specific shape of a desktop automation loop that fires tool calls against accessibility trees, with real users and real per-hour rate caps, Sonnet finishes more tasks per day. The leaderboards do not model that constraint because they do not have to.

The April 2026 news, read through a shipping product, is a routing problem more than a picking problem. Haiku 4.5 for the quick thing. Sonnet 4.6 for the working loop. Opus 4.6 for the hard plan. Gemini 2.5 Pro for the vision job. Whichever April 2026 Anthropic-shape model you want, via the Custom API Endpoint, for everything else. That is what is actually in the binary, and it is visible in the source at the line numbers above.

0 tiers. Try Fazm.