New AI model releases or LLM launches, April 2026Two pipelines, not oneRuntime view, not a roundup

New AI model releases or LLM launches, April 2026: which ones actually move a shipping Mac agent

Every April 2026 roundup lists the same launches. GPT-5 Turbo on April 7, Claude Opus 4.6 and Sonnet 4.6 on April 10, GPT-6 Spud on April 14, Gemini 2.5 Pro at 2M context, Llama 4 Scout at 10M context, GLM-5.1 under MIT, Mistral Medium 3 with open weights. None of them map each launch to a feature in a specific Mac app. From Fazm's runtime, the interesting question is not the list, it is the graph. Fazm runs two separate LLM pipelines. A Claude launch moves one of them. A Gemini launch moves the other. A GPT-5 Turbo launch moves neither. Here is the exact map, pinned to files and line numbers.

Fazm

Published April 20, 202614 min read

Try Fazm free

4.9from 200+

Dual-pipeline architecture pinned to GeminiAnalysisService.swift and acp-bridge/src/index.ts

Maps every April 2026 launch to its actual effect on the product

Exact buffer sizes: 3600s trigger, 120 chunk cap, 1.5 MB upload cutoff

April 2026: two pipelines, one app

Which LLM launches actually move a shipping Mac agent

Every roundup lists the same dozen launches

A Mac agent has more than one LLM pipeline

A Claude launch moves the ACP agent path

A Gemini launch moves the screen observer path

A GPT-5 Turbo launch moves neither, today

0:00 / 0:05

The two-pipeline claim

Most April 2026 roundups quietly assume a single LLM sits at the center of any product. That is not how this one is built. Fazm carries two independent pipelines, each with its own provider, context format, trigger cadence, and billing account. A one-sentence summary of each:

Pipeline A

Conversational agent (Claude, text-first)

The floating bar and the main chat window talk to Anthropic through the ACP SDK. Context is the macOS accessibility tree, not screenshots. Latency budget: sub-second.

Transport: acp-bridge/src/index.ts
Discovery: availableModels on session/new
Default seed: claude-sonnet-4-6 (DEFAULT_MODEL, line 1245)

Pipeline B

Screen observer (Gemini, vision-first)

The observer reads finalized MP4 chunks from the session recorder, buffers 60 minutes, then hits the Gemini File API. Output is a TASK_FOUND verdict and a markdown document.

Transport: GeminiAnalysisService.swift
Trigger: targetDurationSeconds = 3600 (line 69)
Model alias: gemini-pro-latest (line 67)

Because these are two pipelines, an LLM launch does not move the whole product. It moves at most one side of it. The rest of this page walks through which April 2026 launch moves which side, with exact references.

Inputs and outputs at a glance

Every launch is an edge added somewhere in this graph. The trick to reading a roundup is figuring out which edge it modifies.

Fazm runtime: two LLM families in, four product surfaces out

April 2026 launches, as they hit Fazm's runtime

Same month, very different product effects. Watch what each launch does, or does not do.

A roundup drops

The press cycle lists GPT-5 Turbo, Opus 4.6, Gemini 2.5 Pro, Llama 4 Scout, GLM-5.1. Every article looks the same.

April 2026 launch set

Claude Opus 4.6Claude Sonnet 4.6Claude Haiku 4.5Claude Mythos (partner-only)GPT-5 TurboGPT-6 SpudGemini 2.5 Pro (2M ctx)Gemini 2.5 FlashGemma 4Llama 4 Scout (10M ctx)Llama 4 MaverickMuse Spark (Meta proprietary)GLM-5.1 (MIT, 744B MoE)Mistral Medium 3DeepSeek R2Qwen 3.6-Plus

What each pipeline is, in four tiles

Each tile is a fact about the shape of the pipeline, not a claim about a model.

Pipeline A: Conversational agent (Claude via ACP)

Input is text from the macOS accessibility tree plus the user's chat. Context flows through acp-bridge to the Anthropic Agent Client Protocol SDK. Discovery of new models is automatic through the availableModels array on every session/new response.

Pipeline B: Screen observer (Gemini on MP4)

Input is finalized MP4 chunks from SessionRecordingManager, buffered for 60 minutes. Upload goes to the Gemini File API, and one generateContent call produces a TASK_FOUND / NO_TASK / UNCLEAR verdict plus a markdown document.

Latency budget

Pipeline A: sub-second tool turns. Pipeline B: runs at most once an hour on finalized chunks. Very different constraints, very different model choices.

Billing split

Observer tokens post under account: observer_gemini. Agent tokens post through the ACP bridge's own usage accounting. Two ledgers, one app.

Anchor fact: the exact observer constants

The number that tells you a Gemini launch matters more to Fazm than a GPT-6 launch is this one: targetDurationSeconds = 3600. Sixty minutes of buffered screen recording before a single Gemini call. Which means a longer-context Gemini model is product-relevant in a way a longer-context Claude model is not. These are the actual constants from the service, copied verbatim, not paraphrased:

Desktop/Sources/GeminiAnalysisService.swift

0sobserver analysis trigger (targetDurationSeconds, line 69)

0max buffered chunks before FIFO rotation (line 68)

0MBinline vs resumable upload cutoff (line 71)

0spost-failure retry cooldown (line 78)

0independent LLM pipelines in the app

0pre-warmed ACP sessions seeded with claude-sonnet-4-6

2 ledgers, 1 app

“A new Gemini launch shows up in observer_gemini. A new Claude launch shows up in the ACP usage. You can see which launch is moving your product from the billing ledger alone.”

GeminiAnalysisService.swift:261 and acp-bridge usage accounting

Separate accounts: observer_gemini vs ACP bridge usage (0 independent ledgers)

Desktop/Sources/GeminiAnalysisService.swift (billing split)

Five greps that prove the dual-pipeline shape

Do not trust the paragraphs. Run these on a local checkout of the Fazm desktop source and read the matched lines. Each one pins a claim from this page to a specific line of Swift or TypeScript.

Verify on a local checkout

Every April 2026 launch, mapped

Column one is what every roundup says about the launch. Column two is what it actually does inside Fazm today.

Feature	The press release	Inside Fazm
Claude Opus 4.6 (Apr 10)	Arena #1, new reasoning mode	Appears in floating bar picker same day, labeled 'Smart (Opus, latest)' via ShortcutSettings.swift:162
Claude Sonnet 4.6 + Haiku 4.5	Faster 'fast mode', cheaper Haiku	Sonnet 4.6 is the pre-warm seed (DEFAULT_MODEL at index.ts:1245). Haiku 4.5 auto-labels as 'Scary'.
Claude Mythos (partner-only)	Step change above Opus 4.6	Zero effect. Not returned by availableModels behind the 50-company firewall.
GPT-5 Turbo (Apr 7, native multimodal)	Text, image, audio in one call	Zero effect on either pipeline. ACP is Anthropic-only; observer is Gemini-only.
GPT-6 Spud (Apr 14)	New OpenAI frontier	Zero effect for the same reason as GPT-5 Turbo.
Gemini 2.5 Pro (2M context)	Longest hosted context on the market	Directly upgrades the observer. targetDurationSeconds of 3600s can eventually extend because one analyze call covers more buffered video.
Gemini 2.5 Flash	Cheaper vision tier	Candidate for a cheap mode in the observer if cost matters more than depth. Swap at GeminiAnalysisService.swift:67.
Gemma 4 (Apache 2.0)	Open-weight Google family	No integration today. Would need a new local pipeline.
Llama 4 Scout (10M context)	Open-weight, MoE, long context	No integration today. Hosted-only architecture.
GLM-5.1 (MIT, 744B MoE)	SWE-Bench Pro top-tier	No integration today. Same reason.
Mistral Medium 3 (Apr 9, open weights)	EU AI Act metadata	No integration today.

The month, one Fazm-relevant step at a time

Seven moments in April 2026 that actually touch this app's runtime, in order. Press releases that do not show up here did not move the product.

Apr 7: GPT-5 Turbo launches

Native multimodal in one API call. No effect on Fazm's two pipelines today. Moves OpenAI-first apps, not this one.

Apr 9: Mistral Medium 3, open weights

EU AI Act compliance metadata baked in. No integration with Fazm yet; hosted-provider architecture only.

Apr 10: Claude Opus 4.6 and Sonnet 4.6

ACP SDK starts returning claude-opus-4-6 and claude-sonnet-4-6 in availableModels. emitModelsIfChanged at acp-bridge/src/index.ts:1279 writes a models_available JSON line. Picker grows a row automatically.

Apr 14: GPT-6 Spud

OpenAI frontier refresh. Still no effect on Fazm's pipelines for the same architectural reason as GPT-5 Turbo.

Apr 15: Claude Haiku 4.5

Substring match on 'haiku' in ShortcutSettings.swift:160 auto-labels it 'Scary', order 0, so it sorts first in the picker.

Apr 17: Gemini 2.5 Pro at 2M tokens (long-context rollout)

The observer's gemini-pro-latest alias picks it up. Larger context means the 3600-second ring buffer can eventually grow without truncation risk.

Apr 18: GLM-5.1 under MIT

Real news, real benchmark wins. Zero effect on Fazm today. Open-weight models are strictly potential until a pipeline is built for them.

Why run two pipelines and not one big vision model for everything?

Because the agent and the observer have opposite constraints. The agent needs to answer in hundreds of milliseconds when the user presses a hotkey, so pushing a PNG screenshot through a multimodal model every turn is too slow and too expensive. The agent instead reads AXUIElementCopyAttributeValue on the focused window (AppState.swift:441 and 472), which returns structured text about titles, roles, values, and child elements. A text-only Claude model on that input reads the screen just fine.

The observer, on the other hand, is allowed to be heavy. It fires at most once an hour on a buffer of finalized MP4 chunks. Vision is cheap at that cadence, and the whole point is to notice patterns across sixty minutes of screen time that no accessibility snapshot could ever capture.

Fusing them into one model would either make the agent prohibitively slow or make the observer too shallow to spot anything interesting. Two pipelines with different latency budgets was the right shape.

What the April 2026 roundups quietly assume

Every top SERP result for this keyword shares the same implicit model. The table below is why the same roundup can be simultaneously accurate and irrelevant.

The roundup assumes

One product, one model
More benchmark points equals better product
Multimodal always matters
Bigger context always matters
An open-weight release helps every app

What is actually true here

One product, two pipelines
Product-relevant gains are per-pipeline
Multimodal is dead weight on the agent path
Context length helps the observer, not the agent
Open-weight models are unused until a pipeline exists

Want to see which April 2026 launch is actually moving your agent?

Thirty minutes on a call and I'll walk your team through both pipelines, the split billing, and which launches are worth reacting to.

Book a call →

Frequently asked questions

Which AI models and LLMs actually launched in April 2026?

The April 2026 cycle brought Claude Opus 4.6 (Arena #1 for most of the month), Claude Sonnet 4.6, Claude Haiku 4.5, Claude Mythos (partner-only, locked behind a 50-company firewall), GPT-5 Turbo on April 7 with native multimodal input, GPT-6 (codename Spud) on April 14, Gemini 2.5 Pro with a 2M token context, Gemini 2.5 Flash, Gemma 4 (Apache 2.0), Llama 4 Scout at 10M context, Llama 4 Maverick, Meta Muse Spark (first proprietary Meta model), GLM-5.1 (MIT license, 744B MoE, 40B active), Mistral Medium 3 on April 9 (open weights), Qwen 3.6-Plus, and DeepSeek R2. This guide is not a roundup. It is about which of those launches actually moves a shipping Mac agent's behavior and which ones are press releases from that agent's point of view.

Why does a shipping Mac agent care about a model launch at all? Isn't it just an API call?

Because routes matter. Fazm has two independent LLM pipelines wired to two different providers. The conversational agent talks to Anthropic through the ACP SDK, using text and the macOS accessibility tree as context. The screen observer talks to Google through the Gemini File API, uploading finalized MP4 chunks. A GPT-5 Turbo launch touches neither today. A Claude Opus 4.6 launch auto-surfaces in the agent picker. A Gemini 2.5 Pro launch directly upgrades how much video the observer can reason over in a single call. Same month, three different product effects.

What exactly is the screen observer pipeline and which April 2026 launch matters to it most?

It lives in Desktop/Sources/GeminiAnalysisService.swift. The service maintains a ring buffer of finalized MP4 chunks handed to it by SessionRecordingManager. Line 69 sets targetDurationSeconds = 3600, so analysis fires after 60 minutes of buffered recording. Line 68 caps the buffer at maxChunks = 120 as a safety bound. Line 71 flips between inline base64 and resumable upload at inlineSizeLimit = 1_500_000 (1.5 MB). Line 78 sets retryCooldown = 300, a five-minute backoff after any failed analysis. The provider is configured at line 67 with `private let model = "gemini-pro-latest"`, which currently resolves to gemini-2.5-pro. That resolution is why the Gemini 2.5 Pro launch matters most: a longer-context model directly lengthens how much of a user's hour the observer can reason over in one pass.

How is billing split between the two pipelines?

GeminiAnalysisService.swift line 261 calls APIClient.shared.recordLlmUsage with account: "observer_gemini". Line 264 calls recordExternalLlmTrace with model: "gemini-2.5-pro" and source: "fazm_observer". The conversational agent, in contrast, runs through the ACP bridge and is billed through the bridge's own usage accounting, not observer_gemini. The split means you can see in your own usage ledger which LLM launch is pushing your spend. A new Gemini family model shows up in observer_gemini. A new Claude family model shows up in your ACP usage.

Does a GPT-5 Turbo or GPT-6 launch do anything for Fazm today?

Not in the conversational path. The ACP SDK returns availableModels from Anthropic only, so GPT-5 Turbo and GPT-6 do not surface in the floating bar picker at acp-bridge/src/index.ts:1279 today. Not in the observer path either, since GeminiAnalysisService is pinned to the Google generativelanguage.googleapis.com endpoint (line 752 and line 957) and expects a Gemini-shape response. A hypothetical future provider would need its own service, its own file uploader, and its own billing account. That is deliberate. Each pipeline has one provider and a narrow contract, which keeps failure domains small.

Why run two pipelines at all, instead of one vision model for everything?

Different cost and latency profiles. The conversational agent must respond in hundreds of milliseconds to feel interactive, so it reads structured text from the macOS accessibility tree rather than pushing screenshots through a vision model every turn. The observer runs on a 60-minute cadence on finalized MP4 chunks and is allowed to use a heavyweight vision model, because it fires once an hour at most. Fusing them would either make the agent slow or make the observer cheap and useless. So Fazm runs them separately, with two separate launch dependency graphs.

What happens in the buffer if analysis fails?

GeminiAnalysisService.swift line 214 checks lastFailedAnalysis against retryCooldown (300s) before retrying. On success (line 232-234 onward) the buffer is cleared and the chunk files are deleted. On failure the chunks stay on disk so the next attempt can replay them. Buffer state is persisted to buffer-index.json in Application Support (line 138), so the ring survives app restarts, OS updates, and Gemini endpoint outages. A bad afternoon for the Gemini API means the buffer grows up to maxChunks = 120 and then rotates FIFO.

Where does the accessibility-tree vs. screenshots choice come in?

On the conversational path. When the floating bar agent needs to know what the user is looking at, it reads AXUIElement attributes from the focused window in AppState.swift (lines 439 and 470 touch AXUIElementCopyAttributeValue for kAXFocusedWindowAttribute). Those attributes are real structured text, not pixels. A text-only Claude model therefore gets the same fidelity as a multimodal model would get from a screenshot of the same window, except cheaper and faster. That is why the GPT-5 Turbo native-multimodal headline does not move the agent path: the agent path was never blocked on vision.

What about Llama 4 Scout's 10M context, GLM-5.1's MIT license, or Mistral Medium 3's open weights? Are those useful?

Not to the product today. Fazm's two pipelines are pinned to Anthropic (conversational) and Google (observer), and both providers are hosted, not local. A 10M-context open-weight model is relevant only if the product grew a third pipeline that ran locally, or if a hosted provider integrated it. The April 2026 open-weight launches are real news, but at the runtime layer of a shipping macOS agent they are strictly potential, not current.

What is the smallest thing a launch has to do to actually change behavior in Fazm?

For the agent path: the Anthropic ACP SDK has to return the new model ID in the availableModels array of a session/new response (acp-bridge/src/index.ts:1345 and 1488). Everything downstream is automatic, including the picker row, family label (Scary/Fast/Smart), and per-window persistence. For the observer path: Google has to make it the default model behind gemini-pro-latest, or Fazm bumps the constant at GeminiAnalysisService.swift:67 and ships a new build. Anything less than those two changes is a press release.

How can I verify all of this on a checkout of the Fazm desktop repo?

Run `rg -n 'private let model' Desktop/Sources/GeminiAnalysisService.swift` and you will hit line 67 with gemini-pro-latest. Run `rg -n 'targetDurationSeconds' Desktop/Sources/GeminiAnalysisService.swift` and you will hit line 69 with the 3600-second value. Run `rg -n 'observer_gemini' Desktop/Sources/GeminiAnalysisService.swift` and you will hit line 261. Run `rg -n 'models_available' acp-bridge/src/index.ts` and you will hit line 1279 where emitModelsIfChanged writes the JSON line. Run `rg -n 'rate_limit' acp-bridge/src/index.ts` and you will hit the rate_limit case around line 2443. Those five greps prove the dual-pipeline shape.