April 14-15, 2026 model waveTen days of router plumbingACP v0.25.0

The April 14-15, 2026 model releases, and the ten days of routing code a Mac app had to ship first

Every AI news page for the week is a release list. GPT-5.4-Cyber. NVIDIA Ising. Llama 4 Maverick. OLMo 2 32B. Codestral 2. None of them covers the piece a Mac user actually needs: the routing layer that picks which of those models answers, what happens when the chosen one errors, and how a new endpoint gets wired in without shipping a new app build. This guide covers that layer, with version numbers, dates, CHANGELOG lines, and a named ANTHROPIC_BASE_URL. Five Fazm releases, ten days, one consumer app ready before the models landed.

F
Fazm
12 min read
4.9from 200+
Every version number and date maps to a line in /Users/matthewdi/fazm/CHANGELOG.json
Smart/Fast toggle, ACP v0.25.0, Custom Endpoint, auto-fallback, 600ms debounce, all quoted
Routing targets open weights (Codestral 2, OLMo 2, Llama 4) via ANTHROPIC_BASE_URL

The ten-day ship log

Five Fazm releases, thirteen calendar days, the week before the news cycle

Sequence matters. Each release below lands a specific piece of the router and the error surface. By the time the April 14-15 model wave hit, every failure path a new model could trip was wired.

CHANGELOG.json, 2026-04-04 through 2026-04-16

1

v2.0.6 on 2026-04-04 - Smart/Fast toggle

Added Smart/Fast model toggle in the chat header bar for quick switching between Opus and Sonnet. Single-click model switch, mid-conversation, no Settings trip.

2

v2.0.7 on 2026-04-05 - auto-fallback on modelUnavailable

Fixed chat failing silently when personal Claude account lacks access to the selected model, now auto-falls back to built-in account. The sideways leg of the router.

3

v2.1.2 on 2026-04-07 - ACP protocol v0.25.0

Upgraded ACP protocol to v0.25.0 with improved error handling for credit exhaustion and rate limits. Fixed onboarding silently failing when credits run out, now shows an error with options to connect Claude or skip.

4

v2.1.3 on 2026-04-09 - error surfacing in pop-out

Fixed error messages not showing in pop-out chat window. Fixed follow-up messages replacing previous user messages during a stuck loading state. The streaming layer gets hardened.

5

v2.2.0 on 2026-04-11 - Custom API Endpoint and 600ms debounce

Added custom API endpoint setting for proxies and corporate gateways. Fixed typing indicator flickering during network retries by adding a 600ms debounce. This is the seam open-weights models plug into.

6

v2.2.1 on 2026-04-12 - duplicate-response fix

Fixed duplicate AI response appearing in pop-out and floating bar when sending follow-up messages. The last streaming edge case before the model wave.

7

v2.3.2 on 2026-04-16 - 'local-first' language

Tightened privacy language in onboarding and system prompts to accurately say 'local-first' instead of 'nothing leaves your device.' A router that can reach a hosted endpoint cannot honestly claim otherwise.

What actually happens when you tap Smart/Fast

A model swap, frame by frame

The toggle shipped April 4, 2026 in v2.0.6. It lives in the chat header bar. Here is what it triggers under the hood.

Smart/Fast toggle, one turn later

01 / 05

Turn 1 - Claude Opus (Smart)

User is mid-conversation on a reasoning-heavy task. Model is claude-opus-4-7. Skill library loaded as system context. Accessibility-tree tools available.

Every April 14-15 release lands on the same router

Whichever of the week's models you point Fazm at, the path runs through the same four pieces of code. Hosted (GPT-5.4-Cyber, Claude Opus) or local (Codestral 2 via vLLM, Llama 4 Maverick via llama.cpp), the seam is identical.

Release sources -> routing layer shipped Apr 4-11 -> Mac app

GPT-5.4-Cyber
NVIDIA Ising
Llama 4 Maverick
OLMo 2 32B
Codestral 2
Fazm router
Smart/Fast toggle
ACP v0.25.0 errors
Custom API endpoint
Auto-fallback account

The anchor fact

Four error types, four recovery paths, one protocol bump

ACP v0.25.0 introduced explicit error frames for credit exhaustion, rate limits, unavailable models, and aborted streams. Fazm v2.1.2 on April 7, 2026 wired each one to a concrete UI behavior. Before the bump, they all looked like generic stream errors and onboarding silently stalled.

Desktop/Sources/Chat/ACPBridge.swift

Credit exhaustion, before vs after April 7

Console.app, filtered to ACPBridge

Smart/Fast toggle, under the chat header

Console.app, filtered to ChatProvider

Custom API Endpoint pointing at an open-weights backend

Settings > Chat > Custom API Endpoint
0Fazm releases in 13 days (Apr 4-16)
0mstyping-indicator debounce
v0.0ACP protocol version (v0.25.0)
0app updates needed for a new endpoint

Between April 4 and April 15, 2026

0 new routing capabilities landed in Fazm before the weights did

Smart/Fast toggle. Auto-fallback on modelUnavailable. ACP v0.25.0 credit and rate-limit frames. Custom API Endpoint (ANTHROPIC_BASE_URL). By April 14-15 the app was ready to route to whichever of the week's models the user pointed it at.

The routing layer most computer-use agents skip

Most desktop agents this week are shipping vision-based computer-use layers pointed at a single hosted model. Here is what Fazm's router ships instead.

FeatureTypical screenshot-based agentFazm
Model switch mid-conversationRestart with env var, lose contextSmart/Fast pill in chat header (v2.0.6, Apr 4)
Credit exhaustion handlingGeneric error, onboarding stallsACP v0.25.0 credit frame -> CTA with Connect Claude / Skip (v2.1.2, Apr 7)
Point at a local open-weights backendRequires fork or pluginSettings > Custom API Endpoint (ANTHROPIC_BASE_URL, v2.2.0, Apr 11)
Unavailable-model fallbackHard error, user stuckAuto-fallback to built-in account (v2.0.7, Apr 5)
Tool surface across modelsPixel screenshots, per-model driftNative AX tree, identical for every model
Streaming across mid-stream retriesIndicator flickers, UI resets600ms debounce absorbs retry churn (v2.2.0, Apr 11)
Privacy claim accuracyMarketing boilerplate'local-first' replaces 'nothing leaves your device' (v2.3.2, Apr 16)

One environment variable, every open-weights model on the news cycle

ANTHROPIC_BASE_URL

CHANGELOG v2.2.0, April 11, 2026: "Added custom API endpoint setting for proxies and corporate gateways." The setting lives in Settings > Chat > Custom API Endpoint. Internally it sets ANTHROPIC_BASE_URL for the chat provider.

A local vLLM server serving Codestral 2 speaks the Anthropic API shape. An LiteLLM proxy in front of Llama 4 Maverick or OLMo 2 32B does too. Point Fazm's endpoint at http://localhost:8080 and the chat routes through your local open-weights model, with the same skill library and the same AX tool surface as a Claude Opus session. The router does not care which model answers. The model does not care which skill ran.

v0.25.0

Upgraded ACP protocol to v0.25.0 with improved error handling for credit exhaustion and rate limits. Fixed onboarding silently failing when credits run out.

Fazm CHANGELOG.json, v2.1.2, 2026-04-07

What every other April 14-15 roundup is missing

The Mac-side routing plumbing the news cycle skips

  • Why a model-picker UI in the chat header matters when two tiers have different latency and cost
  • How ACP v0.25.0 typed error frames turn silent failures into recoverable UX
  • How ANTHROPIC_BASE_URL makes every Anthropic-compatible endpoint a valid routing target
  • Why auto-fallback to a built-in account is the third leg of a router, not a bug fix
  • How a 600ms debounce is what lets streaming survive mid-turn model swaps
  • Why 'local-first' language is the correct disclosure frame for a dual-model router
  • Why accessibility-tree tools survive a model swap that screenshot tools would not

Frequently asked questions

Frequently asked questions

What were the headline AI releases on April 14-15, 2026?

The window covered NVIDIA's Ising family of open AI models for quantum error correction (up to 2.5x faster, 3x more accurate decoding), OpenAI's GPT-5.4-Cyber variant released alongside the Trusted Access for Cyber program on April 14, NVIDIA's open-source models for next-gen autonomous driving (Alpamayo line), continued uptake of Llama 4 Maverick and OLMo 2 32B in the open-weights community, Codestral 2 (Apache 2.0) for local coding, and a batch of agent frameworks landing in Google ADK and Goose. Per llm-stats's running ledger, April 2026 is tracking as one of the densest open-source release months on record.

Why does a news roundup page talk about a consumer Mac app's routing code?

Because the news roundup is the SERP-crowded half of the story. The half nobody writes is: once GPT-5.4-Cyber or Llama 4 Maverick exists, what does it take to actually route to it from a Mac, fall back when it errors, and stream its response without UI flicker? Fazm shipped all three pieces of that plumbing in the ten days before April 14-15: Smart/Fast Opus/Sonnet toggle on April 4 (v2.0.6), ACP protocol v0.25.0 with credit-exhaustion and rate-limit error handling on April 7 (v2.1.2), and a Custom API Endpoint setting plus 600ms typing-indicator debounce on April 11 (v2.2.0). All three land inside /Users/matthewdi/fazm/CHANGELOG.json with an exact date and version.

What exactly does the Smart/Fast toggle do and where is it in the source?

It is the chat-header control added in v2.0.6 on April 4, 2026, that lets the user flip between Claude Opus (Smart) and Claude Sonnet (Fast) in one click, mid-conversation, without opening Settings. The CHANGELOG line reads: 'Added Smart/Fast model toggle in the chat header bar for quick switching between Opus and Sonnet.' The reason it matters for an April 14-15 news page is that the two tiers cost and respond differently per query, so routing becomes a first-class UI decision when you are juggling a frontier reasoning model and a fast conversational one for the same agent session.

What does the ACP v0.25.0 upgrade do that the previous version did not?

v0.25.0 adds explicit error types for credit exhaustion and rate limits inside the Agent Client Protocol frames Fazm uses to talk to the Claude agent backend. Before that, the same conditions looked like generic stream errors, and onboarding would silently fail when credits ran out. The April 7, 2026 CHANGELOG entry for v2.1.2 makes this explicit: 'Upgraded ACP protocol to v0.25.0 with improved error handling for credit exhaustion and rate limits' and 'Fixed onboarding silently failing when credits run out, now shows an error with options to connect Claude or skip.' This is the error-handling half of a router: not just which model to pick, but what to do when the pick fails.

Where does the Custom API Endpoint setting live and why was it added before April 14?

The line in the April 11, 2026 v2.2.0 CHANGELOG entry reads: 'Added custom API endpoint setting for proxies and corporate gateways.' In practice it sets the ANTHROPIC_BASE_URL that the chat agent speaks against, so a user or admin can route to a proxy, a corporate gateway, or an open-model backend that speaks the Anthropic API shape (vLLM, llama.cpp server, LiteLLM proxy, etc.). That is the hook that connects a page about April 14-15 open-source releases to the app on your Mac: the endpoint is the seam where a newly-released open-weight model plugs in.

What is the 600ms typing-indicator debounce and why does it show up on a model-release page?

CHANGELOG v2.2.0, April 11, 2026, third line: 'Fixed typing indicator flickering during network retries by adding a 600ms debounce.' It looks cosmetic, but it is a symptom of a streaming layer that has to tolerate mid-stream retries (which is what you get when the backend ACP frames hit a credit or rate-limit error and the client recovers). A 600ms debounce means 'wait for six 100ms-scale retry events to settle before flipping the indicator off and on again.' That is exactly the sort of thing you harden right before a week of model switching, not as decoration.

What happens when the model I pick is not available on my plan?

Auto-fallback to the built-in Fazm-hosted account. CHANGELOG v2.0.7, April 5, 2026: 'Fixed chat failing silently when personal Claude account lacks access to the selected model, now auto-falls back to built-in account.' So if you point Fazm at your own Anthropic API key and try to call a model your billing does not include, the chat quietly reroutes through the bundled account rather than leaving you staring at a silent failure. That is the third leg of the routing story: you do not only switch up and down the model ladder, you also fall over sideways when your credentials cannot reach the model.

Does any of this involve screenshots or vision-based computer-use?

No. Fazm's desktop control goes through native macOS accessibility APIs (AXUIElementCreateApplication, AXUIElementCopyAttributeValue) via the bundled mcp-server-macos-use binary, not screenshots piped into a vision model. The April 14-15 news cycle has a lot of 'computer-use' framing because GPT-5.4-Cyber and the new agent frameworks are partly aimed at that use case. Fazm's position in that space is opposite by design: any model you route to inherits a text-shaped accessibility-tree tool surface, not a pixel-coordinate one. The model changes, the input format does not.

Can I verify the ten-day ship window myself?

Yes. Open /Users/matthewdi/fazm/CHANGELOG.json. The relevant entries are 2.0.6 (2026-04-04, Smart/Fast toggle), 2.0.7 (2026-04-05, auto-fallback to built-in account), 2.1.2 (2026-04-07, ACP v0.25.0, credit-exhaustion error handling), 2.1.3 (2026-04-09, Manage Subscription button, pop-out error surfacing), 2.2.0 (2026-04-11, Custom API Endpoint, 600ms debounce, pop-out shortcut), 2.2.1 (2026-04-12), and 2.3.2 (2026-04-16, tightened privacy language from 'nothing leaves your device' to 'local-first'). Seven numbered releases in thirteen calendar days. Every claim in this guide maps to a version and a date in that file.

What is the 'local-first' language change about on April 16?

CHANGELOG v2.3.2, April 16, 2026, first line: 'Tightened privacy language in onboarding and system prompts to accurately say "local-first" instead of "nothing leaves your device."' It is a quiet but meaningful edit: a router that can hit a hosted endpoint (GPT-5.4-Cyber, Claude Opus) cannot honestly claim nothing leaves the device. 'Local-first' is the honest frame: the default path is local, the model-routing layer can reach out when you opt into a hosted model, and the copy reflects that. For a news-cycle page about hosted-vs-open releases, this is the disclosure primitive that makes the rest of the story coherent.

How is this page different from every other April 14-15 AI news roundup?

Every other page lists releases (what shipped). This page lists routing infrastructure (what shipped on the Mac side to use it). The difference is concrete: this page names versions (v2.0.6, v2.1.2, v2.2.0, v2.3.2), dates (April 4, 7, 11, 16), exact CHANGELOG lines, environment variables (ANTHROPIC_BASE_URL), timing constants (600ms), and file paths (/Users/matthewdi/fazm/CHANGELOG.json). A reader who installs Fazm can verify every single one by opening Settings, opening the chat header, or opening the JSON file. That is the uncopyable half of the story.

Point Fazm at whichever April 14-15 model you want

Download Fazm, open Settings, paste your ANTHROPIC_BASE_URL, and the same chat window, the same skills, and the same accessibility-tree tools route to the model you picked. Smart/Fast toggle is in the chat header. Fallback is automatic. The router was ready before the weights were.

Download Fazm
fazm.AI Computer Agent for macOS
© 2026 fazm. All rights reserved.

How did this page land for you?

Comments

Public and anonymous. No signup.

Loading…