Model wave, May 24-25 2026For Claude Code users

New AI model releases and announcements, May 24-25 2026

Every recap of this week stops at the same place: a leaderboard, a per-token price, a context-window number. This one keeps going. If you already run an AI coding agent, the question that actually matters is not which model is fastest, it is how you point your existing loop at a brand-new, non-Anthropic model without rebuilding your workflow or losing the thread you are in. There is a single setting for that, and one constraint that gates it.

Matthew Diakonov, Written with AI

Published May 27, 20268 min read

Try Fazm

Direct answer, verified May 27, 2026

No single brand-new frontier model launched on the exact 24th or 25th. The wave live that week was the adoption of three things that landed days earlier:

Google Gemini 3.5 Flash shipped at Google I/O on May 19-20, pitched as agent-first, roughly 4x faster on output tokens than other frontier models, and posting 76.2% on Terminal-Bench 2.1.
OpenAI GPT-5.5 Instant became ChatGPT's default model on May 5, with about 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts.
The OpenRouter crossover, reported May 25-26: Chinese open-weights families (DeepSeek, Qwen, Kimi, GLM, MiniMax) passed roughly 60% of token usage, with DeepSeek-V4-Flash topping the global weekly ranking.

Sources: Google (Gemini 3.5), TechCrunch (GPT-5.5 Instant), OpenRouter rankings.

The wave, in order

01 / 04

Gemini 3.5 Flash

Google I/O, May 19-20. Agent-first, ~4x faster output tokens, 76.2% on Terminal-Bench 2.1. Not an Anthropic model.

The gap every recap leaves open

Read the week's coverage and you will know that Gemini 3.5 Flash is fast and that Chinese open-weights models now move more tokens through OpenRouter than the US frontier labs. What none of it tells you is the thing you would actually do next: try one of these in the agent you already use, today, without throwing away your current session.

Here is the friction. The bare Claude Code CLI talks to Anthropic. Two of the three headline models this week (Gemini 3.5 Flash and the Chinese open-weights families) are not Anthropic models, so the CLI cannot reach them. The usual answer is to go install a second tool, wire up a second config, and lose whatever context you had built in the first one. That is the part the leaderboards never mention, because it is not a property of the model. It is a property of the harness you run the model in.

Fazm wraps the same Claude Code agent loop in a native macOS app, and it exposes one setting that closes this gap: a Custom API Endpoint. The mechanism is small enough to read in full, so the rest of this page is the actual code, the exact constraint, and the steps to use it. For the wider argument about why this layer matters more than which model wins a given month, see why agent tooling beats model upgrades.

The anchor fact: 12 lines that decide what your agent talks to

When you fill in the Custom API Endpoint, Fazm does not blindly trust it. Before the value is handed to the agent subprocess as ANTHROPIC_BASE_URL, it has to survive a guard. This is the whole of it, from Desktop/Sources/Chat/ACPBridge.swift lines 633 to 650.

ACPBridge.swift

// Custom API endpoint (allows proxying through Copilot, corporate gateways, etc.)
// Only forward it when it's an absolute http(s) URL with a host. A malformed value
// (missing scheme, "localhost:8766", stray text) otherwise lands in ANTHROPIC_BASE_URL
// and makes the Anthropic SDK throw "API Error: Invalid URL" on every query, silently
// bricking built-in chat. Falling back to the default keeps chat working.
if let customEndpoint = defaults.string(forKey: "customApiEndpoint")?
  .trimmingCharacters(in: .whitespacesAndNewlines), !customEndpoint.isEmpty {
  if let url = URL(string: customEndpoint),
    let scheme = url.scheme?.lowercased(), scheme == "http" || scheme == "https",
    let host = url.host, !host.isEmpty {
    env["ANTHROPIC_BASE_URL"] = customEndpoint
  } else {
    log("ACPBridge: ignoring malformed customApiEndpoint '\(customEndpoint)' " +
        "(expected an http(s) URL with a host); falling back to default endpoint")
  }
}

The guard is the interesting part. It requires the string to parse as a URL with an http or https scheme and a non-empty host. That is why "localhost:8766" is rejected (no scheme) while "http://localhost:8766" is accepted. The reason is spelled out in the comment: a malformed value would otherwise reach the Anthropic SDK, throw "API Error: Invalid URL" on every query, and silently brick chat. So the failure mode is a clean fallback to the default endpoint, not a dead window. The setting can only ever break loudly into the log, never quietly into your workflow.

Pointing your loop at one of this week's models

Five steps, none of which require a second tool or a fresh session. The model behind the endpoint can be Gemini 3.5 Flash, a Chinese open-weights model, or anything else, as long as the endpoint itself speaks the Anthropic Messages format.

Open Settings, AI Chat, Custom API Endpoint

It lives in the AI Chat section under an Advanced toggle. The field placeholder shows a sample proxy URL on port 8766, a hint that it wants a full http(s) URL, not a bare host:port pair.

Point it at an Anthropic-compatible URL

This is the bridge or proxy that translates the Anthropic Messages shape to whatever you are actually running this week: a local server in front of a Chinese open-weights model, an OpenRouter-to-Anthropic proxy for Gemini 3.5 Flash, or a corporate gateway. The endpoint speaks Anthropic; the model behind it can be anything.

Fazm validates the URL before trusting it

On save it checks the value parses as an http(s) URL with a host (ACPBridge.swift:639-649). A valid value is assigned to ANTHROPIC_BASE_URL in the subprocess environment. A malformed one is logged and ignored, and Fazm quietly falls back to the default Anthropic endpoint so you are never left with a dead chat.

The bridge restarts, your session does not

Saving fires restartBridgeForEndpointChange(), which relaunches the agent subprocess against the new endpoint. The window, its full uncompacted history, and its fork lineage carry over untouched. Only the thing answering your prompts changed.

Errors are translated, not dumped

If your endpoint returns a recognizable failure, like a local server with no model loaded, Fazm rewrites it into an actionable message rather than surfacing a raw 400 from the upstream. You get told to load a model, not blamed for the app misbehaving.

What the launch log prints

The guard is not theoretical. Here is what the bridge logs on two saves: one valid endpoint pointed at a local Anthropic-compatible bridge, and one malformed value that gets rejected and falls back.

fazm-dev.log

The second case is the one most tools get wrong. A missing scheme is an easy paste mistake, and the lazy implementation lets it through and dies on every query. Here it is caught, logged, and survived.

Which of this week's models you can actually run, and how

Honest accounting. "Native" means the bare Claude Code CLI reaches it without help. The last column is what Fazm adds.

Release	Native in Claude Code?	Reachable in Fazm
Claude 4.x (Opus, Sonnet, Haiku)	Yes	Default backend, no setup
Gemini 3.5 Flash	No	Via Custom API Endpoint pointed at an Anthropic-shape bridge or proxy
GPT-5.5 Instant / GPT-5 family	No	Codex backend (ChatGPT account, text-first) or an Anthropic-shape shim
DeepSeek, Qwen, Kimi, GLM (open weights)	No	Local server plus an Anthropic-compatible bridge, then the Custom API Endpoint

The pattern is the same for the bottom three rows: put something that speaks Anthropic in front of the model, point the endpoint at it. The Settings help text says it plainly, a raw Gemini or OpenAI key will not work in that field. For the deeper story on how Fazm absorbs an entire month of releases without an app update, see the April 2026 LLM release breakdown.

Why the week's real story is the harness, not the model

Look at what the labs themselves emphasized. Gemini 3.5 Flash was sold as an agent model, benchmarked on Terminal-Bench and MCP Atlas rather than on chat quality. The OpenRouter data showed programming now drives more than half of all token usage on the platform. Both signals say the same thing: the value is shifting from the raw model to the loop you run it in, the tools it can call, and whether your context survives long enough to finish a real task.

That is exactly the part a Custom API Endpoint protects. Swap the model behind it as often as the news cycle changes, and the loop you built, the persistent window, the forks, the full uncompacted history, stays put. The model is the part that churns weekly. The harness is the part you keep.

Want to point your agent at this week's model?

Walk through the Custom API Endpoint setup live and we'll get your existing Claude Code loop talking to Gemini 3.5 Flash or a local open-weights model.

Frequently asked questions

What new AI models were announced around May 24-25, 2026?

Three things dominated the wave that week. Google's Gemini 3.5 Flash, shipped at Google I/O on May 19-20, pitched explicitly as an agent-first model (TechCrunch's framing: Google bets its next AI wave on agents, not chatbots), roughly 4x faster on output tokens than other frontier models and posting 76.2% on Terminal-Bench 2.1. OpenAI's GPT-5.5 Instant, which became ChatGPT's default model on May 5 with about 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts. And the milestone reported May 25-26 that Chinese open-weights families (DeepSeek, Qwen, Kimi, GLM, MiniMax) crossed roughly 60% of token usage on OpenRouter, with DeepSeek-V4-Flash topping the global weekly ranking. No single brand-new frontier model launched on the exact 24th or 25th; the story that week was adoption of what shipped days earlier.

Can I run Gemini 3.5 Flash inside Claude Code?

Not natively. Claude Code speaks to the Anthropic Messages API. Gemini 3.5 Flash is a Google model on a different API surface, so the bare Claude Code CLI will not call it. In Fazm, which wraps the same Claude Code agent loop in a native macOS app, you reach it by pointing the Custom API Endpoint at a bridge or proxy that translates the Anthropic Messages shape to Gemini. The constraint is real: the endpoint itself must speak the Anthropic format. A raw Gemini API key dropped into the field will not work.

What does 'the endpoint must speak the Anthropic API format' actually mean?

Fazm runs the Claude Agent SDK, which sends requests shaped like Anthropic's /v1/messages endpoint. When you set a Custom API Endpoint, Fazm injects it as ANTHROPIC_BASE_URL into the agent subprocess environment, so every request goes to your URL instead of api.anthropic.com. Your server has to accept Anthropic-shaped requests and return Anthropic-shaped responses. That is why a local LLM bridge, an OpenRouter-to-Anthropic proxy, a corporate gateway, or a GitHub Copilot bridge all work, while pasting a raw Gemini or OpenAI key does not. The Settings help text states this directly: the endpoint must speak the Anthropic API format.

What happens to my session when I switch the backend to a new model?

It survives. Changing the Custom API Endpoint calls restartBridgeForEndpointChange(), which relaunches the agent subprocess with the new ANTHROPIC_BASE_URL. The window itself, its full conversation history, and its fork lineage are persisted by Fazm independently of which backend is answering. So you can read the week's announcements, point the loop at a fresh model, and keep the exact thread you were working in. Nothing gets auto-compacted to make room.

Why does Fazm reject a value like 'localhost:8766' as a custom endpoint?

Because a value with no scheme is not a usable base URL. ACPBridge.swift lines 639-649 require the string to parse as a URL with an http or https scheme and a non-empty host before it is assigned to ANTHROPIC_BASE_URL. A bare 'localhost:8766' fails that check. The reason for the guard is in the code comment: a malformed value would otherwise land in ANTHROPIC_BASE_URL and make the Anthropic SDK throw 'API Error: Invalid URL' on every query, silently bricking chat. When the value is malformed, Fazm logs it and falls back to the default Anthropic endpoint so chat keeps working. Use 'http://localhost:8766' instead.

Can I run the Chinese open-weights models that took over OpenRouter?

Yes, through the same path. DeepSeek, Qwen, Kimi, and GLM models are open weights, so you can serve them locally (for example via a local server) or reach them through a hosted router. Either way you place an Anthropic-compatible bridge in front, then point Fazm's Custom API Endpoint at it. Fazm even detects a common local-server failure: if the upstream returns an error like 'No models loaded ... use the lms load command', it surfaces an actionable message instead of a raw upstream error, so you know to load a model rather than blaming the app.

Does GPT-5.5 Instant work in Fazm?

Two ways, with caveats. Fazm ships a Codex backend you can switch to: it uses your ChatGPT subscription via OpenAI's Codex, covers the GPT-5 family, and is text-first today (tool and MCP support is described as coming). For the broader OpenAI catalog you can also stand up an Anthropic-shape shim in front of the OpenAI API and point the Custom API Endpoint at it. The honest tradeoff is that going through a shim means you depend on that shim's fidelity to the Anthropic Messages contract for tool calls.

Run any of this week's models, keep your session

Fazm wraps the Claude Code agent loop in a native Mac app with persistent sessions, one-click forking, no auto-compacting, and one setting that points the whole thing at any Anthropic-compatible endpoint.

Download Fazm