Field notes from a macOS AI agent

Three labs shipped new models in 2026. Which ones can your Mac app actually run?

Gemini, Claude, and Qwen each had a busy 2026. Every guide on this is a release calendar or a benchmark board: a list of version numbers and scores. That tells you a model exists. It does not tell you whether the tool you run can use it. For a Mac app built on Claude Code, the answer splits three ways, and the split has nothing to do with which model is best. It has to do with one protocol.

ANTHROPIC_BASE_URLACP bridgemacOS 14+verified 2026-05-17

Matthew Diakonov, Written with AI

Published May 17, 202610 min read

Direct answer · verified May 2026

In 2026 Anthropic shipped Claude Opus 4.6 and Sonnet 4.6 in February and Claude Opus 4.7 on April 16. Google shipped Gemini 3.1 and the open-weight Gemma 4, with another Gemini model expected at I/O in May. Alibaba shipped the Qwen 3.5 and Qwen 3.6 families. If you run a Claude Code based Mac app, only Claude models drop in for free. GPT runs through a bundled Codex backend. Gemini and Qwen need an Anthropic-API-compatible gateway in front of them, because a raw Gemini or OpenAI key will not work directly.

Sources for the release facts: anthropic.com/news, the Gemini API changelog, and the Qwen 3.6 repository.

The roundup, and the column the calendars leave out

Here is the 2026 release picture for the three labs people search for together. The first two columns are what every release calendar gives you. The third is the one that decides whether you can use any of it from a desktop agent today.

Lab and family	What shipped in 2026	Reaches a Claude Code Mac app via
Claude Anthropic	Claude Opus 4.6 and Claude Sonnet 4.6 in February; Claude Opus 4.7 on April 16. A 1 million token context window is in beta on Opus and Sonnet.	Native It is the agent loop. New Claude tiers appear in the picker on the next session, no app update.
Gemini Google	Gemini 3.1 Pro in February; Gemini 3.1 Flash-Lite to developers in March; the open-weight Gemma 4 in April. A further Gemini model is expected at Google I/O in May 2026.	Gateway only Reachable, but only behind a proxy that translates the Anthropic API format. A raw Gemini key does not work.
Qwen Alibaba	The Qwen 3.5 family in February; the Qwen 3.6 family, including Qwen 3.6-Plus and Qwen 3.6-Max-Preview, in April. Parts of the lineup are Apache 2.0 open weights.	Gateway, or local Same gateway requirement, or serve the open weights on your own Mac behind an Anthropic-format shim.

The release facts move week to week, so a static page is not where you track them. Public registries stay current. The durable part of this page is the third column: why a model that exists is not automatically a model your tool can run.

Why the answer splits three ways

Fazm is a native macOS app, and its agent is Claude Code, wrapped through the agent client protocol. Claude Code talks to a model in exactly one wire format: the Anthropic API. That one fact is the whole story. A model is easy to reach if it already speaks that format, and it takes a translation layer if it does not. The four model families people care about in 2026 fall on different sides of that line.

Every model funnels through one protocol

Read the diagram left to right. Whatever model you put on the left, it has to arrive at the hub in the Anthropic API format. Claude is already there. The other three need help getting there. What sits on the right, the loop and the session behavior, never changes regardless of which model fed it.

Lane 1

Claude: a new release costs you nothing

Claude models are the native case. Claude Code is the agent loop, so a new Claude tier is not an integration, it is just the next value in a list. Fazm does not bake its model list into the app binary. The ACP bridge reports the models the agent currently exposes, the Swift side ingests that list, and the per-window model picker updates on the next session.

When Claude Opus 4.7 went generally available on April 16, 2026, a Fazm install pointed at the default Anthropic API picked it up with no App Store download. That is the entire upgrade path for a Claude release: open a window, open the model dropdown, the new tier is in it. The same is true for any future Claude model. There is nothing on the Fazm side to ship and nothing on your side to wait for.

Lane 2

GPT: a second backend, bundled and swappable

OpenAI's GPT family is the second special case. Fazm bundles a separate backend, codex-acp, an isolated wrapper around OpenAI's Codex CLI. It is not a gateway trick. It is a real second agent process that exposes the same agent client protocol surface as the Claude bridge, so it can run side by side with it. You toggle the Codex backend on, and a GPT-family model becomes selectable per chat.

This matters for 2026 specifically. OpenAI shipped its own run of releases this year, and because the Codex backend is a maintained part of the app rather than a bring-your-own proxy, GPT releases land in Fazm closer to the Claude experience than to the Gemini or Qwen one. Two of the four families, Claude and GPT, are first-class. The other two are not, and that is lane three.

Lane 3 · the part no release calendar tells you

Gemini and Qwen: reachable, but only through a gateway

Here is the anchor fact for this whole page. Fazm has a Custom API Endpoint setting under Settings, Advanced, AI Chat. It looks like the field where you would paste a Gemini or Qwen key. It is not. When you fill it in, Fazm reads that value in ACPBridge.swift and sets one environment variable before spawning the bridge:

// Desktop/Sources/Chat/ACPBridge.swift
// Custom API endpoint (allows proxying through
// Copilot, corporate gateways, etc.)
if let customEndpoint = defaults.string(forKey: "customApiEndpoint"),
   !customEndpoint.isEmpty {
  env["ANTHROPIC_BASE_URL"] = customEndpoint
}

ANTHROPIC_BASE_URL is the override Claude Code respects for routing through a proxy. The agent still makes Anthropic-shaped API calls. It just sends them somewhere else. That is why the setting's own in-app help text is blunt about what it is not:

Verbatim, from the Custom API Endpoint setting

"Route API calls through an Anthropic-API-compatible endpoint (e.g. local LLM bridge, corporate proxy, or GitHub Copilot bridge). The endpoint must speak the Anthropic API format; a raw Gemini or OpenAI key will not work here. Leave empty to use the default Anthropic API."

The team thought this was confusing enough to call out in the changelog too: a recent entry reads "Clarified Custom API Endpoint setting requires an Anthropic-API-compatible endpoint." The setting is honest about its own limits, which is rarer than it should be.

So a Gemini 3.1 or Qwen 3.6 release is not unreachable. It is one layer of indirection away. You run a gateway that accepts Anthropic-format requests and translates them to Gemini or Qwen calls. Open-source proxies like LiteLLM and claude-code-router do exactly this. You point the Custom API Endpoint at the gateway, not at Google or Alibaba directly, and from that moment Gemini or Qwen answers every turn while the Fazm window behaves exactly as it would on Claude.

What the field should and should not contain

Will not work

AIzaSyD...your-gemini-key

A raw key, or a bare Gemini or Qwen API base URL. The bridge sends Anthropic-shaped requests; the endpoint answers in a different shape; nothing connects.

Works

http://localhost:4000

A gateway that accepts the Anthropic API format and forwards to Gemini or Qwen. The gateway holds the real vendor key; Fazm only ever sees an Anthropic-shaped endpoint.

How a Qwen request actually travels

Put the gateway in place and the round trip for a non-Claude model is straightforward. Every turn follows the same path. Nothing in the Fazm window knows or cares that the model on the far end is not a Claude tier.

One turn, routed at a non-Claude model

For an open-weight Qwen model the gateway and the model can both live on your Mac, served by a local runtime like LM Studio or Ollama. Fazm has error handling tuned for that case: if the local server has no model loaded, the bridge surfaces a specific message telling you to load one rather than a generic failure. That is the whole appeal of running an open-weight release locally, and the loop never leaves your machine.

What stays constant when the model changes

The reason the protocol distinction matters less than it sounds is that the model is the cheap part to swap. The expensive part, the harness, does not move. When you point a Fazm window at Gemini through a gateway, or switch a forked window to Qwen, you keep everything that made the window useful. The session persists across a Mac restart. The one-click fork still branches a live conversation into a new window with the full prior context. The chat history stays live in context for the lifetime of the window, with no auto-compacting quietly dropping decisions you made an hour ago.

That is the practical takeaway for a year with this many releases. You do not need a new app every time a lab ships a model. You need a tool where the model is a per-window dropdown value and the protocol is handled once. If you want to compare two of these releases on a task you actually care about, the companion guide on testing a new model release on your own real work walks through the fork-and-compare flow step by step. For the open-weight side of 2026, the May 2026 open-source release notes cover what is worth serving locally.

Want to see a non-Claude model routed into Fazm live?

A 20-minute call: set up an Anthropic-format gateway, point the Custom API Endpoint at it, and run a turn on Gemini or Qwen inside a real Fazm window.

Questions people ask about the 2026 model releases

Frequently asked questions

What new models did Gemini, Claude, and Qwen release in 2026?

Through the first half of 2026, Anthropic shipped Claude Opus 4.6 and Claude Sonnet 4.6 in February and Claude Opus 4.7 on April 16, with a 1 million token context window in beta. Google shipped Gemini 3.1 Pro in February, Gemini 3.1 Flash-Lite to developers in March, and the open-weight Gemma 4 in April, with a further Gemini model expected at its I/O conference in May. Alibaba shipped the Qwen 3.5 family in February and the Qwen 3.6 family, including Qwen 3.6-Plus and Qwen 3.6-Max-Preview, in April, with parts of the lineup released under the Apache 2.0 license. The cadence is roughly one notable release a week across these three labs plus the rest of the field.

Can I run Gemini or Qwen inside a Claude Code based app like Fazm?

Yes, but not by pasting a Gemini or Qwen key into the app. Fazm's agent loop is Claude Code, which speaks the Anthropic API protocol. To route it at a non-Claude model you point Fazm's Custom API Endpoint (Settings, Advanced, AI Chat) at a gateway that translates the Anthropic API format into whatever the target model expects. LiteLLM, claude-code-router, and similar proxies do this. Once the gateway is in front, Gemini or Qwen answers every turn while the Fazm window, its persistent session, and the one-click fork all behave exactly as they do on Claude.

Why does a raw Gemini API key not work in Fazm's Custom API Endpoint?

The Custom API Endpoint field feeds the ANTHROPIC_BASE_URL environment variable into the ACP bridge that runs Claude Code. Claude Code only knows how to make Anthropic-shaped API calls. A raw Google endpoint expects Gemini-shaped requests, so the two never agree on the wire format. The setting's own in-app help text says this directly: the endpoint must speak the Anthropic API format, and a raw Gemini or OpenAI key will not work there. The fix is a translation gateway, not a different key.

Do new Claude models work in Fazm without an app update?

For Claude tiers, yes. Fazm does not compile its model list into the app binary. The ACP bridge reports the models the agent currently exposes, and the Swift side ingests that list, so a Claude tier that goes generally available shows up in the per-window model picker on the next session. When Claude Opus 4.7 shipped on April 16, a Fazm install pointed at the default Anthropic API picked it up with no download from the App Store.

How do I run a new Qwen model locally with Fazm?

Qwen ships open weights, so you can serve a Qwen 3.6 checkpoint on your own Mac with a local runtime such as LM Studio or Ollama, then put an Anthropic-API-compatible shim in front of it and point Fazm's Custom API Endpoint at that shim. The bridge already has error handling tuned for this path: if the local server has no model loaded, Fazm surfaces a specific message telling you to load a model rather than a generic failure. The whole loop stays on your machine, which is the reason to run an open-weight model in the first place.

Will the Gemini model expected at Google I/O 2026 be usable in Fazm?

It follows the same rule as every other Gemini release. The model itself does not change Fazm's protocol surface. Whatever Google announces at I/O in May 2026 becomes reachable from Fazm the moment a gateway exposes it behind an Anthropic-API-compatible endpoint. There is nothing to update in the app and nothing to wait for from Fazm. The work, if any, is on the gateway side.

What environment variable does Fazm's Custom API Endpoint actually set?

ANTHROPIC_BASE_URL. In ACPBridge.swift the app reads the customApiEndpoint value from user defaults and, when it is not empty, sets env["ANTHROPIC_BASE_URL"] before spawning the bridge. That is the standard override Claude Code respects for routing through a proxy. Changing the field stops and restarts the bridge so the new endpoint takes effect on the next query. Leaving it empty routes to the default Anthropic API.