LLM updates, June 2026: what shipped, and which ones your Mac agent can run

Four frontier-class models landed in a two-week window around the start of June 2026. The news pages all list them. None of them answers the question a developer actually has on a Mac: can I drive this one inside my agent, and what happens to my Anthropic key when I do? This page lists the drops with real ship dates, then walks the exact code path that decides the answer.

Matthew Diakonov, Written with AI

Published June 22, 20269 min read

June 2026, in order

The drops that mattered

May 19 Gemini 3.5 Flash GA

May 28 Claude Opus 4.8

Jun 1 MiniMax M3 (open weights)

Jun 4 NVIDIA Nemotron 3 Ultra

0:00 / 0:05

Direct answer, verified June 22, 2026

The headline LLM updates around June 2026 were Claude Opus 4.8 (Anthropic, May 28), Gemini 3.5 Flash going generally available at Google I/O (May 19), MiniMax M3 (open weights, June 1, on Hugging Face by June 7), and NVIDIA Nemotron 3 Ultra (open weights, June 4, unveiled at Computex on June 1). Opus 4.8 is first-party Anthropic; MiniMax M3 and Nemotron 3 Ultra ship with open weights you can self-host; Gemini 3.5 Flash is API-only. The dated table is below.

Claude Opus 4.8Gemini 3.5 FlashMiniMax M3Nemotron 3 UltraOpenMDW-1.11M contextOSWorld-VerifiedACP / Messages API

The June 2026 wave, with ship dates

Dates are the public ship dates I re-checked on June 22, 2026. The last column is the one the news pages leave out: how the model reaches a persistent agent loop on your Mac.

Model	Shipped	What landed	In your agent
Claude Opus 4.8 Anthropic	May 28, 2026	Dynamic workflows in Claude Code (research preview), effort controls, cheaper fast mode. $5 / $25 per million tokens.	Default in Fazm
Gemini 3.5 Flash Google	May 19, 2026	GA at Google I/O 2026. ~1M input window, $1.50 / $9.00 per million tokens, roughly 4x the speed of the prior Flash tier.	Proxy + custom endpoint
MiniMax M3 MiniMax	June 1, 2026	Open-weight, 1M-token context, MiniMax Sparse Attention, 70.06% on OSWorld-Verified for computer use. Weights on Hugging Face by June 7.	Proxy + custom endpoint
Nemotron 3 Ultra NVIDIA	June 4, 2026	550B Mixture-of-Experts (~55B active), 1M context, OpenMDW-1.1 license, 48 on the Artificial Analysis Intelligence Index.	Proxy + custom endpoint

Opus 4.8 pricing and the dynamic-workflow preview are from Anthropic's release post. Nemotron 3 Ultra parameter counts, license, and the index score are from NVIDIA's model card and Artificial Analysis. MiniMax M3 figures (1M context, OSWorld-Verified 70.06%) are from MiniMax's launch coverage. Gemini 3.5 Flash pricing is from Google's I/O 2026 announcement.

The whole "can I run it" question is one Settings field

Fazm does not reimplement an agent. It runs the real Claude Code loop over the Agent Client Protocol, and ACP talks the Anthropic Messages API. That single fact is what makes the June 2026 wave reachable: anything you can put behind an Anthropic-compatible gateway becomes a backend the loop can address, with the persistence, forking, and no-compact behavior unchanged.

The control surface for that is a single text field in Settings, with the placeholder https://{your-proxy}:8766. What it does at launch time is narrower, and more careful, than "swap the model." It changes exactly one environment variable, and it deliberately keeps your real key off the wire.

What the custom endpoint changes in the agent's environment

# Raw Claude Code, talking to Anthropic
ANTHROPIC_BASE_URL=(unset -> Anthropic default)
ANTHROPIC_API_KEY=sk-ant-...your real key...
# Point this at a third-party proxy and your
# real key rides along on every request.

0% fewer lines

The right-hand column is not a marketing simplification. In ACPBridge.swift the custom-endpoint branch (around line 2417) validates the URL, then assigns it to env["ANTHROPIC_BASE_URL"], sets a FAZM_CUSTOM_API_ENDPOINT flag, and then does the part that matters for your account safety: it overwrites the key with a placeholder.

env["ANTHROPIC_BASE_URL"] = customEndpoint
env["FAZM_CUSTOM_API_ENDPOINT"] = "true"
// Never send Fazm's bundled Anthropic key to that proxy.
env["ANTHROPIC_API_KEY"] = "sk-fazm-custom-endpoint"

That placeholder, sk-fazm-custom-endpoint, is the anchor of this whole approach. The moment you route the agent through your own gateway, your real Claude Pro or Max credential stops being forwarded. Local Anthropic-compatible gateways accept any non-empty key, so the placeholder keeps the agent on the API-key path instead of accidentally kicking off a Claude OAuth flow against a server that is not Anthropic. You get the new model, your key stays home.

Why a typo here does not brick your chat

There is a sharp edge hidden in this design, and the code guards it explicitly. If a malformed value (a bare localhost:8766, a missing scheme, stray pasted text) reached ANTHROPIC_BASE_URL, the SDK would throw "Invalid URL" on every single query and the retry-with-resume path could swallow it into an empty turn, so you would see no error at all. The validator exists to make that impossible: it requires an http or https scheme and a non-empty host, and falls back to the default Anthropic endpoint otherwise. Here are the two log lines you actually get.

ACPBridge launch log

Wiring a June 2026 model into a persistent session

Concretely, here is the path for an open-weight drop like Nemotron 3 Ultra or MiniMax M3. The same four steps work for Gemini 3.5 Flash behind a shim, or for a corporate gateway.

Serve the weights, or the API

Run the open-weight model (vLLM, or the vendor's reference server), or get an API key for the hosted model. Note the host and port it answers on.

Put an Anthropic-compatible gateway in front

Stand up LiteLLM (or any proxy that re-emits requests as the Anthropic Messages API) so the agent's ACP traffic is translated into calls your backend understands. This is the layer that makes a non-Anthropic model look like Claude to the loop.

Paste the gateway URL into Settings

In Fazm, open Settings, enable the custom endpoint, and paste the gateway's http(s) URL. Fazm validates it, assigns it to ANTHROPIC_BASE_URL, and disables the bundled key for that session.

Open a chat and fork it freely

The new model now answers inside a normal Fazm window. It survives a restart, forks in one click with full prior context, and never auto-compacts. Swap the endpoint next month for the next drop without losing your workflow.

What this does not do

A custom endpoint does not magically give a weaker open-weight model Claude Code's tool fluency. The agent loop still issues tool calls in the Anthropic format, and a model that is poor at structured tool use will fumble them no matter how clean your proxy is. The honest read of the June 2026 wave is that Opus 4.8 remains the strongest fit for the loop because the loop was built around it; the open-weight models are most interesting for local-first, privacy, or cost reasons, and for tasks where a 1M-token window or self-hosting matters more than the last few points of tool-following accuracy. The custom endpoint is the mechanism that lets you make that tradeoff yourself, per session, instead of being locked to one backend.

Want to point your agent at a specific June 2026 model?

Bring your gateway setup and we'll walk through wiring it into a persistent Fazm session live.

Common questions

Frequently asked questions

What LLM models updated around June 2026?

The headline drops were Claude Opus 4.8 (Anthropic, May 28), Gemini 3.5 Flash going GA at Google I/O (May 19), MiniMax M3 (open weights, June 1, with weights on Hugging Face by June 7), and NVIDIA Nemotron 3 Ultra (open weights, June 4, unveiled at Computex on June 1). Opus 4.8 is a first-party Anthropic model; MiniMax M3 and Nemotron 3 Ultra are open-weight and self-hostable; Gemini 3.5 Flash is API-only.

Can I run an open-weight model like Nemotron 3 Ultra or MiniMax M3 inside Fazm?

Yes, indirectly. Fazm runs the real Claude Code agent loop over the Agent Client Protocol, which speaks the Anthropic Messages API. To drive a different model you stand up an Anthropic-compatible gateway in front of it (LiteLLM, vLLM with an Anthropic shim, or any proxy that re-emits Messages-API requests) and paste that gateway's URL into Settings. Fazm then routes the agent's traffic there instead of to Anthropic.

Does Fazm send my Claude Pro or Max key to my proxy when I use a custom endpoint?

No. In ACPBridge.swift the custom-endpoint branch sets ANTHROPIC_API_KEY to the literal placeholder string sk-fazm-custom-endpoint before launching the agent, and logs 'using custom API endpoint ... with bundled API key disabled'. Your real Anthropic credential is never forwarded to a third-party gateway. Most local Anthropic-compatible gateways accept any non-empty key, so the placeholder keeps the agent on the API-key path instead of triggering a Claude OAuth flow.

What does the custom endpoint field actually change?

Exactly one environment variable: ANTHROPIC_BASE_URL. The validated URL is assigned to env['ANTHROPIC_BASE_URL'] (ACPBridge.swift, around line 2426). It does not change the protocol, the tools, the persistence layer, or the UI. The same persistent session, one-click forking, and no-auto-compact behavior apply regardless of which backend answers.

What happens if I paste a malformed URL into the custom endpoint field?

Nothing breaks. The validator requires an http or https scheme and a non-empty host. A value like 'localhost:8766' or stray text fails validation, Fazm logs 'ignoring malformed customApiEndpoint ... falling back to default Anthropic endpoint', and the agent keeps talking to Anthropic. The code comment explains why: a bad value landing in ANTHROPIC_BASE_URL would make the SDK throw 'Invalid URL' on every query and silently brick chat, so the validation gate exists specifically to avoid that.

Is Opus 4.8 available in Fazm without a proxy?

Yes. Opus 4.8 is a first-party Anthropic model, so it arrives through the normal Claude Code path with your own Claude Pro or Max account. You only need a custom endpoint when you want a non-Anthropic backend (an open-weight model you host, Gemini behind a shim, a corporate gateway, or GitHub Copilot routing).

Why route a June 2026 model through Fazm instead of a terminal or a hosted chat?

Because the wrapper fixes the three things raw Claude Code does not: sessions survive a Mac restart and auto-restore, every chat forks in one click into a new window with full prior context, and nothing auto-compacts the history. Those properties hold no matter which backend you point ANTHROPIC_BASE_URL at, so you can swap the June 2026 model of the week under a stable workflow.

Download Fazm for macOSOpen source, runs locally, bring your own Claude account.