Large language model releases in May 2026

Here is the actual May 2026 calendar, proprietary and open weight in one list, followed by the question every other roundup skips: can you point your Mac at the model that just shipped, and which path actually gets you there. Verified May 29, 2026.

Matthew Diakonov, Written with AI

Published May 29, 20268 min read

Direct answer, verified May 29, 2026

The large language models released in May 2026

Four proprietary releases (OpenAI GPT-5.5 Instant on May 5, xAI Grok 4.3 in early May, Google Gemini 3.5 Flash around May 19, Anthropic Claude Opus 4.8 on May 28) and two open-weight releases on Hugging Face (OpenBMB MiniCPM-V 4.6 1.3B on May 11, Cohere Command A+ on May 20). The four late-April open-weight frontier drops are included below because, as of late May, they are still the models everything new is measured against.

Model	Date	Type	Context
GPT-5.5 Instant	May 5, 2026	Proprietary (OpenAI)	API
Grok 4.3	Early May 2026	Proprietary (xAI)	1M
Gemini 3.5 Flash	~May 19, 2026	Proprietary (Google)	1M
Claude Opus 4.8	May 28, 2026	Proprietary (Anthropic)	API
MiniCPM-V 4.6 1.3B	May 11, 2026	Apache 2.0	262K
Command A+	May 20, 2026	Apache 2.0	128K
MiMo-V2.5-Pro	Apr 28, 2026	MIT	256K
Nemotron 3 Nano Omni	Apr 28, 2026	NVIDIA Open Model	128K
IBM Granite 4.1	Apr 29, 2026	Apache 2.0	up to 512K
Mistral Medium 3.5	Apr 29, 2026	Modified MIT	256K

Open-weight sources, verified May 29, 2026: Command A+, MiniCPM-V 4.6, MiMo-V2.5-Pro, Granite 4.1. For the open-weight-only deep dive, see the open source LLM releases May 2026 guide.

The thing every release has in common, and the catch

Read the rows above and one feature repeats in nearly every launch note: a large context window. Grok 4.3 and Gemini 3.5 Flash lead with 1M tokens, Command A+ with 128K input, MiniCPM-V 4.6 with 262K, Granite 4.1 with up to 512K, Mistral Medium 3.5 and MiMo-V2.5-Pro with 256K. The 2026 release race is, to a first approximation, a context-window race.

Here is the catch nobody puts in the launch note. A context window is a ceiling, not a guarantee. Whether you ever fill it depends on the tool you run the model through. The raw Claude Code CLI auto-compacts: as the live context approaches its limit, it summarizes the earlier turns and discards the originals. That happens precisely when a long agent run is most valuable, an hour in, with a stack of decisions behind it. A 512K window does you no good if your agent quietly rewrote the first 200K into a paragraph.

So the useful way to read a May 2026 release is not the benchmark in isolation. It is two questions a desktop user can act on: can I point my agent at this model, and will the agent keep the full conversation that the model's context window is supposed to hold. The rest of this page answers both from a shipping, open-source Mac agent rather than in the abstract.

The part the other roundups never show: the three ways a Mac agent reaches one of these

Fazm is open source at github.com/m13v/fazm, so the mechanism is readable rather than asserted. There is no single "use any model" magic switch; there are three concrete paths, and which one you take depends on who made the model.

1. Claude, including Opus 4.8: bring your own account

The bridge runs in BridgeMode.personalOAuth (defined in Desktop/Sources/Chat/ACPBridge.swift), which strips the bundled API key and authenticates with your own Claude account. Usage hits your existing Claude Pro or Max plan, so the May 28 Opus 4.8 release is reachable the moment your plan can see it.

2. GPT-5.5: the bundled Codex backend on your ChatGPT plan

Fazm ships a Codex backend, managed by CodexBackendManager in Desktop/Sources/CodexBackendManager.swift behind an enableCodexBackend toggle. It runs GPT models through your existing ChatGPT subscription via OpenAI's Codex backend, and its model picker lists the gpt-5.5 variants directly. So GPT-5.5 Instant, the May 5 default-model release, runs without a weights file and without a second bill.

3. Everything else: one custom-endpoint string

For Grok 4.3, Gemini 3.5 Flash, or any open weight, the bridge reads a customApiEndpoint string and assigns it to ANTHROPIC_BASE_URL on the spawned agent subprocess. The in-code comment says it plainly: it allows proxying through Copilot, corporate gateways, and the like. Point that one field at an Anthropic-compatible gateway (a LiteLLM or vLLM proxy in front of the model) and the same agent loop runs against it with no per-vendor adapter to maintain.

The table below maps each kind of May 2026 release to the path you take. The right column is what tends to be true of an agent hardcoded to one vendor.

Feature	Vendor-locked agent	Fazm
Run GPT-5.5 Instant (OpenAI, May 5)	Only if the agent ships an OpenAI adapter	Codex backend on your ChatGPT subscription, or a custom endpoint
Run Claude Opus 4.8 (Anthropic, May 28)	Often metered through the vendor's key	Bring your own Claude Pro or Max account (personalOAuth)
Run Grok 4.3 or Gemini 3.5 Flash	Usually not reachable	Custom endpoint to an Anthropic-compatible gateway
Run an open weight (Command A+, Granite 4.1)	Rarely supported	Custom endpoint to a LiteLLM or vLLM proxy

Keeping the conversation the new context window is supposed to hold

Reaching the model is half the answer. The other half is whether the agent throws your history away. Fazm does not auto-compact: the full chat history stays live in context for the lifetime of the window. So when you point it at Command A+ at 128K input or Granite 4.1 at up to 512K, the window can actually use that headroom instead of summarizing the early turns out from under you.

Two source-level behaviors make a release-comparison week survivable. Forking a conversation is one call: forkSession(fromKey:toKey:cwd:model:) in ACPBridge.swift branches the active session into a new window, leaves the source session intact and resumable, and lets the branch carry a different model than the source. That is the natural way to A/B a new release: branch a real working thread, switch the branch to the new model, run the same prompt.

And the chats persist across a restart. Fazm stores chat history in a local SQLite database (the ChatMessageStore layer in Desktop/Sources/ChatMessageStore.swift) and reloads the session on launch. Comparing models is a multi-day activity; a reboot should not wipe the threads you built to evaluate them.

The late-April cluster still setting the bar

A May roundup that omits the April 28 to 29 cluster is incomplete, because those four open-weight releases are what every May model is being benchmarked against. The shortest honest read: Xiaomi MiMo-V2.5-Pro is the largest by total parameters (1.02T MoE, 42B active) and the most permissive frontier-class license (MIT). NVIDIA Nemotron 3 Nano Omni unifies vision, speech, and language, activates only 3B parameters per token, and runs in 25GB of RAM. IBM Granite 4.1 is the clean Apache 2.0 dense pick (3B / 8B / 30B, up to 512K context, twelve languages). Mistral Medium 3.5 is the 128B dense coder (256K context, 77.6% on SWE-Bench Verified) under a Modified MIT license you must read before deploying inside a large company.

For a Mac agent the routing question is identical for all of them: stand up an Anthropic-compatible proxy in front of the inference server, then put that URL in the custom endpoint field. The open-weight-only version of this calendar, with the proxy targets spelled out, is in the open source LLM releases May 2026 guide.

Want to run the model that just shipped against your own Mac

Book 15 minutes. I will walk through pointing the agent loop at a new proprietary or open-weight model without losing your conversation.

Frequently asked questions

Which large language models were released in May 2026?

Verified as of May 29, 2026, the month's notable releases split into proprietary and open weight. Proprietary: OpenAI made GPT-5.5 Instant the default ChatGPT model on May 5; xAI's Grok 4.3 reached general availability in early May (its API model-retirement notice went out May 6); Google made Gemini 3.5 Flash generally available around May 19 at Google I/O; and Anthropic shipped Claude Opus 4.8 on May 28. Open weight, both on Hugging Face: OpenBMB MiniCPM-V 4.6 1.3B on May 11 (Apache 2.0, 262K context), and Cohere Command A+ on May 20 (Apache 2.0, 218B total / 25B active MoE, 128K input). The wider conversation is still carried by four frontier-class open-weight models that landed within 48 hours of each other on April 28 and 29 (Xiaomi MiMo-V2.5-Pro, NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1, Mistral Medium 3.5), which is why an honest May roundup includes them.

Is GPT-5.5 Instant open weight?

No. GPT-5.5 Instant, which OpenAI made the ChatGPT default on May 5, 2026, is a proprietary model with no weights file. For a Mac agent there are two practical paths. Fazm bundles a Codex backend (the CodexBackendManager in Desktop/Sources/CodexBackendManager.swift, behind an enableCodexBackend toggle) that runs GPT models through your existing ChatGPT subscription via OpenAI's Codex backend; its model picker lists gpt-5.5 variants directly. The other path, for any proprietary model, is an Anthropic-compatible gateway plugged into the custom endpoint field. Either way you reach the model without a weights file.

Can I run the model that just shipped against my real Mac apps?

Yes, through one of three paths, all visible in the open repo at github.com/m13v/fazm. For Claude (including Opus 4.8 from May 28) you bring your own Claude Pro or Max account: the bridge runs in personalOAuth mode (BridgeMode.personalOAuth in Desktop/Sources/Chat/ACPBridge.swift) and usage hits your existing plan. For GPT-5.5 you use the bundled Codex backend and your ChatGPT subscription. For anything else (Grok 4.3, Gemini 3.5 Flash, or an open weight) you point the customApiEndpoint setting at an Anthropic-compatible gateway, which the bridge assigns to ANTHROPIC_BASE_URL on the spawned agent subprocess. The same agent loop then drives your real macOS accessibility tree.

What is the single most important thing about these releases if I work on a Mac?

Almost every May 2026 release leads with a bigger context window. That number only helps if the tool you run the model through does not silently discard your conversation when it fills up. The raw Claude Code CLI auto-compacts: as the live context nears its limit it summarizes earlier turns and drops the originals, which is exactly when a long agent run starts losing the decisions you made an hour ago. Fazm does not auto-compact; the full chat history stays live in context for the lifetime of the window. So when you point it at Command A+ at 128K input or Granite 4.1 at up to 512K, the window can actually use that headroom.

Why does a roundup of May releases keep talking about late April?

Because the four-release cluster on April 28 and 29, 2026 is genuinely load-bearing for the May conversation. Xiaomi MiMo-V2.5-Pro (MIT, 1.02T total / 42B active MoE) is the largest by raw parameter count and the most permissive frontier-class license. NVIDIA Nemotron 3 Nano Omni (30B / 3B active multimodal, runs in 25GB of RAM) is the efficiency story. IBM Granite 4.1 (Apache 2.0, dense 3B / 8B / 30B, up to 512K context) is the practical enterprise pick. Mistral Medium 3.5 (Modified MIT, 128B dense, 256K context, 77.6% on SWE-Bench Verified) is the EU-friendly coder. As of late May these four are still the bar every new release is measured against.

Which May 2026 open-weight model is the best fit to drive a desktop agent?

Of the two open-weight May releases, Cohere Command A+ is the credible agent-planning target and MiniCPM-V 4.6 1.3B is not. Command A+ is a 218B / 25B-active Sparse MoE with 128K input context under Apache 2.0, shipped on Hugging Face in BF16, FP8, and W4A4 quantizations on day one, so it can run on as few as two H100s. MiniCPM-V 4.6 1.3B is a 1.3B multimodal model optimized for on-device inference; on a Mac it is a fit for an always-on screen-understanding pass, not the multi-step planning loop. For planning, the late-April frontier drops (MiMo-V2.5-Pro, Granite 4.1, Mistral Medium 3.5) are the realistic open-weight targets.

How do I A/B a new release against an existing task without losing my work?

Fazm forks a conversation in one step. The forkSession(fromKey:toKey:cwd:model:) call in Desktop/Sources/Chat/ACPBridge.swift branches the active session into a new window, leaves the source session intact and resumable, and lets the branch carry a different model than the source. So when a new model ships you can branch a real working thread, switch the branch to the new model (or a different backend), and run the same prompt, without rebuilding context or touching the original.

Do my chats survive a Mac restart while I am testing new models?

Yes. Fazm persists chat history to a local SQLite database (the ChatMessageStore layer in Desktop/Sources/ChatMessageStore.swift) and reloads the session on launch, so windows come back with their history intact. This matters during a release cycle specifically: comparing models is a multi-day activity, and a restart should not wipe the threads you set up to evaluate GPT-5.5, Grok 4.3, Command A+, or Opus 4.8.

Is the May 2026 list final?

No. The pattern across 2026 has been one to three open-weight releases of any size per week, plus proprietary frontier drops on their own cadence (Claude Opus 4.8 landed as late as May 28). The list on this page is a snapshot verified as of May 29, 2026. Treat it as a moving picture, not a final accounting of the month.

Where can I verify these dates and specs myself?

The open-weight models have public pages: huggingface.co/CohereLabs/command-a-plus-05-2026-bf16 for Command A+, huggingface.co/openbmb/MiniCPM-V-4.6 for MiniCPM-V 4.6, huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro for MiMo-V2.5-Pro, and research.ibm.com/blog/granite-4-1-ai-foundation-models for Granite 4.1. The proprietary releases are documented on the OpenAI, xAI, Google, and Anthropic announcement channels (for example anthropic.com/news for Claude Opus 4.8 and openai.com for GPT-5.5 Instant). The Fazm behaviors described here are readable in the open repo at github.com/m13v/fazm.

The open-weight cut of this calendar, and the context behaviors behind it.

Related guides

Open weights

Open source LLM releases in May 2026: the Hugging Face calendar and the Swift block that points a Mac agent at any of them

The open-weight-only cut of the same calendar, with the endpoint switch in Fazm's ACPBridge.swift spelled out.

Read

Sessions

Claude Code persistent sessions on macOS: how Fazm keeps a conversation alive across a restart

The local persistence path that survives a reboot, and why it matters during a model-comparison week.

Read

Context

Claude Code auto-compacting and token waste: what it drops and how to keep full context

Why a bigger advertised context window is wasted if the agent summarizes your conversation away at 80% full.

Read

The large language models released in May 2026

The thing every release has in common, and the catch

The part the other roundups never show: the three ways a Mac agent reaches one of these

1. Claude, including Opus 4.8: bring your own account

2. GPT-5.5: the bundled Codex backend on your ChatGPT plan

3. Everything else: one custom-endpoint string

Keeping the conversation the new context window is supposed to hold

The late-April cluster still setting the bar

Want to run the model that just shipped against your own Mac

Frequently asked questions

Related guides

Open source LLM releases in May 2026: the Hugging Face calendar and the Swift block that points a Mac agent at any of them

Claude Code persistent sessions on macOS: how Fazm keeps a conversation alive across a restart

Claude Code auto-compacting and token waste: what it drops and how to keep full context

Comments (••)

Comments ()