Llama 4 release date in 2026: there isn't one

Matthew Diakonov, Written with AI

Published May 26, 20266 min read

Direct answer (verified May 26, 2026)

Llama 4 has no 2026 release. Meta shipped Llama 4 Scout and Llama 4 Maverick on April 5, 2025. The largest planned variant, Behemoth, was previewed as still in training, was delayed through 2025, and was never publicly released. On April 8, 2026 Meta changed course and launched a closed-weight model, Muse Spark, under Meta Superintelligence Labs. The 2025 Llama 4 weights are still downloadable; there is simply no new Llama 4 model carrying a 2026 date.

Sources: Meta's Llama 4 launch post, The New Stack on Muse Spark.

The Llama 4 timeline, at a glance

If you searched for a 2026 date, you were probably trying to reconcile two facts: Llama 4 was announced a while ago, yet Meta's frontier news in 2026 has not used the Llama name. Here is the full lineage, so the gap makes sense.

Model	Date	Status
Llama 4 Scout	April 5, 2025	Released, open weights
Llama 4 Maverick	April 5, 2025	Released, open weights
Llama 4 Behemoth	Previewed April 2025	Delayed, never publicly released
Muse Spark	April 8, 2026	Released, closed weights (not Llama)

Behemoth was framed as a ~2-trillion-parameter teacher model at the April 2025 announcement. Meta delayed it amid internal concerns and ultimately moved its frontier effort to the closed-weight Muse Spark. The Scout and Maverick weights from 2025 remain available for download.

What actually changed in 2026

The headline is not a new Llama. It is a reversal. For four generations Meta was the company that handed you the weights. In 2026 that stopped: Muse Spark is closed, it ships from a new division called Meta Superintelligence Labs, and the weights are not published. If your plan was “wait for the next open Llama and drop it into my stack,” that plan no longer has an obvious next step from Meta.

For most people reading a release-date question, the deeper concern is not trivia about Meta's org chart. It is the thing sitting one layer up: you built a workflow on an assumption about a vendor, and the vendor moved. That is the part worth planning around, whichever model you run.

“Meta abandoned its largest open-weight variant and shipped a closed model instead. The lesson for tooling: do not bet a workflow on a single vendor staying open.”

Verified against The New Stack and Meta's own posts, May 2026

The practical takeaway: keep the model swappable

I build Fazm, a native macOS app that wraps Claude Code and Codex via ACP. So I think about this question from the tooling side: when a model you like goes closed, or a model you ignored suddenly gets good, how much work is it to switch? The honest answer for most setups is “a lot,” because the agent is hard-wired to one provider.

Fazm deliberately separates the agent loop from the model behind it. There is a setting in the app, found at Settings → AI Chat → Custom API Endpoint, defined in SettingsPage.swift (the field is backed by an @AppStorage("customApiEndpoint") string). Toggle it on and you can route every call through any gateway that speaks the Anthropic API format.

Custom API Endpoint (verbatim from the app)

The constraint is the honest part: the help text says outright that the endpoint must speak Anthropic format, and that “a raw Gemini or OpenAI key will not work here.” Llama 4 does not speak that format either. So this is not a one-click “use Llama” button. What it is, is an escape hatch: front the model with a translating proxy that exposes an Anthropic-compatible endpoint, and the agent never knows the difference.

point-fazm-at-llama4.sh

The same separation is why Fazm bundles Codex (codex-acp) as a swappable backend per chat, and routes Gemini and GPT-class model IDs to their own backends. The point is not which model wins this month. The point is that a model release, or a model going closed, becomes a config change rather than a rebuild. Fazm is open source, so you can read exactly how the endpoint override and backend routing work before you trust them with your keys.

Want a model-agnostic agent on your Mac?

Tell me your stack and I'll show you how to keep your coding agent decoupled from any single model vendor.

Frequently asked questions

Is there a Llama 4 release in 2026?

No. Meta released Llama 4 Scout and Llama 4 Maverick on April 5, 2025. It has not shipped a new Llama 4 model in 2026. The largest planned variant, codenamed Behemoth, was previewed in April 2025 as still in training, was repeatedly delayed through 2025, and was never publicly released. On April 8, 2026 Meta changed direction entirely and launched Muse Spark, a closed-weight model from the newly formed Meta Superintelligence Labs. So the 2026 answer to 'when does Llama 4 release' is: it already released in 2025, and there is no 2026 follow-up under the Llama 4 name.

What happened to Llama 4 Behemoth?

Behemoth was introduced alongside Scout and Maverick in April 2025 as a roughly 2-trillion-parameter teacher model that was still in training. Meta delayed it amid internal concerns about its capabilities, and it never received a public release. By April 2026 Meta's frontier work had moved to the closed-weight Muse Spark instead, so Behemoth as a shipped open-weight model is effectively off the table.

Are the existing Llama 4 weights still available to download?

Yes. Meta has not pulled the Llama 4 Scout and Maverick weights. They remain downloadable under the Llama community license and continue to run in the usual local stacks (llama.cpp, vLLM, Ollama, and the like). What changed in 2026 is the direction of Meta's new frontier models, not the availability of the 2025 weights.

What is Muse Spark and how is it different from Llama 4?

Muse Spark is Meta's first proprietary, closed-weight model, launched April 8, 2026 as the debut release of Meta Superintelligence Labs. Unlike the Llama 4 line, the weights are not published for download. It marks a strategic shift away from the open-weight approach that defined Llama 1 through 4.

Can Fazm run a Llama 4 model behind Claude Code?

Indirectly, yes, through a translation layer. Fazm wraps Claude Code and Codex via ACP and talks the Anthropic API format. Its Settings have a Custom API Endpoint field that routes every call through any Anthropic-API-compatible gateway (the in-app help text names a local LLM bridge, a corporate proxy, or a GitHub Copilot bridge as examples). Llama 4 itself does not speak the Anthropic format, so you front it with a translating proxy such as LiteLLM that exposes an Anthropic-compatible endpoint, then point Fazm at that URL. The agent loop, session persistence, and forking stay the same; only the model behind the endpoint changes.

Why does a model going closed-source matter for my coding tool?

Because if your workflow is welded to one vendor's hosted model, that vendor's strategy becomes your strategy. Meta spent years as the open-weight standard-bearer, then pivoted to a closed model in 2026. If your agent only knows how to call one endpoint, a pivot like that strands you. A tool that lets you change the backend per chat, or repoint it at a compatible gateway, converts a vendor decision into a one-line config change instead of a migration.

Llama 4 release date in 2026: there isn't one

The Llama 4 timeline, at a glance

What actually changed in 2026

The practical takeaway: keep the model swappable

Want a model-agnostic agent on your Mac?

Frequently asked questions

Frequently asked questions

Related reading

Every major 2026 AI model release, year to date

Claude Code vs Cursor vs Copilot in 2026

An AI agent for macOS, beyond the terminal

Comments ()

The Llama 4 timeline, at a glance

What actually changed in 2026

The practical takeaway: keep the model swappable

Want a model-agnostic agent on your Mac?

Frequently asked questions

Frequently asked questions

Related reading

Every major 2026 AI model release, year to date

Claude Code vs Cursor vs Copilot in 2026

An AI agent for macOS, beyond the terminal

Comments (••)

Comments ()