Notes from inside a shipping Mac AI agent
The version of Meta's Llama 2026 roadmap that nobody else writes: there is no Llama 5, and Behemoth is a teacher model.
The query "meta llama 4 llama 5 release date roadmap 2026" asks two questions stacked on top of each other. When does the next public Llama ship, and what slots in between Llama 4 and whatever comes after. As of May 11, 2026 the honest answers are: Llama 4 Scout and Maverick from April 2025 are still the latest open-weights frontier models in the Llama herd, Llama 4 Behemoth never shipped publicly and is used as an internal teacher model, and there is no Llama 5 on the public roadmap. On April 8, 2026 Meta Superintelligence Labs introduced Muse Spark instead, proprietary and closed-weights, framed as a ground-up overhaul of Meta's AI efforts. This page walks through the actual timeline and what it changes for anyone routing a desktop computer-use agent through a local Llama on a Mac.
Direct answer (verified 2026-05-11)
Llama 4 Scout and Maverick: released April 5, 2025. Open weights, still downloadable. Llama 4 Behemoth: still unreleased as of May 11, 2026. Used internally as a teacher model for codistillation into Scout and Maverick. Never formally cancelled, also never given a public release date. Llama 5: not on the public roadmap. Meta has not published a Llama 5 product page, paper, or release date. What replaced it: Muse Spark, released April 8, 2026 by Meta Superintelligence Labs. Proprietary, closed-weights, available at meta.ai and through a private API preview. Wikipedia's Llama article now describes Muse Spark, verbatim, as "a replacement for Llama." Authoritative source: ai.meta.com/blog/introducing-muse-spark-msl.
The 13 months between Llama 4 Maverick and Muse Spark, in order
Most of the pages that try to answer "when does Llama 5 come out" were written before April 8, 2026 and forecast forward from rumour. None of that matters now. The timeline below is the actual list of shipped events from the last frontier Llama release to today, drawn from Meta's own blog, Meta's About newsroom, Wikipedia's Llama and Muse Spark entries, and Wall Street Journal reporting on the Behemoth slip. If a date is missing from this list, no public announcement landed for it.
What actually shipped, April 2025 through May 2026
April 5, 2025: Llama 4 Scout + Maverick ship with open weights
17B active parameters, 16 and 128 experts respectively, 10M and 1M context windows. Base and instruct variants on Hugging Face and llama.com. Behemoth (~288B active, ~2T total) announced as still training the same day.
Source: ai.meta.com/blog/llama-4-multimodal-intelligence. This was the last day in 2025 where Meta shipped a frontier open-weights model in one calendar event.
May 15, 2025: Behemoth release slips from summer to fall
Wall Street Journal reports Meta is struggling to lift Behemoth's capability enough to justify the announced specs. Internal name floats as 4.X or 4.5. No public weights ship in 2025.
Source: SiliconANGLE summary of the WSJ report. The pattern from this point onward is that the herd's flagship becomes a teacher model for codistillation, not a public release target.
Summer 2025: Meta Superintelligence Labs spins up under Alexandr Wang
Mark Zuckerberg reorganises Meta's AI work, brings in Alexandr Wang (former Scale AI CEO) as Chief AI Officer. Reported capex guidance climbs into the ~$48B range for the 2025 to 2026 fiscal window.
August 28, 2025: 'Llama 4.X or Llama 4.5 by year end' report surfaces
Business Insider reports an internal target to ship a new Llama before end of 2025. The model never materialises in public under either name. By December 2025 the cadence of Llama frontier launches has effectively paused.
April 8, 2026: Muse Spark replaces the Llama frontier line
MSL's first public model. Proprietary weights. Natively multimodal. Tool use, visual chain of thought, multi-agent orchestration. Meta reports 'over an order of magnitude less compute than Llama 4 Maverick' for equivalent reasoning. The Llama 5 product page does not exist.
Sources: ai.meta.com/blog/introducing-muse-spark-msl, about.fb.com. Wikipedia's Llama article (English) now reads, verbatim, "In April 2026, Meta Superintelligence Labs released Muse Spark as a replacement for Llama."
May 11, 2026: still no Behemoth, still no Llama 5
Existing Llama 2, 3, and 4 weights remain downloadable. Frontier compute at Meta is on the Muse path. The 'when does Llama 5 ship' question resolves to 'it does not, the next frontier from Meta is not in the Llama line and is not open-weights.'
“In April 2026, Meta Superintelligence Labs released Muse Spark as a replacement for Llama.”
Wikipedia, 'Llama (language model)' summary, retrieved 2026-05-11
Two practical readings from the timeline. First, the question "when does Llama 5 ship" resolves negatively. As of the cutoff date on this page, there is no Llama 5. The internal codename that some August 2025 reporting referred to as Llama 4.X or Llama 4.5 also never appeared under those labels. The release-date table for 2026 has exactly one entry from Meta's frontier line, and it is Muse Spark. Second, the question "when does Behemoth ship" resolves as "in practice, it doesn't." Behemoth was announced as still in training on April 5, 2025. The WSJ slip pushed it from early summer 2025 to fall 2025. After fall, the public communications stop. The model's job in 2026 is to teach the smaller herd members via codistillation, not to be downloaded.
Why this question really gets asked: where does my local Llama plug into the agent
People who type this query into Google are rarely looking for a corporate strategy memo. They are usually trying to decide what model to download next, whether to wait for a bigger Llama, and how it slots into whatever they actually run. For a real worked example, here is how that decision lands inside Fazm, a shipping Mac computer-use agent. Fazm's default backend is Claude through the Anthropic Agent Protocol. The product also supports routing every chat turn through a custom Anthropic-compatible base URL. That single base URL is the seam where any local Llama lives, regardless of whether it is Llama 3 8B on Ollama, Llama 3 70B quantised on llama.cpp, or Llama 4 Scout via a transformer-serving stack.
The wiring is concentrated in two files. The settings field is in Desktop/Sources/MainWindow/Pages/SettingsPage.swift. Around line 962 the label reads "Custom API Endpoint." Line 984 is a TextField bound to an @AppStorage key named "customApiEndpoint." The agent subprocess reads that key back at startup, in Desktop/Sources/Chat/ACPBridge.swift. Lines 467 through 470 are the actual injection.
The agent process inherits ANTHROPIC_BASE_URL. The Anthropic SDK inside the subprocess uses whatever URL it sees in that variable. If the URL points at a local proxy that translates Anthropic's message shape to a Llama-serving backend, the rest of the agent loop is none the wiser. The tool-use frames go out, the tool results come back, the agent applies them to the user's screen via the macOS accessibility tree the way it always does. The Llama choice is downstream of one env var.
The Llama integration is not theoretical. Fazm has shipped explicit error-message handling for it. When a user's local server (LM Studio, Ollama, llama.cpp behind a proxy) reports an upstream error, the agent surfaces an actionable hint instead of a raw 500. Lines 2045 to 2058 of the same file detect the "No models loaded" case from LM Studio and the "connection refused" case from a stopped Ollama process.
Two lines of comment, one substring match, one actionable error. The Llama path is a first-class user surface, not a hidden setting. Which is also why the 2026 Meta pivot matters here: the people who installed this product specifically to keep their inference local are the people most exposed to a frontier-line freeze. The Llama 4 weights they already have are still good; the question of what to upgrade to next is suddenly less obvious.
The end-to-end path from a user typing a chat turn to a Llama running on the same MacBook
Spelled out as a sequence, the trip from a chat input to a local Llama 3 70B response takes five actors and roughly ten frames. The agent never knows the response came from Llama. The Llama proxy never knows it is running behind a desktop computer-use agent. The seam stays at one env var, and everything else is the standard Anthropic message shape.
Custom endpoint flow, one chat turn
The same diagram applies whether the "Local LLM" column is Llama 3 8B, Llama 3 70B, Llama 4 Scout, or some non-Llama open-weights model. Nothing about the upstream agent code changes when the underlying weights change. Which is also the answer to the underlying question this page's query implies, "what happens to my workflow if Llama 5 never ships." The workflow does not break. The integration was always at the protocol layer, not the model layer. What you lose is the upside that a bigger open-weights Llama would have brought, not the existing capability.
What is true, what is not true, and what is still pending
A clean status block for the four model labels people actually type into search bars, plus the rumour names that turn up in mid-2025 reporting and never resolve. Items with a check are confirmed against Meta's own posts. Items without are either unreleased or unconfirmed.
Llama herd and 2026 successors, status
- Llama 4 Scout (17B active, 109B total, 10M context) - released 2025-04-05
- Llama 4 Maverick (17B active, 400B total, 1M context) - released 2025-04-05
- Muse Spark (proprietary, MSL) - released 2026-04-08
- Llama 4 Behemoth (~288B active, ~2T total) - never publicly released as of 2026-05-11
- Llama 5 - not on the public roadmap as of 2026-05-11
- Llama 4.5 / Llama 4.X - announced internally Aug 2025, never shipped under those labels
- Mango / Avocado image and video models - reported in 2026, no Meta product page
The two negatives at the bottom of that list are the part you cannot get from a content-mill page. Llama 4.X / 4.5 / Mango / Avocado are names that show up in second-hand reporting and have no Meta product page behind them as of this date. That does not mean they are vapour; it means they are not the answer to the literal query. The literal query has exactly one positive 2026 answer, and that is Muse Spark.
The honest counterargument: why this might still be wrong by the end of 2026
A few caveats that should make a reader less confident in the framing above. Meta has not formally discontinued Llama. The community license is still active, the back-catalogue is still on Hugging Face, and existing fine-tunes and quantisations are not going anywhere. Nothing in the Muse Spark launch post says "no more Llama frontier ever." A face-saving 4.X or 5 release later in 2026, perhaps with a smaller compute envelope and a tighter scope, is not ruled out. Meta has been here before with Llama 3.x point releases that arrived without a fanfare.
What is ruled out, by the post itself, is the "open-weights frontier from Meta" expectation that defined 2023 to early 2025. Muse Spark is closed. If the open-weights story matters to your stack, Meta is not the place to plan around anymore; the relevant horizon is whatever Mistral, DeepSeek, Qwen, and the other open-weights labs ship next, plus whatever 4.X-style update Meta might or might not bring out of the back catalogue. The Llama 5 specific question is still "no" on the public record either way.
For a Mac computer-use agent, the planning move that survives both outcomes is the one already in Fazm's code: keep the integration seam at one env var, keep the agent loop model-agnostic, and treat the model picker as something that follows the SDK's available-models payload at runtime. If Llama 5 ships next week, the same custom-endpoint field absorbs it. If it never ships, nothing changes downstream. That is the engineering value of having no opinion on the roadmap question while still being correct about today's status.
Want to wire your local Llama into a shipping Mac agent?
Bring whatever you already run locally. We'll walk through the custom-endpoint seam in your real workflow and where the tool-use reliability cliff actually sits for your model.
Frequently asked questions
When was Llama 4 released?
Llama 4 Scout (17B active parameters, 109B total, 16 experts, 10M context window) and Llama 4 Maverick (17B active, 400B total, 128 experts, 1M context window) were released on April 5, 2025 as base and instruction-tuned variants. Both have open weights under Meta's community license. Llama 4 Behemoth, the third member of the herd, was announced as still in training the same day and has not shipped publicly since. See https://ai.meta.com/blog/llama-4-multimodal-intelligence/ for the launch post.
Has Llama 4 Behemoth been released as of May 2026?
No. Behemoth was announced on April 5, 2025 with roughly 288B active parameters across 16 experts and around 2T total parameters. Meta did not release public weights for it during 2025. Reporting from the Wall Street Journal in May 2025 said the launch had slipped from early summer to fall of that year, and as of May 11, 2026 there is still no public release. Meta has not formally cancelled it, but in practice Behemoth functions as an internal teacher model used to codistill into Scout and Maverick, not as a downloadable frontier model.
Is Llama 5 coming out in 2026?
Not under that name. As of May 11, 2026, Meta has not published a Llama 5 product page, release date, or paper. The thing some August 2025 reporting called Llama 4.X or Llama 4.5 also never shipped under those labels. On April 8, 2026, Meta Superintelligence Labs introduced Muse Spark instead, described in the official launch post as a 'ground-up overhaul' of the company's AI efforts. Muse Spark is proprietary, available at meta.ai and through a private API preview to select users. The Llama family has not been formally discontinued and existing Llama models remain downloadable, but the frontier investment has moved to the Muse line. See https://ai.meta.com/blog/introducing-muse-spark-msl/.
What did Meta Superintelligence Labs ship in April 2026?
Muse Spark, announced April 8, 2026, the first model release out of Meta Superintelligence Labs (MSL). MSL was formed in summer 2025 after a reorganisation that brought in Alexandr Wang, formerly Scale AI co-founder and CEO, as Chief AI Officer. The Muse Spark blog post describes it as 'natively multimodal' with support for tool use, visual chain of thought, and multi-agent orchestration, and calls it 'the first step on our scaling ladder' toward personal superintelligence. Meta reports that Muse Spark uses 'over an order of magnitude less compute than Llama 4 Maverick' for the equivalent reasoning load. Unlike every prior Llama herd model, the weights are not open.
Is the Llama family still open source?
Today's Llama models (Llama 2, Llama 3 family, Llama 4 Scout, Llama 4 Maverick) remain available under Meta's community license, which is open-weights but not OSI-approved open source. Those models are still on the Hugging Face and llama.com download paths. What changed in April 2026 is that the next frontier model out of Meta is not part of the Llama line at all, and is not open-weights. The practical reading from outside is that Meta is treating Muse Spark as the proprietary frontier and Llama as the back catalogue.
What does this mean for a Mac user who wants to run Llama locally with a computer-use agent?
Pragmatically, your local-LLM ceiling on a MacBook is whatever fits in unified memory at usable quantisation. Llama 3 8B and Llama 3 70B (with heavy quantisation) are the realistic floor and ceiling on most consumer Macs. Llama 4 Scout's 109B total parameters and 16-expert MoE layout are technically smaller than Maverick, but the routing makes inference latency unpredictable on Apple Silicon. Llama 4 Maverick at 400B and the unreleased Behemoth at ~2T are not consumer-Mac targets. Muse Spark closes the door on a near-term open-weights frontier from Meta, so the local-Llama story on Mac is now a back-catalogue story, not a roadmap one.
Where does Fazm wire in a local Llama, if you wanted to?
One env var, one settings field, three lines of Swift. The settings UI lives at Desktop/Sources/MainWindow/Pages/SettingsPage.swift around line 962 ('Custom API Endpoint') and line 984 (a TextField bound to the AppStorage key 'customApiEndpoint'). The agent process reads it back at Desktop/Sources/Chat/ACPBridge.swift lines 467 to 470, where the value is injected into the agent subprocess environment as ANTHROPIC_BASE_URL. Stand up any Anthropic-API-compatible proxy in front of an Ollama or llama.cpp instance serving a Llama 3 or Llama 4 quantised build, paste its URL into Settings -> Advanced -> AI Chat -> Custom API Endpoint, and Fazm routes the next chat turn through it. There is no separate Llama provider type. The agent loop is the same. The seam is one base URL.
Why route a computer-use agent through a local Llama at all, if Claude is the default?
Three honest reasons. Privacy, where screen-context payloads stay on the box. Cost, where heavy automation runs cease to be metered. Sovereignty, where a corporate proxy or air-gapped Ollama is the only allowed path. The honest tradeoff is tool-use reliability. Llama 3 and Llama 4 tool calls have improved across 2025 and 2026, but the failure modes (refusal to emit a tool call, malformed JSON, hallucinated arguments) still happen more often than with the frontier proprietary models. For a desktop agent that depends on tool calls being well-formed every turn, that is the real cost of moving off Claude, not the throughput number. Pick the route that matches your workload.
What about Llama 4.5, Llama 4.X, Mango, Avocado, and the other 2026 names floating around?
Reporting from Business Insider in August 2025 said Meta planned to ship an internal Llama 4.X or Llama 4.5 before the end of 2025. Nothing public shipped under either label. Separately, an mlq.ai item described two image and video models targeted for 2026 codenamed Mango and Avocado, sourced from internal reporting rather than from a Meta blog post. As of May 11, 2026, neither has a Meta product page. The credible release-date facts in 2026 are exactly two: Muse Spark on April 8, 2026 (Meta), and Llama 4 Scout plus Maverick still being the frontmost open-weights Llama from the previous year.
How likely is a Llama 5 announcement later in 2026?
On a literal reading of public Meta posts, low. Muse Spark is described as the start of a new scaling ladder, not as a side project alongside Llama. The Wikipedia summary of the Llama page now ends with 'In April 2026, Meta Superintelligence Labs released Muse Spark as a replacement for Llama.' If a frontier Llama-branded model does ship later in 2026, it would have to coexist with Muse Spark in Meta's product line, which would be unusual given how MSL is positioned. The honest answer from outside is to treat Llama as the long tail and Muse as the frontier path, and to plan local-LLM infrastructure around the models that already exist rather than the model the search query asks about.
More on what the open-weights frontier looks like in 2026
Adjacent reading
New LLM models released April 2026
The April 2026 frontier launches in one place, including the Muse Spark pivot from Meta. The other side of the same news cycle.
Best Hugging Face models for Mac AI agents, April 2026
Which downloadable models actually fit on Apple Silicon and tool-call reliably. The practical companion to this roadmap reality check.
Local AI hardware: bandwidth vs memory tradeoffs
Why MoE flagships like Llama 4 Maverick struggle on consumer Macs even when the weights technically fit. The hardware side of the local-Llama story.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.