Llama release tracker
Llama 3 in 2026: what actually shipped, and what did not
Short version: there is no Llama 3 release in 2026. The whole Llama 3 line shipped in 2024, and the newest open-weights Llama you can run is Llama 4 from April 2025. Below is the exact timeline, why this question keeps coming up, and the part nobody else covers, which is how to run any Llama model as the brain of a native Mac agent.
Direct answer - verified May 31, 2026
No. There was no Llama 3 release in 2026.
Every Llama 3 version shipped in 2024: Llama 3 in April, Llama 3.1 (with the 405B) in July, Llama 3.2 (small text plus vision) in September, and Llama 3.3 70B in December. The current open-weights generation is Llama 4 (Scout and Maverick), released April 2025. As of this writing, no new Llama model has shipped in 2026. You can confirm the live model list on Meta's official site at llama.com and the model cards at github.com/meta-llama/llama-models.
The actual Llama release timeline
Here is every Llama release that bears on this question, in order. Note that the four releases people might call "Llama 3" all landed in 2024, and the only thing after them is the Llama 4 generation in 2025.
Llama 3 to Llama 4, dated
April 2024 - Llama 3 (8B, 70B)
Meta's first Llama 3 models. Text-only, open weights, and the start of the 3.x line that most local tooling was built around.
July 2024 - Llama 3.1 (8B, 70B, 405B)
Added the 405B parameter model, the largest open-weights Llama at the time, plus longer context across the family.
September 2024 - Llama 3.2 (1B, 3B, 11B, 90B)
Small on-device text models (1B and 3B) plus the first Llama vision models (11B and 90B).
December 2024 - Llama 3.3 70B
A refreshed 70B that closed much of the gap to the 405B while staying far cheaper to run. The last release in the 3.x line.
April 2025 - Llama 4 Scout and Maverick
The next generation. Mixture-of-experts, multimodal, and the current open-weights Llama you would reach for in 2026.
2026 so far - no new Llama 3
As of May 31, 2026, no new Llama model has been released this year. The newest open-weights Llama you can run is still Llama 4 from April 2025.
Why people keep searching for a 2026 Llama 3
There are a few honest reasons this question shows up so often. Llama versioning moved fast: four point releases in 2024 alone, then a jump to a different major number in 2025. If you onboarded onto Llama 3.1 or 3.3 in 2024 and only checked back in 2026, it is reasonable to assume there must be a newer Llama 3 by now. There is not. The line stopped at 3.3, and the next step was Llama 4.
The other reason is that "Llama 3" became a generic shorthand for "the open Meta model" the way people say "Kleenex" for tissue. So a search for the latest Llama 3 news in 2026 usually means one of two things: either you want the newest model overall (that is Llama 4 Scout or Maverick), or you want the most recent 3.x specifically (that is Llama 3.3 70B from December 2024). Both are answerable, neither is a 2026 release.
If you landed here trying to decide what to actually pull and run, the practical answer is below. The version you download matters less than where the model lives and what drives it.
The part no Llama roundup covers: running it as an agent on your Mac
Every other guide on this stops at specs and download links. The question that actually changes your day is different: once you have a Llama model, how do you put it behind a real agent loop, locally, on a Mac, without losing your work every time you restart?
That is the thing fazm is built around. fazm is a native macOS app that wraps Claude Code (via the Agent Client Protocol package @agentclientprotocol/claude-agent-acp) and Codex (codex-acp) so you get the same agent loop in a window instead of a terminal. The detail that matters for Llama: fazm exposes a custom API endpoint setting. Point it at a local inference server behind an Anthropic-compatible gateway and the agent drives Llama instead of Anthropic's models. Nothing about the loop changes.
# 1. Serve a Llama model locally (example: Ollama)
ollama pull llama3.3:70b
ollama serve # exposes http://localhost:11434
# 2. Put an Anthropic-compatible gateway in front (LiteLLM)
litellm --model ollama/llama3.3:70b --port 4000
# -> http://localhost:4000 now speaks the Anthropic messages API
# 3. In fazm, set the custom API endpoint for the chat:
# Base URL: http://localhost:4000
# Model: llama3.3:70b
# The Claude Code / Codex agent loop now drives Llama,
# while fazm keeps the session, the fork button, and full context.The gateway exists because the agent harness expects an Anthropic-shaped API while Ollama and vLLM speak their own. LiteLLM translates between them, which is what lets the unchanged Claude Code loop talk to Llama.
Raw terminal Llama agent vs the same loop in fazm
You can run a Llama-backed agent from a bare terminal today. The model works. What breaks is everything around it: close the window or restart the Mac and the conversation is gone, branching an experiment means a session-id dance, and long runs quietly compact context to stay under a limit, dropping decisions you made earlier.
Same Llama model, different harness
Llama behind a raw terminal agent
- Sessions vanish on restart
- Forking a run is a manual session-id copy
- Context auto-compacts and silently drops earlier decisions
- No window-level history you can scroll back through
What you need to drive Llama from fazm
Nothing here is fazm-specific except the last step. If you already run a local Llama, you are most of the way there.
Local Llama agent on macOS
- A Mac on macOS 14 or newer
- A Llama model pulled locally (Llama 3.3 70B or Llama 4, depending on your hardware)
- An inference server: Ollama, vLLM, or LM Studio
- An Anthropic-compatible gateway in front, such as LiteLLM
- fazm, with the chat's custom API endpoint pointed at the gateway
Want a Llama model running in a real Mac agent loop?
Walk through pointing fazm at your local Llama, with persistent sessions and no context compacting.
Llama 3 in 2026, answered
Was there a Llama 3 release in 2026?
No. Every Llama 3 version shipped in 2024: Llama 3 in April 2024, Llama 3.1 (including the 405B) in July 2024, Llama 3.2 (small text models plus 11B and 90B vision) in September 2024, and Llama 3.3 70B in December 2024. The newest open-weights generation is Llama 4, released April 2025. As of May 31, 2026, no new Llama 3 model has shipped in 2026.
If I want the latest Llama, which version should I download?
For most local work the current generation is Llama 4 Scout or Maverick (April 2025). If you specifically want a dense, well supported model that runs in nearly every local runtime, Llama 3.3 70B (December 2024) and Llama 3.1 8B remain popular because the tooling around them is mature. Check llama.com for the live list before you pull weights.
Did Llama 4 Behemoth ever release?
Meta previewed Llama 4 Behemoth alongside Scout and Maverick in April 2025, but it was described as still in training and was not made available as a general download. Reports through 2025 said its launch slipped from spring to fall and beyond. Scout and Maverick were the two open-weights models you could actually run from that announcement.
Can I run a Llama model inside fazm instead of Claude?
Yes, indirectly. fazm wraps Claude Code and Codex via the Agent Client Protocol, and it exposes a custom API endpoint setting. Point that at a local inference server (Ollama, vLLM, or LM Studio) sitting behind an Anthropic-compatible gateway such as LiteLLM, and the agent loop drives a Llama model instead of Anthropic's. You keep fazm's persistent sessions, one-click forking, and no auto-compacting.
Why does running Llama through a gateway matter for an agent?
Agent loops like Claude Code expect an Anthropic-shaped API (messages, tool calls, streaming). Llama weights served by Ollama or vLLM speak their own or an OpenAI-shaped API. A translation gateway (LiteLLM is the common one) maps between them so the same agent harness works unchanged. Without it, the agent cannot parse the model's tool calls.
Is Llama 3 still worth using in 2026?
For many tasks, yes. Llama 3.1 8B is small enough to run on a laptop and Llama 3.3 70B is a capable dense model with broad runtime support. Llama 4 is newer and multimodal, but the 3.x line is battle-tested and the quantizations and integrations are everywhere. The right pick depends on your hardware and whether you need vision.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.