Llama release tracker

Llama 3 in 2026: what actually shipped, and what did not

Short version: there is no Llama 3 release in 2026. The whole Llama 3 line shipped in 2024, and the newest open-weights Llama you can run is Llama 4 from April 2025. Below is the exact timeline, why this question keeps coming up, and the part nobody else covers, which is how to run any Llama model as the brain of a native Mac agent.

Matthew Diakonov, Written with AI

Published May 31, 20267 min read

Run any model in a native Mac agent

Llama 3 - Apr 2024Llama 3.1 405B - Jul 2024Llama 3.2 vision - Sep 2024Llama 3.3 70B - Dec 2024Llama 4 Scout - Apr 2025Llama 4 Maverick - Apr 2025Llama 4 Behemoth - preview

Direct answer - verified May 31, 2026

No. There was no Llama 3 release in 2026.

Every Llama 3 version shipped in 2024: Llama 3 in April, Llama 3.1 (with the 405B) in July, Llama 3.2 (small text plus vision) in September, and Llama 3.3 70B in December. The current open-weights generation is Llama 4 (Scout and Maverick), released April 2025. As of this writing, no new Llama model has shipped in 2026. You can confirm the live model list on Meta's official site at llama.com and the model cards at github.com/meta-llama/llama-models.

The actual Llama release timeline

Here is every Llama release that bears on this question, in order. Note that the four releases people might call "Llama 3" all landed in 2024, and the only thing after them is the Llama 4 generation in 2025.

Llama 3 to Llama 4, dated

April 2024 - Llama 3 (8B, 70B)

Meta's first Llama 3 models. Text-only, open weights, and the start of the 3.x line that most local tooling was built around.

This is the release people mean when they say 'Llama 3'. It is a 2024 event, not a 2026 one.

July 2024 - Llama 3.1 (8B, 70B, 405B)

Added the 405B parameter model, the largest open-weights Llama at the time, plus longer context across the family.

The 405B is the one that made headlines for matching frontier closed models on several benchmarks.

September 2024 - Llama 3.2 (1B, 3B, 11B, 90B)

Small on-device text models (1B and 3B) plus the first Llama vision models (11B and 90B).

The 1B and 3B were aimed squarely at laptops and phones, which is why they show up in so many local setups.

December 2024 - Llama 3.3 70B

A refreshed 70B that closed much of the gap to the 405B while staying far cheaper to run. The last release in the 3.x line.

If you are looking for 'the latest Llama 3', this is it. It shipped in 2024, not 2026.

April 2025 - Llama 4 Scout and Maverick

The next generation. Mixture-of-experts, multimodal, and the current open-weights Llama you would reach for in 2026.

Behemoth was previewed at the same time but was described as still in training and not released as a general download.

2026 so far - no new Llama 3

As of May 31, 2026, no new Llama model has been released this year. The newest open-weights Llama you can run is still Llama 4 from April 2025.

Anyone searching for a 'Llama 3 2026 release' is almost always thinking of one of the 2024 point releases, or of Llama 4.

Why people keep searching for a 2026 Llama 3

There are a few honest reasons this question shows up so often. Llama versioning moved fast: four point releases in 2024 alone, then a jump to a different major number in 2025. If you onboarded onto Llama 3.1 or 3.3 in 2024 and only checked back in 2026, it is reasonable to assume there must be a newer Llama 3 by now. There is not. The line stopped at 3.3, and the next step was Llama 4.

The other reason is that "Llama 3" became a generic shorthand for "the open Meta model" the way people say "Kleenex" for tissue. So a search for the latest Llama 3 news in 2026 usually means one of two things: either you want the newest model overall (that is Llama 4 Scout or Maverick), or you want the most recent 3.x specifically (that is Llama 3.3 70B from December 2024). Both are answerable, neither is a 2026 release.

If you landed here trying to decide what to actually pull and run, the practical answer is below. The version you download matters less than where the model lives and what drives it.

The part no Llama roundup covers: running it as an agent on your Mac

Every other guide on this stops at specs and download links. The question that actually changes your day is different: once you have a Llama model, how do you put it behind a real agent loop, locally, on a Mac, without losing your work every time you restart?

That is the thing fazm is built around. fazm is a native macOS app that wraps Claude Code (via the Agent Client Protocol package @agentclientprotocol/claude-agent-acp) and Codex (codex-acp) so you get the same agent loop in a window instead of a terminal. The detail that matters for Llama: fazm exposes a custom API endpoint setting. Point it at a local inference server behind an Anthropic-compatible gateway and the agent drives Llama instead of Anthropic's models. Nothing about the loop changes.

point fazm at a local Llama

# 1. Serve a Llama model locally (example: Ollama)
ollama pull llama3.3:70b
ollama serve            # exposes http://localhost:11434

# 2. Put an Anthropic-compatible gateway in front (LiteLLM)
litellm --model ollama/llama3.3:70b --port 4000
# -> http://localhost:4000 now speaks the Anthropic messages API

# 3. In fazm, set the custom API endpoint for the chat:
#    Base URL: http://localhost:4000
#    Model:    llama3.3:70b
# The Claude Code / Codex agent loop now drives Llama,
# while fazm keeps the session, the fork button, and full context.

The gateway exists because the agent harness expects an Anthropic-shaped API while Ollama and vLLM speak their own. LiteLLM translates between them, which is what lets the unchanged Claude Code loop talk to Llama.

Raw terminal Llama agent vs the same loop in fazm

You can run a Llama-backed agent from a bare terminal today. The model works. What breaks is everything around it: close the window or restart the Mac and the conversation is gone, branching an experiment means a session-id dance, and long runs quietly compact context to stay under a limit, dropping decisions you made earlier.

Same Llama model, different harness

Llama behind a raw terminal agent

Sessions vanish on restart
Forking a run is a manual session-id copy
Context auto-compacts and silently drops earlier decisions
No window-level history you can scroll back through

What you need to drive Llama from fazm

Nothing here is fazm-specific except the last step. If you already run a local Llama, you are most of the way there.

Local Llama agent on macOS

A Mac on macOS 14 or newer
A Llama model pulled locally (Llama 3.3 70B or Llama 4, depending on your hardware)
An inference server: Ollama, vLLM, or LM Studio
An Anthropic-compatible gateway in front, such as LiteLLM
fazm, with the chat's custom API endpoint pointed at the gateway

Want a Llama model running in a real Mac agent loop?

Walk through pointing fazm at your local Llama, with persistent sessions and no context compacting.

Llama 3 in 2026, answered

Was there a Llama 3 release in 2026?

No. Every Llama 3 version shipped in 2024: Llama 3 in April 2024, Llama 3.1 (including the 405B) in July 2024, Llama 3.2 (small text models plus 11B and 90B vision) in September 2024, and Llama 3.3 70B in December 2024. The newest open-weights generation is Llama 4, released April 2025. As of May 31, 2026, no new Llama 3 model has shipped in 2026.

If I want the latest Llama, which version should I download?

For most local work the current generation is Llama 4 Scout or Maverick (April 2025). If you specifically want a dense, well supported model that runs in nearly every local runtime, Llama 3.3 70B (December 2024) and Llama 3.1 8B remain popular because the tooling around them is mature. Check llama.com for the live list before you pull weights.

Did Llama 4 Behemoth ever release?

Meta previewed Llama 4 Behemoth alongside Scout and Maverick in April 2025, but it was described as still in training and was not made available as a general download. Reports through 2025 said its launch slipped from spring to fall and beyond. Scout and Maverick were the two open-weights models you could actually run from that announcement.

Can I run a Llama model inside fazm instead of Claude?

Yes, indirectly. fazm wraps Claude Code and Codex via the Agent Client Protocol, and it exposes a custom API endpoint setting. Point that at a local inference server (Ollama, vLLM, or LM Studio) sitting behind an Anthropic-compatible gateway such as LiteLLM, and the agent loop drives a Llama model instead of Anthropic's. You keep fazm's persistent sessions, one-click forking, and no auto-compacting.

Why does running Llama through a gateway matter for an agent?

Agent loops like Claude Code expect an Anthropic-shaped API (messages, tool calls, streaming). Llama weights served by Ollama or vLLM speak their own or an OpenAI-shaped API. A translation gateway (LiteLLM is the common one) maps between them so the same agent harness works unchanged. Without it, the agent cannot parse the model's tool calls.

Is Llama 3 still worth using in 2026?

For many tasks, yes. Llama 3.1 8B is small enough to run on a laptop and Llama 3.3 70B is a capable dense model with broad runtime support. Llama 4 is newer and multimodal, but the 3.x line is battle-tested and the quantizations and integrations are everywhere. The right pick depends on your hardware and whether you need vision.