New LLM Releases April 2026: Every Major Model Launch This Month

Matthew Diakonov·April 11, 2026·18 min read

llm ai-models gpt-6 claude gemma open-source april-2026

New LLM Releases April 2026: Every Major Model Launch This Month

April 2026 was supposed to be the month GPT-6 landed. It did not. The launch slipped, and yet the month still produced one of the densest stretches of frontier-model releases on record: Anthropic previewed Claude Mythos to a small partner cohort, Google shipped four Gemma 4 variants under Apache 2.0, Zhipu published GLM-5.1 under MIT, Alibaba shipped Qwen 3.6-Plus, Meta shipped Llama 4 Scout and Maverick, and Arcee dropped Trinity at 400B parameters. With the month winding down, here is what actually shipped, what slipped, and how each release stacks up if you are picking a model to build on right now.

Update, April 28

Two weeks past the original "April 14" window, GPT-6 (codename "Spud") is still unreleased. Altman's most recent on-the-record line is "a few more weeks." Treat the GPT-6 row in the table below (parameter count, pricing, benchmark deltas) as partner-leaked provisional figures, not confirmed specs. Everything else in the roundup, Gemma 4, GLM-5.1, Qwen 3.6-Plus, Llama 4 Scout and Maverick, and Arcee Trinity, has shipped and is running in production for users today.

For readers on macOS

Skip the roundup, plug any of these models into a real Mac agent.

Fazm is a local, open-source computer-use agent. Bring your own API key for GPT-6, Claude, Gemma 4, GLM-5.1, or point it at a local model. Voice-first, works on the apps you already use.

Download for macOS

Free, macOS 14+, no credit card. Source at github.com/m13v/fazm.

Quick Reference: April 2026 LLM Releases

Model	Company	Release Date	Parameters	Context Window	License	Pricing (per 1M tokens)
GPT-6	OpenAI	slipped, late Apr/May	Undisclosed	2M tokens (provisional)	Proprietary	$2.50 in / $12 out (provisional)
Claude Mythos	Anthropic	Apr 7 (preview)	Undisclosed	TBA	Proprietary (gated)	$25 in / $125 out
Gemma 4 31B	Google	Apr 2	31B dense	256K tokens	Apache 2.0	Free (open weights)
Gemma 4 26B MoE	Google	Apr 2	26B MoE	256K tokens	Apache 2.0	Free (open weights)
Gemma 4 E4B	Google	Apr 2	~4B effective	256K tokens	Apache 2.0	Free (open weights)
Gemma 4 E2B	Google	Apr 2	~2B effective	256K tokens	Apache 2.0	Free (open weights)
GLM-5.1	Zhipu AI	Early Apr	744B MoE (40B active)	200K tokens	MIT	Free (open weights)
Qwen 3.6-Plus	Alibaba	Early Apr	Undisclosed	1M tokens	Open	Free (open weights)
Llama 4 Scout	Meta	Apr (rolling)	Undisclosed	10M tokens	Llama License	Free (open weights)
Llama 4 Maverick	Meta	Apr (rolling)	400B	1M tokens	Llama License	Free (open weights)
Arcee Trinity	Arcee AI	Early Apr	400B	TBA	Apache 2.0	Free (open weights)

Want the link by email?

Get a one-click macOS install of Fazm sent to your inbox.

Free, open source, runs locally. Plug in any of the keys above (GPT-6, Claude, Gemma 4, GLM-5.1, Llama 4) and it drives your real Mac. On Windows or Linux? We will email you when it lands.

No spam. We send the download link, then leave you alone.

GPT-6 ("Spud"): Slipped Past April 14, Still Pending

OpenAI confirmed on April 7 that GPT-6 (internally codenamed "Spud") would ship globally on April 14, 2026. The date came and went. As of April 28, no public weights, no API rollout, no developer keys. OpenAI has not published an official reason. The current line from Altman's recent talks is "a few more weeks," and partners in the early-access program have been told to expect a quiet ramp rather than a single launch event.

What we still know about Spud, sourced from partner briefings in early April:

Pre-training was reportedly complete on March 17, with post-training wrapping in early April. The slip is widely believed to be a safety/alignment review, not a capabilities issue.
Claimed performance: roughly +40% over GPT-5.4 across coding, reasoning, and agent tasks. HumanEval pushing past 95%, MATH around 85%, agent task completion climbing from 62% to ~87%. None of these numbers have been independently verified.
2M token context window. Double GPT-5.4 and Claude Opus 4.6, roughly 1.5 million words in a single context. Useful in theory; recall accuracy at 2M is the open question.
Dual-tier reasoning. A two-tier inference framework where System-1 handles rapid responses and System-2 performs internal logic verification and multi-step deduction. OpenAI partner decks claim this drops hallucination rates below 0.1%, again, unverified externally.
Super-app integration. GPT-6 is the engine designed to merge ChatGPT, Codex, and the Atlas browser into a single desktop application: one agent that browses, codes, and converses without breaking context.
Pricing held flat. $2.50 input / $12 output per million tokens, materially the same as GPT-5.4. This is provisional until OpenAI publishes the official tier sheet.

What to do in the meantime: anything you build on the prerelease GPT-6 numbers is speculation. If your roadmap depends on a launched, priced, rate-limit-known frontier model right now, Claude Opus 4.6 and GLM-5.1 are the two safest bets. We will update this section the day Spud actually ships.

Claude Mythos: Anthropic's Gated Preview

Anthropic announced Claude Mythos Preview on April 7, available exclusively through Project Glasswing to roughly 50 partner organizations. The focus is on cybersecurity vulnerability detection, reasoning, and coding.

Mythos is described as a "step change" above Claude Opus 4.6, which has been the top-performing model on many benchmarks since its February 2026 release. Preview pricing is steep at $25/$125 per million input/output tokens, reflecting the gated early-access nature of the program.

No public release date has been announced. For most developers, Claude Opus 4.6 and Sonnet 4.6 remain the current Anthropic options.

Google Gemma 4: Open-Source Gets Serious

Google released the Gemma 4 family on April 2 under Apache 2.0, delivering four models purpose-built for different deployment scenarios:

Gemma 4 31B Dense - the flagship, with benchmark scores that outperform models 20 times its size
Gemma 4 26B MoE - mixture-of-experts variant for efficient inference
Gemma 4 E4B - consumer GPU and edge deployment
Gemma 4 E2B - smartphones and Raspberry Pi devices

All four models support 256K context windows, native vision and audio processing, and fluency in over 140 languages. They are purpose-built for advanced reasoning and agentic workflows.

With over 400 million cumulative Gemma downloads, this release under Apache 2.0 (upgraded from earlier, more restrictive licenses) represents a strategic shift in Google's open model approach.

Zhipu GLM-5.1: China's MIT-Licensed Giant

Zhipu AI released GLM-5.1 under the MIT license, a 744-billion-parameter mixture-of-experts model with 40 billion parameters active per forward pass and a 200K context window.

The headline claim: on SWE-Bench Pro, GLM-5.1 reportedly beat both Claude Opus 4.6 and GPT-5.4. Alongside GLM-5.1, Zhipu also released GLM-5V-Turbo, a multimodal variant optimized for coding tasks.

The MIT license makes this one of the most permissive releases of a frontier-scale model to date. No usage restrictions, no registration required.

Alibaba Qwen 3.6-Plus: 1M Context for Agents

Alibaba's Qwen 3.6-Plus targets agentic coding workflows with a 1 million token context window. The model is designed for tasks that require understanding and modifying large codebases in a single pass.

This positions Qwen 3.6-Plus as a direct competitor to Claude Opus 4.6 and GPT-5.4 for the growing market of AI-powered coding agents.

Meta Llama 4: The 10M Token Context Window

Meta's Llama 4 family includes two headline models:

Llama 4 Scout with a 10 million token context window, the largest of any model released this month
Llama 4 Maverick with 400 billion parameters, 1 million token context, and native multimodal capabilities

Both models use a mixture-of-experts architecture and are natively multimodal from training (not bolted-on vision after the fact). Meta is using controlled licensing agreements for Llama 4, distinguishing its approach from fully permissive open-source releases.

Arcee Trinity: 400B Under Apache 2.0

Arcee AI released Trinity, a 400 billion parameter model under Apache 2.0. Trinity is designed for enterprise use cases where teams need a large, capable model they can run and modify without licensing restrictions.

Which Model Should You Actually Pick This Month?

With Spud delayed and the open-weight side of the field crowded, "which one do I use today" is the question that matters more than the headline benchmark numbers. Practical picks by use case, as of April 28:

Long-context coding (whole-codebase analysis, multi-file refactors). Claude Opus 4.6 if budget allows. GLM-5.1 if you want frontier-level capability at open-weight cost; the SWE-Bench Pro numbers are real and the MIT license removes most procurement headaches. Qwen 3.6-Plus is a strong tool-calling alternative if you want a 1M context with a different token bias.

Local model on a Mac. Gemma 4 31B Dense for any machine with 64GB+ unified memory (M3 Max, M4 Pro/Max). Drop to Gemma 4 26B MoE on 36-48GB. E4B and E2B are the picks for 16-24GB MacBook Airs; both run at usable speeds via mlx and llama.cpp. Llama 4 Scout is the one to load when you need the 10M context for a single document and can absorb the disk footprint.

Agentic browser and desktop work. Tool-call reliability and latency matter more than raw IQ here. Claude Opus 4.6 and GPT-5.4 still lead on reliability for production agents; GLM-5.1 is closing fast and is now competitive enough that it is worth A/B-ing on your real workflows. The harness matters more than the model. Fazm's custom API endpoint accepts any of these, so you can swap GPT-5.4 for GLM-5.1 mid-task and watch the failure rate change in place.

Privacy-sensitive workflows (legal, medical, internal HR data). Local Gemma 4 31B Dense or Llama 4 Maverick via Ollama or LM Studio. The cost stops scaling with usage the moment you stop paying per million tokens, which makes large-volume document review and inbox triage economically reasonable for the first time.

Frontier reasoning, "I just need the smartest thing in the room." Claude Opus 4.6 today, with Claude Mythos available only if you have an existing Anthropic relationship. Mythos pricing at $25/$125 is steep enough that it is not the everyday-default model.

ChatGPT-Atlas-style "browse + code + converse" super-app. Wait for Spud to actually ship. Building on prerelease pricing or rate limits is a recipe for re-architecting the moment OpenAI publishes the real numbers.

What This Means for Developers

The open-source gap is closing fast

Three months ago, proprietary models held a clear lead on reasoning and coding benchmarks. In April 2026, GLM-5.1 claims to beat the best proprietary models on SWE-Bench Pro, and Gemma 4's 31B dense model outperforms models 20x its size. The cost advantage of running open weights on your own infrastructure keeps growing.

Context windows are no longer a differentiator

When the smallest context window in this list is 200K tokens and the largest is 10M, context length alone is not a selling point. The question shifts to how well models actually use long contexts. Retrieval accuracy at 1M+ tokens matters more than the raw number.

Agent capabilities are the new battleground

Every release this month emphasizes agent workflows: GPT-6's super-app integration, Gemma 4's agentic design, Qwen 3.6-Plus's coding agent focus. If you are building AI products, agent reliability (tool calling accuracy, multi-step planning, error recovery) is now the primary differentiator between models.

The catch is that none of these models do anything useful sitting in a chat window. They need an agent loop, a tool layer, and a surface to actually act on. For anything that touches a real desktop (opening a browser tab, editing a Google Doc, filling a CRM, moving files between apps), that layer has to live on the machine. Browser-only agents and cloud sandboxes cover a small slice of what most small businesses actually do in a day.

Pricing compression continues

GPT-6's leaked pricing tier holds flat despite a claimed 40%+ capability jump (input $2.50, output $12). Open-weight models are free to run, the only cost is the GPU or the Mac you already own. The cost of intelligence per token continues to fall, making previously expensive workflows (whole-codebase analysis, document processing at scale, batch inbox triage) economically viable for smaller teams that could not have afforded them last year.

Running Any Of These Models On Your Mac

If you are reading this list trying to pick which model to actually use for day-to-day work, the blocker is almost never the model. It is the harness. A 1M token context window and 95% HumanEval do not matter if the thing cannot open your browser, read your screen, and click the right button.

Fazm is the macOS-side answer to that harness problem. It is a local computer-use agent that drives your actual Mac through the accessibility APIs (not screenshots), and it is model-agnostic:

Point it at any of the models in this post. Fazm supports custom API endpoints, so you can route through the GPT-6 API on April 14, Claude Opus 4.6, a local Gemma 4 31B, or a GLM-5.1 instance hosted behind a corporate proxy. Same agent, different backend.
It works on the apps you already use. Browser, Google Docs, Sheets, Calendar, your CRM, your invoicing tool, Mail, the Finder. Not a headless Chromium in a data center.
Voice-first. You describe what you want and it does it. No prompt engineering chat loop.
Fully open source, runs locally. Source is at github.com/m13v/fazm. Your screen and mic never leave your machine unless you explicitly point it at a hosted model.

It runs on macOS 14 or newer. Native Swift/SwiftUI, no terminal required.

Fazm - free, open source, local

Point any of these models at a real Mac and watch it do the work.

Native computer-use agent for macOS. Custom API endpoint field accepts GPT-6, Claude, Gemma 4, GLM-5.1, Llama 4, or any Anthropic-compatible gateway. Voice-first. Drives your real browser, Google Docs, Sheets, CRM, Mail, Finder through accessibility APIs - not screenshots.

- 10x faster than screenshot agents (skips the vision round-trip)
- Nothing leaves your machine unless you explicitly point it at a hosted model
- macOS 14+, native Swift/SwiftUI, no terminal required

Send me the macOS download link.

We email a one-click install. On Windows or Linux? Same form, you get on the launch list.

View source on GitHub →orSkip the email, download directly →

Why the harness matters more than the model

Every new benchmark in the tables above is measured in a sanitized sandbox. The moment you point a model at a messy real desktop, the failure modes change. Latency shifts to accessibility-tree queries and app focus transitions, not token generation. Errors cluster around stale UI state, modal dialogs, and half-loaded pages, not reasoning. A model that wins SWE-Bench Pro can still fail to close a Zoom notification blocking the "Send" button.

The practical bottleneck for small-business automation right now is whichever agent loop plus accessibility plumbing handles those failure modes most gracefully. That is what Fazm optimizes for. The model is a swappable component.

Looking Ahead: What Is Still Pending

With two days left in April, the open items:

GPT-6 public launch. Slipped past the original April 14 date; current guidance is "a few more weeks," likely a quiet API ramp rather than a launch event.
Grok 5 from xAI is still expected this quarter; no date but late April or May is realistic.
Claude Mythos public availability timeline unknown. Anthropic has not committed to a wider release window beyond the Project Glasswing partner cohort.
The 1M token context window beta for Claude Sonnet 4.5 and Sonnet 4 retires on April 30. If you are running production load against it, plan the migration this week.

Even with Spud absent, April 2026 is a turning point. The sheer volume of high-quality open-weight models, plus a credible open-weight challenger to the proprietary leaders on coding benchmarks (GLM-5.1), means developers have more frontier-tier options at lower cost than at any prior point. The best time to evaluate which model fits your use case is now, not when GPT-6 finally drops.

If you want to evaluate them actually doing things on your Mac (not in a chat window), grab Fazm for free. Plug in whichever of these models you are curious about and point it at a real workflow you would normally do by hand, invoicing, CRM updates, inbox triage, scheduling. That is the honest benchmark.

New LLM Releases April 2026: Every Major Model Launch This Month

New LLM Releases April 2026: Every Major Model Launch This Month

Quick Reference: April 2026 LLM Releases

GPT-6 ("Spud"): Slipped Past April 14, Still Pending

Claude Mythos: Anthropic's Gated Preview

Google Gemma 4: Open-Source Gets Serious

Zhipu GLM-5.1: China's MIT-Licensed Giant

Alibaba Qwen 3.6-Plus: 1M Context for Agents

Meta Llama 4: The 10M Token Context Window

Arcee Trinity: 400B Under Apache 2.0

Which Model Should You Actually Pick This Month?

What This Means for Developers

The open-source gap is closing fast

Context windows are no longer a differentiator

Agent capabilities are the new battleground

Pricing compression continues

Running Any Of These Models On Your Mac

Why the harness matters more than the model

Related reading

Looking Ahead: What Is Still Pending

Related Posts

LLM Releases April 2026: What Actually Shipped

Open Source LLM Releases in April 2026: Every Model Worth Running

Latest Open Source LLM Releases April 2026: Mid-Month Tracker with Benchmarks

Comments ()

New LLM Releases April 2026: Every Major Model Launch This Month

Quick Reference: April 2026 LLM Releases

GPT-6 ("Spud"): Slipped Past April 14, Still Pending

Claude Mythos: Anthropic's Gated Preview

Google Gemma 4: Open-Source Gets Serious

Zhipu GLM-5.1: China's MIT-Licensed Giant

Alibaba Qwen 3.6-Plus: 1M Context for Agents

Meta Llama 4: The 10M Token Context Window

Arcee Trinity: 400B Under Apache 2.0

Which Model Should You Actually Pick This Month?

What This Means for Developers

The open-source gap is closing fast

Context windows are no longer a differentiator

Agent capabilities are the new battleground

Pricing compression continues

Running Any Of These Models On Your Mac

Why the harness matters more than the model

Related reading

Looking Ahead: What Is Still Pending

Related Posts

LLM Releases April 2026: What Actually Shipped

Open Source LLM Releases in April 2026: Every Model Worth Running

Latest Open Source LLM Releases April 2026: Mid-Month Tracker with Benchmarks

Comments (••)

Comments ()