New LLM Releases April 2026: Every Major Model Launch This Month
New LLM Releases April 2026: Every Major Model Launch This Month
April 2026 was supposed to be the month GPT-6 landed. It did not. The launch slipped, and yet the month still produced one of the densest stretches of frontier-model releases on record: Anthropic previewed Claude Mythos to a small partner cohort, Google shipped four Gemma 4 variants under Apache 2.0, Zhipu published GLM-5.1 under MIT, Alibaba shipped Qwen 3.6-Plus, Meta shipped Llama 4 Scout and Maverick, and Arcee dropped Trinity at 400B parameters. With the month winding down, here is what actually shipped, what slipped, and how each release stacks up if you are picking a model to build on right now.
Quick Reference: April 2026 LLM Releases
| Model | Company | Release Date | Parameters | Context Window | License | Pricing (per 1M tokens) |
|---|---|---|---|---|---|---|
| GPT-6 | OpenAI | slipped, late Apr/May | Undisclosed | 2M tokens (provisional) | Proprietary | $2.50 in / $12 out (provisional) |
| Claude Mythos | Anthropic | Apr 7 (preview) | Undisclosed | TBA | Proprietary (gated) | $25 in / $125 out |
| Gemma 4 31B | Apr 2 | 31B dense | 256K tokens | Apache 2.0 | Free (open weights) | |
| Gemma 4 26B MoE | Apr 2 | 26B MoE | 256K tokens | Apache 2.0 | Free (open weights) | |
| Gemma 4 E4B | Apr 2 | ~4B effective | 256K tokens | Apache 2.0 | Free (open weights) | |
| Gemma 4 E2B | Apr 2 | ~2B effective | 256K tokens | Apache 2.0 | Free (open weights) | |
| GLM-5.1 | Zhipu AI | Early Apr | 744B MoE (40B active) | 200K tokens | MIT | Free (open weights) |
| Qwen 3.6-Plus | Alibaba | Early Apr | Undisclosed | 1M tokens | Open | Free (open weights) |
| Llama 4 Scout | Meta | Apr (rolling) | Undisclosed | 10M tokens | Llama License | Free (open weights) |
| Llama 4 Maverick | Meta | Apr (rolling) | 400B | 1M tokens | Llama License | Free (open weights) |
| Arcee Trinity | Arcee AI | Early Apr | 400B | TBA | Apache 2.0 | Free (open weights) |
GPT-6 ("Spud"): Slipped Past April 14, Still Pending
OpenAI confirmed on April 7 that GPT-6 (internally codenamed "Spud") would ship globally on April 14, 2026. The date came and went. As of April 28, no public weights, no API rollout, no developer keys. OpenAI has not published an official reason. The current line from Altman's recent talks is "a few more weeks," and partners in the early-access program have been told to expect a quiet ramp rather than a single launch event.
What we still know about Spud, sourced from partner briefings in early April:
- Pre-training was reportedly complete on March 17, with post-training wrapping in early April. The slip is widely believed to be a safety/alignment review, not a capabilities issue.
- Claimed performance: roughly +40% over GPT-5.4 across coding, reasoning, and agent tasks. HumanEval pushing past 95%, MATH around 85%, agent task completion climbing from 62% to ~87%. None of these numbers have been independently verified.
- 2M token context window. Double GPT-5.4 and Claude Opus 4.6, roughly 1.5 million words in a single context. Useful in theory; recall accuracy at 2M is the open question.
- Dual-tier reasoning. A two-tier inference framework where System-1 handles rapid responses and System-2 performs internal logic verification and multi-step deduction. OpenAI partner decks claim this drops hallucination rates below 0.1%, again, unverified externally.
- Super-app integration. GPT-6 is the engine designed to merge ChatGPT, Codex, and the Atlas browser into a single desktop application: one agent that browses, codes, and converses without breaking context.
- Pricing held flat. $2.50 input / $12 output per million tokens, materially the same as GPT-5.4. This is provisional until OpenAI publishes the official tier sheet.
What to do in the meantime: anything you build on the prerelease GPT-6 numbers is speculation. If your roadmap depends on a launched, priced, rate-limit-known frontier model right now, Claude Opus 4.6 and GLM-5.1 are the two safest bets. We will update this section the day Spud actually ships.
Claude Mythos: Anthropic's Gated Preview
Anthropic announced Claude Mythos Preview on April 7, available exclusively through Project Glasswing to roughly 50 partner organizations. The focus is on cybersecurity vulnerability detection, reasoning, and coding.
Mythos is described as a "step change" above Claude Opus 4.6, which has been the top-performing model on many benchmarks since its February 2026 release. Preview pricing is steep at $25/$125 per million input/output tokens, reflecting the gated early-access nature of the program.
No public release date has been announced. For most developers, Claude Opus 4.6 and Sonnet 4.6 remain the current Anthropic options.
Google Gemma 4: Open-Source Gets Serious
Google released the Gemma 4 family on April 2 under Apache 2.0, delivering four models purpose-built for different deployment scenarios:
- Gemma 4 31B Dense - the flagship, with benchmark scores that outperform models 20 times its size
- Gemma 4 26B MoE - mixture-of-experts variant for efficient inference
- Gemma 4 E4B - consumer GPU and edge deployment
- Gemma 4 E2B - smartphones and Raspberry Pi devices
All four models support 256K context windows, native vision and audio processing, and fluency in over 140 languages. They are purpose-built for advanced reasoning and agentic workflows.
With over 400 million cumulative Gemma downloads, this release under Apache 2.0 (upgraded from earlier, more restrictive licenses) represents a strategic shift in Google's open model approach.
Zhipu GLM-5.1: China's MIT-Licensed Giant
Zhipu AI released GLM-5.1 under the MIT license, a 744-billion-parameter mixture-of-experts model with 40 billion parameters active per forward pass and a 200K context window.
The headline claim: on SWE-Bench Pro, GLM-5.1 reportedly beat both Claude Opus 4.6 and GPT-5.4. Alongside GLM-5.1, Zhipu also released GLM-5V-Turbo, a multimodal variant optimized for coding tasks.
The MIT license makes this one of the most permissive releases of a frontier-scale model to date. No usage restrictions, no registration required.
Alibaba Qwen 3.6-Plus: 1M Context for Agents
Alibaba's Qwen 3.6-Plus targets agentic coding workflows with a 1 million token context window. The model is designed for tasks that require understanding and modifying large codebases in a single pass.
This positions Qwen 3.6-Plus as a direct competitor to Claude Opus 4.6 and GPT-5.4 for the growing market of AI-powered coding agents.
Meta Llama 4: The 10M Token Context Window
Meta's Llama 4 family includes two headline models:
- Llama 4 Scout with a 10 million token context window, the largest of any model released this month
- Llama 4 Maverick with 400 billion parameters, 1 million token context, and native multimodal capabilities
Both models use a mixture-of-experts architecture and are natively multimodal from training (not bolted-on vision after the fact). Meta is using controlled licensing agreements for Llama 4, distinguishing its approach from fully permissive open-source releases.
Arcee Trinity: 400B Under Apache 2.0
Arcee AI released Trinity, a 400 billion parameter model under Apache 2.0. Trinity is designed for enterprise use cases where teams need a large, capable model they can run and modify without licensing restrictions.
Which Model Should You Actually Pick This Month?
With Spud delayed and the open-weight side of the field crowded, "which one do I use today" is the question that matters more than the headline benchmark numbers. Practical picks by use case, as of April 28:
Long-context coding (whole-codebase analysis, multi-file refactors). Claude Opus 4.6 if budget allows. GLM-5.1 if you want frontier-level capability at open-weight cost; the SWE-Bench Pro numbers are real and the MIT license removes most procurement headaches. Qwen 3.6-Plus is a strong tool-calling alternative if you want a 1M context with a different token bias.
Local model on a Mac. Gemma 4 31B Dense for any machine with 64GB+ unified memory (M3 Max, M4 Pro/Max). Drop to Gemma 4 26B MoE on 36-48GB. E4B and E2B are the picks for 16-24GB MacBook Airs; both run at usable speeds via mlx and llama.cpp. Llama 4 Scout is the one to load when you need the 10M context for a single document and can absorb the disk footprint.
Agentic browser and desktop work. Tool-call reliability and latency matter more than raw IQ here. Claude Opus 4.6 and GPT-5.4 still lead on reliability for production agents; GLM-5.1 is closing fast and is now competitive enough that it is worth A/B-ing on your real workflows. The harness matters more than the model. Fazm's custom API endpoint accepts any of these, so you can swap GPT-5.4 for GLM-5.1 mid-task and watch the failure rate change in place.
Privacy-sensitive workflows (legal, medical, internal HR data). Local Gemma 4 31B Dense or Llama 4 Maverick via Ollama or LM Studio. The cost stops scaling with usage the moment you stop paying per million tokens, which makes large-volume document review and inbox triage economically reasonable for the first time.
Frontier reasoning, "I just need the smartest thing in the room." Claude Opus 4.6 today, with Claude Mythos available only if you have an existing Anthropic relationship. Mythos pricing at $25/$125 is steep enough that it is not the everyday-default model.
ChatGPT-Atlas-style "browse + code + converse" super-app. Wait for Spud to actually ship. Building on prerelease pricing or rate limits is a recipe for re-architecting the moment OpenAI publishes the real numbers.
What This Means for Developers
The open-source gap is closing fast
Three months ago, proprietary models held a clear lead on reasoning and coding benchmarks. In April 2026, GLM-5.1 claims to beat the best proprietary models on SWE-Bench Pro, and Gemma 4's 31B dense model outperforms models 20x its size. The cost advantage of running open weights on your own infrastructure keeps growing.
Context windows are no longer a differentiator
When the smallest context window in this list is 200K tokens and the largest is 10M, context length alone is not a selling point. The question shifts to how well models actually use long contexts. Retrieval accuracy at 1M+ tokens matters more than the raw number.
Agent capabilities are the new battleground
Every release this month emphasizes agent workflows: GPT-6's super-app integration, Gemma 4's agentic design, Qwen 3.6-Plus's coding agent focus. If you are building AI products, agent reliability (tool calling accuracy, multi-step planning, error recovery) is now the primary differentiator between models.
The catch is that none of these models do anything useful sitting in a chat window. They need an agent loop, a tool layer, and a surface to actually act on. For anything that touches a real desktop (opening a browser tab, editing a Google Doc, filling a CRM, moving files between apps), that layer has to live on the machine. Browser-only agents and cloud sandboxes cover a small slice of what most small businesses actually do in a day.
Pricing compression continues
GPT-6's leaked pricing tier holds flat despite a claimed 40%+ capability jump (input $2.50, output $12). Open-weight models are free to run, the only cost is the GPU or the Mac you already own. The cost of intelligence per token continues to fall, making previously expensive workflows (whole-codebase analysis, document processing at scale, batch inbox triage) economically viable for smaller teams that could not have afforded them last year.
Running Any Of These Models On Your Mac
If you are reading this list trying to pick which model to actually use for day-to-day work, the blocker is almost never the model. It is the harness. A 1M token context window and 95% HumanEval do not matter if the thing cannot open your browser, read your screen, and click the right button.
Fazm is the macOS-side answer to that harness problem. It is a local computer-use agent that drives your actual Mac through the accessibility APIs (not screenshots), and it is model-agnostic:
- Point it at any of the models in this post. Fazm supports custom API endpoints, so you can route through the GPT-6 API on April 14, Claude Opus 4.6, a local Gemma 4 31B, or a GLM-5.1 instance hosted behind a corporate proxy. Same agent, different backend.
- It works on the apps you already use. Browser, Google Docs, Sheets, Calendar, your CRM, your invoicing tool, Mail, the Finder. Not a headless Chromium in a data center.
- Voice-first. You describe what you want and it does it. No prompt engineering chat loop.
- Fully open source, runs locally. Source is at github.com/m13v/fazm. Your screen and mic never leave your machine unless you explicitly point it at a hosted model.
It runs on macOS 14 or newer. Native Swift/SwiftUI, no terminal required.
Native computer-use agent for macOS. Custom API endpoint field accepts GPT-6, Claude, Gemma 4, GLM-5.1, Llama 4, or any Anthropic-compatible gateway. Voice-first. Drives your real browser, Google Docs, Sheets, CRM, Mail, Finder through accessibility APIs - not screenshots.
- - 10x faster than screenshot agents (skips the vision round-trip)
- - Nothing leaves your machine unless you explicitly point it at a hosted model
- - macOS 14+, native Swift/SwiftUI, no terminal required
Why the harness matters more than the model
Every new benchmark in the tables above is measured in a sanitized sandbox. The moment you point a model at a messy real desktop, the failure modes change. Latency shifts to accessibility-tree queries and app focus transitions, not token generation. Errors cluster around stale UI state, modal dialogs, and half-loaded pages, not reasoning. A model that wins SWE-Bench Pro can still fail to close a Zoom notification blocking the "Send" button.
The practical bottleneck for small-business automation right now is whichever agent loop plus accessibility plumbing handles those failure modes most gracefully. That is what Fazm optimizes for. The model is a swappable component.
Related reading
- Accessibility APIs vs screenshots for computer control
- AI agents vs copilots: when each wins
- 1M token context windows in practice
- Does a 3-tool-call problem still matter in 2026?
Looking Ahead: What Is Still Pending
With two days left in April, the open items:
- GPT-6 public launch. Slipped past the original April 14 date; current guidance is "a few more weeks," likely a quiet API ramp rather than a launch event.
- Grok 5 from xAI is still expected this quarter; no date but late April or May is realistic.
- Claude Mythos public availability timeline unknown. Anthropic has not committed to a wider release window beyond the Project Glasswing partner cohort.
- The 1M token context window beta for Claude Sonnet 4.5 and Sonnet 4 retires on April 30. If you are running production load against it, plan the migration this week.
Even with Spud absent, April 2026 is a turning point. The sheer volume of high-quality open-weight models, plus a credible open-weight challenger to the proprietary leaders on coding benchmarks (GLM-5.1), means developers have more frontier-tier options at lower cost than at any prior point. The best time to evaluate which model fits your use case is now, not when GPT-6 finally drops.
If you want to evaluate them actually doing things on your Mac (not in a chat window), grab Fazm for free. Plug in whichever of these models you are curious about and point it at a real workflow you would normally do by hand, invoicing, CRM updates, inbox triage, scheduling. That is the honest benchmark.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.