Open Source AI Releases April 2026: Every Major Launch This Month

Matthew Diakonov·April 13, 2026·13 min read

open-source ai-releases april-2026 llm ai-agents foundation-models inference

Open Source AI Releases April 2026

April 2026 has delivered the densest wave of open source AI releases in the field's history. Foundation models, agent frameworks, inference runtimes, and developer toolkits have all shipped within a two-week window. This guide covers every significant release, organized by what you can actually do with each one.

Release Summary Table

| Release | Organization | Category | Parameters | License | Date | |---|---|---|---|---|---| | Gemma 4 (E2B, E4B, 26B MoE, 31B Dense) | Google DeepMind | Foundation Model | 2B to 31B | Apache 2.0 | Apr 2 | | GLM-5.1 | Zhipu AI | Foundation Model | 744B MoE (40B active) | MIT | Apr 3 | | Llama 4 Scout | Meta | Foundation Model | 109B MoE (17B active) | Meta Community | Apr 5 | | Llama 4 Maverick | Meta | Foundation Model | 400B MoE (17B active) | Meta Community | Apr 5 | | Goose | Block / Linux Foundation | Agent Framework | N/A | Apache 2.0 | Apr 5 | | Mistral Small 4 | Mistral AI | Foundation Model | 119B MoE (6B active) | Apache 2.0 | Apr 7 | | Google ADK | Google | Agent Framework | N/A | Apache 2.0 | Apr 9 | | Qwen 3 (0.6B to 235B) | Alibaba Cloud | Foundation Model | 0.6B to 235B | Apache 2.0 | Apr 9 | | OpenAI Agents SDK | OpenAI | Agent Framework | N/A | MIT | Apr 9 | | Claw Code | Community | Coding Agent | N/A | Open Source | Apr 5 | | vLLM 0.8 | vLLM Project | Inference Engine | N/A | Apache 2.0 | Apr 2026 | | Continue.dev 1.0 | Continue | IDE Extension | N/A | Apache 2.0 | Apr 2026 |

Foundation Models

Qwen 3 (Alibaba Cloud, April 9)

The Qwen 3 family spans eight model sizes from 0.6B to 235B parameters. The headline capability is "hybrid thinking": the same model can switch between extended chain-of-thought reasoning and direct fast responses, controlled with /think and /no_think toggles in the system prompt.

Performance highlights:

235B MoE (22B active parameters): 81.5% on AIME 2025, 70.7% on LiveCodeBench
32B dense: competitive with much larger models on math and coding
0.6B: runs on mobile devices and edge hardware
All sizes released under Apache 2.0
Built-in MCP support for tool use and agent workflows

For local deployment, the 4B and 8B models run well on consumer hardware through Ollama, while the 32B model needs roughly 20GB VRAM with 4-bit quantization.

Gemma 4 (Google DeepMind, April 2)

Four model sizes: E2B, E4B, 26B MoE, and 31B Dense. The 31B dense model hits 89.2% on AIME 2026 and 80.0% on LiveCodeBench v6. Apache 2.0 license, 256K context window across all sizes, multimodal input (text plus image on all variants, audio on edge models), and support for 140+ languages.

The E2B and E4B edge variants run on a Raspberry Pi. Day-one support for Hugging Face, GGUF, ONNX, vLLM, and Ollama.

GLM-5.1 (Zhipu AI, April 3)

A 744B MoE model with 40B active parameters. The notable detail: GLM-5.1 was trained entirely on Huawei Ascend chips, not NVIDIA GPUs. It holds the top spot on SWE-Bench Pro for real-world software engineering tasks. MIT licensed. Runs in FP8 quantized mode on supported hardware.

This release proves that frontier-class open source models can be trained without the NVIDIA ecosystem.

Llama 4 Scout and Maverick (Meta, April 5)

Scout is a 109B MoE model (17B active) with a 10M token context window, the longest natively supported window of any open model. Maverick is 400B MoE (17B active), optimized for per-token quality over context length, scoring 85.5% on MMLU.

Both use Meta's Community License, which is permissive for most use cases but not OSI-approved and restricts use above 700M monthly active users.

Mistral Small 4 (Mistral AI, April 7)

A 119B MoE model with only 6B active parameters. This means it runs through llama.cpp on a single consumer GPU. Apache 2.0 licensed. Mistral reports 40% faster inference than Mistral Small 3 with better quality across the board.

The 6B active parameter count makes this the most parameter-efficient model in the MoE lineup this month. If you need a capable model on a laptop with 16GB RAM, this is currently the strongest option.

Benchmark Comparison

| Model | AIME 2025 | LiveCodeBench | MMLU | SWE-Bench Pro | Context Window | |---|---|---|---|---|---| | Qwen 3 235B | 81.5% | 70.7% | - | - | 128K | | Gemma 4 31B Dense | 89.2% (2026) | 80.0% (v6) | - | - | 256K | | GLM-5.1 | - | - | - | #1 | - | | Llama 4 Maverick | - | - | 85.5% | - | 128K | | Llama 4 Scout | - | - | - | - | 10M | | Mistral Small 4 | - | - | - | - | - |

Benchmark caveat

Every lab publishes benchmarks where their model performs best. AIME tests math reasoning, LiveCodeBench tests code generation, SWE-Bench tests real engineering tasks, and MMLU tests broad knowledge. Pick the benchmark that matches your actual workload rather than the one with the highest number.

Agent Frameworks

Goose (Block / Linux Foundation, April 5)

Block (formerly Square) donated its internal AI coding agent to the Linux Foundation. Written in Rust, Goose runs fully locally and connects to external tools through MCP. It supports multiple LLM backends, both local and API-based.

Goose is designed around "extensions" (MCP servers) rather than plugins, meaning tool integrations follow a standard protocol instead of framework-specific APIs. This is the first major agent framework under the Linux Foundation umbrella.

Google Agent Development Kit (April 9)

An open source framework for building multi-agent systems. ADK supports agent-to-agent communication via the A2A (Agent-to-Agent) protocol, works with any LLM (not only Gemini), and includes built-in evaluation tools.

ADK is notable because Google is explicitly supporting interoperability: agents built with ADK can communicate with agents built on other frameworks through A2A.

OpenAI Agents SDK (April 9)

A Python framework for building agentic workflows with tool use, handoffs between agents, and guardrails. Model-agnostic in principle, though optimized for OpenAI models. The SDK uses three core primitives: agents, handoffs, and guardrails.

Claw Code (April 5)

A community-driven terminal coding agent that collected 72,000 GitHub stars in 48 hours. Claw Code supports MCP for tool use and works with multiple model providers. The rapid adoption reflects strong demand for open source alternatives to proprietary coding agents.

Inference and Serving Infrastructure

The model releases above only matter if you can actually run them. Three infrastructure projects shipped significant updates this month:

| Engine | What Shipped | Impact | |---|---|---| | vLLM 0.8 | Continuous batching improvements | 2x throughput on Llama 4 MoE serving | | llama.cpp | Gemma 4 + Qwen 3 support | Day-one GGUF quantization for every new model | | Ollama | Model library updates | Pull-and-run support for all models listed above |

For running models locally on macOS:

# Pull and run Qwen 3 8B
ollama pull qwen3:8b
ollama run qwen3:8b "Summarize the MCP protocol in two sentences"

# Or Gemma 4 edge
ollama pull gemma4:e4b
ollama run gemma4:e4b "What is the A2A protocol?"

All sub-10B models run comfortably on an M1 MacBook with 16GB RAM. The 30B+ models need 32GB or more, or 4-bit quantization.

Hardware Requirements

| Model | Min. RAM (Quantized) | Min. RAM (Full) | Runs on Laptop? | |---|---|---|---| | Gemma 4 E2B | 2GB | 4GB | Yes (even Raspberry Pi) | | Gemma 4 E4B | 4GB | 8GB | Yes | | Qwen 3 4B | 4GB | 8GB | Yes | | Qwen 3 8B | 6GB | 16GB | Yes | | Mistral Small 4 (6B active) | 8GB | 16GB | Yes | | Qwen 3 32B | 20GB | 64GB | 32GB+ M-series Mac | | Gemma 4 31B Dense | 20GB | 64GB | 32GB+ M-series Mac | | Llama 4 Scout (109B) | 64GB | 200GB+ | Server only | | Qwen 3 235B | 128GB+ | 400GB+ | Server only | | GLM-5.1 (744B) | 256GB+ | 1TB+ | Server/cluster only |

Developer Tools and IDE Integrations

Continue.dev 1.0

A fully open source IDE extension for AI-assisted coding with support for local models. The 1.0 release marks the project's graduation from beta, offering autocomplete, chat, and inline editing powered by any model backend.

Claude Code Extensions

Claude Code now supports custom skills, hooks, and persistent memory across sessions. The extension ecosystem through MCP servers covers file systems, databases, APIs, and browser automation.

The MCP Ecosystem

The Model Context Protocol now has official SDKs in Python, TypeScript, Java, C#, Go, Swift, and Kotlin. The community has built over 2,000 MCP servers on GitHub as of April 2026. MCP is becoming the standard interface between AI agents and external tools.

Recommended Stacks for Local AI

| Use Case | Model | Inference | Framework | |---|---|---|---| | General assistant (lightweight) | Qwen 3 8B or Gemma 4 E4B | Ollama | Goose | | Coding agent | Qwen 3 32B or GLM-5.1 | vLLM or Ollama | Claude Code or Claw Code | | Edge / mobile deployment | Gemma 4 E2B | ONNX Runtime | Custom | | Multi-agent system | Any model via API | vLLM (production) | Google ADK | | Long-context processing | Llama 4 Scout (10M tokens) | vLLM | OpenAI Agents SDK |

Common Mistakes to Avoid

Confusing "open source" with "open weights." Llama 4 uses Meta's Community License, which restricts commercial use above 700M monthly active users. Qwen 3, Gemma 4, and Mistral Small 4 use Apache 2.0 with no such restriction. Always check the actual license text.

Confusing active parameters with total parameters. A 400B MoE model still loads the full 400B into memory even though only 17B are active per token. "17B active" does not mean "needs 17B of RAM."

Trusting benchmarks without testing on your data. Benchmark performance and real-world performance diverge as tasks become more domain-specific. Run your own evaluations before choosing a model for production.

Skipping quantization testing. Running a 235B model at 4-bit quantization produces different outputs than the full-precision version. Test quantized responses against the reference before committing to a compressed deployment.

What These Releases Mean Together

The common thread across all April 2026 open source AI releases is that running capable AI locally is no longer a compromise. MoE architectures bring large-model quality to consumer hardware. Edge models fit on single-board computers. Agent frameworks designed for local execution are now under foundation governance. And MCP gives the entire stack a standard integration protocol.

For developers building AI-powered applications, the practical question has shifted from "can I use open source?" to "which combination of open source tools fits my specific use case?"

Fazm is an open source macOS AI agent that runs locally on your machine. Open source on GitHub.