Open Source AI Projects Releases in April 2026: The Complete Tracker

Matthew Diakonov··14 min read

Open Source AI Projects Releases: April 2026

April 2026 has been the most active month for open source AI releases since the field took off. Models, agent frameworks, developer tooling, and inference engines are all shipping at the same time, from labs of every size and from independent contributors. This post tracks every significant open source AI project release this month, organized by category and updated as new launches land.

Monthly Overview

The releases this month fall into four buckets: foundation models, agent frameworks, inference and serving infrastructure, and developer tools. Several of these overlap (a new model ships alongside inference optimizations for it), so we have grouped them by primary function.

| Category | Key Releases | Trend | |---|---|---| | Foundation Models | Qwen 3, Gemma 4, GLM-5.1, Llama 4 Scout/Maverick, Mistral Small 4 | MoE architectures dominate; Apache 2.0 is the new default | | Agent Frameworks | Goose (Linux Foundation), ADK (Google), OpenAI Agents SDK, Claw Code | Local-first execution, MCP protocol adoption | | Inference Engines | vLLM 0.8, llama.cpp (Gemma 4 day-one), Ollama updates | FP8/FP4 quantization, 10M+ context serving | | Developer Tools | Claude Code extensions, Cursor agent mode updates, Continue.dev 1.0 | IDE-native agent loops with local model backends |

April 2026 Open Source AI Release TimelineWeek 1Week 2Week 3+Apr 2Gemma 4Apr 3GLM-5.1Apr 5Llama 4Apr 5GooseApr 7Mistral Sm 4Apr 9Google ADKApr 9Qwen 3Legend:ModelsMoE / HybridFrameworks / ToolsMore releases expected through April 30

Foundation Models

Qwen 3 (Alibaba Cloud)

Alibaba released the Qwen 3 family on April 9, spanning eight model sizes from 0.6B to 235B. The headline feature is "hybrid thinking," a mode that lets the same model switch between extended chain-of-thought reasoning and fast direct responses. You control this with a /think and /no_think toggle in the system prompt.

The numbers worth knowing:

  • 235B MoE (22B active parameters) tops multiple benchmarks including AIME 2025 at 81.5% and LiveCodeBench at 70.7%
  • 32B dense model competes with much larger competitors on math and coding tasks
  • 0.6B fits on mobile devices and edge hardware
  • All sizes use Apache 2.0 licensing
  • MCP (Model Context Protocol) support is built in for tool use and agent workflows

For local use, the 4B and 8B models run well on consumer hardware through Ollama. The 32B model needs ~20GB of VRAM with 4-bit quantization.

Gemma 4 (Google DeepMind)

Released April 2. Four model sizes (E2B, E4B, 26B MoE, 31B Dense). The 31B dense model hits 89.2% on AIME 2026 and 80.0% on LiveCodeBench v6. Full Apache 2.0 license, 256K context window across all sizes, multimodal input (text plus image on all, audio on edge models), and support for 140+ languages.

The E2B and E4B edge models run on a Raspberry Pi. Day-one support for Hugging Face, GGUF, ONNX, vLLM, and Ollama.

GLM-5.1 (Zhipu AI / Z.ai)

Released April 3. A 744B MoE model (40B active parameters) trained entirely on Huawei Ascend chips, not NVIDIA. Hits #1 on SWE-Bench Pro, which tests real-world software engineering tasks. MIT licensed. Runs in FP8 quantized mode on supported hardware.

The significance here is the hardware story: GLM-5.1 proves that frontier-class open models can be trained outside the NVIDIA ecosystem.

Llama 4 Scout and Maverick (Meta)

Released April 5. Scout is a 109B MoE model (17B active) with a 10M token context window. Maverick is 400B MoE (17B active), tuned for quality over length. Both use Meta's Community License, which is permissive for most uses but not OSI-approved.

Scout's 10M token context is the longest natively supported window of any open model. Maverick trades that for better per-token quality, scoring 85.5% on MMLU.

Mistral Small 4 (Mistral AI)

Released April 7. A 119B MoE model with only 6B active parameters, which means it runs through llama.cpp on a single consumer GPU. Apache 2.0 licensed. Mistral claims 40% faster inference than Mistral Small 3 with better quality across the board.

The 6B active parameter count makes this the most efficient model in the MoE lineup this month. If you need a capable model that can run on a laptop with 16GB RAM, this is the current best option.

Agent Frameworks and Tools

Goose (Block / Linux Foundation)

Block (formerly Square) donated its internal AI coding agent "Goose" to the Linux Foundation on April 5. Written in Rust, Goose is an open agent framework that runs fully locally and connects to external tools through MCP. It supports multiple LLM backends (both local and API-based).

Key differentiator: Goose is designed around "extensions" (MCP servers) rather than plugins, meaning tool integrations follow a standard protocol instead of framework-specific APIs. This is the first major agent framework under the Linux Foundation umbrella.

Google Agent Development Kit (ADK)

Google released ADK on April 9, an open source framework for building multi-agent systems. It supports agent-to-agent communication via the A2A (Agent-to-Agent) protocol, works with any LLM (not just Gemini), and includes built-in evaluation tools.

ADK is notable because Google is explicitly supporting interoperability: agents built with ADK can talk to agents built with other frameworks through A2A.

OpenAI Agents SDK

OpenAI's open source Agents SDK (Python) provides a minimal framework for building agentic workflows with tool use, handoffs between agents, and guardrails. It is model-agnostic in principle, though optimized for OpenAI models. The SDK emphasizes simplicity: agents, handoffs, and guardrails are the three core primitives.

Claw Code

A community-driven coding agent that went viral in early April, collecting 72,000 GitHub stars in 48 hours. Claw Code runs in the terminal, supports MCP for tool use, and works with multiple model providers. Its rapid adoption signals strong demand for open source alternatives to proprietary coding agents.

Note

MCP (Model Context Protocol) has emerged as the standard integration layer across nearly all agent frameworks released this month. If you are building tools for AI agents, implementing an MCP server is now the highest-leverage investment you can make.

Inference and Serving

The model releases above would be academic without inference infrastructure. Three projects shipped significant updates in April:

| Engine | Update | What Changed | |---|---|---| | vLLM 0.8.x | Continuous batching improvements | 2x throughput on Llama 4 MoE serving | | llama.cpp | Gemma 4 + Qwen 3 support | Day-one GGUF quantization for all new models | | Ollama | Model library updates | Pull-and-run support for every model above |

For developers running models locally on macOS, the typical workflow is:

# Pull a model
ollama pull qwen3:8b

# Or for Gemma 4 edge
ollama pull gemma4:e4b

# Test it
ollama run qwen3:8b "Explain MCP protocol in two sentences"

All of the sub-10B models listed above run comfortably on an M1 MacBook with 16GB RAM. The 30B+ models need 32GB or more (or quantization down to 4-bit).

Developer Tools

Coding Agents

The competitive landscape for open source coding agents has shifted dramatically this month:

Claude Code now supports custom skills, hooks, and persistent memory across sessions. The extension ecosystem (MCP servers) has grown to cover file systems, databases, APIs, and browser automation.
Goose (covered above) entered the coding agent space with Rust performance and Linux Foundation governance.
Continue.dev hit 1.0, offering a fully open source IDE extension for AI-assisted coding with support for local models.
Claw Code exploded in popularity as a terminal-native alternative with MCP support and multi-provider flexibility.

The MCP Ecosystem

The Model Context Protocol now has official SDKs in Python, TypeScript, Java, C#, Go, Swift, and Kotlin. The number of community-built MCP servers passed 2,000 on GitHub in April. This protocol is becoming the USB-C of AI tooling: one standard connector for everything.

What This Means for Local AI

The common thread across all April releases is that running capable AI locally is no longer a compromise. Between MoE models that need 6-17B active parameters, edge models that fit on a Raspberry Pi, and agent frameworks designed for local execution, the stack is complete.

If you are building a local-first AI application, the practical stack as of April 2026 looks like this:

| Layer | Recommended | Why | |---|---|---| | Foundation model (general) | Qwen 3 8B or Gemma 4 E4B | Best quality-per-GB for local inference | | Foundation model (coding) | Qwen 3 32B or GLM-5.1 | Top SWE-Bench scores | | Foundation model (edge) | Gemma 4 E2B | Runs on Raspberry Pi | | Inference engine | Ollama (simple) or vLLM (production) | Ollama for dev, vLLM for serving | | Agent framework | Goose or ADK | MCP-native, model-agnostic | | Tool integration | MCP servers | Standard protocol, largest ecosystem |

Common Pitfalls

  • License confusion: "Open source" and "open weights" are not the same. Llama 4 uses Meta's Community License, which restricts use above 700M monthly active users. Qwen 3 and Gemma 4 use Apache 2.0 with no such restriction. If licensing matters for your use case, check the actual license text, not the press release.

  • Benchmark cherry-picking: Every lab publishes the benchmarks where their model wins. AIME scores, MMLU, SWE-Bench, and LiveCodeBench test very different capabilities. Pick the benchmark that matches your actual workload, not the one with the highest number.

  • Quantization quality loss: Running a 235B model at 4-bit quantization on consumer hardware gives you a different model than the full-precision version. For critical tasks, test quantized outputs against the reference before committing to a compressed deployment.

  • MoE memory vs. active parameters: A 400B MoE model still needs the full 400B loaded in memory, even though only 17B are active per token. Do not confuse "17B active" with "needs 17B of RAM." You need space for the full model weights.

Warning

If you are evaluating these models for production use, test on your actual data before trusting any published benchmark. Benchmark performance and real-world performance diverge more as tasks get domain-specific.

Quick Reference: All April 2026 Open Source AI Releases

| Project | Type | Org | License | Release Date | Link | |---|---|---|---|---|---| | Gemma 4 | Model | Google DeepMind | Apache 2.0 | Apr 2 | Blog | | GLM-5.1 | Model | Zhipu AI | MIT | Apr 3 | GitHub | | Llama 4 Scout | Model | Meta | Meta Community | Apr 5 | llama.meta.com | | Llama 4 Maverick | Model | Meta | Meta Community | Apr 5 | llama.meta.com | | Goose | Agent Framework | Block / LF | Apache 2.0 | Apr 5 | GitHub | | Claw Code | Agent Framework | Community | Open Source | Apr 5 | GitHub | | Mistral Small 4 | Model | Mistral AI | Apache 2.0 | Apr 7 | mistral.ai | | Google ADK | Agent Framework | Google | Apache 2.0 | Apr 9 | GitHub | | Qwen 3 | Model | Alibaba Cloud | Apache 2.0 | Apr 9 | GitHub | | OpenAI Agents SDK | Agent Framework | OpenAI | MIT | Apr 2026 | GitHub |

This list will be updated as more releases land through the rest of April.

Wrapping Up

April 2026 marks the point where the open source AI stack became genuinely complete: foundation models that compete with proprietary alternatives, agent frameworks built for local execution, inference engines that run these models on consumer hardware, and a standard protocol (MCP) tying it all together. If you are building AI-powered software, the question is no longer "can I use open source?" but "which combination of open source tools fits my use case best?"

Fazm is an open source macOS AI agent that runs locally on your machine. Open source on GitHub.

Related Posts