Open Source AI Projects: Releases and Updates in April 2026

Matthew Diakonov·April 11, 2026·12 min read

open-source ai-projects releases updates april-2026 llm ai-agents

Open source AI moves fast, and April 2026 has been no exception. Beyond the headline model drops, dozens of projects shipped point releases, backported fixes, added new features, and adjusted their roadmaps. This post tracks both the initial releases and the ongoing updates across models, agent frameworks, inference engines, and developer tools throughout the month.

Release and Update Timeline

The pace of activity this month has made it hard to keep up. Here is a chronological view of the most significant events.

Date	Project	Type	What Changed
Apr 1	vLLM 0.8.1	Patch	Fixed FP8 quantization regression on A100, added Gemma 4 MoE support
Apr 2	Ollama 0.6.2	Minor	Added Qwen 3 and Gemma 4 model manifests, ~15% faster cold start
Apr 3	Qwen 3	Release	Apache 2.0 MoE family (0.6B to 235B), hybrid thinking mode
Apr 4	llama.cpp b5120	Update	Day-one Qwen 3 GGUF support, IQ2_XXS quantization for 235B variant
Apr 5	OpenAI Agents SDK 0.4	Minor	Added MCP tool-use protocol, streaming handoffs between agents
Apr 7	Gemma 4 31B Dense	Release	Google's Apache 2.0 dense model, 128K context, fits on one H100
Apr 7	Gemma 4 26B MoE	Release	Mixture-of-experts variant, same license and context window
Apr 8	GLM-5.1	Release	Zhipu's 744B MoE (40B active), MIT license, beat proprietary models on SWE-Bench Pro
Apr 8	Continue.dev 1.0	Major	Stable release with local model backends, context providers, autocomplete
Apr 9	DeepSeek-V3.2	Release	Frontier reasoning model with native tool-use, 128K context
Apr 9	ADK (Google)	Release	Agent Development Kit, open source orchestration for multi-agent systems
Apr 10	Goose 1.2	Update	Linux Foundation agent added MCP server discovery, local-first execution
Apr 10	MiniMax M2.7	Release	Self-evolving training, 3x faster inference than predecessor
Apr 11	vLLM 0.8.2	Patch	GLM-5.1 serving support, chunked prefill for 200K+ context

Foundation Model Updates

The model releases get the headlines, but the follow-up updates matter just as much for practitioners who need to actually run these models in production.

Qwen 3 Family

Alibaba released the Qwen 3 MoE family on April 3 under Apache 2.0. The lineup spans from 0.6B to 235B parameters. The standout feature is hybrid thinking mode, where the model can switch between fast generation and step-by-step reasoning within a single conversation.

Post-launch updates during the first week:

Apr 4: llama.cpp added GGUF conversion scripts and IQ2_XXS quantization, letting the 235B model run on consumer hardware with 48GB VRAM
Apr 5: Ollama shipped Qwen 3 manifests across all sizes
Apr 6: vLLM merged a PR for tensor-parallel Qwen 3 serving with FP8

Gemma 4

Google released two variants on April 7: a 31B dense model and a 26B MoE model, both under Apache 2.0 with 128K context. The dense variant fits on a single H100 and matches models 20x its size on several benchmarks.

Updates since release:

Apr 8: llama.cpp added day-one GGUF support with Q4_K_M and Q5_K_M presets
Apr 9: ExLlamaV3 shipped 4-bit EXL2 quantization with reported ~40 tok/s on RTX 4090
Apr 10: Hugging Face Transformers merged Gemma 4 support in a point release

GLM-5.1

Zhipu AI's GLM-5.1 dropped on April 8: 744B total parameters with 40B active (MoE), MIT licensed, 200K context. It scored higher than several proprietary models on SWE-Bench Pro, which measures real-world software engineering tasks.

Tip

GLM-5.1 uses the MIT license, which is more permissive than most open source model licenses. You can use it commercially, modify it, and distribute it with no attribution requirement in the binary.

DeepSeek-V3.2 and MiniMax M2.7

DeepSeek-V3.2 launched April 9 with native tool-use support and 128K context. MiniMax M2.7 followed on April 10 with a self-evolving training approach that claims 3x inference speedup over M2.5.

Both models had inference engine support within 48 hours of release, following the pattern we have seen all month: the inference toolchain now moves faster than the model labs.

Agent Framework Updates

April saw several agent frameworks either launch or ship major updates. The common thread: MCP (Model Context Protocol) adoption and local-first execution.

Google ADK

Google open-sourced the Agent Development Kit on April 9. ADK provides multi-agent orchestration with built-in tool calling, memory management, and structured output. It ships with connectors for Vertex AI, Gemini, and any OpenAI-compatible API.

OpenAI Agents SDK 0.4

The Agents SDK added MCP tool-use support and streaming handoffs between agents in version 0.4 (April 5). This means agents built with the OpenAI SDK can now consume MCP servers, which is the same protocol that Claude, Cursor, and other tools use.

Goose 1.2 (Linux Foundation)

Goose, the Linux Foundation's open source AI agent, shipped version 1.2 on April 10 with automatic MCP server discovery and improved local-first execution. The update reduced setup friction significantly: you point Goose at a project directory and it auto-detects available MCP servers.

Framework	Version	MCP Support	Local Models	Multi-Agent	License
Google ADK	1.0	Yes	Via connectors	Yes (orchestrated)	Apache 2.0
OpenAI Agents SDK	0.4	Yes (new)	Via compatible API	Yes (handoffs)	MIT
Goose	1.2	Yes (auto-discover)	Yes (native)	Planned	Apache 2.0
LangGraph	0.3.x	Via adapters	Yes	Yes (graph-based)	MIT

Inference Engine Updates

The inference layer had to keep up with the model releases, and it did. Both vLLM and llama.cpp shipped multiple updates within the first ten days of April.

vLLM

vLLM shipped two patch releases:

0.8.1 (Apr 1): Fixed an FP8 quantization regression on A100 GPUs that was causing ~10% throughput loss, added Gemma 4 MoE tensor-parallel support
0.8.2 (Apr 11): Added GLM-5.1 serving support and chunked prefill for contexts over 200K tokens

llama.cpp

The llama.cpp project shipped build b5120 through b5135 this month, averaging more than one update per day. Key changes:

Day-one GGUF support for every major model release (Qwen 3, Gemma 4, GLM-5.1)
IQ2_XXS quantization for running 200B+ models on consumer hardware
Flash attention v2 integration for ~30% speedup on long contexts

Note

If you are running llama.cpp from source, pull and rebuild frequently this month. The project is merging model support PRs within hours of new releases, but these changes sometimes require rebuilding from scratch rather than incremental compilation.

Ollama

Ollama 0.6.2 shipped on April 2 with updated model manifests for Qwen 3 and Gemma 4. Cold start time improved by approximately 15%, which matters for agent workflows that spin up model instances on demand.

Developer Tools Updates

Continue.dev 1.0

Continue.dev reached its stable 1.0 release on April 8. The IDE extension (VS Code and JetBrains) now ships with:

Local model backends (Ollama, llama.cpp, LM Studio)
Context providers that pull from files, docs, and web
Tab autocomplete with configurable models
MCP tool integration for agent-style workflows

Claude Code Extensions

Claude Code shipped several updates to its extension ecosystem this month, including improved MCP server management, background agent support, and worktree isolation for parallel tasks.

Common Pitfalls When Tracking These Updates

Quantization compatibility: Not every quantization format works with every model architecture on day one. IQ2 quantizations for MoE models (Qwen 3 235B, GLM-5.1) may produce slightly different quality than for dense models. Always benchmark on your specific use case before deploying.
License confusion: "Open source" means different things to different labs. Qwen 3 and Gemma 4 use Apache 2.0 (truly permissive). GLM-5.1 uses MIT (also permissive). DeepSeek-V3.2 and MiniMax M2.7 use custom licenses labeled "open" that have usage restrictions. Read the actual license file before building a product on top of a model.
Context window vs. effective context: A model advertising 200K context does not mean it performs equally well at 200K tokens. Real-world testing consistently shows quality degradation in the upper 30-40% of the advertised window. For production use, plan for ~60-70% of the listed maximum.
Update cadence and stability: llama.cpp's rapid update pace this month means you might hit regressions if you pull main at the wrong time. Pin to a specific build tag for production workloads and update on your own schedule.

How to Stay on Top of Updates

If you work with open source AI models and tools, here is a practical approach to tracking the update stream:

# Watch GitHub releases for your key projects
gh api repos/vllm-project/vllm/releases --jq '.[0:3] | .[] | "\(.tag_name) - \(.published_at[:10]) - \(.name)"'

# Check Hugging Face for new model uploads
curl -s "https://huggingface.co/api/models?sort=lastModified&limit=10&search=april-2026" | python3 -c "
import sys, json
for m in json.load(sys.stdin):
    print(f\"{m['id']:50s} {m.get('lastModified','')[:10]}\")
"

# Monitor llama.cpp build tags
git -C /path/to/llama.cpp log --oneline -10 --tags

What to Watch for the Rest of April

Several projects have announced or hinted at upcoming releases:

Llama 4 Behemoth: Meta's largest open model in the Llama 4 family. Expected to be a 2T+ MoE model
Mistral Medium 4: An update to Mistral's mid-tier open model
vLLM 0.9: Major version with speculative decoding improvements and better multi-GPU scheduling
Kimi K2.5 open weights: Moonshot AI has indicated open weights are coming

The rest of the month will likely bring more inference engine patches as the community benchmarks and profiles the new models at scale.

Wrapping Up

April 2026 is shaping up to be a landmark month for open source AI. The pattern is clear: model labs release, the inference community adapts within hours, and agent frameworks wire everything together within days. Tracking both the initial releases and the subsequent updates gives you a realistic picture of when a model is actually usable in production, not just when it was announced.

Fazm is an open source AI agent for macOS. Check out the GitHub repo to try it yourself.

Open Source AI Projects: Releases and Updates in April 2026

Release and Update Timeline

Foundation Model Updates

Qwen 3 Family

Gemma 4

GLM-5.1

DeepSeek-V3.2 and MiniMax M2.7

Agent Framework Updates

Google ADK

OpenAI Agents SDK 0.4

Goose 1.2 (Linux Foundation)

Inference Engine Updates

vLLM

llama.cpp

Ollama

Developer Tools Updates

Continue.dev 1.0

Claude Code Extensions

Common Pitfalls When Tracking These Updates

How to Stay on Top of Updates

What to Watch for the Rest of April

Wrapping Up

Related Posts

Best Open Source AI Computer Use Agent in 2026

Best Open Source Computer Use Agent for Windows in 2026

Best Open Source Computer Use Agents in 2026 for Local Desktop Control

Comments ()

Release and Update Timeline

Foundation Model Updates

Qwen 3 Family

Gemma 4

GLM-5.1

DeepSeek-V3.2 and MiniMax M2.7

Agent Framework Updates

Google ADK

OpenAI Agents SDK 0.4

Goose 1.2 (Linux Foundation)

Inference Engine Updates

vLLM

llama.cpp

Ollama

Developer Tools Updates

Continue.dev 1.0

Claude Code Extensions

Common Pitfalls When Tracking These Updates

How to Stay on Top of Updates

What to Watch for the Rest of April

Wrapping Up

Related Posts

Best Open Source AI Computer Use Agent in 2026

Best Open Source Computer Use Agent for Windows in 2026

Best Open Source Computer Use Agents in 2026 for Local Desktop Control

Comments (••)

Comments ()