Open Source AI Projects: Releases and Updates in April 2026

Matthew Diakonov··12 min read

Open Source AI Projects: Releases and Updates in April 2026

Open source AI moves fast, and April 2026 has been no exception. Beyond the headline model drops, dozens of projects shipped point releases, backported fixes, added new features, and adjusted their roadmaps. This post tracks both the initial releases and the ongoing updates across models, agent frameworks, inference engines, and developer tools throughout the month.

Release and Update Timeline

The pace of activity this month has made it hard to keep up. Here is a chronological view of the most significant events.

| Date | Project | Type | What Changed | |---|---|---|---| | Apr 1 | vLLM 0.8.1 | Patch | Fixed FP8 quantization regression on A100, added Gemma 4 MoE support | | Apr 2 | Ollama 0.6.2 | Minor | Added Qwen 3 and Gemma 4 model manifests, ~15% faster cold start | | Apr 3 | Qwen 3 | Release | Apache 2.0 MoE family (0.6B to 235B), hybrid thinking mode | | Apr 4 | llama.cpp b5120 | Update | Day-one Qwen 3 GGUF support, IQ2_XXS quantization for 235B variant | | Apr 5 | OpenAI Agents SDK 0.4 | Minor | Added MCP tool-use protocol, streaming handoffs between agents | | Apr 7 | Gemma 4 31B Dense | Release | Google's Apache 2.0 dense model, 128K context, fits on one H100 | | Apr 7 | Gemma 4 26B MoE | Release | Mixture-of-experts variant, same license and context window | | Apr 8 | GLM-5.1 | Release | Zhipu's 744B MoE (40B active), MIT license, beat proprietary models on SWE-Bench Pro | | Apr 8 | Continue.dev 1.0 | Major | Stable release with local model backends, context providers, autocomplete | | Apr 9 | DeepSeek-V3.2 | Release | Frontier reasoning model with native tool-use, 128K context | | Apr 9 | ADK (Google) | Release | Agent Development Kit, open source orchestration for multi-agent systems | | Apr 10 | Goose 1.2 | Update | Linux Foundation agent added MCP server discovery, local-first execution | | Apr 10 | MiniMax M2.7 | Release | Self-evolving training, 3x faster inference than predecessor | | Apr 11 | vLLM 0.8.2 | Patch | GLM-5.1 serving support, chunked prefill for 200K+ context |

April 2026: Open Source AI Update TimelineApr 1Apr 4Apr 7Apr 9Apr 11vLLM 0.8.1Ollama 0.6.2Qwen 3llama.cppAgents SDKGemma 4GLM-5.1DeepSeek-V3.2ADKGoose 1.2vLLM 0.8.2New ReleaseUpdate / PatchUpdate Velocity by CategoryModels: 8 releases + updatesInference: 5 updatesAgents: 4 releasesDev Tools: 3 updates

Foundation Model Updates

The model releases get the headlines, but the follow-up updates matter just as much for practitioners who need to actually run these models in production.

Qwen 3 Family

Alibaba released the Qwen 3 MoE family on April 3 under Apache 2.0. The lineup spans from 0.6B to 235B parameters. The standout feature is hybrid thinking mode, where the model can switch between fast generation and step-by-step reasoning within a single conversation.

Post-launch updates during the first week:

  • Apr 4: llama.cpp added GGUF conversion scripts and IQ2_XXS quantization, letting the 235B model run on consumer hardware with 48GB VRAM
  • Apr 5: Ollama shipped Qwen 3 manifests across all sizes
  • Apr 6: vLLM merged a PR for tensor-parallel Qwen 3 serving with FP8

Gemma 4

Google released two variants on April 7: a 31B dense model and a 26B MoE model, both under Apache 2.0 with 128K context. The dense variant fits on a single H100 and matches models 20x its size on several benchmarks.

Updates since release:

  • Apr 8: llama.cpp added day-one GGUF support with Q4_K_M and Q5_K_M presets
  • Apr 9: ExLlamaV3 shipped 4-bit EXL2 quantization with reported ~40 tok/s on RTX 4090
  • Apr 10: Hugging Face Transformers merged Gemma 4 support in a point release

GLM-5.1

Zhipu AI's GLM-5.1 dropped on April 8: 744B total parameters with 40B active (MoE), MIT licensed, 200K context. It scored higher than several proprietary models on SWE-Bench Pro, which measures real-world software engineering tasks.

Tip

GLM-5.1 uses the MIT license, which is more permissive than most open source model licenses. You can use it commercially, modify it, and distribute it with no attribution requirement in the binary.

DeepSeek-V3.2 and MiniMax M2.7

DeepSeek-V3.2 launched April 9 with native tool-use support and 128K context. MiniMax M2.7 followed on April 10 with a self-evolving training approach that claims 3x inference speedup over M2.5.

Both models had inference engine support within 48 hours of release, following the pattern we have seen all month: the inference toolchain now moves faster than the model labs.

Agent Framework Updates

April saw several agent frameworks either launch or ship major updates. The common thread: MCP (Model Context Protocol) adoption and local-first execution.

Google ADK

Google open-sourced the Agent Development Kit on April 9. ADK provides multi-agent orchestration with built-in tool calling, memory management, and structured output. It ships with connectors for Vertex AI, Gemini, and any OpenAI-compatible API.

OpenAI Agents SDK 0.4

The Agents SDK added MCP tool-use support and streaming handoffs between agents in version 0.4 (April 5). This means agents built with the OpenAI SDK can now consume MCP servers, which is the same protocol that Claude, Cursor, and other tools use.

Goose 1.2 (Linux Foundation)

Goose, the Linux Foundation's open source AI agent, shipped version 1.2 on April 10 with automatic MCP server discovery and improved local-first execution. The update reduced setup friction significantly: you point Goose at a project directory and it auto-detects available MCP servers.

| Framework | Version | MCP Support | Local Models | Multi-Agent | License | |---|---|---|---|---|---| | Google ADK | 1.0 | Yes | Via connectors | Yes (orchestrated) | Apache 2.0 | | OpenAI Agents SDK | 0.4 | Yes (new) | Via compatible API | Yes (handoffs) | MIT | | Goose | 1.2 | Yes (auto-discover) | Yes (native) | Planned | Apache 2.0 | | LangGraph | 0.3.x | Via adapters | Yes | Yes (graph-based) | MIT |

Inference Engine Updates

The inference layer had to keep up with the model releases, and it did. Both vLLM and llama.cpp shipped multiple updates within the first ten days of April.

vLLM

vLLM shipped two patch releases:

  • 0.8.1 (Apr 1): Fixed an FP8 quantization regression on A100 GPUs that was causing ~10% throughput loss, added Gemma 4 MoE tensor-parallel support
  • 0.8.2 (Apr 11): Added GLM-5.1 serving support and chunked prefill for contexts over 200K tokens

llama.cpp

The llama.cpp project shipped build b5120 through b5135 this month, averaging more than one update per day. Key changes:

  • Day-one GGUF support for every major model release (Qwen 3, Gemma 4, GLM-5.1)
  • IQ2_XXS quantization for running 200B+ models on consumer hardware
  • Flash attention v2 integration for ~30% speedup on long contexts

Note

If you are running llama.cpp from source, pull and rebuild frequently this month. The project is merging model support PRs within hours of new releases, but these changes sometimes require rebuilding from scratch rather than incremental compilation.

Ollama

Ollama 0.6.2 shipped on April 2 with updated model manifests for Qwen 3 and Gemma 4. Cold start time improved by approximately 15%, which matters for agent workflows that spin up model instances on demand.

Developer Tools Updates

Continue.dev 1.0

Continue.dev reached its stable 1.0 release on April 8. The IDE extension (VS Code and JetBrains) now ships with:

  • Local model backends (Ollama, llama.cpp, LM Studio)
  • Context providers that pull from files, docs, and web
  • Tab autocomplete with configurable models
  • MCP tool integration for agent-style workflows

Claude Code Extensions

Claude Code shipped several updates to its extension ecosystem this month, including improved MCP server management, background agent support, and worktree isolation for parallel tasks.

Common Pitfalls When Tracking These Updates

  • Quantization compatibility: Not every quantization format works with every model architecture on day one. IQ2 quantizations for MoE models (Qwen 3 235B, GLM-5.1) may produce slightly different quality than for dense models. Always benchmark on your specific use case before deploying.

  • License confusion: "Open source" means different things to different labs. Qwen 3 and Gemma 4 use Apache 2.0 (truly permissive). GLM-5.1 uses MIT (also permissive). DeepSeek-V3.2 and MiniMax M2.7 use custom licenses labeled "open" that have usage restrictions. Read the actual license file before building a product on top of a model.

  • Context window vs. effective context: A model advertising 200K context does not mean it performs equally well at 200K tokens. Real-world testing consistently shows quality degradation in the upper 30-40% of the advertised window. For production use, plan for ~60-70% of the listed maximum.

  • Update cadence and stability: llama.cpp's rapid update pace this month means you might hit regressions if you pull main at the wrong time. Pin to a specific build tag for production workloads and update on your own schedule.

How to Stay on Top of Updates

If you work with open source AI models and tools, here is a practical approach to tracking the update stream:

# Watch GitHub releases for your key projects
gh api repos/vllm-project/vllm/releases --jq '.[0:3] | .[] | "\(.tag_name) - \(.published_at[:10]) - \(.name)"'

# Check Hugging Face for new model uploads
curl -s "https://huggingface.co/api/models?sort=lastModified&limit=10&search=april-2026" | python3 -c "
import sys, json
for m in json.load(sys.stdin):
    print(f\"{m['id']:50s} {m.get('lastModified','')[:10]}\")
"

# Monitor llama.cpp build tags
git -C /path/to/llama.cpp log --oneline -10 --tags

What to Watch for the Rest of April

Several projects have announced or hinted at upcoming releases:

  • Llama 4 Behemoth: Meta's largest open model in the Llama 4 family. Expected to be a 2T+ MoE model
  • Mistral Medium 4: An update to Mistral's mid-tier open model
  • vLLM 0.9: Major version with speculative decoding improvements and better multi-GPU scheduling
  • Kimi K2.5 open weights: Moonshot AI has indicated open weights are coming

The rest of the month will likely bring more inference engine patches as the community benchmarks and profiles the new models at scale.

Wrapping Up

April 2026 is shaping up to be a landmark month for open source AI. The pattern is clear: model labs release, the inference community adapts within hours, and agent frameworks wire everything together within days. Tracking both the initial releases and the subsequent updates gives you a realistic picture of when a model is actually usable in production, not just when it was announced.

Fazm is an open source AI agent for macOS. Check out the GitHub repo to try it yourself.

Related Posts