New AI Model Releases and Open Source Projects in April 2026
New AI Model Releases and Open Source Projects in April 2026
April 2026 has been one of the densest months for AI releases in recent memory. Multiple foundation models shipped from Meta, Alibaba, Google, Mistral, and Ai2, while GitHub and Hugging Face saw a wave of new open source tools, agent frameworks, and deployment stacks. This post covers both sides: the models themselves and the projects built around them.
AI Model Releases: What Shipped in April 2026
Seven major open source models launched in the first twelve days of April. The table below captures every production-relevant release with parameters, architecture, and licensing.
| Date | Model | Organization | Parameters (Total / Active) | Architecture | License | Key Strength | |---|---|---|---|---|---|---| | Apr 2 | Llama 4 Scout | Meta | 109B / 17B | MoE (16 experts) | Llama 4 Community | 10M token context window | | Apr 3 | OLMo 2 32B | Ai2 | 32B / 32B | Dense | Apache 2.0 | Fully open training data and code | | Apr 5 | Llama 4 Maverick | Meta | 400B / 17B | MoE (128 experts) | Llama 4 Community | Best multilingual MoE performance | | Apr 5 | Qwen 3 72B | Alibaba | 72B / 72B | Dense | Apache 2.0 | Top dense model on reasoning tasks | | Apr 8 | Qwen 3 MoE 235B | Alibaba | 235B / 22B | MoE | Apache 2.0 | Near-frontier at low active params | | Apr 8 | Codestral 2 | Mistral | 22B / 22B | Dense | Apache 2.0 | Code generation, fill-in-the-middle | | Apr 9 | Gemma 3n | Google | 4B effective / 2B footprint | Dense multimodal | Gemma License | Runs on-device (phone, tablet) |
Benchmark Snapshot
The MoE models (Llama 4 Scout, Maverick, Qwen 3 MoE 235B) are the headline story. They hit near-frontier benchmark scores while keeping active parameter counts low enough for multi-GPU consumer setups. Gemma 3n is the opposite extreme: a 2B-footprint multimodal model designed to run on phones.
Open Source Projects: Tools, Frameworks, and Deployment Stacks
The model releases triggered a parallel wave of tooling. These are the open source projects that gained the most traction on GitHub and Hugging Face during April 2026.
| Project | Platform | Stars / Downloads | Language | Purpose | |---|---|---|---|---| | google/adk-python | GitHub | 8,200+ stars | Python | Multi-agent orchestration framework | | meta-llama/llama-stack | GitHub | 6,400+ stars | Python | Unified Llama 4 deployment and inference | | openai/codex-cli | GitHub | 5,800+ stars | TypeScript | Terminal coding agent with sandboxed execution | | block/goose | GitHub | 4,900+ stars | Rust | Local-first AI agent with MCP support | | huggingface/smolagents | GitHub | 4,100+ stars | Python | Lightweight agent library with tool-use | | microsoft/markitdown | GitHub | 3,600+ stars | Python | Any document to Markdown for LLM ingestion | | unsloth/unsloth | GitHub | 2,100+ monthly gain | Python | 2x faster fine-tuning, 70% less memory | | qwen-ai/qwen3-coder | GitHub | 2,800+ stars | Python | Code-specialized Qwen 3 with 128K context | | mistralai/codestral-2 | GitHub | 2,400+ stars | Python | Apache 2.0 code generation model code | | Llama-4-Scout-GGUF | Hugging Face | 180K+ downloads | GGUF | Quantized Llama 4 Scout for local inference |
How the Models and Projects Connect
The pattern is clear: every major model release now ships alongside an official deployment project, and the community responds with quantization packs, fine-tuning integrations, and agent wrappers within days.
Highlights by Category
Best for Coding
Codestral 2 from Mistral is the standout code model this month. It ships under Apache 2.0 (a shift from Mistral's previous restrictive licensing for code models), supports fill-in-the-middle completions, and benchmarks above GPT-4o on HumanEval and MBPP. OpenAI's Codex CLI, while not a model itself, pairs with any API-accessible model to create a sandboxed terminal coding agent.
Best for Running Locally
Gemma 3n fits in 4 GB of VRAM and handles text, images, and audio. For heavier workloads, Llama 4 Scout's MoE architecture means only 17B parameters are active at inference time despite 109B total, making it feasible on a single 48 GB GPU with quantization. Unsloth's updated fine-tuning support for these models means you can adapt them to your use case without renting a cluster.
Best for Agent Workflows
Google's Agent Development Kit (adk-python) is the most complete multi-agent framework to ship this month. Block's Goose takes a different approach: local-first, with native MCP (Model Context Protocol) support for tool integration. Hugging Face's smolagents is the lightweight option if you want tool-use without the overhead of a full orchestration framework.
Best for Data and Documents
Microsoft's markitdown converts PDFs, DOCX, PPTX, HTML, and other formats into clean Markdown suitable for LLM context windows. It gained 3,600+ stars in its first week because it solves a problem every RAG pipeline hits: getting messy documents into a format models can actually use well.
Quick-Start: Running a New Model Locally
If you want to try the most impactful model from this month right now, here is the fastest path to running Llama 4 Scout locally:
# Install ollama if you haven't
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the quantized Scout model (fits in 24GB VRAM)
ollama pull llama4-scout:q4_K_M
# Run it
ollama run llama4-scout:q4_K_M
For Qwen 3 72B using vLLM:
pip install vllm
vllm serve Qwen/Qwen3-72B-AWQ --tensor-parallel-size 2
For Gemma 3n on a laptop:
pip install mlx-lm
mlx_lm.generate --model google/gemma-3n-E4B --prompt "Explain MoE architectures"
What This Means for Builders
Three trends stand out from the April 2026 releases taken together:
-
MoE is the default architecture for large models. Four of the seven models this month use mixture-of-experts. The active parameter count matters more than total count for practical deployment.
-
Apache 2.0 is winning. Qwen 3, Codestral 2, and OLMo 2 all ship under Apache 2.0. Mistral's shift is especially notable since their previous code model had commercial restrictions.
-
The gap between "release" and "usable" is shrinking. When Llama 4 Scout launched, quantized GGUF packs appeared on Hugging Face within hours and llama-stack provided an official deployment path on day one. The old pattern of waiting weeks for community tooling to catch up is largely gone.
If you are building AI-powered desktop automation workflows and want to test these models locally, Fazm provides a local-first agent framework that works with any of these models through standard API interfaces.
Further Reading
For deeper coverage of specific topics from this month:
- The EU AI Act's open source exemption guidance (published April 10) clarifies which models qualify for lighter regulatory treatment
- DeepSeek V3's published paper includes training details that most labs still keep proprietary
- Google's updated Gemma 3 commercial license (April 11) removes the previous user-count restriction
This post will be updated as additional releases land through the rest of April 2026.