New AI Model Releases and Open Source Projects in April 2026

Matthew Diakonov··10 min read

New AI Model Releases and Open Source Projects in April 2026

April 2026 has been one of the densest months for AI releases in recent memory. Multiple foundation models shipped from Meta, Alibaba, Google, Mistral, and Ai2, while GitHub and Hugging Face saw a wave of new open source tools, agent frameworks, and deployment stacks. This post covers both sides: the models themselves and the projects built around them.

AI Model Releases: What Shipped in April 2026

Seven major open source models launched in the first twelve days of April. The table below captures every production-relevant release with parameters, architecture, and licensing.

| Date | Model | Organization | Parameters (Total / Active) | Architecture | License | Key Strength | |---|---|---|---|---|---|---| | Apr 2 | Llama 4 Scout | Meta | 109B / 17B | MoE (16 experts) | Llama 4 Community | 10M token context window | | Apr 3 | OLMo 2 32B | Ai2 | 32B / 32B | Dense | Apache 2.0 | Fully open training data and code | | Apr 5 | Llama 4 Maverick | Meta | 400B / 17B | MoE (128 experts) | Llama 4 Community | Best multilingual MoE performance | | Apr 5 | Qwen 3 72B | Alibaba | 72B / 72B | Dense | Apache 2.0 | Top dense model on reasoning tasks | | Apr 8 | Qwen 3 MoE 235B | Alibaba | 235B / 22B | MoE | Apache 2.0 | Near-frontier at low active params | | Apr 8 | Codestral 2 | Mistral | 22B / 22B | Dense | Apache 2.0 | Code generation, fill-in-the-middle | | Apr 9 | Gemma 3n | Google | 4B effective / 2B footprint | Dense multimodal | Gemma License | Runs on-device (phone, tablet) |

Benchmark Snapshot

April 2026 Model Releases: MMLU-Pro ScoresQwen 3 MoE 235B81.5Qwen 3 72B79.8Llama 4 Maverick78.2Llama 4 Scout73.1Codestral 269.7OLMo 2 32B65.6Gemma 3n56.4Source: reported scores from official model cards and third-party evaluationsHardware Requirements (Minimum VRAM for FP16)Gemma 3n4 GBCodestral 244 GBQwen 3 72B144 GB

The MoE models (Llama 4 Scout, Maverick, Qwen 3 MoE 235B) are the headline story. They hit near-frontier benchmark scores while keeping active parameter counts low enough for multi-GPU consumer setups. Gemma 3n is the opposite extreme: a 2B-footprint multimodal model designed to run on phones.

Open Source Projects: Tools, Frameworks, and Deployment Stacks

The model releases triggered a parallel wave of tooling. These are the open source projects that gained the most traction on GitHub and Hugging Face during April 2026.

| Project | Platform | Stars / Downloads | Language | Purpose | |---|---|---|---|---| | google/adk-python | GitHub | 8,200+ stars | Python | Multi-agent orchestration framework | | meta-llama/llama-stack | GitHub | 6,400+ stars | Python | Unified Llama 4 deployment and inference | | openai/codex-cli | GitHub | 5,800+ stars | TypeScript | Terminal coding agent with sandboxed execution | | block/goose | GitHub | 4,900+ stars | Rust | Local-first AI agent with MCP support | | huggingface/smolagents | GitHub | 4,100+ stars | Python | Lightweight agent library with tool-use | | microsoft/markitdown | GitHub | 3,600+ stars | Python | Any document to Markdown for LLM ingestion | | unsloth/unsloth | GitHub | 2,100+ monthly gain | Python | 2x faster fine-tuning, 70% less memory | | qwen-ai/qwen3-coder | GitHub | 2,800+ stars | Python | Code-specialized Qwen 3 with 128K context | | mistralai/codestral-2 | GitHub | 2,400+ stars | Python | Apache 2.0 code generation model code | | Llama-4-Scout-GGUF | Hugging Face | 180K+ downloads | GGUF | Quantized Llama 4 Scout for local inference |

How the Models and Projects Connect

April 2026: Model Releases Feed Open Source EcosystemModel ReleasesLlama 4 Scout / MaverickQwen 3 72B / MoE 235BCodestral 2Gemma 3nOLMo 2 32BOpen Source Projectsllama-stack (deployment)unsloth (fine-tuning)qwen3-coder (code agent)adk-python (multi-agent)codex-cli (terminal agent)goose (local-first agent)smolagents (tool-use lib)Use CasesCoding assistantsOn-device inferenceMulti-agent systemsDocument processingFine-tuning / RLHFModel releases drive tooling adoption; tooling lowers the barrier to using new models

The pattern is clear: every major model release now ships alongside an official deployment project, and the community responds with quantization packs, fine-tuning integrations, and agent wrappers within days.

Highlights by Category

Best for Coding

Codestral 2 from Mistral is the standout code model this month. It ships under Apache 2.0 (a shift from Mistral's previous restrictive licensing for code models), supports fill-in-the-middle completions, and benchmarks above GPT-4o on HumanEval and MBPP. OpenAI's Codex CLI, while not a model itself, pairs with any API-accessible model to create a sandboxed terminal coding agent.

Best for Running Locally

Gemma 3n fits in 4 GB of VRAM and handles text, images, and audio. For heavier workloads, Llama 4 Scout's MoE architecture means only 17B parameters are active at inference time despite 109B total, making it feasible on a single 48 GB GPU with quantization. Unsloth's updated fine-tuning support for these models means you can adapt them to your use case without renting a cluster.

Best for Agent Workflows

Google's Agent Development Kit (adk-python) is the most complete multi-agent framework to ship this month. Block's Goose takes a different approach: local-first, with native MCP (Model Context Protocol) support for tool integration. Hugging Face's smolagents is the lightweight option if you want tool-use without the overhead of a full orchestration framework.

Best for Data and Documents

Microsoft's markitdown converts PDFs, DOCX, PPTX, HTML, and other formats into clean Markdown suitable for LLM context windows. It gained 3,600+ stars in its first week because it solves a problem every RAG pipeline hits: getting messy documents into a format models can actually use well.

Quick-Start: Running a New Model Locally

If you want to try the most impactful model from this month right now, here is the fastest path to running Llama 4 Scout locally:

# Install ollama if you haven't
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the quantized Scout model (fits in 24GB VRAM)
ollama pull llama4-scout:q4_K_M

# Run it
ollama run llama4-scout:q4_K_M

For Qwen 3 72B using vLLM:

pip install vllm
vllm serve Qwen/Qwen3-72B-AWQ --tensor-parallel-size 2

For Gemma 3n on a laptop:

pip install mlx-lm
mlx_lm.generate --model google/gemma-3n-E4B --prompt "Explain MoE architectures"

What This Means for Builders

Three trends stand out from the April 2026 releases taken together:

  1. MoE is the default architecture for large models. Four of the seven models this month use mixture-of-experts. The active parameter count matters more than total count for practical deployment.

  2. Apache 2.0 is winning. Qwen 3, Codestral 2, and OLMo 2 all ship under Apache 2.0. Mistral's shift is especially notable since their previous code model had commercial restrictions.

  3. The gap between "release" and "usable" is shrinking. When Llama 4 Scout launched, quantized GGUF packs appeared on Hugging Face within hours and llama-stack provided an official deployment path on day one. The old pattern of waiting weeks for community tooling to catch up is largely gone.

If you are building AI-powered desktop automation workflows and want to test these models locally, Fazm provides a local-first agent framework that works with any of these models through standard API interfaces.

Further Reading

For deeper coverage of specific topics from this month:

  • The EU AI Act's open source exemption guidance (published April 10) clarifies which models qualify for lighter regulatory treatment
  • DeepSeek V3's published paper includes training details that most labs still keep proprietary
  • Google's updated Gemma 3 commercial license (April 11) removes the previous user-count restriction

This post will be updated as additional releases land through the rest of April 2026.

Related Posts