New AI Model Releases and Open Source Projects in April 2026

Matthew Diakonov·April 12, 2026·10 min read

ai-model-releases open-source april-2026 llama-4 qwen-3 gemma-3n codestral github hugging-face

New AI Model Releases and Open Source Projects in April 2026

April 2026 has been one of the densest months for AI releases in recent memory. Multiple foundation models shipped from Meta, Alibaba, Google, Mistral, and Ai2, while GitHub and Hugging Face saw a wave of new open source tools, agent frameworks, and deployment stacks. This post covers both sides: the models themselves and the projects built around them.

AI Model Releases: What Shipped in April 2026

Seven major open source models launched in the first twelve days of April. The table below captures every production-relevant release with parameters, architecture, and licensing.

| Date | Model | Organization | Parameters (Total / Active) | Architecture | License | Key Strength | |---|---|---|---|---|---|---| | Apr 2 | Llama 4 Scout | Meta | 109B / 17B | MoE (16 experts) | Llama 4 Community | 10M token context window | | Apr 3 | OLMo 2 32B | Ai2 | 32B / 32B | Dense | Apache 2.0 | Fully open training data and code | | Apr 5 | Llama 4 Maverick | Meta | 400B / 17B | MoE (128 experts) | Llama 4 Community | Best multilingual MoE performance | | Apr 5 | Qwen 3 72B | Alibaba | 72B / 72B | Dense | Apache 2.0 | Top dense model on reasoning tasks | | Apr 8 | Qwen 3 MoE 235B | Alibaba | 235B / 22B | MoE | Apache 2.0 | Near-frontier at low active params | | Apr 8 | Codestral 2 | Mistral | 22B / 22B | Dense | Apache 2.0 | Code generation, fill-in-the-middle | | Apr 9 | Gemma 3n | Google | 4B effective / 2B footprint | Dense multimodal | Gemma License | Runs on-device (phone, tablet) |

Benchmark Snapshot

The MoE models (Llama 4 Scout, Maverick, Qwen 3 MoE 235B) are the headline story. They hit near-frontier benchmark scores while keeping active parameter counts low enough for multi-GPU consumer setups. Gemma 3n is the opposite extreme: a 2B-footprint multimodal model designed to run on phones.

Open Source Projects: Tools, Frameworks, and Deployment Stacks

The model releases triggered a parallel wave of tooling. These are the open source projects that gained the most traction on GitHub and Hugging Face during April 2026.

| Project | Platform | Stars / Downloads | Language | Purpose | |---|---|---|---|---| | google/adk-python | GitHub | 8,200+ stars | Python | Multi-agent orchestration framework | | meta-llama/llama-stack | GitHub | 6,400+ stars | Python | Unified Llama 4 deployment and inference | | openai/codex-cli | GitHub | 5,800+ stars | TypeScript | Terminal coding agent with sandboxed execution | | block/goose | GitHub | 4,900+ stars | Rust | Local-first AI agent with MCP support | | huggingface/smolagents | GitHub | 4,100+ stars | Python | Lightweight agent library with tool-use | | microsoft/markitdown | GitHub | 3,600+ stars | Python | Any document to Markdown for LLM ingestion | | unsloth/unsloth | GitHub | 2,100+ monthly gain | Python | 2x faster fine-tuning, 70% less memory | | qwen-ai/qwen3-coder | GitHub | 2,800+ stars | Python | Code-specialized Qwen 3 with 128K context | | mistralai/codestral-2 | GitHub | 2,400+ stars | Python | Apache 2.0 code generation model code | | Llama-4-Scout-GGUF | Hugging Face | 180K+ downloads | GGUF | Quantized Llama 4 Scout for local inference |

How the Models and Projects Connect

The pattern is clear: every major model release now ships alongside an official deployment project, and the community responds with quantization packs, fine-tuning integrations, and agent wrappers within days.

Highlights by Category

Best for Coding

Codestral 2 from Mistral is the standout code model this month. It ships under Apache 2.0 (a shift from Mistral's previous restrictive licensing for code models), supports fill-in-the-middle completions, and benchmarks above GPT-4o on HumanEval and MBPP. OpenAI's Codex CLI, while not a model itself, pairs with any API-accessible model to create a sandboxed terminal coding agent.

Best for Running Locally

Gemma 3n fits in 4 GB of VRAM and handles text, images, and audio. For heavier workloads, Llama 4 Scout's MoE architecture means only 17B parameters are active at inference time despite 109B total, making it feasible on a single 48 GB GPU with quantization. Unsloth's updated fine-tuning support for these models means you can adapt them to your use case without renting a cluster.

Best for Agent Workflows

Google's Agent Development Kit (adk-python) is the most complete multi-agent framework to ship this month. Block's Goose takes a different approach: local-first, with native MCP (Model Context Protocol) support for tool integration. Hugging Face's smolagents is the lightweight option if you want tool-use without the overhead of a full orchestration framework.

Best for Data and Documents

Microsoft's markitdown converts PDFs, DOCX, PPTX, HTML, and other formats into clean Markdown suitable for LLM context windows. It gained 3,600+ stars in its first week because it solves a problem every RAG pipeline hits: getting messy documents into a format models can actually use well.

Quick-Start: Running a New Model Locally

If you want to try the most impactful model from this month right now, here is the fastest path to running Llama 4 Scout locally:

# Install ollama if you haven't
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the quantized Scout model (fits in 24GB VRAM)
ollama pull llama4-scout:q4_K_M

# Run it
ollama run llama4-scout:q4_K_M

For Qwen 3 72B using vLLM:

pip install vllm
vllm serve Qwen/Qwen3-72B-AWQ --tensor-parallel-size 2

For Gemma 3n on a laptop:

pip install mlx-lm
mlx_lm.generate --model google/gemma-3n-E4B --prompt "Explain MoE architectures"

What This Means for Builders

Three trends stand out from the April 2026 releases taken together:

MoE is the default architecture for large models. Four of the seven models this month use mixture-of-experts. The active parameter count matters more than total count for practical deployment.
Apache 2.0 is winning. Qwen 3, Codestral 2, and OLMo 2 all ship under Apache 2.0. Mistral's shift is especially notable since their previous code model had commercial restrictions.
The gap between "release" and "usable" is shrinking. When Llama 4 Scout launched, quantized GGUF packs appeared on Hugging Face within hours and llama-stack provided an official deployment path on day one. The old pattern of waiting weeks for community tooling to catch up is largely gone.

If you are building AI-powered desktop automation workflows and want to test these models locally, Fazm provides a local-first agent framework that works with any of these models through standard API interfaces.

New AI Model Releases and Open Source Projects in April 2026

New AI Model Releases and Open Source Projects in April 2026

AI Model Releases: What Shipped in April 2026

Benchmark Snapshot

Open Source Projects: Tools, Frameworks, and Deployment Stacks

How the Models and Projects Connect

Highlights by Category

Best for Coding

Best for Running Locally

Best for Agent Workflows

Best for Data and Documents

Quick-Start: Running a New Model Locally

What This Means for Builders

Further Reading

Related Posts

New Open Source LLM Releases in April 2026: What Just Dropped and How to Run Them

Open Source Large Language Model News April 2026: Timeline, Benchmarks, and What Changed

Open Source Large Language Model Release April 2026: Every Model, Ranked