Open Source AI Projects and Tools Updates: April 11-12, 2026

Matthew Diakonov··14 min read

Open Source AI Projects and Tools Updates: April 11-12, 2026

April 11-12 delivered a dense wave of open source AI activity. MiniMax open sourced its self-evolving M2.7 agent model. OpenClaw shipped Active Memory and native Codex integration. OpenAI's Codex CLI got voice, MCP, and remote workflow upgrades. Google's Gemma 4 GGUF quantizations received critical fixes. And the claude-mem persistent memory plugin crossed 46,000 GitHub stars. This post covers every significant open source AI project update from those two days.

What Shipped: April 11-12 at a Glance

| Project | Date | Category | License | Key Update | |---|---|---|---|---| | MiniMax M2.7 | April 12 | Agent model | Open weights | 230B MoE, 56% SWE-Pro, self-evolving training | | OpenClaw 2026.4.10 | April 11 | AI coding harness | Open source | Active Memory plugin, native Codex integration | | OpenAI Codex CLI | April 11 | AI coding agent | Open source | Realtime voice, MCP upgrades, remote workflows | | Gemma 4 GGUF | April 11 | Model quantization | Apache 2.0 | Chat template fix, llama.cpp compatibility | | claude-mem | April 11 | Developer tools | Open source | v11.0 sync CLI, 46K GitHub stars | | Archon | April 11 | AI coding workflows | Open source | 14K+ stars, YAML deterministic pipelines |

MiniMax M2.7: The Self-Evolving Agent Model

MiniMax open sourced M2.7 on April 12, a 230B-parameter sparse mixture-of-experts model that participated in its own training process. This is not just another large model release. M2.7 is the first open source model where the model itself helped design its own reinforcement learning experiments.

Benchmark Performance

| Benchmark | M2.7 Score | Comparison | |---|---|---| | SWE-Pro | 56.22% | Approaches Claude Opus levels | | VIBE-Pro | 55.6% | End-to-end project delivery | | Terminal Bench 2 | 57.0% | Complex engineering systems |

The MoE architecture keeps inference costs low. During any given forward pass, only a fraction of the 230B parameters activate, so the model runs at a fraction of the cost of a dense model with comparable capability.

# Pull MiniMax M2.7 from Ollama
ollama pull minimax-m2.7

# Or download from Hugging Face
huggingface-cli download MiniMaxAI/MiniMax-M2.7 \
  --local-dir ./models/minimax-m2.7

# Serve via vLLM
python -m vllm.entrypoints.openai.api_server \
  --model MiniMaxAI/MiniMax-M2.7 \
  --tensor-parallel-size 4

The self-evolution capability is the most interesting aspect. During M2.7's training, the model updated its own memory and built dozens of complex skills in its harness to help with reinforcement learning experiments. MiniMax reports a 30% performance improvement on internal evaluation sets from this approach.

Hardware note

M2.7 is a 230B MoE model. Even with sparse activation, you need multi-GPU setups for local inference. For single-GPU users, the Ollama and OpenRouter hosted options are the fastest way to test it.

OpenClaw 2026.4.10: Active Memory and Native Codex

OpenClaw shipped version 2026.4.10 on April 11, a heavyweight release with 17 new features and 20+ fixes. Two features stand out: Active Memory and native Codex integration.

Active Memory Plugin

The new Active Memory plugin gives OpenClaw a dedicated memory sub-agent that runs before every reply. Instead of requiring users to say "remember this" or "search memory," the plugin automatically pulls in relevant preferences, context, and past details from ongoing conversations.

Configuration options include:

  • Message mode for recent context injection
  • Full context mode for deep retrieval across all sessions
  • Live /verbose inspection for debugging what the memory agent retrieves
  • Opt-in transcript persistence for auditing memory decisions

Native Codex Integration

Previously, Codex models in OpenClaw reused the generic OpenAI provider path. Version 2026.4.10 adds a dedicated Codex provider with its own authentication, session thread management, and context compaction capabilities. The codex/gpt-* models now use Codex-specific paths while openai/gpt-* retains the original provider.

# Update OpenClaw
npm install -g openclaw@2026.4.10

# Enable Active Memory plugin
openclaw config set plugins.active-memory.enabled true

# Configure Codex provider
openclaw config set providers.codex.auth oauth

Other notable additions in 2026.4.10: local MLX voice support for macOS Talk Mode, SSRF hardening, launchd stability fixes, and Teams integration with pins, reactions, and read actions.

OpenAI Codex CLI: Voice, MCP, and Remote Workflows

OpenAI pushed a major Codex CLI update on April 11, touching nearly every subsystem: realtime voice, MCP support, remote workflows, and internal architecture.

What Changed

Realtime voice now defaults to the v2 WebRTC path. You can configure transport, select voices, and use native TUI media support. This means you can talk to Codex while coding, hands-free.

MCP support gained richer capabilities: resource reads, tool-call metadata, custom-server tool search, server-driven elicitations, file-parameter uploads, and more reliable plugin cache refreshes. If you run MCP servers alongside Codex, the integration is significantly smoother.

Remote workflows now support egress websocket transport, remote --cd forwarding, and sandbox-aware filesystem APIs. The experimental codex exec-server subcommand opens the door for orchestrating Codex from other tools.

Architecture Improvements

The codex-core crate was slimmed down through major extractions: MCP, tools, config, model management, auth, feedback, and protocol each became separate crates. This reduced compile times by removing expensive async-trait expansion from hot code paths.

# Update Codex CLI
npm install -g @openai/codex@latest

# Start a voice session
codex --voice

# Connect to a remote workspace
codex --remote wss://your-server.example.com

Gemma 4 GGUF: Community Quantization Fixes

Google's Gemma 4 (released April 2, Apache 2.0) got a critical community update on April 11: re-downloaded GGUF files with Google's latest chat template and llama.cpp compatibility fixes.

Available Quantizations

| Variant | Size | VRAM | Best For | |---|---|---|---| | Gemma 4 E2B | ~1.5GB | 2GB | Phones, Raspberry Pi, edge devices | | Gemma 4 E4B | ~3GB | 4GB | Laptops, lightweight tasks | | Gemma 4 26B-A4B (Q4_K_M) | ~15GB | 16GB | Development workstations | | Gemma 4 31B Dense (Q4_K_M) | ~18GB | 20GB | High-capability local inference |

The April 11 fix addressed a chat template mismatch that caused incorrect output formatting with llama.cpp and Ollama. If you downloaded Gemma 4 GGUF files before April 11, re-download for the corrected versions.

# Pull corrected Gemma 4 26B via Ollama
ollama pull gemma4:26b

# Or download GGUF from Hugging Face (Unsloth)
huggingface-cli download unsloth/gemma-4-26B-A4B-it-GGUF \
  --include "gemma-4-26B-A4B-it-Q4_K_M.gguf" \
  --local-dir ./models/gemma4

# Fine-tune with Unsloth (QLoRA, ~60% VRAM savings)
# See: https://unsloth.ai/docs/models/gemma-4/train

Gemma 4's performance jump over Gemma 3 is dramatic: AIME 2026 math went from 20.8% to 89.2%, LiveCodeBench coding from 29.1% to 80.0%, and GPQA science from 42.4% to 84.3%. With the April 11 GGUF fixes, these capabilities are now reliably accessible through local inference.

claude-mem: Persistent Memory Hits 46K Stars

The claude-mem plugin for Claude Code crossed 46,100 GitHub stars, solidifying its position as the leading persistent memory solution for AI coding agents. Recent releases (v10.7.0 through v11.0.0, shipped April 4) added a new claude-mem-sync CLI.

The sync CLI supports:

  • push / pull / sync / status commands
  • Bidirectional sync of observations and session summaries between machines via SSH/SCP
  • Automatic deduplication to prevent memory bloat

Five lifecycle hooks (SessionStart, UserPromptSubmit, PostToolUse, Stop, SessionEnd) capture what Claude does without manual intervention. Everything goes into a local SQLite database with Chroma vector search for retrieval.

# Install claude-mem
claude plugins install claude-mem

# Sync memory between machines
claude-mem-sync push --target user@remote:~/.claude-mem
claude-mem-sync pull --source user@remote:~/.claude-mem
claude-mem-sync status

For teams working across multiple machines, the sync CLI eliminates the problem of having fragmented memory across different development environments.

How the April 11-12 Stack Connects

Open Source AI Ecosystem: April 11-12, 2026Memoryclaude-mem v11.046K stars, sync CLIOpenClaw Active MemoryAuto context retrievalHarnessesOpenClaw 2026.4.10Codex native, MLX voiceArchon14K stars, YAML flowsCodex CLIVoice, MCP, remoteAgentsClaude CodeCodexGemini CLIGooseInferencellama.cpp / GGUFvLLM / SGLangOllamaModelsMiniMax M2.7230B MoE, self-evolvingGemma 4 (fixed)Apache 2.0, GGUF updateGLM-5.1 (MIT)Community quantsProtocolMCP (Linux Foundation AAIF governance)10,000+ servers, 97M installs, vendor-neutral protocolTeal = new/updated April 11-12 | Gray = existing ecosystem

The diagram shows the layered relationships. Memory systems (claude-mem, OpenClaw Active Memory) feed into harnesses (OpenClaw, Archon, Codex CLI), which orchestrate agents (Claude Code, Codex, Gemini CLI, Goose). Agents use the inference layer (llama.cpp, vLLM, Ollama) to access models (M2.7, Gemma 4, GLM-5.1). MCP runs underneath everything as the protocol connecting agents to external tools and data.

Common Pitfalls

  • Running M2.7 without enough GPUs. The MoE architecture is efficient during inference, but you still need the full model loaded in memory. A 4-GPU setup is the minimum for local serving. Use Ollama or OpenRouter hosted options for quick testing.

  • Not re-downloading Gemma 4 GGUF files. The April 11 fix addresses a real chat template bug. If your Gemma 4 outputs look malformed in llama.cpp or Ollama, this is likely the cause. Re-pull to get the corrected files.

  • Enabling OpenClaw Active Memory without understanding context costs. Active Memory adds a sub-agent call before every reply. In full context mode, this can significantly increase token usage and latency. Start with message mode and monitor costs before upgrading.

  • Assuming Codex voice works on all platforms. The realtime voice v2 WebRTC path has platform-specific requirements. Test on your target OS before relying on it for hands-free coding workflows.

  • Treating M2.7's self-evolution claims as general capability. The model participated in its own RL training, which is impressive for the training pipeline. But the deployed model is frozen. It does not continue learning during inference.

Quickstart: Testing the April 11-12 Releases

# Option 1: MiniMax M2.7 via Ollama (needs multi-GPU)
ollama pull minimax-m2.7
ollama run minimax-m2.7 "Explain the MoE architecture"

# Option 2: Gemma 4 26B locally (needs 16GB+ VRAM)
ollama pull gemma4:26b
ollama run gemma4:26b "What changed in your April 11 GGUF release?"

# Option 3: OpenClaw with Active Memory (any machine)
npm install -g openclaw@2026.4.10
openclaw config set plugins.active-memory.enabled true

# Option 4: Codex CLI with voice (any machine)
npm install -g @openai/codex@latest
codex --voice

# Option 5: claude-mem for Claude Code (any machine)
claude plugins install claude-mem

Fastest path

If you want to try one thing from April 11-12 right now, update your Codex CLI. The MCP improvements and voice mode work on any machine, require no model downloads, and the update takes under a minute. For local model testing, Gemma 4 26B via Ollama is the best ratio of capability to hardware requirements.

What to Watch After April 12

  1. M2.7 community quantization. The 230B MoE model will get GGUF and other quantized formats from Unsloth and community teams. This will determine whether M2.7 becomes practical outside multi-GPU setups.
  2. OpenClaw Active Memory patterns. Early adopters are experimenting with memory modes and prompt overrides. Watch the OpenClaw community for best practices on balancing retrieval quality against token costs.
  3. Codex voice in production workflows. The v2 WebRTC voice path is new. Real-world usage will surface edge cases around latency, transcription accuracy, and hands-free coding ergonomics.
  4. Gemma 4 fine-tuning ecosystem. With GGUF fixes landed and Unsloth support live, the next wave is domain-specific fine-tunes. Expect to see specialized Gemma 4 variants for coding, medical, and legal use cases.
  5. Memory convergence. Both claude-mem and OpenClaw Active Memory point toward the same future: AI coding agents that remember everything across sessions. Watch for standardization around memory formats and sync protocols.

Wrapping Up

April 11-12, 2026 advanced the open source AI stack on multiple fronts simultaneously. MiniMax M2.7 pushed the boundary of what open models can do with self-evolving training at the 230B MoE scale. OpenClaw and Codex CLI made their harnesses smarter with automatic memory, native integrations, and voice control. Gemma 4's community fixed a real deployment bug that affected everyone running GGUF locally. And claude-mem demonstrated that persistent memory is now table stakes for AI coding tools. The consistent theme across all five updates: the ecosystem is shifting from "can I run this model" to "how well does my entire toolchain remember, adapt, and integrate."

Fazm is an open source macOS AI agent that works with MCP extensions and the open source models covered in this roundup. Open source on GitHub.

Related Posts