Open-source AI projects, tools, and updates in May 2026: the agent-layer month

If you were watching for headline open-weight frontier models, May 2026 looked thin. Two new permissive releases landed on Hugging Face (Cohere Command A+ on May 20, OpenBMB MiniCPM-V 4.6 on May 11) and the proprietary side took most of the airtime (OpenAI GPT-5.5 Instant on May 5, xAI Grok 4.3 on May 6, Google's Gemini 3.5 Flash at I/O on May 19, Alibaba's closed-weight Qwen3.7-Max on May 20). The April 28 to 29 cluster (MiMo-V2.5-Pro, Nemotron 3 Nano Omni, Granite 4.1, Mistral Medium 3.5) had absorbed the spring release energy.

What actually happened in May, you only see if you read changelogs instead of leaderboards. The Model Context Protocol locked its 2026-07-28 release candidate on May 21. vLLM shipped v0.21.1rc0 on May 15 with the Blackwell backend. MLX Engine v1.8.1 landed on May 13 with parallel predictions for Qwen 3.5 and 3.6 vision and Gemma 4 vision on Apple Silicon. LangChain shipped three minors. Claude Code, Codex CLI, Cline, lm-evaluation-harness, Argmax OSS SDK all moved. And one open-source macOS agent, Fazm, shipped 43 versions in 27 days with 171 individual change strings, every one of them verifiable in CHANGELOG.json on the public repo.

M
Matthew Diakonov
11 min read

Direct answer (verified 2026-05-27)

The May 2026 open-source AI ledger has four layers. Open-weight LLMs: Cohere Command A+ (May 20, Apache 2.0, 218B sparse MoE) and OpenBMB MiniCPM-V 4.6 (May 11, Apache 2.0, 1.3B multimodal). Inference runtimes: vLLM v0.21.1rc0 (May 15) and MLX Engine v1.8.1 (May 13). Agent and IDE tooling: LangChain 1.3.0 / 1.3.1 / 1.3.2, LlamaIndex 0.14.22 (May 14), Claude Code v2.1.152 (May 27), OpenAI Codex CLI 0.129 / 0.130, Cline v3.82 / v3.83. Infrastructure: EleutherAI lm-evaluation-harness v0.4.12, the MCP 2026-07-28 release candidate (locked May 21), Argmax OSS SDK 1.0 (graduates WhisperKit). At the consumer-agent layer, the open-source macOS agent Fazm shipped 43 versions in 27 days with 171 individual changes. Primary source for the Fazm numbers: CHANGELOG.json.

0Fazm releases in May
0Changelog entries
0hAvg release interval
0New open-weight LLMs on HF

The verified May 2026 ledger

Every item below has a primary source: a GitHub release, a vendor blog post, a Hugging Face model card, or a Cargo / PyPI version page. The accent cards are the two items most worth knowing in isolation: Command A+ as the only frontier-tier open-weight release of the month, and the MCP spec RC as the infrastructure event that will shape integration work through the rest of 2026.

Cohere Command A+

Released May 20. 218B sparse MoE (25B active per token), 128K context, 48 languages, Apache 2.0. Cohere's first fully OSS frontier-tier model. BF16, FP8, and W4A4 quantizations shipped on day one, so it runs on as few as two H100s (or a single Blackwell at 4-bit).

OpenBMB MiniCPM-V 4.6

Released May 11. 1.3B parameter multimodal vision-language model (SigLIP2-400M plus Qwen3.5-0.8B), 262K context, accepts text, image, multi-image, and video. Apache 2.0. Built for edge and phone deployment. Public API key shipped May 17.

vLLM v0.21.1rc0

Released May 15. KV offload and hybrid memory allocator, speculative decoding with thinking budgets, TOKENSPEED_MLA Blackwell backend, Cohere MoE and Eagle support, MiMo-V2.5 and Moondream3 model wiring.

MLX Engine v1.8.1

Released May 13. Parallel predictions for Qwen 3.5 and 3.6 vision and Gemma 4 vision on Apple Silicon. Ships inside LM Studio as the on-device runtime.

LangChain 1.3.0, 1.3.1, 1.3.2

Three releases on May 12, May 15, and May 26. Adds version='v3' for stream_events and astream_events. langchain-core 1.4.0 on May 11.

LlamaIndex 0.14.22

Released May 14. Standard bugfix and integration-update release in the steady-state cadence.

Claude Code v2.1.152

Released May 27. /code-review --fix applies findings directly to the working tree. Skills and slash commands can now set disallowed-tools in frontmatter. /reload-skills added.

OpenAI Codex CLI 0.129.0 and 0.130.0

Two May 2026 releases. Vim mode, redesigned resume and fork picker, expanded plugin management, /hooks browser. May 27 update: case-insensitive history search, per-server MCP env plus OAuth, Goal Mode out of experimental.

Cline v3.82.0 and v3.83.0

Released May 2 and May 14. The open-source VS Code agent extension shipping on its usual two-week-ish cadence.

lm-evaluation-harness v0.4.12

EleutherAI's eval framework. Adds TensorRT-LLM, Megatron-LM, Intel Gaudi, and LiteLLM backends, plus native tensor parallel for the HF backend. Tagged mid-May.

MCP 2026-07-28 release candidate

Spec RC locked May 21. Stateless protocol core, Extensions framework, Tasks, MCP Apps, OAuth and OIDC-aligned authorization, formal deprecation policy. Streamable HTTP now requires Mcp-Method and Mcp-Name headers. List and read results carry ttlMs and cacheScope.

Argmax OSS SDK 1.0.0

Graduated from WhisperKit in May. Introduces WhisperKit Local Server (an OpenAI-compatible Vapor HTTP server) and Swift 6 concurrency. The voice layer of the open-source Mac stack.

The agent-layer story, told through one production changelog

Aggregators do not publish a dated index of open-source AI projects. Hugging Face surfaces models by trending score, not by release date on the project itself. GitHub Trending is a rolling popularity window. The only place you can read "here is what the project shipped this month, and when" for an actively maintained open-source AI tool is its own changelog file, on its own repo, stamped with real dates by the people running the builds.

For an honest read of what one open-source AI project actually shipped in May 2026, here is the Fazm timeline in seven beats. Every version number resolves to a real object in CHANGELOG.json on the public repo. The cadence (15.1 hours average between releases) is not aspirational; it is what shows up when you sort the file by date.

1

May 1 to 4: foundation work, v2.7.1 to v2.8.0

v2.7.1 (May 1) was the largest single release of the month with 29 changes, landing the experimental Codex backend (Settings, Advanced, Codex Backend) that routes GPT-5 family models through OpenAI's Codex CLI using a personal ChatGPT account. v2.8.0 (May 4) removed the 60-second auto-cancel that was killing slow Opus and extended-thinking responses mid-think.

2

May 5 to 11: pop-out chat stability, v2.8.1 to v2.9.6

Nine releases focused on the multi-window failure modes: cross-scope contamination between detached pop-out chats, recovery loops cascading into 2 to 3 fresh sessions per request, Sparkle framework crash on universal-DMG installs that broke first-open. v2.9.0 (May 7) cut per-token re-renders during streaming. v2.9.5 (May 11) invalidated every active session on a rate-limit hit so a single 429 stopped wedging unrelated windows.

3

May 12 to 19: streaming and credit-handling, v2.9.8 to v2.9.25

Streaming-typewriter input lag traced to debug log calls writing to disk and Sentry on every keystroke (v2.9.17). Push-to-talk hallucinating gibberish on silent audio (v2.9.25). Built-in API key spend capped at $10 lifetime for every account including Pro after a few users went over (v2.9.25). 1M-context Sonnet and Opus variants hidden in built-in mode so users without entitlement could not pick them by accident.

4

May 20 to 22: Gemini backend, v2.9.26 to v2.9.36

v2.9.30 (May 20) shipped a phone web chat at chat.fazm.ai with new chat, pop-out, model picker, workspace selector, and a stream-target switcher. v2.9.33 (May 21) added Assrt QA testing MCP as an opt-in browser automation tool that imports your real browser sessions. v2.9.35 (May 22) added Gemini Flash and Gemini Pro to the AI picker alongside Claude and ChatGPT, with MCP tool support so the Gemini path could call capture_screenshot, browser, WhatsApp, and the rest. v2.9.36 made Gemini a free fallback when built-in credits run out.

5

May 25: agent-loop guardrails, v2.9.37

Nine changes the agent loop itself. Destructive-shell guardrail (warns before disabling Wi-Fi, killing system UI, rebooting). Consistent password handling that never echoes, logs, or stores user-provided secrets. Post-interrupt context expanded to 40 messages plus the streamed-so-far partial plus a list of tools that just ran. Assrt MCP reframed as a general-purpose visible-Chrome browser agent with a persistent profile so logins survive between sessions. Koah live contextual sponsored content for the free tier, paid stays at zero ads, on-device redaction of emails, phone numbers, tokens, and UUIDs before any data leaves the Mac.

6

May 27: the consolidation wave, v2.9.41 to v2.9.46

v2.9.41 alone had 12 changes: floating bar guaranteed to appear after launch-at-login on reboot, verification-codes skill that pulls 2FA from Messages or WhatsApp or Notification Center, removed the 10-minute inactivity cap on AI replies for slow models like Gemini Pro and GPT-5 high, signin-optional anonymous Firebase plumbing as a PostHog experiment. v2.9.43 to v2.9.46 hammered on chat scroll detaching mid-stream, CPU spikes during long streaming, force-stop affordances for wedged Playwright MCP subprocesses, sponsored-ads gated behind a remote feature flag and not shown to anyone by default.

7

May 28: CPU profiling lands, v2.9.47

Three changes that close out the month. The first two fix an idle chat-window layout storm and the four withAnimation(.repeatForever) animation patterns (update-available pulse, update-button pulse, two more) that pegged the main thread. The third adds automatic stack sampling, when the top thread sustains over 85 percent CPU for a minute the app shells out to /usr/bin/sample and writes a trace, so the next regression of this shape produces its own evidence file.

171

One open-source macOS agent emitted 171 individual change strings across 43 dated releases in May 2026. The whole project, 27 days of working code, with the receipts in one JSON file on a public repo.

CHANGELOG.json, github.com/mediar-ai/fazm

Why the calendar shape mattered

There is a temptation to read the thin model-weight calendar as "open-source AI slowed down in May." The changelogs disagree. What slowed down was the public lab cadence of dropping new frontier weights into a Hugging Face repo. What sped up was everything that sits between those weights and the user: the inference servers that host them (vLLM, MLX), the protocols that shape integration (the MCP RC), the IDE agents that drive them (Claude Code, Codex CLI, Cline), the eval framework that scores them (lm-evaluation-harness), and the consumer-agent layer that assembles the full loop into a product (Fazm).

That order matters. The April 28 to 29 cluster put a wall of new weights into the ecosystem inside 48 hours. The May work is what it looks like when the rest of the stack catches up: vLLM wires the new MoE shapes, MLX adds parallel predictions for the vision variants, LangChain ships a new stream_events version, the MCP spec moves from ad hoc transports to a stateless core, the consumer agent figures out which backend to default to when built-in credits run out and how to keep a 40-message context window after the user hits stop.

If you are evaluating an open-source AI tool in 2026, the model-layer headline is the cheapest signal. The changelog is the honest one. A project that ships once a month with marketing-heavy release notes is not the same product as one that ships every 15 hours with the actual diff in a JSON file on the repo.

The full integration grid

Every project on this card links to its primary changelog or release page. Use this as a reading list for the May 2026 work itself, not a substitute for it.

How to verify any of this yourself

For the Fazm numbers: clone github.com/mediar-ai/fazm, open CHANGELOG.json at the repo root. The file is a JSON object with two top-level keys, unreleased and releases. The releases array is ordered newest to oldest. Each entry has version, date, and changes. Filter the array on date starting with "2026-05" and count the entries (43) and the sum of the changes-array lengths (171). First May entry is v2.7.1 dated 2026-05-01, last is v2.9.47 dated 2026-05-28.

For Cohere Command A+: the model card and weights live at huggingface.co/CohereLabs/command-a-plus-05-2026-w4a4, with the Apache 2.0 license confirmed inline and the May 20 ship date in the model card and Cohere's announcement post at cohere.com/blog.

For the MCP 2026-07-28 release candidate: the spec lock and the changes are written up at blog.modelcontextprotocol.io/posts/2026-07-28-release-candidate/. The spec itself is published at modelcontextprotocol.io.

For the IDE agents (Claude Code, Codex CLI, Cline): each project publishes its own changelog. Claude Code at code.claude.com/docs/en/changelog, Codex CLI at developers.openai.com/codex/changelog, Cline as GitHub releases on github.com/cline/cline/releases. None of them require you to take a third-party roundup's word for the dates.

Frequently asked questions

What open-source AI projects and tools actually updated in May 2026?

The verifiable May 2026 ledger has four layers. Open-weight LLMs on Hugging Face: Cohere Command A+ on May 20 (Apache 2.0, 218B sparse MoE, 128K context) and OpenBMB MiniCPM-V 4.6 on May 11 (Apache 2.0, 1.3B multimodal, 262K context). Inference runtimes: vLLM v0.21.1rc0 on May 15 (KV offload, Blackwell backend) and MLX Engine v1.8.1 on May 13 (Apple Silicon parallel predictions). Agent and IDE tooling: LangChain 1.3.0/1.3.1/1.3.2 across the month, LlamaIndex 0.14.22 on May 14, Claude Code v2.1.152 on May 27 (/code-review --fix), OpenAI Codex CLI 0.129 and 0.130 (vim mode, Goal Mode), Cline v3.82 (May 2) and v3.83 (May 14). Infrastructure: EleutherAI lm-evaluation-harness v0.4.12, the MCP 2026-07-28 spec release candidate locked on May 21, Argmax OSS SDK 1.0 graduating from WhisperKit. And at the consumer-agent layer, the open-source macOS agent Fazm shipped 43 versions between May 1 and May 28, every release stamped in CHANGELOG.json at github.com/mediar-ai/fazm.

Why does May 2026 look thin compared to April for open-weight models?

Two reasons. First, the late-April week of April 28 to 29 absorbed most of the spring release energy: Xiaomi MiMo-V2.5-Pro, NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1, and Mistral Medium 3.5 all shipped within 48 hours. That is a hard release pattern to follow inside the same calendar month. Second, the proprietary side took the May 5 to May 6 window (OpenAI GPT-5.5 Instant on May 5, xAI Grok 4.3 on May 6) and Google opened I/O 2026 with Gemini 3.5 Flash on May 19, which soaked up benchmark airtime. Two new permissive open-weight releases (Cohere Command A+ and MiniCPM-V 4.6) is honest for a month following four frontier-class April drops.

Is Cohere Command A+ a serious open-weight release?

Yes. It is the most credible May 2026 Hugging Face release for the agent planning loop, where MiniCPM-V 4.6 1.3B is not. Command A+ is a 218B parameter sparse Mixture-of-Experts with 25B active per token, 128K input context, under a clean Apache 2.0 license. Cohere shipped it on Hugging Face in BF16, FP8, and W4A4 quantizations on day one, so it can run on as few as two H100s (or a single Blackwell at 4-bit). The Apache 2.0 license means there is no field-of-use clause to clear before using it in a small business workflow. It is Cohere's first fully OSS model.

What did the MCP 2026-07-28 release candidate change?

Five things worth tracking. (1) A stateless protocol core, so a server no longer has to hold per-client state to be spec-compliant. (2) An Extensions framework that lets a server advertise capabilities outside the core surface without forking the spec. (3) Tasks, a first-class long-running operation abstraction with explicit progress events. (4) MCP Apps, a packaging shape for distributing a server plus its UI hints together. (5) OAuth and OIDC-aligned authorization, replacing the looser shapes the spec previously left to implementers. The streamable-HTTP transport now requires Mcp-Method and Mcp-Name headers, and list and read results carry ttlMs and cacheScope fields so clients can cache deterministically. The RC was locked May 21, 2026 with a final release targeted for July 28.

What was the largest single open-source agent release of the month?

Fazm v2.7.1 on May 1, with 29 changes in a single CHANGELOG.json object. The headline items were the experimental Codex backend (Settings, Advanced, Codex Backend) that routes GPT-5 family models through OpenAI's Codex CLI using a personal ChatGPT account; Heal cards on the Discovered Tasks surface that flag Mac health symptoms like kernel panics; the switch to a flat $9.99 per month subscription with no free trial and paywall at launch; the sticky stack of recent user messages at the top of every chat window so prompts stay visible while you scroll deeper; and the fix for orphan ACP bridge processes accumulating across crashes (each one had been holding ~600MB of RAM). v2.9.41 on May 27 came close with 12 changes including the verification-codes skill that pulls 2FA codes from Messages, WhatsApp, or Notification Center instead of asking the user to read them out.

How many releases did Fazm actually ship in May 2026?

43 releases between v2.7.1 (May 1) and v2.9.47 (May 28), 171 individual change strings in the changes arrays, averaging 15.1 hours between releases over the 27-day window. The cadence was not uniform: the largest single day was May 27 with six releases (v2.9.41 through v2.9.46) consolidating chat-window scroll and CPU fixes. The full record is in CHANGELOG.json at the root of the repo, mirrored at github.com/mediar-ai/fazm/blob/main/CHANGELOG.json. Each release object has version, date, and changes fields; you can grep '2026-05' on the file and count the matches yourself.

What is the practical pattern in the May 2026 changelogs?

Three patterns that show up across the projects, not just Fazm. First, multi-backend swappability is now table-stakes for an agent: Fazm added Gemini Pro and Flash alongside Claude and ChatGPT in v2.9.35, the Codex CLI shipped per-server MCP env plus OAuth, Claude Code shipped /code-review --fix that operates on whatever the active model is. Second, the MCP layer is being treated as the integration substrate rather than a competing tool layer; the spec RC, the Codex per-server MCP work, the Fazm Assrt MCP for browser automation, the lm-eval-harness LiteLLM backend, all point at the same convergence. Third, the boring stability work (scroll handling, CPU profiling, recovery loops, paywall enforcement) is what occupies most of the surface area of a real production agent's monthly diff. The model-weight headlines drive the discovery; the changelogs are where the product actually gets useful.

Where do I point ANTHROPIC_BASE_URL to use Command A+ or MiniCPM-V 4.6 from a Mac agent?

Stand up an Anthropic-compatible proxy in front of the actual inference server. LiteLLM is the most common bridge for Command A+, MiMo-V2.5-Pro, Granite 4.1, and Mistral Medium 3.5; it has shipped Anthropic API translation for several of these out of the box. vLLM 0.21.x ships an OpenAI-compatible server and pairs with LiteLLM to expose an Anthropic surface. For MiniCPM-V 4.6 the easiest path is the OpenBMB serving repo plus a LiteLLM bridge. Once the proxy URL is set, point a wrapper agent's custom API endpoint at it. In Fazm specifically that is a single Settings toggle (Settings, Advanced, AI Chat, Custom API Endpoint) that is read into env['ANTHROPIC_BASE_URL'] on the spawned ACP subprocess. Same pattern works for any agent that respects the env var.

Was there a new open-source voice or speech project in May 2026?

The notable open-source voice release was Argmax OSS SDK 1.0, which graduates the WhisperKit codebase into a broader SDK with WhisperKit Local Server (an OpenAI-compatible Vapor HTTP server) and Swift 6 concurrency. No new Moshi or Kyutai release shipped in May based on the primary sources we checked. On the consumer side, Fazm's voice work this month landed inside push-to-talk: v2.9.16 fixed the mic and PTT button needing two clicks in unfocused pop-out windows, v2.9.25 fixed PTT inserting repeated gibberish (such as 'भाई भाई भाई') when transcription hallucinated on silent audio, and v2.9.25 also fixed PTT transcription concatenating onto previously spoken text so each voice message starts fresh.

Where can I verify the Fazm CHANGELOG.json numbers myself?

Three steps. (1) Open https://github.com/mediar-ai/fazm/blob/main/CHANGELOG.json in a browser, or clone the repo and open CHANGELOG.json at the root. (2) The file is a JSON object with two keys, unreleased and releases. The releases array is ordered newest to oldest; each entry has version, date, and changes. (3) Filter the array on date starting with '2026-05' and count the entries (43) and the total length of the changes arrays (171). The first May entry is v2.7.1 dated 2026-05-01, the last is v2.9.47 dated 2026-05-28. Versions v2.9.7, v2.9.11, v2.9.13, v2.9.14, v2.9.23, v2.9.24, v2.9.27, v2.9.28, v2.9.29, v2.9.38, v2.9.39, and v2.9.40 do not appear as separate dated entries; the work landed batched into adjacent versions.

Want to talk through the May 2026 stack for your team?

20 minutes with Matt, the maintainer of Fazm. We can map the open-source pieces (Command A+, MiniCPM-V, the MCP RC, vLLM, MLX, the IDE agents) to whatever you are actually trying to build.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.