Open source LLM releases in May 2026: one new model so far, four April tail releases, and the Swift block that decides which of them drive your Mac

The actual May 2026 ship calendar, the late-April cluster everyone is still benchmarking against it, and the three-line block at lines 522 to 525 of Fazm's open ACPBridge.swift that turns any of these weights into something that can drive your real macOS apps. Verified May 13, 2026.

Matthew Diakonov, Written with AI

Published May 13, 20269 min read

Direct answer, verified May 13, 2026

May 2026 open-weight LLM releases so far

One major new open-weight LLM has shipped between May 1 and May 13, 2026: OpenBMB MiniCPM-V 4.6 1.3B, on May 11, under Apache 2.0. The conversation in the room is still being carried by four late-April releases that landed within 48 hours of each other on April 28 and 29 (MiMo-V2.5-Pro, Nemotron 3 Nano Omni, Granite 4.1, Mistral Medium 3.5). The combined matrix is below.

Model	Date	License	Params	Context
MiniCPM-V 4.6 1.3B	May 11, 2026	Apache 2.0	1.3B	262K
MiMo-V2.5-Pro	Apr 28, 2026	MIT	1.02T / 42B active	256K
Nemotron 3 Nano Omni	Apr 28, 2026	NVIDIA Open Model	30B / 3B active	128K
IBM Granite 4.1	Apr 29, 2026	Apache 2.0	3B / 8B / 30B dense	up to 512K
Mistral Medium 3.5	Apr 29, 2026	Modified MIT	128B dense	256K

Sources, verified May 13, 2026: huggingface.co/openbmb/MiniCPM-V-4.6, huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro, blogs.nvidia.com (Nemotron 3 Nano Omni launch), research.ibm.com/blog/granite-4-1-ai-foundation-models, docs.mistral.ai (Mistral Medium 3.5 model card).

The one May 2026 release: MiniCPM-V 4.6 1.3B

OpenBMB shipped MiniCPM-V 4.6 1.3B Instruct on May 11, 2026 under Apache 2.0. The model is built on the LLaVA-UHD v4 visual backbone, accepts text, image, and video inputs, and reduces visual encoding FLOPs by more than 50% relative to its predecessor. ArtificialAnalysis records it at roughly 1.5x the token throughput of Qwen3.5-0.8B. On the AA Intelligence Index it lands at 12.7, on GPQA Diamond at 30.5%, on TAU2-bench at 87.7%. The context window is 262K tokens.

What this release is not: a frontier-class reasoning model. The label on the box says "1.3B parameter multimodal SLM optimized for on-device inference", and that is what it does. The right way to read MiniCPM-V 4.6 1.3B is not against MiMo-V2.5-Pro at 1.02T parameters; it is against the on-phone tier where the question is whether a 1.3B vision-language model can run continuously without melting the battery. For a Mac agent the use case is the always-on screen understanding pass, not the planning loop.

Weights are at huggingface.co/openbmb/MiniCPM-V-4.6. The OpenBMB serving repo on GitHub at github.com/OpenBMB/MiniCPM-V is the reference path. Both are reachable today.

The four late-April releases driving the May 2026 conversation

Four labs shipped frontier-class open-weight releases within 48 hours of each other on April 28 and 29, 2026. Calling them "April releases" is technically accurate and editorially misleading, because as of May 13 they are still the four models any open-weight roundup is being measured against. Each one is summarized below in the form a small business owner deploying it on a Mac actually needs.

April 28, 2026 - Xiaomi

MiMo-V2.5-Pro (1.02T MoE / 42B active)

Open-source Mixture-of-Experts language model under MIT. 1.02T total parameters with 42B active per token. Xiaomi's claim on its internal ClawEval benchmark is roughly 70,000 tokens per trajectory for a 64% Pass^3 score, a 40 to 60% token saving versus comparable agentic models. The most permissive license in this list and the largest by total parameter count. Weights on Hugging Face.

April 28, 2026 - NVIDIA

Nemotron 3 Nano Omni (30B / 3B active multimodal)

Single open multimodal model that unifies vision, speech, and language. Activates only 3B parameters per token and runs in 25GB of RAM. NVIDIA reports up to 9x higher throughput than other open omni models on video and document workloads. Released under the NVIDIA Open Model Agreement, which permits commercial use but is not OSI-approved. Launch blog.

April 29, 2026 - IBM

Granite 4.1 (3B / 8B / 30B dense)

Three dense decoder-only models under Apache 2.0. The 8B instruct model is reported to match or beat the prior Granite 4.0 32B MoE. Context windows extend up to 512K tokens. Twelve languages supported (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese). The release also covers speech, vision, embeddings, and Guardian safety models. The practical pick when you want a clean license, dense weights, and broad language coverage. Announcement.

April 29, 2026 - Mistral

Medium 3.5 (128B dense)

128B dense multimodal model under a Modified MIT license, 256K token context, 77.6% on SWE-Bench Verified. Replaces both Devstral 2 and Magistral in the Mistral lineup. The Modified MIT terms are permissive for individuals, startups, mid-market companies, and most universities, but companies above a revenue threshold need to negotiate a separate agreement. Read the actual license before deploying inside a large org. The EU-friendly frontier coder pick. Model card.

The part the other roundups never show: the three Swift lines that decide which of these weights drive your Mac

Every other guide that ranks these releases compares params, context, benchmark scores, and license terms. None of them quote the line of code in any consumer Mac agent that decides whether you can actually point that agent at any of these weights. That is the load-bearing question if you want to run any model on this list against your real macOS apps rather than reading about it.

Fazm is open source at github.com/m13v/fazm. The relevant block is at Desktop/Sources/Chat/ACPBridge.swift lines 522 to 525. It is three lines plus a comment:

Desktop/Sources/Chat/ACPBridge.swift

That is the whole mechanism. The Settings panel writes a string into UserDefaults under the key customApiEndpoint. When the bridge spawns the ACP subprocess that runs the agent loop, it reads the string and assigns it to ANTHROPIC_BASE_URL on the subprocess environment. The Anthropic SDK inside the subprocess honors that env var and rewrites every request to the proxy URL. The proxy translates the Anthropic-shape request into whichever model API actually owns the weights you care about.

That is the entire surface area. There is no model registry to update, no per-vendor adapter to maintain, no fork to keep in sync. A LiteLLM proxy in front of vLLM, an OpenRouter route that owns the Anthropic translation, an OpenBMB serving container with a thin bridge, an NVIDIA NIM endpoint behind LiteLLM. Any of these become a valid target for Fazm in the time it takes to paste the URL into the Settings field. The three lines above are the contract.

For comparison, the same plumbing exists in some other agents only inside a developer framework with a configuration file you have to edit before launching, or behind a paid tier, or not at all (the agent is hardcoded to one vendor). The reason Fazm exposes it as a Settings toggle is that the same group of users who care about the May 2026 open-weight calendar are exactly the users who want to route their Mac agent at the weights they prefer.

The proxy or inference server that actually serves the weights. The Mac agent does not care which one you pick.

What you point ANTHROPIC_BASE_URL at

LiteLLM

Anthropic-compatible translation layer for almost every open weight in this list. Most common bridge.

vLLM 0.20.1

OpenAI-compatible inference server, May 3 2026 release. Pairs with LiteLLM to expose an Anthropic surface.

OpenRouter

Hosts several Anthropic-compatible routes. Good fit when you do not want to run inference yourself.

Ollama

Local model server. Pair with an Anthropic-compat bridge for any of the smaller weights in this list.

NVIDIA NIM

Official microservice for Nemotron 3 Nano Omni. Wrap it in LiteLLM the same way as any other inference server.

OpenBMB serving

The reference serving repo for MiniCPM-V 4.6 1.3B. Bridge through LiteLLM for Anthropic-compat.

Picking a model from this list for a Mac agent workflow

The honest answer is that none of these five releases will make a Mac agent obviously better at every task on day one, and the right choice depends on which part of the agent loop you care about. The rough mapping below is what a Mac-first user actually picks for each common case, given the May 2026 release set.

If you want	Pick	Why
Most capable open weights, MIT license	MiMo-V2.5-Pro	1.02T MoE, 42B active. Clean MIT terms. Heavy to host but no field-of-use restrictions.
Always-on screen understanding, on-device	MiniCPM-V 4.6 1.3B	1.3B vision-language SLM in 262K context. The fits-on-a-phone tier.
Frontier coding for an EU-based team	Mistral Medium 3.5	128B dense, 77.6% SWE-Bench Verified, 256K context. Read the Modified MIT terms before deploying inside a large company.
Enterprise-grade Apache 2.0 dense model	IBM Granite 4.1 (8B or 30B)	Clean Apache 2.0 license, dense architecture, up to 512K context, broad language coverage, full Guardian safety models in the same release.
Single model for vision plus speech plus text	Nemotron 3 Nano Omni	30B / 3B active, runs in 25GB of RAM, native audio in the same model. NVIDIA Open Model terms to read first.

What is still on the May 2026 release calendar

There are eighteen days left in the month at the time of writing. The pattern across the first four months of 2026 has been roughly one to three open-weight releases per week of any size, so the realistic forecast is at least three or four more model drops before end-of-month, plus an unknown number of inference-server and tooling releases that are open source but not LLM weights (vLLM 0.20.2 has been previewed). The largest pending name is GLM-5.2, which has been previewed by Z.AI but has no firm date.

This page is dated May 13, 2026 and reflects what has actually shipped through that date. Treat the "one new model so far" framing as a snapshot of a moving calendar, not a final accounting of May.

Want to point a Mac agent at your own May 2026 open-weight model

Book 15 minutes. I will walk through the proxy setup that turns any of the models in this list into something that drives your macOS apps end to end.

Frequently asked questions

What open source LLMs released in May 2026 so far?

As of May 13, 2026, the only major new open-weight model released this month is OpenBMB MiniCPM-V 4.6 1.3B (released May 11, 2026, Apache 2.0). It is a 1.3B parameter multimodal vision-language model with a 262K token context, accepting text, image, and video inputs. The release sits inside a 2026 open-weight calendar where the dominant story is the four late-April drops still being benchmarked everywhere: Xiaomi MiMo-V2.5-Pro on April 28 (MIT, 1.02T total / 42B active MoE), NVIDIA Nemotron 3 Nano Omni on April 28 (NVIDIA Open Model Agreement, 30B / 3B active multimodal, runs in 25GB of RAM), IBM Granite 4.1 on April 29 (Apache 2.0, dense 3B / 8B / 30B with up to 512K context), and Mistral Medium 3.5 on April 29 (Modified MIT, 128B dense, 256K context). May is shaping up as a slow month relative to the late-April cluster.

Why is the May 2026 open-weight calendar so thin?

Two reasons. First, the late-April week of April 28 to 29 absorbed most of the spring release energy, with four frontier-class labs (Xiaomi, NVIDIA, IBM, Mistral) shipping within 48 hours of each other; that is a hard release pattern to follow inside the same calendar month. Second, the proprietary side took the May 5 to May 6 window (OpenAI GPT-5.5 Instant on May 5, xAI Grok 4.3 on May 6), which tends to soak up benchmark airtime even for labs that are not competing on the same axis. The MiniCPM-V 4.6 1.3B release on May 11 is a deliberate small-model play on a separate axis (multimodal SLM that fits on a phone), so it is not really competing with the late-April large drops for headline space. Several open-weight labs are publicly aiming at end of May for their next drop. There is no fundamental reason May ends up thin; the calendar mechanics just landed this way.

How do I run any of these May 2026 open-weight LLMs against my real Mac apps?

The model weights only get you halfway. You also need a consumer Mac agent that will let you point its inference target at a proxy server instead of the default Anthropic API. In Fazm specifically that is a single Settings toggle backed by a three-line block at lines 522 to 525 of Desktop/Sources/Chat/ACPBridge.swift. The block reads the customApiEndpoint string from UserDefaults and assigns it to env["ANTHROPIC_BASE_URL"] on the spawned ACP subprocess. With that env var pointed at a LiteLLM, vLLM, or any Anthropic-compatible proxy that re-emits the request as MiniCPM-V 4.6, MiMo-V2.5-Pro, Nemotron 3 Nano Omni, Granite 4.1, or Mistral Medium 3.5, the agent runs end to end against your real macOS accessibility tree. The code is visible to anyone reading the open repo at github.com/m13v/fazm.

What is the license picture for the May 2026 release set?

Three of the five are genuinely permissive. MiniCPM-V 4.6 1.3B and IBM Granite 4.1 ship under Apache 2.0 with no field-of-use or revenue clauses. Xiaomi MiMo-V2.5-Pro ships under MIT, also unconditional. Mistral Medium 3.5 ships under a Modified MIT license that is permissive for individuals, startups, mid-market companies, and most universities, but requires a negotiated commercial agreement for companies above a revenue threshold; if you are deploying inside a large company you have to read the actual license terms. NVIDIA Nemotron 3 Nano Omni ships under the NVIDIA Open Model Agreement, which permits commercial use but is not OSI-approved and has its own redistribution and attribution rules. If you need a clean, unencumbered weights file for a small business workflow on a Mac, the Apache 2.0 and MIT entries are the safe picks.

Does MiniCPM-V 4.6 1.3B make sense as the inference target for a desktop Mac agent?

For most agent workloads no, for two specific workloads yes. The model is optimized for vision-language efficiency on a phone (LLaVA-UHD v4 backbone, roughly 50% fewer visual encoding FLOPs than the prior version, about 1.5x token throughput versus Qwen3.5-0.8B). For the general desktop agent loop, where the model needs to plan multi-tool sequences and reason over the macOS accessibility tree, a 1.3B parameter model is undersized; the late-April drops are the realistic targets there. Where MiniCPM-V 4.6 is genuinely interesting on a Mac is for the always-on screen understanding loop (reading what is currently visible without spinning up a frontier model for each frame) and for any flow that has to fall back to an offline pass when the network is unreliable. Those are exactly the cases where a 1.3B model with a 262K context is a better fit than a 100B-class frontier model.

Where do I point the ANTHROPIC_BASE_URL to make each of these models work?

Most teams run an Anthropic-compatible proxy in front of the actual inference server. LiteLLM has shipped Anthropic API translation for several of these models out of the box and is the most common bridge for MiMo-V2.5-Pro, Granite 4.1, and Mistral Medium 3.5. vLLM 0.20.1 (the May 3, 2026 release) ships an OpenAI-compatible server and pairs with LiteLLM to expose an Anthropic surface. For MiniCPM-V 4.6 the easiest path is the OpenBMB serving repo plus a LiteLLM bridge. For Nemotron 3 Nano Omni, NVIDIA's NIM microservice is the official path; you wrap it in LiteLLM the same way. In all cases the only thing the Mac agent needs to know is the proxy URL; the three-line block in ACPBridge.swift does the rest.

Are there any May 2026 releases that are open source but not open weights?

Yes, and they matter for the agent layer. Hugging Face shipped ml-intern (the open-source post-training automation agent built on the smolagents framework) on April 21, 2026, which is open code but not a model release. Several inference servers also shipped May 2026 versions that are open source but not models: vLLM v0.20.1 on May 3 stabilizes DeepSeek V4 inference and ships a configurable VLLM_MULTI_STREAM_GEMM_TOKEN_THRESHOLD knob. These are part of the May 2026 open-source picture even though they are not LLM weight releases.

What about the late-April releases everyone is still talking about in May?

The four-release cluster on April 28 and 29 is genuinely load-bearing for the May 2026 open-source LLM conversation, and any roundup that omits them is misleading. The shortest honest summary: Xiaomi MiMo-V2.5-Pro is the largest by raw parameter count (1.02T MoE, 42B active) and shipped under MIT, which makes it the most permissive frontier-class release. NVIDIA Nemotron 3 Nano Omni is the efficiency story (30B total / 3B active multimodal, runs in 25GB of RAM, claimed up to 9x higher throughput than other open omni models on video and document workloads). IBM Granite 4.1 is the practical enterprise pick (Apache 2.0 dense 3B/8B/30B with up to 512K context and 12 supported languages). Mistral Medium 3.5 is the 128B dense coder (256K context, 77.6% on SWE-Bench Verified) under a Modified MIT license that you need to read carefully for commercial deployments.

Will more May 2026 open-weight releases land before the end of the month?

Probably yes. The pattern through 2026 has been one to three open-weight releases of any size per week, and the May 1 to May 13 window has only produced MiniCPM-V 4.6 1.3B. The labs publicly aiming at late May include the GLM team (GLM-5.2 has been previewed) and at least two China-based labs that signaled May drops without dates. The list on this page reflects what is verifiably shipped as of May 13, 2026; it will be updated as actual releases land.

Closer reading on the 2026 open-weight cycle and the consumer Mac agent layer.

Related guides

Open weights

Open source LLM news 2026: the releases everyone covers, and the one Mac agent that will actually run them

The 2026 release cycle from inside a shipping Mac agent, with the one-field switch that points the engine at a community proxy.

Read

Inference

vLLM release May 2026: v0.20.1 patch, the v0.20.0 dep-floor jump, and why your Mac agent does not care

vLLM v0.20.1 stabilizes DeepSeek V4. The four Swift lines on the Mac side that make a server upgrade invisible.

Read

Open weights

Open source LLM news, April 2026: the weights are loud, the consumer agents are quiet

April 2026 shipped a wave of open weight LLMs; very few new MIT-licensed consumer agents to drive them.

Read

May 2026 open-weight LLM releases so far

The one May 2026 release: MiniCPM-V 4.6 1.3B

The four late-April releases driving the May 2026 conversation

MiMo-V2.5-Pro (1.02T MoE / 42B active)

Nemotron 3 Nano Omni (30B / 3B active multimodal)

Granite 4.1 (3B / 8B / 30B dense)

Medium 3.5 (128B dense)

The part the other roundups never show: the three Swift lines that decide which of these weights drive your Mac

What you point ANTHROPIC_BASE_URL at

Picking a model from this list for a Mac agent workflow

What is still on the May 2026 release calendar

Want to point a Mac agent at your own May 2026 open-weight model

Frequently asked questions

Related guides

Open source LLM news 2026: the releases everyone covers, and the one Mac agent that will actually run them

vLLM release May 2026: v0.20.1 patch, the v0.20.0 dep-floor jump, and why your Mac agent does not care

Open source LLM news, April 2026: the weights are loud, the consumer agents are quiet

Comments (••)

Comments ()