New AI model releases, papers, and open-source projects: June 3 to 4, 2026

Matthew Diakonov, Written with AI

Published June 20, 20268 min read

Direct answer · verified 2026-06-20

The one standout dated release in this 48-hour window was NVIDIA Nemotron 3 Ultra (550B-A55B) on June 4, 2026: a Mixture-of-Experts model with roughly 55B active parameters per token, a 1M-token context, and four checkpoints published under the Linux Foundation OpenMDW-1.1 license. It was the largest open-weight model available on the day it shipped. June 3 had no top-lab foundation-weight drop, and MiniMax M3 (announced June 1, API-only) still had no downloadable weights, those did not land until June 7.

Source: Nemotron 3 Ultra model card and the NVIDIA Nemotron asset hub.

The dated record, June 3 to 4

Two days is a small window. Here is the complete record rather than a padded one. External rows trace to launch coverage and the model cards; the application-layer rows trace to the Fazm repo commit log and git tags, which you can read directly.

Date	Kind	What shipped
June 4	Model release (open weight)	NVIDIA Nemotron 3 Ultra (550B-A55B) A 550-billion-parameter Mixture-of-Experts model with roughly 55 billion active per token, hybrid Mamba-2 and Transformer layers, and a 1M-token context. Shipped on Hugging Face, ModelScope, and OpenRouter with four checkpoints (NVFP4, BF16 instruct, BF16 base, GenRM) plus training data and recipes under the Linux Foundation OpenMDW-1.1 license. Announced at Computex 2026. This was the largest open-weight model available on the day it dropped.
June 3	Model release (top lab)	No downloadable top-lab foundation weights dated to June 3 No frontier lab published new downloadable foundation-model weights stamped June 3, 2026. The big open-weight event in this window was the NVIDIA drop on June 4. Listed here for honesty so the two-day record is complete rather than padded.
June 3 to 4	Weights still pending	MiniMax M3 weights had not landed yet M3 was announced June 1 as API-only, with open weights promised within about ten days. Those weights did not appear on Hugging Face until June 7, and the MiniMax Sparse Attention technical report did not reach arXiv until June 11. So on June 3 to 4, M3 was still an API, not a file you could pull.
June 3	Application layer	Fazm ships model-independent voice fallback Across the v2.9.62 to v2.9.65 tags, Fazm added a voice fallback for any model that finishes a turn without calling the speak_response tool. The commit message is literal: "Add voice fallback for models that skip speak_response tool." This is the dated anchor of this page, traceable commit by commit on github.com/mediar-ai/fazm.
June 3	Application layer	Fazm removes the macos-use-remote MCP server Commit "Remove macos-use-remote MCP server" lands the same day, trimming a remote computer-use surface in favor of the local accessibility-API path. A small cleanup, dated and readable in the repo, not a roadmap promise.

The part every roundup skips

Every other write-up about this window will give you the same three things about Nemotron 3 Ultra: the parameter count, the license, and a benchmark line you cannot independently confirm yet. None of them tells you what actually happens when you pull a model the day it ships and run it inside a real agent UI. I can, because I shipped a fix for one specific failure on June 3, the day before Nemotron dropped.

The failure is voice, and it fails silently. Fazm is voice-first: you hold a hotkey, talk, and the agent talks back. Under the hood the agent speaks by calling a tool named speak_response. Claude obeys that instruction reliably. The comment I wrote in the source is blunt about the rest: “codex/GPT and Gemini skip it far more than Claude.” A freshly released checkpoint whose tool-following is still rough behaves the same way. It writes a perfect text answer to the screen and never calls the speak tool. Nothing errors. The speakers just go quiet, and you assume voice is broken.

The fix: watch the result event, not the model

The June 3 release stopped trusting the model to cooperate. Instead of waiting for a tool call that a new model may never make, Fazm inspects the end-of-turn .result event. In Desktop/Sources/Chat/ACPBridge.swift (around line 1262), when a turn finishes with voice enabled, the text is non-empty, the session is a real foreground chat, and the model has not already spoken this turn (!spokeThisTurn), it calls ChatToolExecutor.speakModelIndependentSummary. The model never had to know the tool existed.

End-of-turn voice fallback (ACPBridge.swift)

The fallback also refuses to narrate the wrong things. A raw agent response is full of material you do not want spoken aloud: fenced code, image markdown, link URLs, headers, list markers. The spokenSummary function strips all of it with a chain of regular expressions, then caps the result at 450 characters and tries to end on a sentence boundary rather than mid-word. It reconstructs the same brevity the speak_response tool asks a cooperating model to produce, after the fact, from whatever a non-cooperating model happened to print.

ChatToolExecutor.swift

One more guard matters: the trigger excludes a set of non-spoken session keys (observer, graph-exploration, profile-exploration, and the spare warmup session). Those run headless and should stay silent. So when you swap in Nemotron 3 Ultra, your foreground chat talks back and the background agents do not suddenly start narrating over it.

Why this is the right lens for June 3 to 4

Nemotron 3 Ultra is exactly the kind of model this fix exists for. It is not Claude. Its tool-calling behavior on day one is unproven, the independent evals were not out yet, and the only way to actually try it on June 4 was through a gateway, not a Claude-native path. If your agent UI assumed the model would politely call a speak tool, you would have downloaded the largest open-weight model in the world and then sat in silence wondering why voice stopped working.

Because Fazm reaches a new model through any Anthropic-compatible endpoint and never hardcodes a model id, Nemotron was reachable the day it shipped. Because the voice path no longer depends on the model cooperating with a tool convention, it also talked back. That is the difference between a roundup that tells you a model exists and a tool that lets you use it the same afternoon. The full project, including the v2.9.62 to v2.9.65 tags dated June 3, is open source at github.com/mediar-ai/fazm.

Running a day-one model and voice went quiet?

Walk through how Fazm reaches new checkpoints through a gateway and keeps voice working when the model skips the speak tool.

Questions about the June 3 to 4 releases

Frequently asked questions

What AI models, papers, or open-source projects shipped on June 3 to 4, 2026?

The standout dated release was NVIDIA Nemotron 3 Ultra on June 4, 2026, a 550-billion-parameter Mixture-of-Experts model with about 55 billion active per token, a 1M-token context, and four checkpoints released under the Linux Foundation OpenMDW-1.1 license on Hugging Face, ModelScope, and OpenRouter. It was the largest open-weight model available that day. June 3 itself saw no top-lab foundation-weight drop. MiniMax M3, announced June 1 as API-only, still had no downloadable weights in this window (they landed June 7). At the application layer, Fazm tagged v2.9.62 through v2.9.65 across June 3 and shipped a model-independent voice fallback, all readable on github.com/mediar-ai/fazm.

What is NVIDIA Nemotron 3 Ultra and was it actually open source?

Nemotron 3 Ultra is a 550B-parameter Mixture-of-Experts foundation model with a hybrid Mamba-2 and Transformer architecture and roughly 55B active parameters per token, so it reasons like a 550B system but computes closer to a 55B one. It targets long-running autonomous agents with a 1M-token context. NVIDIA released it under the Linux Foundation OpenMDW-1.1 license with the weights, training data, and recipes published, so yes, it was genuinely open-weight on June 4, not an API-only preview. You can read the model card at huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4.

Can I trust Nemotron 3 Ultra's benchmark numbers?

Treat launch-day numbers as vendor self-reports until neutral evaluators publish. NVIDIA framed Nemotron 3 Ultra around efficiency claims (the headline lines were faster inference and lower cost per agentic task at its scale), but on June 4 the independent third-party scores you would want from neutral leaderboards had not been posted. The durable, verifiable fact in this window is the license and the checkpoints, which are on the model card today. The benchmark ranking is a claim to confirm once external evals land.

If I point a voice agent at a brand-new model the day it ships, what breaks?

Voice usually breaks first, and quietly. A voice-first agent often speaks by calling a tool, in Fazm that tool is speak_response. Claude obeys that instruction reliably. Newer or non-Claude models (Codex/GPT, Gemini, and freshly released checkpoints whose tool-following is still rough) frequently finish a turn with a full text answer but never call the speak tool, so the screen updates and the speakers stay silent. Nothing errors. You just stop hearing replies. That is the exact failure Fazm's June 3 release was built to absorb.

How does Fazm make voice work on a model that ignores the speak tool?

It watches the end-of-turn result event instead of trusting the model to call the tool. In Desktop/Sources/Chat/ACPBridge.swift, when a turn finishes (the .result case, around line 1262) and voice is enabled, the text is non-empty, the session is not a background one, and the model did not already speak this turn (!spokeThisTurn), Fazm calls ChatToolExecutor.speakModelIndependentSummary. That function builds a short spoken summary from the final rendered text and synthesizes it, so the model never had to cooperate with the tool convention for voice to work.

Why does the fallback summarize instead of reading the whole answer aloud?

A full agent response is full of things you do not want spoken: fenced code blocks, image markdown, link URLs, headers, bullet markers. The spokenSummary function in ChatToolExecutor.swift strips all of that with a sequence of regular expressions (it drops ```fenced``` blocks entirely, removes image and link syntax while keeping the visible text, deletes bare URLs and markdown markers), collapses whitespace, then caps the result at 450 characters and tries to end on a sentence boundary rather than mid-word. The goal is the same brevity the speak_response tool asks a cooperating model to produce, reconstructed after the fact.

Does the fallback talk over background or onboarding agents?

No. The trigger in ACPBridge.swift excludes a set of non-spoken session keys: observer, graph-exploration, profile-exploration, and the spare warmup session. Those run headless and should stay silent. The fallback only fires for a real foreground chat where voice is on. So the day you swap in Nemotron 3 Ultra or any new checkpoint, your main conversation talks back, and the background graph or profile agents do not suddenly start narrating.

Is a two-day model roundup even the right thing to track?

Not on its own. The feeds that actually move (huggingface.co/models?sort=created for new weights, arxiv.org/list/cs.CL/recent for preprints, github.com/trending for projects) refresh every day, so a fixed 48-hour list is stale within a week. The durable question is whether your tooling reaches a model the day it appears and keeps working once it is there. Nemotron 3 Ultra is reachable through any Anthropic-compatible gateway you point Fazm at, and the June 3 voice fallback means it talks back even though it is not Claude. That is the part you control.

Is Fazm open source and local, and can I verify all of this?

Yes. Fazm is a native macOS app (14.0+), fully open source on GitHub at github.com/mediar-ai/fazm, and runs locally. You bring your own Claude Pro or Max account and usage hits your existing plan. Every Fazm claim here is checkable in that repo: the v2.9.62 to v2.9.65 tags dated June 3, the speakModelIndependentSummary and spokenSummary functions in Desktop/Sources/Providers/ChatToolExecutor.swift, and the .result trigger in Desktop/Sources/Chat/ACPBridge.swift. The external claims trace to the Nemotron 3 Ultra model card on Hugging Face dated June 4, 2026.