The open models dropped. Then you point your agent at them and the endpoint gets zero requests.
Every roundup of these two days hands you model cards and benchmark tables for the mid-June open-weight wave. None of them mention what happens when you actually try to run one of those models inside a Claude-Code-style agent loop: the custom endpoint you configure only routes Claude traffic, so it is easy to wire everything up and have it silently do nothing. This page gives the date-honest answer first, then the trap nobody covers and the one-line check that catches it.
Direct answer, verified 2026-06-18
No major foundation model carries a precise June 15 or June 16, 2026 release date. The dense mid-June open-weight wave brackets the window rather than landing on it: Moonshot AI released Kimi K2.7-Code on June 12 with open weights, and Z.ai's GLM-5.2 landed in the same mid-June stretch. Because no platform publishes a list keyed to a calendar day, the only date-honest sources for any 48-hour window are:
- huggingface.co/models sorted by trending, for weights and quantized variants.
- huggingface.co/papers for research with a linked implementation.
- github.com/trending for agents, MCP servers, and inference engines.
- llm-stats.com/ai-news for a running, dated changelog of model launches.
That is the literal answer. The rest of this page is the thing the spec tables leave out: once you have an open model worth testing, the limit that decides whether your requests even reach it is set by your client, not the model.
The window, dated honestly
Mid-June 2026 was one of the denser open-weight stretches on record, dominated by Chinese coding-first models shipping open weights at frontier-adjacent quality. The two that frame this window:
- Kimi K2.7-Code (Moonshot AI), June 12, 2026
A coding-focused agentic model built on K2.6, with full weights published to Hugging Face under a Modified MIT license. 1 trillion total parameters, 32 billion active per token, 256K context. Moonshot reports double-digit gains over K2.6 on its own benchmarks, with the honest caveat that no independent SWE-bench Verified scores existed yet at release, so the numbers were first-party only.
- GLM-5.2 (Z.ai), mid-June 2026
Coding-first, with a 1M-token context window aimed at repository-scale, long-horizon work, announced with open weights. Like Kimi, every public claim revolves around coding and agentic tasks rather than general chat.
Neither of those carries a hard June 15 or June 16 timestamp, and a vendor's own benchmark numbers are self-reported until an independent harness re-runs them. The useful move with a fresh open model is not to read the table and move on; it is to wire it into the agent loop you already use and run it on a real task. Which is exactly where the next part bites.
How a request actually routes
Here is the mechanism the spec tables never draw. To run an open model inside a Claude-Code-style loop, you stand up an Anthropic-API-compatible gateway in front of it and override the base URL so Claude traffic goes there instead of to Anthropic. The override is per-backend. Watch what happens to the same custom endpoint depending on which model you have selected.
Same custom endpoint, two different selected models
Nothing errors. Each backend has its own credential and route: Gemini models use a bundled Gemini key, Codex models use the Codex backend, and only Claude models read the Anthropic base URL you set. Configure the endpoint while your model is on the default and every request happily goes somewhere else. The endpoint you carefully set up never sees a single call.
The guardrail that landed June 15: a one-line predicate
On June 15, 2026, Fazm shipped a fix for exactly this silent no-op. The detection is a single helper added to ChatProvider.swift (commit 473b8a68), defined by exclusion: a model is a Claude model if it is neither Gemini nor Codex, because the Anthropic base-URL override only touches the Anthropic path.
/// True if the model id routes through the Anthropic SDK path (Claude).
/// Defined by exclusion: anything that is neither a Gemini nor a Codex
/// model goes through the Anthropic path. This matters for the Custom API
/// Endpoint setting, which only overrides `ANTHROPIC_BASE_URL` and
/// therefore ONLY affects Claude-routed traffic - Gemini/Codex models
/// bypass it entirely.
static func isClaudeModelId(_ modelId: String) -> Bool {
return !isGeminiModelId(modelId) && !isCodexModelId(modelId)
}The settings panel (commit ead89cf8) then renders an orange warning whenever isClaudeModelId(selectedModel) is false, with a one-tap button to switch to a Claude model so the endpoint you set actually gets traffic:
// The custom endpoint only overrides ANTHROPIC_BASE_URL, so it
// ONLY routes Claude-model traffic. Gemini/Codex models bypass it
// entirely. New users default to Gemini Flash, so without this
// warning the endpoint silently receives zero requests.
if !ChatProvider.isClaudeModelId(shortcutSettings.selectedModel) {
// orange warning: "Your current model does not use this
// endpoint. The custom endpoint only applies to Claude models.
// Switch to a Claude model for your requests to reach it."
// ... plus a "Switch to a Claude model" button
}The detail that makes this checkable rather than marketing: the comment names the exact reason the trap exists (new users default to Gemini Flash) and the warning keys off the same predicate that governs the real routing. There is no separate, drift-prone list of model names to maintain. If a model routes through the Anthropic path, the endpoint applies and the warning stays quiet. If it does not, the warning fires.
The number on the model card vs. the number that decides your run
Two columns of facts about the same fresh open model. The left is what the release post tells you. The right is what determines whether you actually evaluate it inside your agent loop. Roundups print the left and never mention the right.
| Feature | What the model card prints | What your client controls |
|---|---|---|
| Context window | 256K (Kimi K2.7-Code), 1M (GLM-5.2) | Irrelevant if requests route to a different model |
| How to reach the model | "Anthropic/OpenAI-compatible API" | Base-URL override applies only to Claude-routed traffic |
| What happens on misconfig | Not addressed | Silent zero requests, no error, until the June 15 warning |
| License | Open weights (Modified MIT for Kimi) | Does not change which backend your selected model uses |
| Benchmark scores | First-party, no independent runs yet at release | Only meaningful once your own task actually hits the model |
That is the whole argument for why a one-line predicate belongs next to a model launch. The releases are loud and frequent; the thing that decides whether you can put your hands on one of them is quiet and lives in your client. On June 15, that quiet thing went from a silent no-op to a visible warning.
Want to run a fresh open model through a real agent loop, end to end?
Fifteen minutes. I will stand up a custom endpoint in front of an open-weight model, show the routing warning fire and clear, and run the same task through it so you can see traffic actually reach the endpoint.
Frequently asked questions
What AI model released on June 15 or June 16, 2026?
Nothing major is pinned to either exact day. The honest reason is that neither Hugging Face nor GitHub publishes a list keyed to a calendar date; both order discovery by a rolling trending score with no notion of when something shipped. What surrounds the window is the dense mid-June open-weight wave: Moonshot AI released Kimi K2.7-Code on June 12, 2026 with open weights on Hugging Face under a Modified MIT license, and Z.ai's GLM-5.2 landed in the same mid-June stretch with a 1M-token context window. For any precise 48-hour slice the date-honest feeds are huggingface.co/models sorted by trending, huggingface.co/papers, and github.com/trending.
Are Kimi K2.7-Code and GLM-5.2 actually open source?
They ship open weights, which is not the same as fully open source. Kimi K2.7-Code published its full weights to Hugging Face under a Modified MIT license, with 1 trillion total parameters and 32 billion active per token at a 256K context window. GLM-5.2 is coding-first with a 1M-token window and was announced with open weights. Both are open-weight, meaning you can download and self-host the model, but training data and the full pipeline are not necessarily released. For the day-to-day question that matters here, the relevant fact is that both expose an API you can point an agent loop at.
Can I run a June 2026 open model like Kimi or GLM inside Claude Code or a Claude Code wrapper?
Sometimes, and the catch is the whole point of this page. Claude Code and tools built on it route Claude-model traffic through the Anthropic API. To send that traffic to a different model you override the base URL (the ANTHROPIC_BASE_URL environment variable) and point it at an Anthropic-API-compatible gateway in front of the open model. The trap: that override only affects requests that were going to go to Claude in the first place. If your selected model is a Gemini or a Codex/GPT model, it routes through a different backend entirely and never touches your custom endpoint, so the endpoint silently receives zero requests.
What is the June 15, 2026 Fazm guardrail this page is about?
Fazm is a native macOS app that wraps Claude Code and Codex via ACP. Its Settings page has a Custom API Endpoint field for routing Claude traffic through your own Anthropic-compatible gateway. On June 15, 2026, commit ead89cf8 added a check: if your currently selected model is not a Claude model, the settings panel shows an orange warning that reads, in part, "Your current model does not use this endpoint. The custom endpoint only applies to Claude models." It even offers a one-tap "Switch to a Claude model" button. The detection is a new helper, isClaudeModelId, added the same day in commit 473b8a68.
How does isClaudeModelId decide what counts as a Claude model?
By exclusion. In Desktop/Sources/Providers/ChatProvider.swift the helper is one line: a model id is a Claude model if it is neither a Gemini model nor a Codex model (return !isGeminiModelId(modelId) && !isCodexModelId(modelId)). isCodexModelId matches gpt-, codex-, and o[0-9]- prefixes; isGeminiModelId matches gemini- and auto-gemini- prefixes. Anything left over routes through the Anthropic path, which is the only path the ANTHROPIC_BASE_URL override touches. That is why the warning keys off this exact predicate.
Why does the endpoint silently get zero requests instead of erroring?
Because nothing is technically broken. Each backend has its own credential and route: Gemini models use a bundled Gemini key, Codex models use the Codex backend, and only Claude models read the Anthropic base URL. Set a custom endpoint while your model is on the default (new Fazm users default to Gemini Flash) and every request happily goes to Gemini. No error, no failed call, just an endpoint you carefully configured that never sees traffic. A silent no-op is worse than an error because you assume it is working. That is the failure mode the warning exists to surface.
What is the difference between a model's context window and where my requests route?
They are unrelated limits that roundups constantly blur. The context window (256K for Kimi K2.7-Code, 1M for GLM-5.2) is how much the model can read in a single turn, set by the vendor. Routing is which backend and which endpoint your request goes to, set by your client and your model selection. A 1M-token window does you no good if your traffic is going to a different model than the endpoint you configured. The context-window number is on every model card; the routing behavior is on none of them, because it lives in the app, not the model.
How do I verify the Fazm change myself?
Open github.com/mediar-ai/fazm. Read Desktop/Sources/Providers/ChatProvider.swift for the isClaudeModelId helper and its comment explaining that the custom endpoint only overrides ANTHROPIC_BASE_URL. Read Desktop/Sources/MainWindow/Pages/SettingsPage.swift for the orange warning block that renders when isClaudeModelId(selectedModel) is false. The git log shows commit 473b8a68 ("Add isClaudeModelId helper and expose isCodexModelId") and commit ead89cf8 ("Add warning for non-Claude models in custom endpoint settings"), both authored June 15, 2026. Everything is public.
Verify every claim here
- For the releases: open llm-stats.com/ai-news and huggingface.co/models. Kimi K2.7-Code and GLM-5.2 sit in the mid-June window; nothing major is pinned to June 15 or 16 specifically.
- For the live landscape: open huggingface.co/papers and github.com/trending. The trending order shifts hour to hour, but the submission and commit dates are stable.
- For the Fazm guardrail: open github.com/mediar-ai/fazm, read
Desktop/Sources/Providers/ChatProvider.swiftfor the isClaudeModelId helper andDesktop/Sources/MainWindow/Pages/SettingsPage.swiftfor the warning block, then check the git log for commits 473b8a68 and ead89cf8, both authored June 15, 2026.
Adjacent reading
Related guides
New AI model releases, papers, open source, June 14-15, 2026
The prior dated roundup, on the session-retention limit nobody benchmarks: the day Fazm raised its conversation history list from 20 to 500.
New AI model releases, papers, open source, June 13-14, 2026
Two days earlier in the same mid-June open-weight wave, with the same date-honest method for reading a 48-hour window.
AI agent context checkpoints
The session layer underneath all of this: keeping past work reachable and full context intact, independent of which model answers.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.