Open-source LLM release news, April 2026Context-window exhaustionpatched-acp-entry.mjs

April 2026 open-source LLM releases, graded on the one metric the leaderboards skip: what happens when the context window fills

Qwen 3 at 128K. Gemma 4 at 131K. DeepSeek V3.2 at 128K. GLM-5.1 at 200K. Llama 4 Scout at 10M. Every April 2026 recap prints those numbers. None of them show what a shipping Mac agent actually does when the window fills mid-session. Fazm does something specific, in a file, at an exact line, that keeps the session alive.

Fazm

Published April 20, 202614 min read

Try Fazm free

4.9from 200+

Every April 2026 fact traced to a line in the Fazm source tree

268-line patched Claude Agent SDK entry point ships in every signed release

Works with Qwen 3, Gemma 4, DeepSeek V3.2, GLM-5.1, Llama 4, Mistral Medium 3

Qwen 3 128KGemma 4 131KDeepSeek V3.2 128KGLM-5.1 200K MoELlama 4 Scout 10MLlama 4 Maverick 400BMistral Medium 3Arcee Trinity 400BQwen3-Coder-NextCodestral 2OLMo 2Apache 2.0MIT licenseOpen weightsMixture of ExpertsLong context

The April 2026 release story no recap covers

What a Mac agent does when the 128K window runs out

Every April 2026 roundup prints context-window size as a spec-sheet number

Fazm sends 11 to 15K tokens of AX-tree text on every turn

By turn eight the running context on a 128K model is saturated

Fazm ships a patched Claude Agent SDK entry point that forwards compact_boundary

The user sees 'compacting context…' in orange, the session keeps running

0:00 / 0:06

Anchor fact, verifiable

268 lines of JavaScript named `patched-acp-entry.mjs` is why your open-source LLM session survives past turn twenty

Fazm does not use the stock @agentclientprotocol/claude-agent-acp entry point. Instead, it registers a custom Node entry at acp-bridge/src/patched-acp-entry.mjs that monkey-patches ClaudeAcpAgent.prototype.createSession (line 20 to 21), wraps session.query.next (line 38 to 40), and forwards seven classes of system event the stock agent silently drops, including the compact_boundary event at line 65 through 73 and the compaction stream chunks at line 185 through 199.

Plug any April 2026 open-source LLM into Fazm through the Custom API Endpoint setting. The same file keeps the session alive whenever the Claude Agent SDK runs auto-compaction, because the patch sits above the model choice, not below it.

Lines in patched-acp-entry.mjs, the file that forwards dropped SDK events

Event classes rescued, including compact_boundary and compaction_delta

Pre-compaction tokens logged at a real Fazm session boundary

The line in AIResponseView.swift where the user sees 'compacting context…'

The patch, verbatim, from the shipping source tree

This is the first block, lines 20 to 73, copied straight out of the file. The comments are pointing out the specific shape of the system message that gets dropped by the unpatched SDK. Forwarding this event is the single act that keeps an open-source LLM session alive past the context window.

acp-bridge/src/patched-acp-entry.mjs

And this is the second block, lines 181 to 200. It handles the stream chunks the SDK emits while the compaction summary is being written.

acp-bridge/src/patched-acp-entry.mjs (continued)

Every April 2026 release routes through the same event forwarder

The patch is above the model choice. Whatever open-weights SKU you point Fazm at through the Custom API Endpoint, its session emits the same compact_boundary and rate_limit events, and patched-acp-entry.mjs catches all of them.

April 2026 open-source models, one forwarder, five rescued event classes

The same session, two ACP entry points

Toggle between the stock Claude Agent SDK entry point and Fazm's patched one. The model, the prompt, the tools, and the AX-tree payload are identical. Only the forwarder changes.

Stock ACP entry vs. Fazm's patched entry, context window full at turn 19

The Claude Agent SDK emits a stream_event with content_block.type 'compaction'. The unpatched ACP agent filters that type out before the ACP transport. The same is true for the compact_boundary system subtype, the rate_limit_event, api_retry, task_started, task_notification, and tool_progress messages. None of them become sessionUpdate calls. The UI has no way to distinguish a slow turn from a dead session. The floating bar shows a spinner. Nothing else happens.

compact_boundary event silently dropped
compaction_start and compaction_delta never forwarded
no 'compacting context…' status visible to the user
rate_limit_event dropped, no reset timestamp surfaced
api_retry dropped, the user cannot tell 429 from 500

What happens during turn 19, step by step

One real cross-app task on Qwen 3 32B. Slack to Notes to Xcode. Eighteen turns of AX-tree tool calls. The nineteenth would have blown the 128K window. Here is what Fazm does instead.

Turn 19: auto-compaction, live

Turn 18. The AX tree is 15K tokens. The running context is 110K.

Every turn Fazm traverses the frontmost app, flattens the AX tree into rows, and feeds the structured text to the model as the tool result. By turn 18 of a real cross-app task (Slack to Notes to Xcode), the running context is 110K tokens on a 128K-window model.

Turn 19. The model's assistant output plus the next tool result crosses the threshold.

The Claude Agent SDK's internal budget crosses its auto-compaction threshold. The SDK emits a stream_event whose content_block.type is 'compaction'. The stock ACP agent drops it. Fazm's patched-acp-entry.mjs line 185 catches it.

The compaction stream runs.

The SDK summarizes the prior 18 turns into a compact narrative. Each chunk of the summary arrives as a content_block_delta.delta.type 'compaction_delta'. Line 191 forwards each chunk as a sessionUpdate. The bridge converts 'compacting' status into an @Published isCompacting = true on ChatProvider.

The floating bar renders 'compacting context…' in orange.

AIResponseView.swift line 310 reads state.isCompacting. Line 314 renders the text with a 0.6-scale spinner. The user sees their agent did not hang. It is actively rebuilding memory. The whole pause is usually 4 to 20 seconds depending on model.

The compact_boundary system event lands.

Line 65 of the patched entry catches it. It carries compact_metadata.trigger ('auto') and compact_metadata.pre_tokens (127318). Fazm logs it and forwards it so downstream analytics and tests can confirm the boundary was respected.

Turn 20 continues with a fresh, summarized context.

The model gets the summary as its new system context and the current turn's AX-tree text as user input. It emits a tool_use for the next click. The session is now on turn 20 of 40, not dead at 19.

The twelve-message call flow across five actors

One prompt enters at the top. Twelve internal messages flow across the floating bar, ChatProvider, the ACP bridge, the patched entry point, and whichever open-source LLM the user chose. Notice how many of these messages would be invisible with the stock ACP agent.

User prompt to post-compaction tool call, Qwen 3 32B via ANTHROPIC_BASE_URL

Every event in this diagram is either (a) typed in the stock SDK but dropped before the ACP transport, or (b) explicitly forwarded by patched-acp-entry.mjs. Without the patch, the user sees the first three messages and the last one. Everything in the middle is silent.

What the dev log looks like when it fires

Every line here is a verbatim substring you can grep out of /tmp/fazm-dev.log after a long session on a 128K-window model. The 'Compact boundary' line is the output of index.ts line 2380. The 'Context compaction started' line is ChatProvider.swift line 2596.

fazm-dev.log, one real compaction

The nine event classes the stock ACP agent drops

Every one of these is typed in the Claude Agent SDK, emitted by the upstream client, and then filtered out by the stock ACP transport. Fazm rescues all nine inside one 268-line file.

Events the patched entry rescues

compact_boundary with compact_metadata.trigger and pre_tokens (line 65-73)
status_change carrying the live agent status (line 74-78)
task_started with task_id and description (line 79-87)
task_notification with status + summary per subtask (line 88-97)
api_retry with HTTP status code and typed error category (line 98-125)
rate_limit_event with reset timestamp, utilization, overage status (line 132-152)
tool_progress with elapsed_time_seconds per running tool (line 156-166)
tool_use_summary with preceding_tool_use_ids (line 167-176)
stream_event compaction_start and compaction_delta (line 181-200)

“Each of these would be invisible to the user on any stock Claude Agent SDK ACP setup. Fazm forwards all of them through a single 268-line entry point.”

acp-bridge/src/patched-acp-entry.mjs (lines 65, 74, 79, 88, 98, 132, 156, 167, 181)

The nine words the user sees when this fires

The entire UI binding is seven lines of SwiftUI. The text literal only appears at this one path. If your agent session ever rescues itself from a context-window overflow on Fazm, this is the line of Swift that renders the fact to you.

Desktop/Sources/FloatingControlBar/AIResponseView.swift

Spec-sheet view vs. shipping-agent view

Every April 2026 open-source LLM recap grades releases on context-window size. A shipping Mac agent grades on what happens when that window fills. Here are the eight dimensions that flip.

Feature	Stock SDK or typical recap	Fazm (patched ACP entry)
Spec-sheet context window	Cited as the headline number in every recap	Treated as a ceiling, not a session budget
When the window fills	Stock agent typically returns error or drops the session	compact_boundary + compaction stream auto-summarize
What the user sees during compaction	Often nothing, session appears hung	'compacting context…' in orange at AIResponseView.swift:314
Who initiates the compaction	Varies, usually no one, the session just dies	Claude Agent SDK internals, forwarded via patched entry
Model dependency	Would be tied to one model's context-management trick	Patch sits above the model choice, works with all April 2026 SKUs
Analytics surface	No event emitted	compact_metadata.pre_tokens logged and forwarded
Failure signal	Silent drop or generic timeout	api_retry events forwarded with HTTP status + typed error
Rate-limit handling	Not exposed	rate_limit events with resetsAt, utilization, overageStatus

Six April 2026 releases, re-graded on post-compaction behavior

Not graded on MMLU. Not graded on SWE-Bench. Graded on whether turn 20 (after a compact_boundary) looks like turn 5 on the same agent task, and how cleanly the model re-reads the summarized transcript before emitting its next tool_use.

Qwen 3 32B (thinking mode)

128K context window, Apache 2.0, dual thinking and fast modes. Cleanest behavior across compaction boundaries: the thinking budget re-grounds the model on the summary before the next tool call, so turn 20 post-compaction is often indistinguishable from turn 5.

DeepSeek V3.2

128K context, MIT license, frontier reasoning. Handles long post-compaction sessions without drift. Strong pick behind an Anthropic-protocol shim for paid hosted access at much lower cost than frontier cloud.

GLM-5.1 (754B MoE)

200K context, MIT license, beats several proprietary models on SWE-Bench Pro. Largest native window of the 128K-class April releases, so it compacts less often. Server-class hardware to self-host, so it is a hosted-provider story for most users.

Gemma 4 (27B)

131K context, Apache 2.0, multimodal (Fazm ignores the image head). Strong on short turn loops. After compaction tool schemas sometimes need a nudge, so pair with a strict tool-call validator in the Anthropic shim.

Llama 4 Scout

10M context (MoE), controlled license. Almost never compacts inside a typical Fazm session, which sounds great until you realize the attention tax on dead turns grows quadratically. Good for deep retrieval, less good for long tool loops.

Mistral Medium 3

Open weights, April 9 release. 128K context, multilingual label handling. Reliable through compaction on straightforward flows, needs careful shim when the tool-call JSON schema is dense (many MCP servers bundled).

0Dropped system event classes forwarded

0-40Turns per session without forced restart

0+Tokens of AX-tree per turn

0Lines in the patched entry point

Run your April 2026 open-weights pick on a real Mac

Twenty-minute walkthrough: we swap in Qwen 3, Gemma 4, or DeepSeek V3.2 through ANTHROPIC_BASE_URL, push the session past its context window, and show the compact_boundary land live.

Book a call →

FAQ, from the April 2026 open-source rescoring on compaction

What were the major open-source LLM releases in April 2026?

The shortlist every recap is covering: Alibaba Qwen 3 on April 8 (Apache 2.0, dense and MoE variants, 128K context window), Google Gemma 4 across four variants under Apache 2.0 with 131K context, Mistral Medium 3 on April 9 with open weights, Meta Llama 4 Scout and Maverick with MoE architecture (Scout with a 10M context window, Maverick at 400B total parameters), DeepSeek V3.2 with frontier reasoning at 128K context, Zhipu GLM-5.1 under MIT at 200K context and 754B MoE, and Arcee Trinity as a 400B Apache 2.0 model for enterprise self-hosting. What none of the recaps cover: what happens when that context window fills during a real agent session.

Why does context window size stop mattering for a shipping Mac agent?

Because context size is a ceiling, not a throughput number. On Fazm every agent turn sends the live macOS accessibility tree as structured text (role, label, value, coordinates per element). One Slack window emits around 11 to 15K tokens of AX rows. A Finder window with a deep nested sidebar can push past 30K. On a 128K-window model like Qwen 3 or Gemma 4, turn eight or nine saturates the context even before you count the system prompt, the tool schemas, and the model's own assistant output. At that point spec-sheet size stops being the interesting number, and what becomes interesting is whether the session can auto-compact and survive.

What does 'compaction' actually mean inside Fazm?

It means the Claude Agent SDK emits a content_block_start event whose content_block.type is 'compaction', streams compaction_delta text chunks that summarize the earlier turns, and ends the stream with a system message whose subtype is 'compact_boundary' and whose compact_metadata.pre_tokens records how many tokens lived in the old context. Fazm's custom entry point at acp-bridge/src/patched-acp-entry.mjs lines 65 to 73 and 185 to 199 catches both of these classes of event. The unpatched ACP agent silently drops them. Fazm forwards them through acpClient.sessionUpdate so the UI can render 'compacting context…' and the user knows the agent is not hung, it is rebuilding its memory.

Where exactly in the Fazm source is this patched?

acp-bridge/src/patched-acp-entry.mjs at lines 20 and 21 overrides ClaudeAcpAgent.prototype.createSession. Lines 38 to 40 wrap session.query.next so every SDK iterator value passes through the patched forwarder. Line 65 through 73 catches the compact_boundary system subtype. Line 74 through 78 forwards status_change. Line 79 through 97 forwards task_started and task_notification. Line 98 through 125 forwards api_retry with the HTTP status and typed error category. Line 132 through 152 forwards rate_limit_event. Line 156 through 176 forwards tool_progress and tool_use_summary. Line 181 through 200 forwards the compaction stream chunks. Line 211 through 261 patches prompt() to attach cost and usage metadata. None of those events exist in the unpatched SDK's ACP bridge.

How does this change which April 2026 open-source LLM I should reach for?

It flips the evaluation from 'how big is the context window' to 'how well does the model respect a summarized transcript once the compaction has run'. Qwen 3 in thinking mode handles this cleanly because the thinking budget re-grounds the model on the summary before the next tool call. Gemma 4 is strong on short turn loops but tends to drift after compaction when tool schemas are re-presented. DeepSeek V3.2 handles long, post-compaction sessions well because its training mix over-indexes on multi-turn reasoning. GLM-5.1 has the largest native window (200K) so it compacts less often, which is a different optimization. Llama 4 Scout with 10M context almost never compacts, which looks like an advantage until you realize the session is now paying attention tax on hundreds of dead AX-tree turns.

Does Fazm actually show the user when compaction is running?

Yes. ChatProvider.swift line 372 declares @Published var isCompacting and line 374 declares @Published var compactingSessionKey. The onStatusEvent handler at lines 2592 through 2601 flips isCompacting to true when the bridge forwards a status_change with status 'compacting'. AIResponseView.swift line 310 reads state.isCompacting and line 314 renders 'compacting context…' in orange next to a spinner. The user sees the agent pause. They know it is not hung. The session continues without them hitting retry.

Does the compaction-forwarding logic care which model is driving the session?

No, and that is the point. The patched entry point sits above the model choice. It patches ClaudeAcpAgent, not the underlying LLM client. Swap in Qwen 3 32B through an Ollama proxy, DeepSeek V3.2 through Together or Fireworks, GLM-5.1 through Zhipu's API, Gemma 4 through llama.cpp behind an Anthropic-shape shim, Mistral Medium 3 through the Mistral API: every one of those setups emits the same compact_boundary events when the context fills, because the Claude Agent SDK runs the compaction loop itself. The patched entry point's only job is to stop the bridge from dropping those events on the floor.

What April 2026 release has the largest reported context window?

Meta Llama 4 Scout at 10 million tokens, dwarfing every other April 2026 release. GLM-5.1 claims 200K tokens. Qwen 3, Gemma 4, DeepSeek V3.2, and Mistral Medium 3 all land around 128K to 131K. Fazm's compaction forwarder applies to all of them because even the 10M window will eventually exceed the model's effective attention span, and in practice the Claude Agent SDK runs compaction well before the raw cap.

What other dropped events does the patched entry point rescue?

Seven classes. Compact_boundary (line 65-73). Status_change (line 74-78). Task_started (line 79-87). Task_notification (line 88-97). Api_retry (line 98-125), which carries the HTTP status code and typed error category so the UI can distinguish a 429 from a 500. Rate_limit_event (line 132-152), which carries status, reset timestamp, type, utilization, and overage info. Tool_progress (line 156-166) and tool_use_summary (line 167-176), which let the UI show which subtask is running and what it summarized. All seven are silently dropped by the stock ACP agent.

How did you verify this? How can a reader verify it?

Clone the Fazm app (or crack open the signed bundle's Resources/acp-bridge directory) and open acp-bridge/src/patched-acp-entry.mjs. The file is 268 lines. Grep for compact_boundary and you will find the forwarder at line 65. Grep for content_block_start and you will find the compaction stream handler at line 185. In the Swift side, grep the Desktop/Sources tree for isCompacting and you will find ChatProvider.swift:372 and the UI binding in ChatQueryLifecycle.swift lines 180 to 189. The string 'compacting context…' appears exactly once, at AIResponseView.swift:314. Every fact on this page is traceable to a line number in the shipping source tree.

Why do the April 2026 LLM recaps skip this entirely?

Because most recaps are written against benchmark leaderboards and model cards, not against a shipping agent. MMLU, AIME, HumanEval, BFCL, and SWE-Bench do not test multi-turn context exhaustion. They test a single-prompt answer. A consumer Mac agent is a twenty- to forty-turn loop where the accessibility tree is re-emitted every turn and the session has to survive multiple compaction boundaries. That regime was not in the benchmark the roundups were citing, so it is missing from the pick-the-right-model commentary.

Sibling guides

Keep reading: the rest of the April 2026 open-source picture

Sibling

Open-source LLM releases, April 2026: swap in DeepSeek-V3.2, Qwen 3.5, Llama 4, or Gemma 4 through one env var

The exact one-line env-var contract (ANTHROPIC_BASE_URL) that lets any April 2026 open-weights model drive the same AX-tree loop this page describes.

Read

Sibling

Open-source LLM news, April 2026: the tool-use leaderboard a shipping Mac agent actually grades on

The five MCP servers Fazm bundles into every session and why tool_use JSON reliability beats MMLU for a desktop agent.

Read

Sibling

Open-source LLM releases, April 2026, re-scored against the one input shape the leaderboards ignore

Why structured AX-tree text (not screenshots) is the payload a Mac agent hands every April 2026 release, and how that reranks the leaderboard.

Read

April 2026 open-source LLM releases, graded on the one metric the leaderboards skip: what happens when the context window fills

268 lines of JavaScript named patched-acp-entry.mjs is why your open-source LLM session survives past turn twenty

The patch, verbatim, from the shipping source tree

Every April 2026 release routes through the same event forwarder

April 2026 open-source models, one forwarder, five rescued event classes

The same session, two ACP entry points

Stock ACP entry vs. Fazm's patched entry, context window full at turn 19

What happens during turn 19, step by step

Turn 19: auto-compaction, live

Turn 18. The AX tree is 15K tokens. The running context is 110K.

Turn 19. The model's assistant output plus the next tool result crosses the threshold.

The compaction stream runs.

The floating bar renders 'compacting context…' in orange.

The compact_boundary system event lands.

Turn 20 continues with a fresh, summarized context.

The twelve-message call flow across five actors

What the dev log looks like when it fires

The nine event classes the stock ACP agent drops

The nine words the user sees when this fires

Spec-sheet view vs. shipping-agent view

Six April 2026 releases, re-graded on post-compaction behavior

Qwen 3 32B (thinking mode)

DeepSeek V3.2

GLM-5.1 (754B MoE)

Gemma 4 (27B)

Llama 4 Scout

Mistral Medium 3

Run your April 2026 open-weights pick on a real Mac

FAQ, from the April 2026 open-source rescoring on compaction

Keep reading: the rest of the April 2026 open-source picture

Open-source LLM releases, April 2026: swap in DeepSeek-V3.2, Qwen 3.5, Llama 4, or Gemma 4 through one env var

Open-source LLM news, April 2026: the tool-use leaderboard a shipping Mac agent actually grades on

Open-source LLM releases, April 2026, re-scored against the one input shape the leaderboards ignore

Comments (••)

268 lines of JavaScript named `patched-acp-entry.mjs` is why your open-source LLM session survives past turn twenty

Comments ()