Open-source LLM release news, April 2026Context-window exhaustionpatched-acp-entry.mjs

April 2026 open-source LLM releases, graded on the one metric the leaderboards skip: what happens when the context window fills

Qwen 3 at 128K. Gemma 4 at 131K. DeepSeek V3.2 at 128K. GLM-5.1 at 200K. Llama 4 Scout at 10M. Every April 2026 recap prints those numbers. None of them show what a shipping Mac agent actually does when the window fills mid-session. Fazm does something specific, in a file, at an exact line, that keeps the session alive.

F
Fazm
14 min read
4.9from 200+
Every April 2026 fact traced to a line in the Fazm source tree
268-line patched Claude Agent SDK entry point ships in every signed release
Works with Qwen 3, Gemma 4, DeepSeek V3.2, GLM-5.1, Llama 4, Mistral Medium 3
Qwen 3 128KGemma 4 131KDeepSeek V3.2 128KGLM-5.1 200K MoELlama 4 Scout 10MLlama 4 Maverick 400BMistral Medium 3Arcee Trinity 400BQwen3-Coder-NextCodestral 2OLMo 2Apache 2.0MIT licenseOpen weightsMixture of ExpertsLong context

Anchor fact, verifiable

268 lines of JavaScript named patched-acp-entry.mjs is why your open-source LLM session survives past turn twenty

Fazm does not use the stock @agentclientprotocol/claude-agent-acp entry point. Instead, it registers a custom Node entry at acp-bridge/src/patched-acp-entry.mjs that monkey-patches ClaudeAcpAgent.prototype.createSession (line 20 to 21), wraps session.query.next (line 38 to 40), and forwards seven classes of system event the stock agent silently drops, including the compact_boundary event at line 65 through 73 and the compaction stream chunks at line 185 through 199.

Plug any April 2026 open-source LLM into Fazm through the Custom API Endpoint setting. The same file keeps the session alive whenever the Claude Agent SDK runs auto-compaction, because the patch sits above the model choice, not below it.

0

Lines in patched-acp-entry.mjs, the file that forwards dropped SDK events

0

Event classes rescued, including compact_boundary and compaction_delta

0

Pre-compaction tokens logged at a real Fazm session boundary

0

The line in AIResponseView.swift where the user sees 'compacting context…'

The patch, verbatim, from the shipping source tree

This is the first block, lines 20 to 73, copied straight out of the file. The comments are pointing out the specific shape of the system message that gets dropped by the unpatched SDK. Forwarding this event is the single act that keeps an open-source LLM session alive past the context window.

acp-bridge/src/patched-acp-entry.mjs

And this is the second block, lines 181 to 200. It handles the stream chunks the SDK emits while the compaction summary is being written.

acp-bridge/src/patched-acp-entry.mjs (continued)

Every April 2026 release routes through the same event forwarder

The patch is above the model choice. Whatever open-weights SKU you point Fazm at through the Custom API Endpoint, its session emits the same compact_boundary and rate_limit events, and patched-acp-entry.mjs catches all of them.

April 2026 open-source models, one forwarder, five rescued event classes

Qwen 3 32B
Gemma 4 27B
DeepSeek V3.2
GLM-5.1 MoE
Llama 4 Scout
patched-acp-entry.mjs
compact_boundary
compaction_start
status_change
rate_limit
api_retry

The same session, two ACP entry points

Toggle between the stock Claude Agent SDK entry point and Fazm's patched one. The model, the prompt, the tools, and the AX-tree payload are identical. Only the forwarder changes.

Stock ACP entry vs. Fazm's patched entry, context window full at turn 19

The Claude Agent SDK emits a stream_event with content_block.type 'compaction'. The unpatched ACP agent filters that type out before the ACP transport. The same is true for the compact_boundary system subtype, the rate_limit_event, api_retry, task_started, task_notification, and tool_progress messages. None of them become sessionUpdate calls. The UI has no way to distinguish a slow turn from a dead session. The floating bar shows a spinner. Nothing else happens.

  • compact_boundary event silently dropped
  • compaction_start and compaction_delta never forwarded
  • no 'compacting context…' status visible to the user
  • rate_limit_event dropped, no reset timestamp surfaced
  • api_retry dropped, the user cannot tell 429 from 500

What happens during turn 19, step by step

One real cross-app task on Qwen 3 32B. Slack to Notes to Xcode. Eighteen turns of AX-tree tool calls. The nineteenth would have blown the 128K window. Here is what Fazm does instead.

Turn 19: auto-compaction, live

1

Turn 18. The AX tree is 15K tokens. The running context is 110K.

Every turn Fazm traverses the frontmost app, flattens the AX tree into rows, and feeds the structured text to the model as the tool result. By turn 18 of a real cross-app task (Slack to Notes to Xcode), the running context is 110K tokens on a 128K-window model.

2

Turn 19. The model's assistant output plus the next tool result crosses the threshold.

The Claude Agent SDK's internal budget crosses its auto-compaction threshold. The SDK emits a stream_event whose content_block.type is 'compaction'. The stock ACP agent drops it. Fazm's patched-acp-entry.mjs line 185 catches it.

3

The compaction stream runs.

The SDK summarizes the prior 18 turns into a compact narrative. Each chunk of the summary arrives as a content_block_delta.delta.type 'compaction_delta'. Line 191 forwards each chunk as a sessionUpdate. The bridge converts 'compacting' status into an @Published isCompacting = true on ChatProvider.

4

The floating bar renders 'compacting context…' in orange.

AIResponseView.swift line 310 reads state.isCompacting. Line 314 renders the text with a 0.6-scale spinner. The user sees their agent did not hang. It is actively rebuilding memory. The whole pause is usually 4 to 20 seconds depending on model.

5

The compact_boundary system event lands.

Line 65 of the patched entry catches it. It carries compact_metadata.trigger ('auto') and compact_metadata.pre_tokens (127318). Fazm logs it and forwards it so downstream analytics and tests can confirm the boundary was respected.

6

Turn 20 continues with a fresh, summarized context.

The model gets the summary as its new system context and the current turn's AX-tree text as user input. It emits a tool_use for the next click. The session is now on turn 20 of 40, not dead at 19.

The twelve-message call flow across five actors

One prompt enters at the top. Twelve internal messages flow across the floating bar, ChatProvider, the ACP bridge, the patched entry point, and whichever open-source LLM the user chose. Notice how many of these messages would be invisible with the stock ACP agent.

User prompt to post-compaction tool call, Qwen 3 32B via ANTHROPIC_BASE_URL

Floating barChatProviderACP bridgepatched-acp-entryQwen 3 (proxy)user turn 20 of 40session/promptquery.next()POST /v1/messages (128K ctx full)content_block_start type=compactionsessionUpdate compaction_startstatus_change compacting'compacting context…' renderedcompact_boundary pre_tokens=127318sessionUpdate compact_boundaryisCompacting = falsetool_use: macos-use.click

Every event in this diagram is either (a) typed in the stock SDK but dropped before the ACP transport, or (b) explicitly forwarded by patched-acp-entry.mjs. Without the patch, the user sees the first three messages and the last one. Everything in the middle is silent.

What the dev log looks like when it fires

Every line here is a verbatim substring you can grep out of /tmp/fazm-dev.log after a long session on a 128K-window model. The 'Compact boundary' line is the output of index.ts line 2380. The 'Context compaction started' line is ChatProvider.swift line 2596.

fazm-dev.log, one real compaction

The nine event classes the stock ACP agent drops

Every one of these is typed in the Claude Agent SDK, emitted by the upstream client, and then filtered out by the stock ACP transport. Fazm rescues all nine inside one 268-line file.

Events the patched entry rescues

  • compact_boundary with compact_metadata.trigger and pre_tokens (line 65-73)
  • status_change carrying the live agent status (line 74-78)
  • task_started with task_id and description (line 79-87)
  • task_notification with status + summary per subtask (line 88-97)
  • api_retry with HTTP status code and typed error category (line 98-125)
  • rate_limit_event with reset timestamp, utilization, overage status (line 132-152)
  • tool_progress with elapsed_time_seconds per running tool (line 156-166)
  • tool_use_summary with preceding_tool_use_ids (line 167-176)
  • stream_event compaction_start and compaction_delta (line 181-200)
9

Each of these would be invisible to the user on any stock Claude Agent SDK ACP setup. Fazm forwards all of them through a single 268-line entry point.

acp-bridge/src/patched-acp-entry.mjs (lines 65, 74, 79, 88, 98, 132, 156, 167, 181)

The nine words the user sees when this fires

The entire UI binding is seven lines of SwiftUI. The text literal only appears at this one path. If your agent session ever rescues itself from a context-window overflow on Fazm, this is the line of Swift that renders the fact to you.

Desktop/Sources/FloatingControlBar/AIResponseView.swift

Spec-sheet view vs. shipping-agent view

Every April 2026 open-source LLM recap grades releases on context-window size. A shipping Mac agent grades on what happens when that window fills. Here are the eight dimensions that flip.

FeatureStock SDK or typical recapFazm (patched ACP entry)
Spec-sheet context windowCited as the headline number in every recapTreated as a ceiling, not a session budget
When the window fillsStock agent typically returns error or drops the sessioncompact_boundary + compaction stream auto-summarize
What the user sees during compactionOften nothing, session appears hung'compacting context…' in orange at AIResponseView.swift:314
Who initiates the compactionVaries, usually no one, the session just diesClaude Agent SDK internals, forwarded via patched entry
Model dependencyWould be tied to one model's context-management trickPatch sits above the model choice, works with all April 2026 SKUs
Analytics surfaceNo event emittedcompact_metadata.pre_tokens logged and forwarded
Failure signalSilent drop or generic timeoutapi_retry events forwarded with HTTP status + typed error
Rate-limit handlingNot exposedrate_limit events with resetsAt, utilization, overageStatus

Six April 2026 releases, re-graded on post-compaction behavior

Not graded on MMLU. Not graded on SWE-Bench. Graded on whether turn 20 (after a compact_boundary) looks like turn 5 on the same agent task, and how cleanly the model re-reads the summarized transcript before emitting its next tool_use.

Qwen 3 32B (thinking mode)

128K context window, Apache 2.0, dual thinking and fast modes. Cleanest behavior across compaction boundaries: the thinking budget re-grounds the model on the summary before the next tool call, so turn 20 post-compaction is often indistinguishable from turn 5.

DeepSeek V3.2

128K context, MIT license, frontier reasoning. Handles long post-compaction sessions without drift. Strong pick behind an Anthropic-protocol shim for paid hosted access at much lower cost than frontier cloud.

GLM-5.1 (754B MoE)

200K context, MIT license, beats several proprietary models on SWE-Bench Pro. Largest native window of the 128K-class April releases, so it compacts less often. Server-class hardware to self-host, so it is a hosted-provider story for most users.

Gemma 4 (27B)

131K context, Apache 2.0, multimodal (Fazm ignores the image head). Strong on short turn loops. After compaction tool schemas sometimes need a nudge, so pair with a strict tool-call validator in the Anthropic shim.

Llama 4 Scout

10M context (MoE), controlled license. Almost never compacts inside a typical Fazm session, which sounds great until you realize the attention tax on dead turns grows quadratically. Good for deep retrieval, less good for long tool loops.

Mistral Medium 3

Open weights, April 9 release. 128K context, multilingual label handling. Reliable through compaction on straightforward flows, needs careful shim when the tool-call JSON schema is dense (many MCP servers bundled).

0Dropped system event classes forwarded
0-40Turns per session without forced restart
0+Tokens of AX-tree per turn
0Lines in the patched entry point

Run your April 2026 open-weights pick on a real Mac

Twenty-minute walkthrough: we swap in Qwen 3, Gemma 4, or DeepSeek V3.2 through ANTHROPIC_BASE_URL, push the session past its context window, and show the compact_boundary land live.

Book a call

FAQ, from the April 2026 open-source rescoring on compaction

What were the major open-source LLM releases in April 2026?

The shortlist every recap is covering: Alibaba Qwen 3 on April 8 (Apache 2.0, dense and MoE variants, 128K context window), Google Gemma 4 across four variants under Apache 2.0 with 131K context, Mistral Medium 3 on April 9 with open weights, Meta Llama 4 Scout and Maverick with MoE architecture (Scout with a 10M context window, Maverick at 400B total parameters), DeepSeek V3.2 with frontier reasoning at 128K context, Zhipu GLM-5.1 under MIT at 200K context and 754B MoE, and Arcee Trinity as a 400B Apache 2.0 model for enterprise self-hosting. What none of the recaps cover: what happens when that context window fills during a real agent session.

Why does context window size stop mattering for a shipping Mac agent?

Because context size is a ceiling, not a throughput number. On Fazm every agent turn sends the live macOS accessibility tree as structured text (role, label, value, coordinates per element). One Slack window emits around 11 to 15K tokens of AX rows. A Finder window with a deep nested sidebar can push past 30K. On a 128K-window model like Qwen 3 or Gemma 4, turn eight or nine saturates the context even before you count the system prompt, the tool schemas, and the model's own assistant output. At that point spec-sheet size stops being the interesting number, and what becomes interesting is whether the session can auto-compact and survive.

What does 'compaction' actually mean inside Fazm?

It means the Claude Agent SDK emits a content_block_start event whose content_block.type is 'compaction', streams compaction_delta text chunks that summarize the earlier turns, and ends the stream with a system message whose subtype is 'compact_boundary' and whose compact_metadata.pre_tokens records how many tokens lived in the old context. Fazm's custom entry point at acp-bridge/src/patched-acp-entry.mjs lines 65 to 73 and 185 to 199 catches both of these classes of event. The unpatched ACP agent silently drops them. Fazm forwards them through acpClient.sessionUpdate so the UI can render 'compacting context…' and the user knows the agent is not hung, it is rebuilding its memory.

Where exactly in the Fazm source is this patched?

acp-bridge/src/patched-acp-entry.mjs at lines 20 and 21 overrides ClaudeAcpAgent.prototype.createSession. Lines 38 to 40 wrap session.query.next so every SDK iterator value passes through the patched forwarder. Line 65 through 73 catches the compact_boundary system subtype. Line 74 through 78 forwards status_change. Line 79 through 97 forwards task_started and task_notification. Line 98 through 125 forwards api_retry with the HTTP status and typed error category. Line 132 through 152 forwards rate_limit_event. Line 156 through 176 forwards tool_progress and tool_use_summary. Line 181 through 200 forwards the compaction stream chunks. Line 211 through 261 patches prompt() to attach cost and usage metadata. None of those events exist in the unpatched SDK's ACP bridge.

How does this change which April 2026 open-source LLM I should reach for?

It flips the evaluation from 'how big is the context window' to 'how well does the model respect a summarized transcript once the compaction has run'. Qwen 3 in thinking mode handles this cleanly because the thinking budget re-grounds the model on the summary before the next tool call. Gemma 4 is strong on short turn loops but tends to drift after compaction when tool schemas are re-presented. DeepSeek V3.2 handles long, post-compaction sessions well because its training mix over-indexes on multi-turn reasoning. GLM-5.1 has the largest native window (200K) so it compacts less often, which is a different optimization. Llama 4 Scout with 10M context almost never compacts, which looks like an advantage until you realize the session is now paying attention tax on hundreds of dead AX-tree turns.

Does Fazm actually show the user when compaction is running?

Yes. ChatProvider.swift line 372 declares @Published var isCompacting and line 374 declares @Published var compactingSessionKey. The onStatusEvent handler at lines 2592 through 2601 flips isCompacting to true when the bridge forwards a status_change with status 'compacting'. AIResponseView.swift line 310 reads state.isCompacting and line 314 renders 'compacting context…' in orange next to a spinner. The user sees the agent pause. They know it is not hung. The session continues without them hitting retry.

Does the compaction-forwarding logic care which model is driving the session?

No, and that is the point. The patched entry point sits above the model choice. It patches ClaudeAcpAgent, not the underlying LLM client. Swap in Qwen 3 32B through an Ollama proxy, DeepSeek V3.2 through Together or Fireworks, GLM-5.1 through Zhipu's API, Gemma 4 through llama.cpp behind an Anthropic-shape shim, Mistral Medium 3 through the Mistral API: every one of those setups emits the same compact_boundary events when the context fills, because the Claude Agent SDK runs the compaction loop itself. The patched entry point's only job is to stop the bridge from dropping those events on the floor.

What April 2026 release has the largest reported context window?

Meta Llama 4 Scout at 10 million tokens, dwarfing every other April 2026 release. GLM-5.1 claims 200K tokens. Qwen 3, Gemma 4, DeepSeek V3.2, and Mistral Medium 3 all land around 128K to 131K. Fazm's compaction forwarder applies to all of them because even the 10M window will eventually exceed the model's effective attention span, and in practice the Claude Agent SDK runs compaction well before the raw cap.

What other dropped events does the patched entry point rescue?

Seven classes. Compact_boundary (line 65-73). Status_change (line 74-78). Task_started (line 79-87). Task_notification (line 88-97). Api_retry (line 98-125), which carries the HTTP status code and typed error category so the UI can distinguish a 429 from a 500. Rate_limit_event (line 132-152), which carries status, reset timestamp, type, utilization, and overage info. Tool_progress (line 156-166) and tool_use_summary (line 167-176), which let the UI show which subtask is running and what it summarized. All seven are silently dropped by the stock ACP agent.

How did you verify this? How can a reader verify it?

Clone the Fazm app (or crack open the signed bundle's Resources/acp-bridge directory) and open acp-bridge/src/patched-acp-entry.mjs. The file is 268 lines. Grep for compact_boundary and you will find the forwarder at line 65. Grep for content_block_start and you will find the compaction stream handler at line 185. In the Swift side, grep the Desktop/Sources tree for isCompacting and you will find ChatProvider.swift:372 and the UI binding in ChatQueryLifecycle.swift lines 180 to 189. The string 'compacting context…' appears exactly once, at AIResponseView.swift:314. Every fact on this page is traceable to a line number in the shipping source tree.

Why do the April 2026 LLM recaps skip this entirely?

Because most recaps are written against benchmark leaderboards and model cards, not against a shipping agent. MMLU, AIME, HumanEval, BFCL, and SWE-Bench do not test multi-turn context exhaustion. They test a single-prompt answer. A consumer Mac agent is a twenty- to forty-turn loop where the accessibility tree is re-emitted every turn and the session has to survive multiple compaction boundaries. That regime was not in the benchmark the roundups were citing, so it is missing from the pick-the-right-model commentary.

fazm.AI Computer Agent for macOS
© 2026 fazm. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.