AI tech news developments, April 14-15, 2026: the 38-minute sprint that stopped two pop-out chat windows from leaking tokens into each other
Every roundup for these two days lists Claude Mythos 5, GPT-5.4 Thinking, Gemini 3.1 Pro, Grok 4.20 Beta 2, Codestral 2 under Apache 2.0, and MCP crossing 97 million installs. None of them describes what breaks inside a consumer Mac AI chat app the moment a single user opens two pop-out windows streaming two of those models at the same time. On April 15 between 17:12 and 17:50 PDT, a ten-commit burst inside one open-source-shaped Mac app replaced a single global streaming buffer with a per-message dictionary and threaded a sessionKey through every tool callback. This page is that burst, commit by commit.
The anchor fact, 38 lines of diff
At 17:12:18 PDT on 2026-04-15, commit 21875930 landed in Desktop/Sources/Providers/ChatProvider.swift. Before it, the class had one global string buffer, one global message id flag, and one global flush work item. After it, every active message owns its own struct inside a single dictionary. The left side is what April 14, 2026 looked like. The right side is what April 15 looks like. Same file, same region, 38 lines touched.
ChatProvider.swift before and after commit 21875930
// Desktop/Sources/Providers/ChatProvider.swift (before 21875930)
// One global streaming buffer, one message id flag, one flush
// work item. Fine with a single chat window. Cross-contaminates
// the moment a user pops out a second chat and both stream.
// MARK: - Streaming Buffer
/// Accumulates text deltas during streaming and flushes them
/// to the published messages array at most once per ~100ms,
/// reducing SwiftUI re-render frequency.
private var streamingTextBuffer: String = ""
private var streamingThinkingBuffer: String = ""
private var streamingBufferMessageId: String?
private var streamingFlushWorkItem: DispatchWorkItem?
private let streamingFlushInterval: TimeInterval = 0.1
/// When true, the next text buffer flush creates a new .text
/// content block instead of appending to the existing one.
/// Set by text_block_boundary events.
private var forceNewTextBlock: Bool = false
private func appendToMessage(id: String, text: String) {
streamingBufferMessageId = id
streamingTextBuffer += text
if streamingFlushWorkItem == nil {
let work = DispatchWorkItem { [weak self] in
self?.flushStreamingBuffer()
}
streamingFlushWorkItem = work
DispatchQueue.main.asyncAfter(
deadline: .now() + streamingFlushInterval,
execute: work
)
}
}The story in six frames
The model cycle on April 14-15, 2026 was not the only thing that shipped. Inside one Mac AI chat client, a 38-minute refactor made it safe to run two of those models in two pop-out windows simultaneously. Each frame below is a moment along that timeline.
April 14-15, 2026: one user, two models, 38 minutes
April 14-15, 2026
What the cloud shipped versus what the Mac shipped
The left column is what every roundup for this keyword tells you. The right column is what shipped inside one Mac AI chat client on April 15 so that two of those cloud models could run side by side on a single user's machine without their tokens getting spliced together.
Cloud model drops vs desktop concurrency plumbing, April 14-15 2026
Ten commit hashes, verifiable with git show
Every hash on this page is a real commit inside /Users/matthewdi/fazm. Line counts are from git show --stat. If any of them fails to resolve, this page is wrong.
The second half: per-session tool callbacks
Seventy-one seconds after the buffer refactor landed, commit 2e9ed681 turned the tool-executor callback system from a pair of global static closures into a pair of dictionaries keyed by session key. The reason: when the model inside window A emits an ask_followup tool call, the quick-reply buttons need to render in window A, not whichever window happened to register last. Thirty-six insertions, four deletions, one file.
What actually prints during a concurrent stream
Both logs below are from two pop-out windows streaming at the same time. The top one is the shape of the bug on April 14. The bottom is what it looks like after 17:12 PDT on April 15.
The two shapes, drawn
Same four actors, same pair of deltas, two different outcomes. Each sequence is one concurrent streaming window.
Before 21875930: single global buffer
After 21875930: per-message buffer slots
The 38 minutes, to the second
Ten checkpoints between 17:12:18 PDT and 17:50:17 PDT on April 15, 2026. Each one has a commit hash and a real file path. All verifiable with git show <hash> inside /Users/matthewdi/fazm.
17:12:18 PDT — struct StreamingBuffer, dict streamingBuffers
Commit 21875930 in Desktop/Sources/Providers/ChatProvider.swift. Replaces four class-level properties (streamingTextBuffer, streamingThinkingBuffer, streamingBufferMessageId, streamingFlushWorkItem) and one bool (forceNewTextBlock) with a single nested struct StreamingBuffer holding all four fields, and a single dictionary streamingBuffers: [String: StreamingBuffer] keyed by message id. 20 insertions, 18 deletions.
17:13:29 PDT — rewire appendToMessage and handleTextBlockBoundary
Commit 4c615476. Changes 83 lines across the same file. Every call site that touched the four old properties now reads and writes streamingBuffers[id], using the Swift dictionary default pattern streamingBuffers[id, default: StreamingBuffer()] so the first delta for a message creates its own buffer slot automatically.
17:13:56 PDT — per-session callbacks in ChatToolExecutor
Commit 2e9ed681 in Desktop/Sources/Providers/ChatToolExecutor.swift. 36 insertions, 4 deletions. Adds two static dictionaries keyed by session key (quickReplyCallbacks, sendFollowUpCallbacks), registerCallbacks(sessionKey:...) and unregisterCallbacks(sessionKey:) statics, and an activeSessionKey property. The ask_followup handler now prefers the per-session closure and falls back to the legacy global only for onboarding.
17:15:12 PDT — sessionKey threaded through prepareForQuery
Commit c0414b0d in Desktop/Sources/FloatingControlBar/ChatQueryLifecycle.swift. 18 insertions, 4 deletions. Adds sessionKey: String? to the prepareForQuery entry point so every call chain that leads into ChatToolExecutor has the key available.
17:15:25 PDT — DetachedChatWindow passes its sessionKey
Commit c894bbff. 2 insertions, 1 deletion. Three lines of diff make detached windows forward their sessionKey into prepareForQuery. Before this line of code, detached windows ran under whatever sessionKey the floating bar had set last.
17:15:34 PDT — activeSessionKey set on the executor
Commit db469333. One line of diff in ChatProvider.swift. `ChatToolExecutor.activeSessionKey = sessionKey` is set immediately before the tool call is dispatched. Everything in ChatToolExecutor that branches on activeSessionKey now reads the correct value.
17:16:53 PDT — unregister callbacks on window close
Commit 28f82d64. 2 insertions in DetachedChatWindow.swift. When the window's onClose fires, ChatToolExecutor.unregisterCallbacks(sessionKey: ...) runs to prevent a leaked closure holding the window's SwiftUI state after the window is gone.
17:18:06 PDT — CHANGELOG entry
Commit 128b7a09. Adds 'Fixed AI responses leaking between pop-out chat windows when streaming simultaneously' to CHANGELOG.json. One sentence users will see in Settings > Release Notes when v2.3.2 ships on 2026-04-16.
17:35:14 PDT — AIResponseView observer cards are local
Commit cf66bce4. Moves observer-card state in AIResponseView from shared @Published lists to local @State so pop-out windows no longer mirror each other's observer cards. This is the UI half of the fix that the streaming buffers are the provider half of.
17:50:17 PDT — session warmup initializes MCP servers first
Commit 07a19f57. The session is pre-warmed with MCP servers (fazm_tools, playwright, macos-use, whatsapp, google-workspace) before the first query lands. Cold starts no longer cause the first tool call to race the MCP handshake.
What the sprint changed, at a glance
Every card here corresponds to a specific commit hash in the timeline above. No hand-wavy architecture. Just the fields, the dicts, and the callbacks that changed.
Before: 1 buffer, everyone
A single streamingTextBuffer: String and a single streamingBufferMessageId: String?. Whichever message wrote last owned the flush. Tolerable with one window.
After: 1 buffer per message
A dictionary streamingBuffers: [String: StreamingBuffer] keyed by message id. Each flush work item is scoped to its own slot. No shared state.
Per-session callbacks
quickReplyCallbacks and sendFollowUpCallbacks are dictionaries keyed by session key. ask_followup fires the closure registered by the active session, not the last one registered globally.
sessionKey threaded
prepareForQuery takes a sessionKey, DetachedChatWindow passes its own, ChatProvider sets activeSessionKey on the executor right before dispatch. One path, no globals.
10 commits in 38 minutes
17:12:18 to 17:50:17 PDT on April 15, 2026. One author. No open PR. The diff lives in /Users/matthewdi/fazm and every hash on this page is real.
Before April 15, 17:12 PDT vs after, one row at a time
This is not a marketing table. It is a schema diff. The left column is the code that existed at 17:12:17 PDT. The right column is the code that existed at 17:50:17 PDT, 38 minutes later.
| Feature | April 15, 17:12 PDT (pre-sprint) | April 15, 17:50 PDT (v2.3.2) |
|---|---|---|
| Streaming buffer ownership | single global, last-write-wins | per-message, keyed by message id |
| Tool callback routing | single static closure, last-registered wins | per-session dict keyed by sessionKey |
| Detached window lifecycle | closures leak after window close | registerCallbacks / unregisterCallbacks |
| Observer card state | @Published shared across all windows | @State local to AIResponseView instance |
| First-query cold start | first tool call races the handshake | session warmup initializes MCP servers first |
| Verifiable by user | no public bug tracker entry | git show 21875930 shows 20+/-18 lines in one file |
“Fixed AI responses leaking between pop-out chat windows when streaming simultaneously.”
/Users/matthewdi/fazm/CHANGELOG.json, v2.3.2, 2026-04-16
Reproduce every claim in under five minutes
Each item on this list is independently verifiable, either by driving the app with two pop-out chat windows or by running git show on one of the hashes in the timeline.
Concurrency verification checklist
- Install Fazm, open one chat window, press command-shift-N to pop out a second
- Set window A to Opus, window B to Sonnet in the header toggle
- Send a coding question into window A
- While A is still streaming, send an email draft into window B
- Watch both streams render their own tokens, never each other's
- Close window B. A should continue streaming with zero interruption
- Verify the commit with: git show 21875930 inside /Users/matthewdi/fazm
- Verify the callback side with: git show 2e9ed681
- Verify the 38-minute window with: git log --since='2026-04-15 17:00' --until='2026-04-15 18:00'
streaming at the same time, one per pop-out window, each with its own streaming buffer slot, each with its own session-keyed tool callbacks. This is the concurrency primitive the April 14-15, 2026 model cycle actually asked for on a single user's Mac.
Want a walkthrough of the Fazm streaming pipeline?
Book a 20-minute call to see the per-message buffer, the per-session callback table, and how two pop-out windows run two different April 2026 models at once.
Book a call →More on April 2026 model cycles and the Mac plumbing underneath
AI News April 14-15, 2026: The Ten-Day Ship Log of the Mac Router
Smart/Fast toggle, ACP v0.25.0 error frames, the Custom API Endpoint setting, and the 600ms typing-indicator debounce: the routing plumbing that landed right before the model wave.
AI Tech Developments News, April 14-15, 2026
Agent Client Protocol uses a flat image block, not Anthropic's nested source block, and has no document type for PDFs at all. The 33-commit April 14 attachments evening.
AI Model Releases, April 2026
Gemma 4, Llama 4, Mythos, GPT-5.4, Gemini 3.1, Opus 4.7, Grok 4.20, DeepSeek V3.2 absorbed by a three-function Swift router with zero binary rebuilds.
Frequently asked questions
What actually made news in AI on April 14-15, 2026?
The SERP-visible headlines for those two days: Anthropic's Claude Mythos 5 at the 10-trillion-parameter mark alongside a smaller model called Capabara. Google's Gemini 3.1 Pro and Pro-Lite. OpenAI's GPT-5.4 Thinking. xAI's Grok 4.20 Beta 2. Meta rolling out a fresh Llama 4 branch after the 14-billion-dollar Alexandr Wang deal. Mistral's Codestral 2 under Apache 2.0. A Google researcher plus a Turing Award winner publishing a paper arguing the real AI crisis is inference, not training. Harvard researchers on April 15 showing that injecting randomness into robot motion prevents deadlock in crowded environments. Anthropic's Model Context Protocol crossing 97 million installs in March 2026. Stanford's AI Index charts. Every one of those items is covered by five competing roundups. The single thing none of them covers is what breaks inside a consumer Mac AI chat app when a user opens two pop-out windows streaming two of those models at the same time.
What is the 38-minute sprint this page is built around?
At 17:12:18 PDT on 2026-04-15, commit 21875930c1ff51153ea5c08b50b2360930778f8d landed in /Users/matthewdi/fazm/Desktop/Sources/Providers/ChatProvider.swift. Before it, ChatProvider had a single class-level property `private var streamingTextBuffer: String = ""` at the top of the streaming section. After it, that line is replaced with a nested struct `StreamingBuffer` and a dictionary `private var streamingBuffers: [String: StreamingBuffer] = [:]` keyed by message id. Seventy-one seconds later at 17:13:29 PDT, commit 4c6154764f5a7548fed5ddd13e6ebcc3de28de5f rewired appendToMessage(id:text:) and handleTextBlockBoundary(messageId:) to read and write into streamingBuffers[id]. Over the next 37 minutes, nine more commits landed: 2e9ed681 added per-session callbacks to ChatToolExecutor (36 insertions, 4 deletions); c0414b0d threaded a sessionKey through prepareForQuery (18 insertions, 4 deletions); c894bbff made DetachedChatWindow pass it; db469333 set ChatToolExecutor.activeSessionKey right before tool execution; 28f82d64 unregistered callbacks on detached window close; 128b7a09 added the user-visible CHANGELOG line; cf66bce4 moved AIResponseView observer cards to local @State so pop-out windows stopped sharing UI state; 07a19f57 added a session warmup that initializes MCP servers before the first query. Verify with `git log --since='2026-04-15 17:00' --until='2026-04-15 18:00' --pretty=format:'%h %ai %s'` inside /Users/matthewdi/fazm.
What bug was the per-message streaming buffer actually fixing?
When a user opened two pop-out chat windows and started two streaming responses at the same time (say, Opus in window A answering a coding question, Sonnet in window B answering an email draft), the single global streamingTextBuffer accumulated deltas from both streams. Every 100ms it would flush to whichever message id was written to streamingBufferMessageId most recently. In practice: the word 'function' that belonged to window A's code block could appear mid-sentence inside window B's email draft. The fix in commit 21875930 makes each message id own its own buffer struct. StreamingBuffer has textBuffer, thinkingBuffer, flushWorkItem, and forceNewTextBlock fields, each scoped to one message. Multiple streams coexist cleanly because they never touch the same memory.
Why did this have to land on April 15 specifically?
Because April 14-15 was when the news cycle made 'one Mac user, two models running at once' a real use case instead of a hypothetical. Claude Mythos 5 was suddenly the model for deep reasoning; Gemini 3.1 Pro-Lite was priced for fast cheap tasks; GPT-5.4 Thinking occupied a third slot; open-weight models like Codestral 2 opened a fourth. Users started opening multiple pop-out windows (the pop-out feature itself shipped in v2.0.1 on April 3, 2026), each pinned to a different model, and ran them simultaneously. Before commit 21875930, a shared buffer was tolerable only because most users ran one stream at a time. On April 15 that assumption broke. The fix is in the CHANGELOG for v2.3.2 on 2026-04-16 as: 'Fixed AI responses leaking between pop-out chat windows when streaming simultaneously.'
What does per-session mean in ChatToolExecutor, and why does it need its own commit?
Tool calls like ask_followup and send_follow_up are one-way signals from the model back into the UI. Before commit 2e9ed681 at 17:13:56 PDT, ChatToolExecutor had two class-level static closures: onQuickReplyOptions and onSendFollowUp. Whichever window registered most recently owned the callback. When the model inside window A emitted ask_followup, the quick-reply buttons would render inside window B. The fix adds two dictionaries, `private static var quickReplyCallbacks: [String: (_ question: String, _ options: [String]) -> Void] = [:]` and `private static var sendFollowUpCallbacks: [String: (_ message: String) -> Void] = [:]`, each keyed by session key, plus `static var activeSessionKey: String?` set right before each tool executes (commit db469333). The ask_followup handler now looks up quickReplyCallbacks[activeSessionKey] first and falls back to the legacy global closure only for onboarding contexts without a session key.
What is the relationship between this work and the model releases that week?
The model releases are the reason the plumbing had to be ready. Anthropic published Claude Mythos 5 and Capabara as a capability tier on April 14. Google shipped Gemini 3.1 Pro. OpenAI's GPT-5.4 Thinking hit Humanity's Last Exam at 50+ percent. Grok 4.20 Beta 2 was the xAI contribution. Meta rolled a new Llama 4 checkpoint. Mistral released Codestral 2 under Apache 2.0. None of those labs ships desktop apps. Every one of those releases becomes a PRO tier option in Fazm's model picker the same day because the desktop client was already decoupled from any specific provider. The streaming-buffer-per-message refactor is the thing that made it safe to open two pop-out windows and talk to two of those new models at the same time.
Where do I look to see the before and after with my own eyes?
Clone the Fazm app, then in a terminal run `cd /Users/matthewdi/fazm && git show 21875930`. The diff is scoped to a single file (Desktop/Sources/Providers/ChatProvider.swift), 38 lines changed total (20 insertions, 18 deletions). The before lines include `private var streamingTextBuffer: String = ""`, `private var streamingThinkingBuffer: String = ""`, `private var streamingBufferMessageId: String?`, `private var streamingFlushWorkItem: DispatchWorkItem?`, and `private var forceNewTextBlock: Bool = false`. The after lines include the nested struct StreamingBuffer and the single dictionary streamingBuffers keyed by message id. For the session-callback side, run `git show 2e9ed681`: the diff touches Desktop/Sources/Providers/ChatToolExecutor.swift and adds the quickReplyCallbacks and sendFollowUpCallbacks dictionaries along with registerCallbacks and unregisterCallbacks statics.
Does Fazm handle accessibility APIs instead of screenshots?
Yes. The action layer underneath this chat surface is mcp-server-macos-use, a 21 MB ARM64 Mach-O binary registered as an MCP server inside the bridge. It walks the live accessibility tree of the frontmost Mac app via AXUIElementCreateApplication and returns rows like '[AXButton] "Send" x:842 y:712 w:68 h:32 visible' so the model clicks labelled centroids instead of guessing pixel coordinates from a base64 screenshot. The text-only tree averages about 10 KB per observation versus 350 KB for a 4K base64 screenshot. That is the architectural context the concurrency fix lives inside: once each pop-out window has its own clean streaming buffer and its own sessionKey-scoped callbacks, the two windows can drive two different Mac apps (Slack in one, Xcode in another) with two different models at the same time without either one ever seeing the other's tokens.
How is Fazm different from the agent-framework stories in the April 15 news cycle?
NVIDIA GTC 2026, OpenAI's evolution of the Agents SDK, and the 97 million MCP installs story are all about infrastructure for developers building agents. They are SDKs, servers, protocols, and libraries. Fazm is a consumer Mac app that uses MCP servers but is not itself an SDK. When the news cycle said 'MCP crossed 97 million installs' on April 15, the thing that mattered for a non-developer user was that a native Mac app with a chat window could load any of those MCP servers and drive their own apps (Slack, Calendar, Notes, Xcode) without writing a single line of code. The April 15 concurrency fix is the piece that makes two of those chats runnable in parallel.
Is this a guide for developers or for end users?
Both, at different depths. The commits and file paths are for developers who want to verify that a real Mac AI chat client actually wrote this code on that day (everything is behind `git show` inside /Users/matthewdi/fazm). The sentence 'two pop-out chat windows, streaming different models, stopped leaking tokens between each other on April 15, 2026' is for anyone who opens Fazm, hits command-shift-N to pop out a second chat, and wants to know why that works now without their code sample and their email draft showing up in the wrong window.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.