Claude Sonnet 4.6, in production, twice per user

Anthropic’s latest Claude model, running twice in the background of your Mac

Every article about the April 2026 Claude release benchmarks the model. This one is about what the model does when you run two copies of it per user, one in the foreground and one silently writing memory files. Three parallel sessions, one batch size of 10, one stripped-down tool surface, no UI thread.

Matthew Diakonov, Fazm, founder

Published April 22, 202611 min read

4.9from Shipping on macOS since early 2026

Sonnet 4.6 on both sessions

10 turn pairs per observer batch

Observer tool surface: fazm_tools only

Skill drafts on 3+ repeats

The second Claude

Running in parallel, quietly

You type a question. Sonnet 4.6 answers it.

A second Sonnet 4.6 is listening.

Every 10 turn pairs, it wakes up.

It reads your memory, writes what's new, goes back to sleep.

You see one card. You keep working.

0:00 / 0:05

The release, briefly

Anthropic’s current lineup as of April 22, 2026 is Claude Sonnet 4.6 as the everyday default, Claude Opus 4.7 as the new GA ceiling (same pricing as Opus 4.6 at $5 input and $25 output per million tokens), and Claude Mythos (codename Capybara) behind Project Glasswing as a gated preview. There is also Claude Design inside Claude Apps, but that is a product on top of the existing models, not a new model.

Every other guide on this topic stops there, or compares 4.7 against 4.6 on benchmarks. The more interesting question for anyone building with Claude this month is how you actually use the latest model when a single user turn is not expensive enough to justify thinking about carefully. The answer inside Fazm is: use it twice.

The anchor fact

Fazm warms 0 parallel Claude sessions on boot

Not one. Three. main, floating, and observer. All three are Claude Sonnet 4.6. The first two answer the user. The third watches the first two and writes down what matters, in batches of exactly 10 turn pairs. That is the whole architecture. Everything else in this guide is just details.

Three sessions, one warmup call

The source lives at Desktop/Sources/Providers/ChatProvider.swift, around lines 1047 to 1051. It is a single call into the ACP bridge with an array of three session configs. The bridge spins up three Claude Sonnet 4.6 sessions in parallel, each with a different system prompt and a different MCP tool list.

ChatProvider.swift

The main and floating sessions resume if they have a saved ACP session ID, so past conversations survive app restarts. The observer does not resume. Every boot it starts fresh, because its job is to read persistent memory on disk and write to it, not to carry a conversation across launches.

What actually flows into the observer

The observer is not a passive listener on the ACP socket. Every time the main or floating session finishes a prompt, two explicit pushes happen. One for the user turn, one for the assistant turn. That is how the buffer fills up.

Turn fan-in to the observer

The batch is exactly 10 turn pairs

Here is the entire batching logic from acp-bridge/src/index.ts, around line 1153. A single buffer, a single constant, a single guard flag. 33 lines total.

acp-bridge/src/index.ts

The chatObserverRunning flag is the whole concurrency story. The observer is strictly single-threaded. If a new batch would fire while the previous one is still in Claude, the buffer keeps accumulating and a setTimeout at the end of the run triggers a re-flush after 1 second. No queue, no retries, no dead-letter box. If Anthropic returns an error, the batch is lost and the next 10 pairs go through.

0 pairsObserver batch size

0Parallel Claude sessions

0Observer-exclusive MCP servers

0+Workflow repeats before skill draft

The observer’s entire tool surface, in one early return

The main chat can drive a browser, automate the Mac via accessibility APIs, and send WhatsApp messages. The observer cannot do any of that. Its tool budget is exactly one MCP server, fazm_tools, which is a stdio subprocess that reaches back into the Swift app over a Unix socket. The cutoff happens at line 1023 of the bridge.

acp-bridge/src/index.ts

The consequence is that a second Claude Sonnet 4.6 session cannot click on anything, cannot take a screenshot, cannot open a browser tab, and cannot read another app’s accessibility tree. It can read and write files in the user’s memory directory, query the local SQLite database with execute_sql, and call save_observer_card. That is deliberate. The observer is allowed to think, not to act.

The full cycle, step by step

From warmup to the user seeing a card, here is what happens when you use Fazm for 20 minutes.

A 20-minute chat session, from the observer's point of view

Fazm boots and calls warmupSession with three session configs

ChatProvider.swift (lines 1047-1051) hands the ACP bridge a single warmup call containing the main chat session, the floating bar session, and the observer session. All three are Claude Sonnet 4.6 by default. All three get different system prompts.

The bridge builds three MCP tool surfaces, not one

buildMcpServers in acp-bridge/src/index.ts branches on sessionKey. The observer gets fazm_tools only and returns early at line 1023. The main and floating sessions also get Playwright, macos-use, and WhatsApp.

Every user turn is echoed to the observer buffer

After each main or floating prompt completes, lines 1748 to 1752 call bufferChatObserverTurn('user', ...) and bufferChatObserverTurn('assistant', ...) with the verbatim text. The observer is not a passive listener; the turns are explicitly pushed onto its queue.

The batch flushes at exactly 10 turn pairs

CHAT_OBSERVER_BATCH_SIZE is a constant at line 1156. When Math.floor(buffer.length / 2) hits 10, flushChatObserverBatch constructs a single prompt with the batch text and fires it at the observer session. Only one observer run is allowed in flight at a time; the chatObserverRunning flag prevents overlap.

The observer reads MEMORY.md, writes new topic files, and surfaces cards

Per the ChatPrompts.swift system prompt (lines 566-607), the observer reads the memory index first, decides what is genuinely new, writes individual topic files, updates MEMORY.md, and calls save_observer_card for every observation the user should see. On a 3+ repeat, it drafts a skill at ~/.claude/skills/{name}/SKILL.md.

New cards poll back into the Swift UI

At line 1222 the bridge sends an observer_poll event. The Swift side queries observer_activity for pending cards and renders them in the floating bar with a pulsing dot (AIResponseView.swift line 833). The user can act, ignore, or undo each one.

A single flush, in call order

Zooming into one flush after the 10th turn pair, the sequence between the Swift app, the ACP bridge, and the two Claude sessions looks like this.

observer flush, t=0 to t≈15s

At a glance

Two models, one mic

Both the user-facing chat and the background observer run Claude Sonnet 4.6. The only thing that changes is the system prompt and the tool list. Same API. Same token economy. Different jobs.

10 turn pairs per flush

CHAT_OBSERVER_BATCH_SIZE = 10. Each batch is 20 messages. The observer runs roughly once every 10 to 15 minutes of active chatting, not once per turn.

Observer tool budget

fazm_tools only. No browser. No macos-use. No WhatsApp. The observer cannot click, type, or take a screenshot. It can read, write, query SQL, and save cards.

One-at-a-time gate

chatObserverRunning prevents overlapping observer runs. New turns buffer silently until the current run finishes, then the next batch fires.

Skill drafting on 3+ repeat

The system prompt instructs the observer to draft a SKILL.md file when it sees the same workflow three or more times. One memory plus one card per observation.

Why not one session with a longer system prompt?

This is the most common design question. If Sonnet 4.6 is smart enough, why not tell the main session to also save memories and draft skills on the side? Three reasons the split wins in practice.

Context bleed. Asking the same session to both answer the user and reflect on memory management makes every reply 20 to 40 percent longer than it needs to be. The user sees the reflection. The reflection muddles the answer. The model tries to do both jobs inside a single reply, and both suffer.

Different tool trust. The main session needs a browser, accessibility APIs, and the ability to run Bash. Handing the observer those same tools means a bad memory prompt could trigger a real browser action. Stripping the observer to fazm_tools only is a safety boundary, not just a cleanup.

Different rhythm. The main session fires on every turn. The observer fires once per 10 turn pairs. Splitting them lets each run at its natural rate without a scheduler inside a single prompt.

20 messages

“The observer never wakes up before the 10th turn pair has closed.”

acp-bridge/src/index.ts line 1156, CHAT_OBSERVER_BATCH_SIZE = 10

Watching a batch fire in the log file

Fazm writes to /tmp/fazm-dev.log on dev builds. If you tail it while using the app, every observer flush shows up in roughly the same shape. Nothing here is invented. The log lines come straight from logErr calls inside flushChatObserverBatch and its per-session notification handler.

/tmp/fazm-dev.log (trimmed)

What you can take from this if you build with Claude

You do not have to ship a desktop app to use the pattern. If you are building anything conversational on top of Claude Sonnet 4.6 this month, here is the transferable part.

A second session with a smaller tool list is cheaper than a bigger system prompt on the main session.
Batching every N turn pairs (start at 10) keeps the background pass from dominating your token bill.
A single in-flight guard flag is enough concurrency control if the work is idempotent. You do not need a queue.
If the background session is meant to remember, give it a file-system memory tool and let the main session stay clean.
If the background session is meant to detect repetition, tell it to look for 3+ occurrences before acting. The 3-occurrence threshold is oddly robust.

The short version of all of this

Anthropic’s latest Claude model is Sonnet 4.6 for everyday use and Opus 4.7 as the new GA ceiling. Fazm runs two Sonnet 4.6 sessions per user. One answers the question. The other batches every 10 turn pairs and writes down what matters. Same model, different jobs, different tool budgets. That is it.

Watch the timeline play out

Turn 1

You ask Fazm to 'pull up the CRM notes on Acme'. Main session fires, reply streams. Observer buffer: 2 messages.

claude-sonnet-4-6session: mainsession: floatingsession: observerCHAT_OBSERVER_BATCH_SIZE = 10fazm_toolssave_observer_cardACP bridge

Want to see the two-session architecture running live?

Book 20 minutes and we will screenshare a Fazm session with the observer log tailing next to it, so you can watch the batches fire in real time.

Book a call →

Questions

What is Anthropic's latest Claude model in April 2026?

For general availability, the two current releases are Claude Sonnet 4.6 (the everyday default, fast and cheap enough to run continuously) and Claude Opus 4.7 (the stronger, pricier ceiling, GA since April 22, 2026 at $5 input / $25 output per million tokens). A third release, Claude Mythos (codename Capybara), was unveiled behind Project Glasswing and is not on any public plan. Inside Fazm, the main chat, the floating bar, and the background observer all run against claude-sonnet-4-6 by default, with Opus 4.7 available per-turn in the Smart slot of the model picker.

Why does Fazm warm up three Claude sessions at startup instead of one?

Because a consumer app cannot afford to ask the model to do two jobs at once. Fazm calls acpBridge.warmupSession in Desktop/Sources/Providers/ChatProvider.swift at lines 1047 to 1051 with three session configs: main, floating, and observer. Each one is its own Claude session with its own system prompt. The main session handles long-form chat in the docked window. The floating session drives the always-on-top command bar. The observer session runs silently, watches conversation turns flow through the other two, and persists what matters to a user-local memory directory without ever touching the UI.

What exactly is the Chat Observer?

The Chat Observer is a second Claude Sonnet 4.6 session that Fazm spawns in parallel with the user-facing chat. Every user turn and assistant turn flowing through the main or floating session is also pushed onto an observer buffer via bufferChatObserverTurn (acp-bridge/src/index.ts line 1158). When the buffer reaches CHAT_OBSERVER_BATCH_SIZE, which is exactly 10 turn pairs (20 messages), flushChatObserverBatch fires a single session/prompt request against the observer session. The observer's job is to read MEMORY.md, decide what is genuinely new, write topic files using Claude's built-in memory tools, and call save_observer_card to surface one-line notes to the user.

Why 10 turn pairs and not real-time?

Three reasons. First, token cost: running Sonnet 4.6 on every single turn would double your per-message bill for a background job that does not need that much resolution. Second, signal: a single turn is rarely meaningful, but 10 pairs is enough context for the model to tell 'the user keeps asking about CRM workflows' from 'the user was debugging a typo'. Third, concurrency: the observer runs through a single at-a-time gate (chatObserverRunning flag in index.ts line 1168), so batching prevents overlap when the user is typing fast.

How is the observer session different from the main session on the MCP server side?

The observer is deliberately stripped down. In acp-bridge/src/index.ts line 1023, the buildMcpServers function returns early for sessionKey === 'observer' after attaching only fazm_tools. No Playwright. No macos-use. No WhatsApp. No browser overlay. The observer literally cannot click, type, browse, or take a screenshot. Its tool surface is reduced to file system, SQL against the local app DB, save_observer_card, and the browser-profile query/edit pair. That is the entire power budget for what a second Claude session is allowed to do in the background.

How does the observer detect a repeated workflow?

The system prompt in Desktop/Sources/Chat/ChatPrompts.swift line 566 instructs the observer: when you detect a repeated workflow three or more times, draft a skill. The observer reads chat_messages from the local SQLite database, checks ~/.claude/skills for what already exists, and if a genuine repeat emerges, writes a new SKILL.md file and calls save_observer_card with type 'skill_created'. The user sees a card in the floating bar saying 'Created skill: {name}' and can dismiss it to undo.

Does the observer see sensitive information?

Yes, so Fazm keeps it entirely local. The observer's memory directory is on the user's disk, not a Fazm-hosted database. The observer's allowed tools do not include any network calls back to Fazm servers beyond the normal anonymized analytics. The fazm_tools MCP server runs as a stdio subprocess connecting to the Swift app via a Unix socket (FAZM_BRIDGE_PIPE), so the observer's reads and writes never leave the machine. The one thing it does call Anthropic for is the model inference itself, which is the same call path as any other Claude API turn.

Does any other AI desktop app run a second model session for memory?

Not that we have found on Mac. Most consumer AI apps use one LLM call per user message, period. Some frameworks expose background agents, but those are developer tools, not consumer products. The closest architectural cousin is Claude Code itself, which ships its own memory files under a single session. Fazm's twist is running the memory pass as a separate Sonnet 4.6 session so the main chat's context window stays clean and the observer can be given a different system prompt and a smaller tool surface.

What happens if Anthropic ships a newer model tomorrow?

The observer picks it up the same way the main chat does. The ACP bridge reports availableModels on every session/new, Fazm's ShortcutSettings.updateModels substring-matches the ID against haiku / sonnet / opus, and the observer's warmup config is swapped to whatever the family map currently resolves for Sonnet. There is no build, no app update, and no user action. If Anthropic renames the family entirely, the observer falls back to the raw model name from the API, sorted to the end of the picker, until a new build ships.

Can I turn the observer off?

Yes. The observer session is created during ACP warmup. Users who do not want a parallel memory pass can disable the feature in settings, which clears the observer session and stops bufferChatObserverTurn from being called after each main/floating turn (acp-bridge/src/index.ts lines 1748 to 1752). The main chat keeps working identically. The only thing you lose is the automatic MEMORY.md and the auto-drafted skills.

Other corners of the Anthropic release we have written up from inside a shipping Mac app