Field notes for Claude Code users

Claude Code auto-compacting token waste

The intuition behind “auto-compact is wasting my tokens” is real, but the waste is not where most people point. The summary the SDK writes when it compacts is small, it streams cleanly, and on most providers the cached prefix means you re-pay almost nothing for it next turn. What you actually pay for, in dollars or in your own attention, is everything that happens after the boundary fires.

Matthew Diakonov, Written with AI

Published May 14, 20267 min read

Direct answer (verified 2026-05-14)

Claude Code auto-compact does not waste tokens at the API meter. The summary itself is short, cacheable, and dwarfed by a single tool call. What it wastes is your effective context budget. The summary keeps the gist and drops the specifics, so future turns that needed those specifics end up paying for them twice, once on the way in and once when you retype them after the model has forgotten. The boundary event the SDK fires carries a pre_tokens count that tells you how much context just got flattened. Almost no host surfaces it.

Where the “waste” intuition comes from

You start a long session. You feed it a file tree, a system prompt, a set of decisions about how the codebase is laid out, a few hours of edits, the output of a long-running test run. The model is sharp. Then around the 60th turn you ask a follow-up that references a specific variable from turn 8, and the model confidently invents a different name. You stop. You retype the actual name. You notice the status bar says compacting a few turns back. You feel like you paid for context you do not have anymore. That is the moment people call this token waste.

The instinct is right that something got thrown away. It is the location that is wrong. The original turns are still on disk in ~/.claude/projects. The summary that replaced them is in the model's live window. The dollars you paid for the original turns are gone, not because the summary cost a lot, but because the model no longer benefits from them. Your hourly budget pays the difference in retyping.

What the SDK actually emits at the boundary

The compaction is not silent at the protocol layer, only in most UIs. The Claude Code SDK fires a system event with subtype compact_boundary at the moment the decision is made, then opens a content block of type compaction and streams compaction_delta chunks until the summary is done. In the open-source Fazm ACP bridge you can see the forward path in full:

// acp-bridge/src/patched-acp-entry.mjs (lines 99-107)
if (subtype === "compact_boundary") {
  await acpClient.sessionUpdate({
    sessionId: sid,
    update: {
      sessionUpdate: "compact_boundary",
      trigger: item.value.compact_metadata?.trigger ?? "auto",
      preTokens: item.value.compact_metadata?.pre_tokens ?? 0,
    },
  });
}

The trigger is either auto (the SDK decided) or manual (you typed /compact). pre_tokens is the integer that tells you exactly how much live context the summarizer is about to flatten. A boundary firing with pre_tokens near the model's context window means the summarizer is squeezing close to that whole budget into a paragraph, with everything that implies for what survives.

The compact_boundary event lifecycle

pre_tokens

“Compact boundary: trigger=auto, preTokens=178432”

acp-bridge/src/index.ts line 4250, every boundary in the live log

Why the standard advice is the same lever in three coats

Look at what the well-meaning playbooks tell you to do when long Claude Code sessions misbehave. Run /clear earlier. Run /compact on purpose. Set --max-turns low. Drop to Sonnet on the cheap turns. All of these reduce context. None of them keep the part of the context the next turn will actually need. They are versions of one move: shrink the input. The SDK has a different primitive that nobody surfaces: session/fork.

Feature	The standard playbook	Fork at a known-good point
What happens to older turns	Summarized into a paragraph or dropped entirely	Stay live in their original form on the parent session
Does the model see the original bytes	No, only the summary or whatever is left after /clear	Yes, the new branch can carry hand-picked turns forward
What you pay for next turn	Summary plus everything you retype because it got dropped	The branch's context plus whatever you sent
Original conversation	Intact on disk, but the live model only sees the post-clear or post-compact view	Intact on disk, resumable, untouched by the fork
SDK call	/clear or /compact in the prompt	session/fork via unstable_forkSession

What forking looks like at the bridge layer

The fork primitive lives in @agentclientprotocol/claude-agent-acp as the experimental unstable_forkSession RPC. The bridge wraps it in a normal request that takes the parent sessionId, returns a fresh one rooted at the end of the parent's history, and registers the new session under a new (or in-place) key. From acp-bridge/src/index.ts around line 3862:

// acp-bridge/src/index.ts (lines 3862-3866)
const result = await acpRequest("session/fork", {
  sessionId: sourceEntry.sessionId,
  cwd,
  mcpServers: buildMcpServers(mode, cwd, msg.toSessionKey),
}) as { sessionId: string };

The result is an honest new session. The parent's JSONL is unchanged. The new session can take a different model, a different mode, or no changes at all. Crucially the new session starts with the parent's conversation as the SDK actually had it, not as the summarizer rewrote it. This is the move you want when you can feel the boundary coming and you have not yet lost the specifics.

A working pattern that does not depend on Fazm

If you are running raw Claude Code, you can get most of this with the CLI you already have. Watch the status bar for compacting or check the JSONL for compact_boundary entries with jq. When you see one fire near the window limit, stop the session, copy the last few of your own messages and the load-bearing tool results into a fresh prompt, and start a new conversation with that as the preamble. It is the manual version of fork. It is annoying, but it gives the new session the original bytes the model needs, not the paragraph the summarizer wrote.

# Find every auto-compact boundary in a Claude Code session
SESSION=~/.claude/projects/-Users-you-proj/<session>.jsonl
jq -r 'select(.type=="system" and .subtype=="compact_boundary")
       | "\(.compact_metadata.trigger) \(.compact_metadata.pre_tokens)"' \
   "$SESSION"

The values you get back are the same trigger and pre_tokens the SDK emitted live. They are your record of how much context the model lost and when. A host that surfaces this in real time saves you the post-mortem with jq.

What this changes about how you run long sessions

The mental model worth holding is that auto-compact is a soft failure mode of context being too large for the model, not a feature that saves you money. The summary is cheap. The lost specificity is not. Once you stop optimising the summary itself and start treating the boundary as a fork point, two things change. You stop running one long session and start running a small chain of shorter ones, each forked from the previous at a clean moment. You also stop retyping facts the model already knew, because the next session in the chain starts with those facts in the input rather than in a paragraph about them. The savings show up as fewer retypes, fewer hallucinations, and fewer rerolls on the same prompt, not as a smaller line on the invoice.

If you want a host that surfaces pre_tokens and lets you fork in one click

Fazm wraps Claude Code via ACP in a native macOS UI. Show me your workflow and I will walk you through where the boundary fires today and what changes when you fork at it instead.

Frequently asked questions

Does Claude Code auto-compact actually waste tokens?

Not at the API meter, no. The summarizer is cheap. It writes a short paragraph that replaces the older turns in the live context window, and on most providers the summarized prefix is even cacheable for the next turn. The real waste is downstream of that summary. The summary keeps the gist and drops the specifics, so when you ask the model in turn 90 about the file path you established in turn 12, it does not know. You type the path again. Those bytes you already sent become bytes you have to re-send. That is the waste, and it is invisible at the meter.

When does auto-compact actually fire?

When the running session approaches the model's context window. Internally the SDK fires a system event with subtype `compact_boundary` and a `compact_metadata` object that includes a `trigger` (either `auto` or `manual` from `/compact`) and a `pre_tokens` integer counting the tokens that were live the instant the decision was made. Right after, a content block of type `compaction` starts streaming, and you get a sequence of `compaction_delta` chunks while the model writes the summary. Then the summary replaces the older turns in the live window and the conversation continues. The transcript file on disk still has the originals.

Why does the existing advice (/clear, /compact, lower --max-turns) not really fix it?

Because all three of those move the same lever in the same direction. `/clear` trades full context for none and you start from zero. `/compact` trades it for a shorter version, which is what `auto` was about to do anyway. Lowering `--max-turns` ends the session before the wall, which means a different kind of loss, the kind where you have to restart the whole task. None of these is wrong, they are just versions of the same shrink-the-context move. The primitive the SDK actually offers and almost no guide surfaces is `session/fork`, which gives you a new branched session starting at the end of a known-good prefix of the current one. The branch is a real session you can run forward, the parent stays intact, and you keep the prior context as the SDK saw it, not as a summary saw it.

Where does the pre_tokens number actually come from in code?

It is in the system-message stream from the @agentclientprotocol/claude-agent-acp SDK. In the open-source Fazm bridge that wraps Claude Code you can see it forwarded verbatim. acp-bridge/src/patched-acp-entry.mjs lines 99 to 105 read `item.value.compact_metadata.pre_tokens`, default it to 0, and forward it as `preTokens` on the `compact_boundary` ACP update. acp-bridge/src/index.ts line 4250 then logs every one with the trigger that fired it. ChatProvider.swift line 3438 picks it up on the Mac side. None of this is custom; the value is straight from the SDK. The host just chooses whether to render it.

If I keep the full transcript on disk, why does the model not just read it?

Because the model does not have it. The transcript file in ~/.claude/projects is for you, your editor, and `claude --resume`. The live conversation the model sees is whatever the SDK has currently packed into the context window. After an auto-compact that is the summary plus the most recent turns. The bytes from turn 12 are on disk; they are not in the model's input. This is the part that catches people. Persistence at the JSONL layer and visibility in the model's context are two different things.

How is forking different from just running /resume on the same session?

Resume reattaches the original session in the same state it was in. If the original session had already auto-compacted, resume gives you back the post-compact view. Fork goes through the ACP `session/fork` RPC, which is documented in `@agentclientprotocol/claude-agent-acp` as `unstable_forkSession`, and creates a new sessionId rooted at a chosen point in the parent's history. The parent stays resumable on disk. The branch is its own conversation from that moment on. You can put fresh, hand-picked context on the branch, and the old session does not move.

Can I disable auto-compact entirely?

Not from the upstream CLI today. There is no `--no-auto-compact` flag and no environment variable that turns it off. The closest you can do at the CLI is split a task into smaller sessions yourself, which is what `/clear` is for. A host on top of ACP can do better: it can listen for the `compact_boundary` event before the summary lands, surface the `pre_tokens` count so you know it is about to happen, and offer fork as a one-click alternative to letting the summary roll. That is the design Fazm chose. The SDK still controls the boundary; the host gives you somewhere to go before the boundary fires.

Why is the pre_tokens number worth seeing?

Because it tells you when the host is about to drop work you cared about. A boundary that fires with `pre_tokens` near the model's window limit (Sonnet 4 at 200k, Opus 4 at 200k, larger on extended models) means the summarizer is flattening close to that whole budget into a paragraph. If the immediately preceding 80 turns contained file paths, decisions, or specific numbers the next 80 turns will need, that is your warning. The boundary is not destructive; the work is on disk. But the work is also no longer in the model's input. Knowing the boundary is about to fire is the difference between forking deliberately and discovering the loss when the model starts hallucinating the variable name on turn 130.

Does any of this need a different model or subscription than the CLI gives me?

No. The whole point of going through ACP is that the agent loop and the model are unchanged. Your Claude Pro or Max subscription is the same, the tool surface is the same, the prompts are the same. The wrapper changes what the host does with the SDK events. It can surface `pre_tokens`, expose `session/fork` as a button, and refuse to throw away the JSONL when the live window compacts. The model on the other end of the socket is still Claude Code.