Field notes for Claude Code users
The real project cost of Claude Code session loss, manual forking, and auto-compacting
Three pain points get talked about as separate gripes. On a project week, they compound into four distinct taxes, only one of which shows up on the Anthropic invoice. The other three live in your hours. This is the decomposition, the SDK signal that quantifies each, and the file paths in one open-source wrapper that absorb them.
Direct answer (verified 2026-05-17)
Four taxes, not three, and only one shows on the bill. The session-restart tax is the morning ritual of re-cd plus --continue across every active project. The fork-friction tax is two terminals on the same --resume ID with no parent-child link. The auto-compact specificity tax is the SDK flattening pre_tokens worth of older turns into a paragraph that drops the file path the model needed at turn 90. The rate-limit-roll tax is the upstream session ID changing silently mid-conversation, leaving the pre-roll half stranded in a different JSONL file. The first three are usually written about; the fourth is the hidden one. All four are properties of how the host above ACP is wired, not of the model.
Authoritative source for the resume mechanics: code.claude.com/docs/en/agent-sdk/sessions. SDK fork RPC: unstable_forkSession in @agentclientprotocol/claude-agent-acp.
The four taxes in one diagram
The CLI bug-tracker treats these as separate items. The economy of your project week treats them as one bill. Each input below is a real SDK signal or CLI gap. The destinations are the work you actually do because of it.
Three CLI gaps, four taxes, one weekly bill
What each tax actually costs, broken out
The table below is the honest decomposition. The unit column tells you what you would measure for that tax. The signal column tells you where the SDK or the filesystem already records it, so you can audit your own week without installing anything new.
The four taxes, with the SDK signal that quantifies each
| Feature | Where to audit it | What it costs |
|---|---|---|
| Session-restart tax | Count terminals you opened today and ran --continue or --resume in. That number, times your average preamble token cost, is the floor. | Re-typed preamble + re-discovered tool state, every morning, multiplied by N active projects |
| Fork-friction tax | No SDK signal in the CLI. Count days you wanted to try two directions and only tried one. The unstable_forkSession RPC exists, the CLI does not surface it. | Branches you do not take + duplicate prefix billed twice when you do split. The cost shows as fewer experiments per week. |
| Auto-compact specificity tax | compact_boundary system event in the JSONL. Sum compact_metadata.pre_tokens for the week. That is the live-context volume you lost. | Specifics flattened into a paragraph. Future turns retype file paths, variable names, and decisions the model just forgot. |
| Rate-limit-roll tax | Count distinct session-id files for one logical conversation in ~/.claude/projects/<encoded-cwd>/. Every extra file past the first is one roll. | Pre-roll half of the conversation stranded in a separate JSONL file. Resume after the roll loads the wrong half. |
The one signal that already measures the compacting tax
The Claude Code SDK fires a compact_boundary system event the moment it decides to summarize older turns into a paragraph. The event carries a compact_metadata object with a trigger (auto or manual from /compact) and a pre_tokens integer that names exactly how much live context is about to be flattened. The bare CLI does not show this to you; the JSONL on disk has every one. The forwarding code in the open-source Fazm bridge is six lines:
Once you can see the number, you can decide. A boundary firing at pre_tokens close to the model’s context window means the summarizer is squeezing close to the whole budget into a paragraph. If the next thing you do depends on specifics from the last 80 turns (a file path, a variable name, a decision), fork at that point. Otherwise let it compact. The choice is the work; the metric is just what tells you a choice exists.
The three host-layer constants that absorb each tax
The numbers below are not benchmarks. They are the actual constants in one shipping open-source wrapper (Fazm, MIT-licensed, github.com/m13v/fazm) that exists because each tax needs a host-side fix the CLI cannot provide.
The first number is sessionChainMaxSize at Desktop/Sources/Providers/ChatProvider.swift line 615. A window keeps an append-only chain of every upstream session ID it has ever owned (up to sixteen), so when an upstream roll fires a fresh session ID, the priorContext lookup still spans both halves of the conversation. That is the rate-limit-roll tax, gone.
The second number is the retry budget in Desktop/Sources/FloatingControlBar/DetachedChatWindow.swift function restoreWindows around line 755. On launch the host reads a registry of every detached window from UserDefaults and loads each chat (retrying up to ten times with a half-second sleep before deferring to the next launch). Every window comes back at its saved frame, workspace, and model. That is the session-restart tax, gone.
The third number is one RPC: session/fork, dispatched by handleForkSession at acp-bridge/src/index.ts line 3959. The unstable unstable_forkSession method from @agentclientprotocol/claude-agent-acp returns a fresh session ID rooted at the parent’s history; the host registers the new key, optionally swaps the model, and emits a session_forked event the UI uses to open the new window. The parent stays alive on disk. That is the fork-friction tax, gone, and the auto-compact specificity tax becomes a choice instead of an accident.
“Compact boundary: trigger=auto, preTokens=178432”
acp-bridge/src/index.ts line 4385, every boundary in the live log
A working pattern that does not depend on a wrapper
You do not need to install a host to audit your own week. The JSONL transcripts have every signal. The shell snippet below sums pre_tokens across every auto-compact in a single session, which is the compacting tax expressed in tokens flattened. Run it for a week and you have a real number to argue with.
# Audit your own compacting tax for one Claude Code session
SESSION=~/.claude/projects/-Users-you-proj/<session-id>.jsonl
jq -r 'select(.type=="system" and .subtype=="compact_boundary")
| "\(.compact_metadata.trigger) \(.compact_metadata.pre_tokens)"' \
"$SESSION" \
| awk '{ sum += $2; n++ } END { print n, "boundaries,", sum, "pre_tokens total" }'The same directory tells you the rate-limit-roll tax. List the JSONL files for a single logical conversation; every file past the first is one roll where the upstream session ID changed mid-chat. The CLI’s --resume picker treats them as separate conversations, which is why the pre-roll half feels stranded.
When the bare CLI is the right answer
Two cases. First, you live in one project at a time and your sessions are short enough that auto-compact rarely fires. The transcript layer is doing the right thing under you, and the four taxes round to zero. Second, you already script around these gaps with tmux, shell aliases, and a habit of writing every load-bearing decision into a file before quitting. That is the manual version of the host-layer fixes, and it works.
Where the calculus flips is the week where you ran four projects in parallel, hit a rate limit twice, watched the status bar say compacting once, and wanted to fork a conversation to try two directions and only tried one. That is the week the four taxes compound. A host on top of ACP keeps the agent loop and the model identical and absorbs the four taxes structurally. The model is unchanged. What changes is what the host does between turns.
Trying to quantify your own four-tax week?
Twenty minutes. I will walk through your JSONL with you, sum pre_tokens for last week, and show where each of the four taxes is hitting hardest in your workflow.
Frequently asked questions
What is the real cost of a Claude Code session getting lost on restart?
Four costs, not one. (1) The cwd dance: you re-cd into N projects and re-run `claude --continue` N times every morning. (2) The mental reload: you re-read the last few of your own messages to remember where you were. (3) Tool re-discovery: the agent may re-list a directory or re-read a file you already had it read yesterday, which is paid input tokens you already paid for. (4) The lost decisions: any verbal-only `we decided to use X` that was not written into a file is gone unless you scroll the JSONL by hand. The transcript is on disk, the workspace is not.
Why does forking a Claude Code session cost more than people think?
Because the CLI does not have one-click fork. The cheapest workaround is two terminals running `claude --resume <id>`, which gives you two divergent live sessions backed by the same JSONL up to the fork point, with no parent-child link recorded anywhere. You lose the ability to compare what each branch produced, you lose the ability to cherry-pick later, and you usually end up just running one branch and shutting the other. The branch you ran cost you the full token bill for both copies of the prefix because you sent it twice. The SDK actually exposes `session/fork` as an unstable RPC; the CLI does not surface it.
Does auto-compacting show up on the Anthropic invoice?
Barely. The summary the SDK writes is a few hundred tokens, dwarfed by a single tool call. The expensive part is downstream: you retype facts the model forgot, the model re-reads files it already had read because the file path got summarized out, and you sometimes re-run an entire test because the agent hallucinated the variable name. That work is paid input tokens on later turns, but it is not labelled `compact` on the invoice. The SDK does fire a `compact_boundary` event with a `pre_tokens` integer telling you exactly how much context was flattened. That number is the only honest cost meter for this tax.
What is the rate-limit roll tax?
When the upstream session ID rolls forward mid-conversation (a rate limit, a credit exhaust, an ACP bridge restart), the bridge accepts a fresh session ID and replays your preamble. From the terminal it looks like a hiccup. The JSONL on disk now has two files, one per session ID, with no link between them. If you run `claude --resume <new-id>` later, you get the post-roll half. The pre-roll half is on disk but the resume command will not load it. Most users do not know there are two files and they spend the next thirty minutes wondering why the agent forgot what it knew before lunch.
Can I quantify all four taxes for my own project week?
Yes. Take the JSONL files for the week, find every `system` line with `subtype=compact_boundary` and sum `compact_metadata.pre_tokens` (that is your compacting tax in tokens flattened). Count how many unique session IDs your files cover for the same logical conversation (every extra ID after the first is one rate-limit roll). Multiply your morning-restart count by the average preamble token count for restart cost. Forking is harder to quantify because the cost is in branches you did not take; the proxy is `how many times did I run two terminals on the same `--resume` ID` for that week. The numbers will not show on the Anthropic invoice. They will show in your hours.
How does Fazm fix each of these in particular?
Window auto-restore on launch fixes the session-loss tax (every window comes back at its saved frame, workspace, and model). The session-ID chain capped at 16 entries per window fixes the rate-limit-roll tax (priorContext lookup spans every session ID this window has ever owned, so the pre-roll half is never stranded). A button that dispatches the unstable `session/fork` RPC fixes the forking tax. The host listens for the SDK's `compact_boundary` event and surfaces `pre_tokens` so you can fork at a known-good point before the summary lands, which is the actual fix for the compacting tax. The agent loop and the model are unchanged; it is the host above ACP that absorbs the four costs.
If I keep using the bare CLI, what is the cheapest thing I can change today?
Three small moves. First, alias a one-liner that opens N tmux panes, each cd'd into one of your active projects with `claude --continue` already running, and put it in your shell init so a restart is one command. Second, scan your JSONL once a day for `compact_boundary` lines with `jq -r 'select(.subtype=="compact_boundary") | "\(.compact_metadata.trigger) \(.compact_metadata.pre_tokens)"'` and learn what your own pre_tokens distribution looks like; that tells you when you should be splitting sessions earlier. Third, when you hit a rate limit, deliberately note the next session ID into a scratchpad so the resume picker is not your only memory. None of this requires a host. All of it requires you to do the housekeeping the CLI does not.
Why not just use a smaller model or shorter sessions to dodge all of this?
Because the cost we are talking about is not API dollars. It is the work you do after the CLI drops state. Shorter sessions hit the same restart tax more often. A cheaper model on the cheap turns does not change auto-compact, the rate-limit roll, or fork ergonomics. The taxes are properties of how the host on top of the SDK is wired, not properties of how big the model is. The fix is at the host layer.
Each tax in more depth, plus the rate-limit-roll trap nobody warns you about
Adjacent field notes
Claude Code persistent sessions, what works out of the box and what you have to wrap
Sessions are on disk by default. Auto-restore, one-click fork, and surviving an upstream rate-limit roll are host problems. Source file paths included.
Claude Code auto-compacting token waste, the cost is the re-establishment work after
The summary itself is cheap. The expensive part is everything you retype because the model forgot the specifics. Where pre_tokens comes from in the bridge.
Agent persistent session state, the rollover trap nobody warns you about
Why the upstream session ID rolling forward on a rate limit silently strands your earlier messages, and the chain pattern that keeps the conversation continuous across the roll.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.