Multi-agent Claude Code orchestration is mostly bookkeeping, and the bookkeeping is the tradeoff
Every guide on running multiple Claude Code agents in parallel stops at token spend. That is the easy half. The harder half is the per-session state you have to track once there is more than one agent in the room: a roughly 4 second session/new latency you have to amortize, separate queues and interrupts and generation counters per session, and a quiet pile of MCP server processes that shows up nowhere on a dashboard. These are notes from a desktop app that runs concurrent Claude Code sessions in production.
DIRECT ANSWER (VERIFIED 2026-05-02)
Should you orchestrate multiple Claude Code agents in parallel?
Run multiple agents only when three things hold at the same time: (1) the work splits into independent pieces that no two agents will touch in the same file, (2) you can amortize the roughly 4 second session/new startup cost (pre-warm sessions, or long-running tasks where startup is a rounding error), and (3) the value of parallel exploration beats a 2x to 5x token multiplier. For sequential edits, dependent tasks, or one-shot questions, a single session wins. This matches Anthropic's own framing in the official agent-teams documentation, which calls multi-agent "not for most tasks."
Source: code.claude.com/docs/en/agent-teams, re-checked 2026-05-02.
THE THESIS
The expensive part of multi-agent is not tokens, it is state
Public writing on multi-agent Claude Code orchestration converges on three points: it costs more tokens, it can race on file writes, and it works best for parallel exploration. All true. All of it skips the machinery you actually have to build, or buy, or absorb, the moment there is more than one agent in your loop.
The machinery is per-session state. Every session has its own context window, its own MCP servers, its own pending-message queue, its own interrupt flag, its own generation counter for timeouts. A single-agent setup is a single object. A multi-agent setup is a dictionary of objects keyed by session, plus the routing code to make sure inbound messages, interrupts, and timeouts land on the right key. Get the routing wrong and you get cross-session message bleed, the wrong agent getting cancelled, or runaway turns that nobody is watching.
The numbers below are not hypothetical. They come from Desktop/Sources/Chat/ACPBridge.swift inside Fazm, a 1,984-line file with 177 references to the word "session", and from acp-bridge/src/index.ts, the Node-side counterpart with another 498. That ratio is the tradeoff in a single line: most of the code is bookkeeping, not work.
TRADEOFF 1
The ~4 second session/new latency, and what amortizing it costs you
Every Claude Code agent you orchestrate begins with a session/new round-trip. That call loads the system prompt, registers the MCP servers configured for the working directory, and hands back a session ID. On a warm machine, in our measurements, this is roughly 4 seconds per session. Cold, after a process launch, it is longer. If you spawn three agents on demand, that is 12 seconds of dead time before any of them produces a token, unless you parallelize the calls, which then briefly tripled the load on the bridge.
The fix is pre-warming. The literal comment in our bridge: "saves ~4s on the first query by doing session/new ahead of time. Pass multiple models to pre-warm sessions for both Opus and Sonnet in parallel." The pattern is to call warmupSession on app launch with one config per model you expect to route through. The first real user query lands on a session that already exists, and streaming starts in milliseconds.
The cost is honest. Each pre-warmed session pins a process slot, an allocated context window, and any MCP servers it registered, whether a user ever sends a message to it or not. For a desktop app with two warmed sessions, that is acceptable. For a server-side orchestrator spinning up ten agents per request, it is not, and you are back to paying the 4 seconds per call on the cold path.
TRADEOFF 2
Per-session queues, interrupts, and generations: the four dictionaries
Single-agent code is short. Multi-agent code is mostly routing. The minimum bookkeeping for safe parallel sessions is four dictionaries keyed by sessionKey, each of which exists for a reason that will only become obvious the first time you try to omit it.
- sessionContinuations holds the awaiter for the next inbound message on a given session, so two sessions waiting at once do not collide.
- sessionPendingMessages is the queue for messages that arrived before any awaiter was ready. Without it you drop the first chunk of a streamed response.
- sessionMessageGenerations is a per-session monotonic counter that timeouts use to confirm they are cancelling the right turn.
- sessionInterrupted is a per-session boolean so cancelling agent A does not cancel agent B.
The shape of the diff
// One agent, one session. The whole bookkeeping fits in a few lines.
let session = await provider.request("session/new", { cwd, mcpServers });
const result = await session.prompt(userMessage); // wait for it
console.log(result); // done
// Cancel? One flag is enough.
session.interrupt();The right-hand side is the floor, not a complete implementation. Real bridges add reconnection on session loss, phantom-session detection (sessions the SDK never wrote a turn for and that session/resume will reject), per-session token accounting, and a lifecycle hook for when a session is invalidated by a working-directory change. None of that work is glamorous and all of it is load-bearing.
THE LIFECYCLE
Two agents, one bridge, one orchestrator
The shape of a healthy two-agent run on the same bridge process. Note that the interrupt to agent A is scoped, agent B keeps streaming.
Parallel session lifecycle
TRADEOFF 3
Token spend: the multiplier that everyone names and few quantify
The token math is the well-trodden tradeoff and the right baseline for budgeting is 2x to 5x the single-session spend, not 1.x. Each agent loads its own system prompt and tool catalog. There is no shared cache across sessions for that prefix unless you specifically design for it, which itself is more orchestration code. Each agent then takes its own turns, and the orchestrator typically re-summarizes or re-routes between them, paying again on its own context.
Anthropic's own writeup on building a C compiler with parallel Claudes is the honest reference: parallel agents shipped the compiler faster in wall-clock time but burned dramatically more compute than a single agent doing the same work serially would have. That is the deal. Multi-agent buys latency, not efficiency.
TRADEOFF 4
MCP server proliferation: the cost that does not show up in your token bill
Every Claude Code session registers its own MCP servers based on the working directory it was created with. Three concurrent agents on the same machine, each with four MCP servers (Playwright, filesystem, search, a database client), gives you twelve child processes plus the three agent processes plus your orchestrator. Each holds its own stdin/stdout pipes, its own state, its own file descriptors. None of that is visible in your API bill.
The mitigation is to share MCP servers across sessions where the protocol allows it (stdio servers do not multiplex by design, http ones can). The cost of the mitigation is real: shared servers reduce the per-session isolation that was a reason to run multiple agents in the first place. If agent A leaves the Playwright tab on a logged-in dashboard, agent B inherits it. Sometimes that is fine. Sometimes it is the bug.
TRADEOFF 5
File-system race conditions: a solved problem with a discipline tax
The file-write race is real and it is the one tradeoff with a clean, known answer: one git worktree per agent. Each agent edits inside its own checkout, the merge happens explicitly at the end. You trade a class of correctness bugs for disk space and the discipline of merging. Skip the worktree pattern, let two agents touch the same file, and eventually you ship a bad merge that compiled, passed tests, and silently dropped one agent's intent.
This is the part of multi-agent where the tooling is mature. It is also the part where most teams underspend on tooling because worktrees feel like a developer-environment problem rather than an orchestration problem. They are an orchestration problem.
Single session vs multi-agent, line by line
A short version of the tradeoffs above, in the shape your future self will actually consult.
| Feature | single session | multi-agent |
|---|---|---|
| Wall-clock latency on parallel work | serial (slowest) | near-parallel (fastest) |
| Token spend baseline | 1x | 2x to 5x typical |
| Startup cost per agent | one ~4s session/new | N x ~4s, mitigated by pre-warming |
| Per-agent state to track | one object | four dictionaries keyed by sessionKey |
| Cancel granularity | one flag | per-session interrupt |
| MCP child processes | M servers | M x N (or shared, with isolation cost) |
| Risk of same-file race | none | real, mitigated with git worktrees |
| Best for | edits, dependent steps, one-shots | exploration, independent modules, hypothesis racing |
THE COUNTERARGUMENT
When the bookkeeping is worth it
The pages that currently rank for this topic and Anthropic's own docs all converge on the same three winning shapes, and our experience matches. They are worth naming clearly.
- Parallel exploration. Several agents investigate different files, modules, or hypotheses in the same codebase, then report back. The orchestrator compares and synthesises. Token spend is bounded by exploration time and wall-clock is what you actually want to optimize.
- Independent feature work. Each agent owns a separate module on a separate worktree. No file collisions because there are no shared files. The 2x to 5x token spend buys near-linear wall-clock speedup for greenfield work.
- Competing-hypothesis debugging. Each agent tests one theory about a bug in parallel. First one to confirm or rule out wins. Token spend is the price of not waiting for serial elimination, which on a hard bug can be hours.
What these have in common: the work is independent enough that no two agents will fight, the value of finishing sooner is large enough that paying 2x to 5x in compute is acceptable, and there is a clean point where the orchestrator collects results. If any of those is missing, you are paying for the bookkeeping without getting the speedup.
“Multi-agent buys you wall-clock latency, not efficiency. Pay for it only when wall-clock is what you are optimizing.”
from the field notes above
THE RUBRIC
A short test before you reach for orchestration
Before you write a single dictionary keyed by sessionKey, run the task you have in mind through these four questions. If you cannot say yes to all four, use one session.
- Can the work be split into pieces that no two agents will edit in the same file? If no, single session.
- Is wall-clock latency the actual constraint here, not token spend or developer time spent on integration? If no, single session.
- Will each piece run for long enough that the ~4 second startup cost is a rounding error, or can you pre-warm sessions? If neither, single session.
- Do you have a clear point where you collect, compare, and merge the agents' outputs? If no, single session, or build that point first.
Related guides on this site
Multi-agent workflow with MCP orchestration
How the layers fit together: Claude API, Claude Code, MCP, and desktop agents in one stack.
MCP tools vs skills vs subagent orchestration
Pick the right abstraction layer. Tools, skills, and subagents each have a tradeoff curve.
Claude Code parallel agents setup
The mechanics of running multiple Claude Code agents on the same repo.
Want to talk through your orchestration plan with someone who has built one?
Bring your sketch. We will tell you which of the tradeoffs above are about to bite you and which you can ignore.
Multi-agent Claude Code orchestration, frequently asked
Is multi-agent Claude Code orchestration worth it for typical work?
Honestly, no, not for most of it. Anthropic's own docs say multi-agent does not make sense for the bulk of agent-assisted development tasks; it pays off for research and review (parallel investigation), greenfield modules where teammates can each own a separate file, and debugging by competing hypotheses. For sequential edits, same-file work, dependent tasks, or any one-shot question, a single session is faster, cheaper, and easier to steer. The 2x to 5x token multiplier is the visible cost; the per-session bookkeeping is the invisible one.
Why does session/new take ~4 seconds?
The Claude Code agent process has to spin up an ACP (Agent Client Protocol) session: load the system prompt, initialize the context window, register the MCP servers configured for the cwd, and hand back a session ID. None of that is free. Inside Fazm we measured roughly 4 seconds for the first session/new round-trip on a warm machine; cold (just after launch) it can be longer. The mitigation is to call session/new ahead of time in the background so the first user query reuses an existing session. The trade is that each pre-warmed session pins a process and its context until you tear it down.
What state do I actually have to track per agent if I run them in parallel?
More than people expect. The minimum viable bookkeeping in our bridge is four dictionaries keyed by sessionKey: sessionContinuations (the awaiting-message future for each session), sessionPendingMessages (the queue of inbound messages received before the awaiter was ready), sessionMessageGenerations (a per-session counter for timeout tracking), and sessionInterrupted (a per-session interrupt flag so cancelling one agent does not cancel the others). If you skip any of these, you get one of: dropped messages, cross-session message bleed, runaway turns, or interrupt-of-the-wrong-agent.
How much extra token spend should I budget for multi-agent vs a single session?
Plan for 2x to 5x in the typical case and worse in the worst. Each agent loads its own system prompt and tool catalog (no shared cache across sessions for that prefix unless you carefully design for it), each takes its own turns, and the orchestrator usually re-summarizes or re-routes between them. Anthropic's writeup on building a C compiler with parallel Claudes is a good reference point: parallelism shipped faster wall-clock but burned dramatically more compute than a single agent would have.
What about MCP server proliferation, is that real?
Yes, and it bites quietly. If each agent in your orchestration registers its own MCP servers (Playwright, filesystem, a database client, a search tool), each session spawns those processes. With three concurrent sessions and four MCP servers each, you have twelve child processes plus the parent agents plus your orchestrator, all sharing one machine's CPU, memory, and file descriptors. The fix is to share MCP servers across sessions where the protocol allows it; the cost of the fix is that you lose per-session isolation, which is one of the main reasons to run multiple agents in the first place.
Is the file-system race condition risk overstated?
It is real, but it is a solved problem. The standard answer is git worktrees, one per agent, so each agent has its own working tree and the merge happens explicitly at the end. The cost is disk and the discipline of merging; you trade a class of correctness bugs for a class of integration friction. If you skip the worktree pattern and let two agents touch the same file, you will, eventually, ship a bad merge.
When does multi-agent actually beat single-agent?
Three cases hold up under our usage: (1) parallel exploration where each agent investigates a different hypothesis or area of a codebase and you compare findings, (2) genuinely parallel feature work where each agent owns a separate module that does not share files, (3) competing-hypothesis debugging where each agent tests a different theory and the first to confirm wins. Outside those, the overhead almost always exceeds the benefit.
Does pre-warming sessions help in practice?
It removes the first-query latency, which is the most user-visible part. In Fazm we pre-warm one session per model (Opus and Sonnet) on app launch via warmupSession in ACPBridge.swift. The first user query reuses the warmed session and starts streaming in milliseconds instead of seconds. The downside is honest: each warmed session pins memory and a process slot whether the user ever asks anything or not. If you orchestrate ten agents you cannot pre-warm all of them; you pick the two or three you expect to use first and accept the latency on the rest.
What about per-session interrupts, why does that matter for orchestration?
Because the natural mistake is to wire one cancel button to a global interrupt flag. The moment you have two agents running, that breaks: cancelling agent A also cancels agent B mid-thought. The right shape is what we ended up with: an interrupt(sessionKey:) function that flips a flag in the sessionInterrupted dictionary for one session and leaves the others alone. Same idea applies to timeouts; they have to be per-session generations, not global.
Where does Fazm fit in this picture?
Fazm is a desktop AI agent for macOS, not an orchestration framework, but it is built on the same plumbing this guide is talking about: an ACP bridge that drives one or more Claude Code sessions, with per-session queues, interrupts, and a pre-warming pattern. The reason we wrote this page is that the costs above are not theoretical for us, they are line items in the source. Most of the public writing on multi-agent Claude Code skips them. If you are about to build an orchestration layer of your own, the per-session bookkeeping is the part that will surprise you.