Sub-Agents Spawn Overhead - Batching Tasks in Multi-Agent Systems
Sub-Agents Spawn Overhead - Why Batching Beats One Agent Per Task
The intuitive approach to multi-agent systems is clean delegation: one task, one agent. Parse this file - spawn an agent. Check that API - spawn another agent. Update the database - spawn a third. It feels organized. It is also incredibly wasteful.
The Cost of Spawning
Every sub-agent spawn carries fixed overhead. Context initialization, system prompt loading, tool registration, and the first round of reasoning all consume tokens and time before the agent does any useful work. For a typical Claude-based sub-agent, this initialization cost is 2,000 to 5,000 tokens - regardless of how simple the actual task is.
If you spawn 20 sub-agents for 20 small tasks, you pay 40,000 to 100,000 tokens in initialization overhead alone. Batch those same tasks into 4 agents handling 5 tasks each, and the overhead drops to 8,000 to 20,000 tokens. Same work, 80% less overhead.
When Batching Makes Sense
Batch tasks when they share context. Five file operations in the same directory should go to one agent that loads the directory context once. Three API calls to the same service should go to one agent that handles authentication once. Ten data transformations on related records should go to one agent that understands the schema once.
The grouping principle is shared context. Tasks that need the same background knowledge, the same tool access, or the same file system context belong in the same batch.
When Individual Agents Make Sense
Keep agents separate when tasks are truly independent and long-running. A CI pipeline check and a documentation update have no shared context - batching them saves nothing. Tasks that might fail and need isolation also deserve their own agents, so one failure does not block unrelated work.
Practical Batching Strategy
- Group by context - tasks that need the same files, APIs, or domain knowledge go together
- Cap batch size - more than 7 to 10 tasks per agent risks context overflow and quality degradation
- Separate by risk - high-risk tasks (production changes) get their own agent for isolation
- Measure overhead - track initialization tokens vs productive tokens to find your optimal batch size
The goal is not maximum parallelism. It is maximum useful work per token spent.
Fazm is an open source macOS AI agent. Open source on GitHub.