100M Tokens Tracked: 99.4% Were Input and Parallel Agents Make It Worse
100M Tokens Tracked: 99.4% Were Input and Parallel Agents Make It Worse
After tracking 100 million tokens across parallel Claude Code sessions building a macOS desktop app, the numbers tell a clear story: 99.4% of all tokens were input. The model reads far more than it writes. And running 5 agents in parallel does not multiply your productivity by 5x - it multiplies your input token bill by 5x.
This post breaks down exactly where those tokens go, why the ratio is so skewed, and what you can do about it with concrete CLAUDE.md patterns, prompt caching strategies, and context architecture changes that cut our monthly spend by over 40%.
Where 100M tokens actually went
Here is the raw breakdown from our tracking across ~3 weeks of active development with Claude Code (Sonnet 4.6 at $3/$15 per MTok):
| Category | Tokens | % of Total |
|---|---|---|
| Input tokens (total) | 99,400,000 | 99.4% |
| Output tokens (total) | 600,000 | 0.6% |
| Total | 100,000,000 | 100% |
Breaking down the input tokens further:
| Input breakdown | Tokens (approx) | % of Input |
|---|---|---|
| System prompt + tool definitions | 14,200,000 | 14.3% |
| CLAUDE.md (all levels) | 8,700,000 | 8.8% |
| File reads (grep, cat, glob results) | 52,100,000 | 52.4% |
| Conversation history (prior turns) | 21,300,000 | 21.4% |
| MCP server responses | 3,100,000 | 3.1% |
The single biggest cost driver is file reads. Every time an agent runs grep or reads a source file, the entire content goes into the context window as input tokens. The system prompt and tool definitions are a fixed ~14k token overhead on every single API call. And CLAUDE.md loads on every call too - it survives compaction, meaning you pay for it repeatedly throughout a session.
Why the ratio is so extreme in agentic workflows
A 99.4% input ratio sounds absurd until you understand how coding agents work under the hood. Each "turn" in a Claude Code session looks like this:
- System prompt loads (~14-17k tokens): Claude Code's instructions, all tool definitions, your CLAUDE.md files
- Conversation history replays (variable): every prior message in the session, including all file contents previously read
- Agent reads files (variable): the agent calls grep, reads files, scans directory trees
- Agent writes output (small): a code edit, a terminal command, a short response
Steps 1-3 are all input tokens. Step 4 is output. The ratio gets worse as sessions get longer because conversation history accumulates. Claude Code triggers auto-compaction at roughly 83% of the 200k context window (~167k tokens), but even after compaction, the system prompt, tools, and CLAUDE.md reload in full.
This matches academic research. The "Tokenomics" paper from January 2026 analyzed token consumption across agentic software engineering tasks and found that input tokens consistently constitute the largest share - averaging 53.9% in their multi-agent framework. Our ratio is even more extreme because Claude Code's system prompt and tool definitions are unusually large (~14k tokens) compared to lighter agent frameworks.
The paper also found that the code review and verification phase - not initial generation - accounts for 59.4% of all token consumption. That lines up with our experience. The agent spends most of its tokens re-reading code it already wrote to verify correctness, not writing new code.
The parallel agent multiplier problem
Running parallel agents is where costs compound. Here is why.
With a single Claude Code agent, the system prompt loads once per API call. With 5 parallel agents (using worktrees or separate terminal sessions), each agent independently loads:
- Its own copy of the system prompt (~14k tokens)
- Its own copy of CLAUDE.md (~2-5k tokens depending on your setup)
- Its own file reads (often overlapping with other agents)
- Its own conversation history
There is no shared context between parallel agents. If Agent A reads src/components/Button.tsx and Agent B needs the same file, Agent B reads it again independently. Every token is paid for twice.
Here is what that looks like in practice over a single day of parallel development:
Single agent (baseline):
System prompt: 14k tokens x ~80 API calls = 1.12M input tokens
File reads: ~3.2M input tokens
History: ~1.8M input tokens
Output: ~40k tokens
Daily total: ~6.2M tokens
Daily cost: ~$18.70 (at Sonnet 4.6 rates)
5 parallel agents:
System prompt: 14k tokens x ~400 API calls = 5.6M input tokens
File reads: ~16M input tokens (significant overlap)
History: ~9M input tokens
Output: ~200k tokens
Daily total: ~31M tokens
Daily cost: ~$93.30
Effective multiplier: 5x agents = 5x cost, NOT 5x productivity
Enterprise teams report subagent costs running 300-500% higher than expected. The reason is straightforward - each parallel agent maintains its own full context window, and there is no deduplication of shared context across agents.
The cost math most people get wrong
Because output tokens cost 5x more per token ($15 vs $3 per MTok for Sonnet 4.6), people assume output is where the money goes. But volume matters more than rate.
Let's do the actual math on our 100M token dataset:
Input cost: 99.4M tokens x ($3 / 1M) = $298.20
Output cost: 0.6M tokens x ($15 / 1M) = $9.00
Total: $307.20
Input as % of cost: 97.1%
Output as % of cost: 2.9%
Even with the 5x price premium on output tokens, input still dominates because the volume difference is so massive. This means every optimization effort should focus on the input side.
For Opus 4.6 ($5/$25 per MTok), the same 100M tokens would cost:
Input cost: 99.4M tokens x ($5 / 1M) = $497.00
Output cost: 0.6M tokens x ($25 / 1M) = $15.00
Total: $512.00
At scale, the model choice matters less than the context architecture. Switching from Opus to Sonnet saves ~40%, but cutting input context by 40% saves ~40% regardless of which model you use.
Strategy 1: CLAUDE.md scoping with per-folder files
The highest-leverage optimization is structuring CLAUDE.md so each agent only reads what it needs. Claude Code loads CLAUDE.md files hierarchically:
project-root/
CLAUDE.md # Global: project-wide conventions (keep under 200 lines)
src/
CLAUDE.md # Frontend-specific context
components/
CLAUDE.md # Component patterns and conventions
api/
CLAUDE.md # API layer specifics
scripts/
CLAUDE.md # Script-specific instructions
Each agent working in src/components/ loads only the root CLAUDE.md plus src/CLAUDE.md plus src/components/CLAUDE.md. An agent working on scripts never pays for the frontend context.
Here is a concrete before/after example:
Before (single bloated CLAUDE.md - 4,200 tokens):
# Project Guide
## Architecture
This is a Next.js 15 app with App Router...
[800 tokens of architecture overview]
## Frontend Conventions
Use Tailwind CSS with the cn() utility...
[600 tokens of component patterns]
## API Conventions
All API routes go in src/app/api/...
[500 tokens of API patterns]
## Database
We use Neon Postgres with Drizzle ORM...
[700 tokens of database patterns]
## Testing
Use Vitest for unit tests...
[400 tokens of testing patterns]
## Deployment
Deploy via Vercel...
[300 tokens of deployment info]
## Scripts
Python scripts in scripts/ folder...
[400 tokens of scripting conventions]
After (scoped - root CLAUDE.md is 1,100 tokens):
Root CLAUDE.md:
# Project Guide
Next.js 15 App Router. TypeScript strict mode.
Tailwind CSS with cn() utility from lib/utils.
Neon Postgres + Drizzle ORM.
Deploy via Vercel. Tests with Vitest.
Never use em dashes. Use hyphens with spaces.
src/components/CLAUDE.md:
# Component Conventions
Use forwardRef for all public components.
Props interface named {ComponentName}Props.
Co-locate styles in same file. No CSS modules.
Use Radix UI primitives for accessible components.
src/app/api/CLAUDE.md:
# API Conventions
All routes export async function handlers.
Use zod for request validation.
Return NextResponse.json() with proper status codes.
Database queries go through src/db/queries/.
The frontend agent now reads ~1,500 tokens of CLAUDE.md instead of 4,200. Over 80 API calls per day, that saves (4,200 - 1,500) x 80 = 216,000 tokens per day per agent. With 5 parallel agents, that is over 1M tokens per day saved on CLAUDE.md alone.
Strategy 2: Prompt caching alignment
Claude's prompt caching is automatic in Claude Code, but you can make it more effective by understanding how it works.
Cache pricing for Sonnet 4.6:
| Operation | Cost per MTok | Relative to base |
|---|---|---|
| Base input | $3.00 | 1x |
| Cache write (5 min TTL) | $3.75 | 1.25x |
| Cache write (1 hour TTL) | $6.00 | 2x |
| Cache read (hit) | $0.30 | 0.1x |
A cache hit costs 10% of the base input price. This means that if the same content appears in consecutive API calls (which it does - system prompt, CLAUDE.md, and tool definitions are identical across calls), the second call onwards pays $0.30/MTok instead of $3.00/MTok.
To maximize cache hits:
- Keep CLAUDE.md stable within sessions. Do not edit it while agents are running. Every edit invalidates the cache.
- Front-load static content. Put stable instructions at the top of CLAUDE.md and volatile content at the bottom. Caching works on prefixes, so if the first 3,000 tokens are identical between calls, those 3,000 tokens get cached even if the rest differs.
- Minimize tool definition churn. If you have MCP servers that dynamically change their tool list, those changes invalidate cache for everything after that point in the prompt.
With effective caching, the real cost of that 14k system prompt drops from $0.042 per call to $0.0042 per call after the first one. Over hundreds of calls per day, that adds up.
Strategy 3: Reduce file read volume
File reads account for 52.4% of our input tokens. Here are specific techniques to reduce them:
Use explicit file boundaries in CLAUDE.md:
# File Boundaries
When working on UI components, only read files in src/components/.
Do not scan src/app/api/ or src/db/ unless the task explicitly requires it.
For styling questions, check tailwind.config.ts first, not individual components.
This prevents the agent from doing exploratory reads across the entire codebase. Without boundaries, agents tend to run broad grep searches that pull in dozens of irrelevant files.
Structure your codebase for agent efficiency:
# Key Reference Files
- src/types/index.ts contains all shared TypeScript types
- src/db/schema.ts contains the complete database schema
- src/lib/constants.ts has all magic numbers and config values
Read these files first before searching the codebase for type definitions,
schema info, or configuration values.
By telling the agent where to look, you avoid the scatter-shot grep pattern where the agent searches 50 files to find a type definition that lives in one canonical location.
Use the /compact command proactively. Do not wait for auto-compaction at 83% capacity. If you have finished a subtask and are starting a new one in the same session, run /compact to summarize the history and free up context for fresh file reads instead of re-reading files that are already in the (now stale) history.
Strategy 4: Agent task decomposition
The way you structure parallel agent tasks directly affects token efficiency. Compare these two approaches:
Inefficient (overlapping context):
Agent 1: "Build the user settings page with API and database"
Agent 2: "Build the notification preferences with API and database"
Agent 3: "Build the billing page with API and database"
Each agent reads the database schema, the API utilities, the auth middleware, and the shared types. Three agents, three copies of the same foundational files.
Efficient (domain isolation):
Agent 1 (DB): "Create database migrations and query functions for
settings, notifications, and billing"
Agent 2 (API): "Build API routes for settings, notifications, and
billing using the query functions in src/db/queries/"
Agent 3 (UI): "Build the settings, notifications, and billing pages
using the API routes in src/app/api/"
Now each agent focuses on one layer. The DB agent reads schema files. The API agent reads query files and route patterns. The UI agent reads component patterns and API types. Minimal overlap.
This is the domain-based routing pattern. Add it to your root CLAUDE.md:
# Parallel Agent Routing
When spawning parallel agents, split by architectural layer:
- Database agent: src/db/, drizzle.config.ts
- API agent: src/app/api/, src/lib/server/
- UI agent: src/components/, src/app/(pages)/
Never assign a single agent work that spans all three layers.
Strategy 5: Worktree-based isolation
Since Claude Code shipped native git worktree support in February 2026, you can run parallel agents in isolated working directories. Each worktree gets its own branch and file system, which means agents cannot accidentally read each other's in-progress changes.
But the token benefit is subtler. With worktrees, each agent's file reads reflect a clean state. Without worktrees (multiple agents on the same branch), agents sometimes re-read files that another agent just modified, leading to confusion, re-reads, and wasted tokens on conflict resolution.
In our tracking, worktree-isolated agents used 15-20% fewer input tokens per task compared to agents sharing a single working directory. The reduction came from fewer "wait, this file changed" re-read cycles.
Putting it all together: the 40% reduction
After implementing all five strategies over two weeks, here is our before/after comparison:
| Metric | Before | After | Change |
|---|---|---|---|
| Daily input tokens (5 agents) | 30.8M | 17.9M | -42% |
| Daily output tokens (5 agents) | 195k | 198k | +1.5% |
| Daily cost (Sonnet 4.6) | $93.20 | $54.30 | -42% |
| Monthly cost (22 work days) | $2,050 | $1,195 | -$855/mo |
| Effective cache hit rate | ~45% | ~72% | +27pp |
The output token count barely changed. The agents write roughly the same amount of code. The entire savings came from reducing what they read.
The biggest single win was CLAUDE.md scoping combined with explicit file boundaries, which reduced unnecessary file reads by roughly 35%. Prompt caching alignment added another 10-15% on top of that. Worktree isolation contributed the remaining 5-10%.
What this means for your team
If you are running parallel Claude Code agents, check your token breakdown with the /cost command. If your input ratio is above 95% (and it probably is), your optimization leverage is entirely on the input side.
Start with these three changes today:
- Split your CLAUDE.md into per-folder files. Move domain-specific instructions out of the root file. This takes 30 minutes and pays for itself within a day.
- Add explicit file boundaries. Tell agents where to look and where not to look. This prevents expensive exploratory file reads.
- Decompose parallel tasks by layer, not by feature. Each agent should operate in one part of the codebase, not span the entire stack.
The goal is not to starve agents of context. It is to give each agent exactly the context it needs and nothing more. When 99.4% of your tokens are input, precision in context delivery is not a nice-to-have - it is the main lever you have for controlling costs.
- AI Agent Development Costs with Parallel Agents
- LLM Costs Monthly Breakdown for Agents
- Claude Subscription vs API Pricing Comparison
- 200k Context vs 1M: Aggressive Context Clearing
- Agents Overload Context: Separate Shared Log
Fazm is an open source macOS AI agent. Open source on GitHub.