Saving 10M Tokens (89%) on Claude Code with a CLI Proxy That Truncates Output
Saving 10M Tokens (89%) on Claude Code with a CLI Proxy That Truncates Output
Here is a problem most Claude Code users do not realize they have. When Claude runs a command like npm test or cargo build and the output is 5,000 lines, all of those lines enter the context window immediately. Claude then tries to be smart about it - it runs tail -n 50 to focus on the relevant part. But the damage is already done. Those 5,000 lines of raw output are already consuming tokens in the conversation context.
The Hidden Token Drain
Consider what happens during a typical development session:
npm install- 200 lines of dependency resolution outputnpm run build- 500 lines of webpack/vite outputnpm test- 2,000 lines of test resultsgit log- variable but often hundreds of linesfind . -name "*.ts"- potentially thousands of file paths
Each of these dumps raw text into the context. Even if Claude ignores most of it, the tokens are counted and billed. Over a session with dozens of commands, this adds up to millions of wasted tokens.
The CLI Proxy Solution
A CLI proxy sits between Claude Code and the shell. Before command output reaches the context window, the proxy:
- Captures the full output
- Truncates it to the last N lines (typically 50-100)
- Adds a header noting how many lines were truncated
- Passes only the truncated version to Claude's context
If Claude needs to see more, it can explicitly request the full output. But for 90% of commands, the last 50 lines contain everything relevant - the build errors, the test failures, the final status.
The Results
On a real codebase with daily agent sessions:
- Before proxy: average 11.2M input tokens per day
- After proxy: average 1.2M input tokens per day
- Savings: 10M tokens (89% reduction)
The output quality did not degrade. Claude was already only looking at the tail of most command output - the proxy just prevents the rest from entering context in the first place.
Implementation
The proxy is straightforward - a shell wrapper that intercepts command execution, captures stdout/stderr, truncates, and returns. You can configure different limits for different commands (more lines for test output, fewer for install logs).
The lesson: do not let your AI agent pay for context it never reads.
Fazm is an open source macOS AI agent. Open source on GitHub.