Saving 10M Tokens (89%) on Claude Code with a CLI Proxy That Truncates Output

Matthew Diakonov

Updated March 19, 2026

claude-code token-optimization cli-proxy cost-reduction context-window

Saving 10M Tokens (89%) on Claude Code with a CLI Proxy That Truncates Output

Here is a problem most Claude Code users do not realize they have. When Claude runs a command like npm test or cargo build and the output is 5,000 lines, all of those lines enter the context window immediately. Claude then tries to be smart about it - it runs tail -n 50 to focus on the relevant part. But the damage is already done. Those 5,000 lines of raw output are already consuming tokens in the conversation context.

The Hidden Token Drain

Consider what happens during a typical development session:

npm install - 200 lines of dependency resolution output
npm run build - 500 lines of webpack/vite output
npm test - 2,000 lines of test results
git log - variable but often hundreds of lines
find . -name "*.ts" - potentially thousands of file paths

Each of these dumps raw text into the context. Even if Claude ignores most of it, the tokens are counted and billed. Over a session with dozens of commands, this adds up to millions of wasted tokens.

The CLI Proxy Solution

A CLI proxy sits between Claude Code and the shell. Before command output reaches the context window, the proxy:

Captures the full output
Truncates it to the last N lines (typically 50-100)
Adds a header noting how many lines were truncated
Passes only the truncated version to Claude's context

If Claude needs to see more, it can explicitly request the full output. But for 90% of commands, the last 50 lines contain everything relevant - the build errors, the test failures, the final status.

The Results

On a real codebase with daily agent sessions:

Before proxy: average 11.2M input tokens per day
After proxy: average 1.2M input tokens per day
Savings: 10M tokens (89% reduction)

The output quality did not degrade. Claude was already only looking at the tail of most command output - the proxy just prevents the rest from entering context in the first place.

Implementation

The proxy is straightforward - a shell wrapper that intercepts command execution, captures stdout/stderr, truncates, and returns. You can configure different limits for different commands (more lines for test output, fewer for install logs).

The lesson: do not let your AI agent pay for context it never reads.

Fazm is an open source macOS AI agent. Open source on GitHub.

Saving 10M Tokens (89%) on Claude Code with a CLI Proxy That Truncates Output

Saving 10M Tokens (89%) on Claude Code with a CLI Proxy That Truncates Output

The Hidden Token Drain

The CLI Proxy Solution

The Results

Implementation

More on This Topic

Related Posts

Claude Code Context Limit - When to Compact, Clear, and Optimize Token Usage

120K Tokens Per Task Is Too Expensive - Token Optimization for Browser Automation

Hitting Claude's Context Limit Mid-Build and How CLAUDE.md Fixes It