Saving 10M Tokens (89%) on Claude Code with a CLI Proxy That Truncates Output

Fazm Team··3 min read

Saving 10M Tokens (89%) on Claude Code with a CLI Proxy That Truncates Output

Here is a problem most Claude Code users do not realize they have. When Claude runs a command like npm test or cargo build and the output is 5,000 lines, all of those lines enter the context window immediately. Claude then tries to be smart about it - it runs tail -n 50 to focus on the relevant part. But the damage is already done. Those 5,000 lines of raw output are already consuming tokens in the conversation context.

The Hidden Token Drain

Consider what happens during a typical development session:

  • npm install - 200 lines of dependency resolution output
  • npm run build - 500 lines of webpack/vite output
  • npm test - 2,000 lines of test results
  • git log - variable but often hundreds of lines
  • find . -name "*.ts" - potentially thousands of file paths

Each of these dumps raw text into the context. Even if Claude ignores most of it, the tokens are counted and billed. Over a session with dozens of commands, this adds up to millions of wasted tokens.

The CLI Proxy Solution

A CLI proxy sits between Claude Code and the shell. Before command output reaches the context window, the proxy:

  1. Captures the full output
  2. Truncates it to the last N lines (typically 50-100)
  3. Adds a header noting how many lines were truncated
  4. Passes only the truncated version to Claude's context

If Claude needs to see more, it can explicitly request the full output. But for 90% of commands, the last 50 lines contain everything relevant - the build errors, the test failures, the final status.

The Results

On a real codebase with daily agent sessions:

  • Before proxy: average 11.2M input tokens per day
  • After proxy: average 1.2M input tokens per day
  • Savings: 10M tokens (89% reduction)

The output quality did not degrade. Claude was already only looking at the tail of most command output - the proxy just prevents the rest from entering context in the first place.

Implementation

The proxy is straightforward - a shell wrapper that intercepts command execution, captures stdout/stderr, truncates, and returns. You can configure different limits for different commands (more lines for test output, fewer for install logs).

The lesson: do not let your AI agent pay for context it never reads.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts