MCP Tool Responses Are the Biggest Context Hog - How to Compress Them
MCP Tool Responses Are the Biggest Context Hog - How to Compress Them
You are halfway through a complex task with your AI agent and suddenly hit the context limit. You check the token usage and discover that 80% of your context window is filled with MCP tool responses you never even read.
This is the dirty secret of MCP-based workflows. Every tool call returns a full response that stays in the conversation context. A single accessibility tree traversal can return 50,000+ tokens of nested UI element descriptions. Three traversals and you have burned through most of a 200k context window on raw data.
Where the Tokens Go
The biggest offenders are tools that return structured data about the current state of something - accessibility trees, DOM snapshots, file listings, API responses with nested objects. Each one is useful for the agent to process, but the raw data does not need to persist in context after the agent has extracted what it needs.
A macOS accessibility tree for a moderately complex app window can easily be 20,000-50,000 tokens. Browser DOM snapshots are even worse. And these grow linearly with each interaction - every click triggers a new snapshot.
Compression Strategies
The most effective approach is filtering at the source. Instead of returning the entire accessibility tree, return only the elements that match a relevance filter. If the agent is looking for a button to click, it does not need every static text label on the screen.
Another approach is response compaction - summarizing MCP responses into structured data before they enter the conversation context. Instead of the full tree, store a list of actionable elements with their references.
You can also use output modes that skip embedding images and large binary data directly in the response. Saving screenshots to files instead of base64-encoding them into the conversation saves hundreds of thousands of tokens per session.
The Real Fix
The underlying problem is that MCP was designed for single-turn interactions, not long multi-step workflows. A proper solution would let the agent query past tool results without keeping them in the active context window. Until then, aggressive compression is the best defense.
- MCP Server Management Fewer Is Better
- Accessibility Tree Context Window Overflow
- Saved 10M Tokens Claude Code CLI Proxy Output
Fazm is an open source macOS AI agent. Open source on GitHub.