MCP Tool Responses Are the Biggest Context Hog - How to Compress Them

Matthew Diakonov

Updated March 19, 2026

mcp context-window accessibility-api optimization token-management claudecode

MCP Tool Responses Are the Biggest Context Hog - How to Compress Them

You are halfway through a complex task with your AI agent and suddenly hit the context limit. You check the token usage and discover that 80% of your context window is filled with MCP tool responses you never even read.

This is the dirty secret of MCP-based workflows. Every tool call returns a full response that stays in the conversation context. A single accessibility tree traversal can return 50,000+ tokens of nested UI element descriptions. Three traversals and you have burned through most of a 200k context window on raw data.

Where the Tokens Go

The biggest offenders are tools that return structured data about the current state of something - accessibility trees, DOM snapshots, file listings, API responses with nested objects. Each one is useful for the agent to process, but the raw data does not need to persist in context after the agent has extracted what it needs.

A macOS accessibility tree for a moderately complex app window can easily be 20,000-50,000 tokens. Browser DOM snapshots are even worse. And these grow linearly with each interaction - every click triggers a new snapshot.

Compression Strategies

The most effective approach is filtering at the source. Instead of returning the entire accessibility tree, return only the elements that match a relevance filter. If the agent is looking for a button to click, it does not need every static text label on the screen.

Another approach is response compaction - summarizing MCP responses into structured data before they enter the conversation context. Instead of the full tree, store a list of actionable elements with their references.

You can also use output modes that skip embedding images and large binary data directly in the response. Saving screenshots to files instead of base64-encoding them into the conversation saves hundreds of thousands of tokens per session.

The Real Fix

The underlying problem is that MCP was designed for single-turn interactions, not long multi-step workflows. A proper solution would let the agent query past tool results without keeping them in the active context window. Until then, aggressive compression is the best defense.

MCP Tool Responses Are the Biggest Context Hog - How to Compress Them

MCP Tool Responses Are the Biggest Context Hog - How to Compress Them

Where the Tokens Go

Compression Strategies

The Real Fix

More on This Topic

Related Posts

MCP Server Context Window Bloat and Why You Need a Toggle

Agents Can Overload Their Own Context - Use Separate Context with Shared Log

Beyond Apple Music MCP - Using Accessibility APIs to Control Any macOS App