Tokens Used Loading MCP Tools - Measuring and Reducing the Overhead
Tokens Used Loading MCP Tools - Measuring and Reducing the Overhead
Every time your AI coding tool starts a conversation, it loads tool schemas into the context window. With MCP servers, those schemas add up fast. 31 tools can eat 3,000 to 5,000 tokens before a single user message is processed.
That is real money and real context window space consumed by tool definitions your agent might never use in that conversation.
Measuring the Actual Cost
You can measure tool token overhead by comparing the token count of an empty conversation with and without MCP servers enabled. The difference is your tool loading cost. In Cursor with three MCP servers providing 31 tools, the overhead was consistently 3,200 to 4,800 tokens per conversation.
For a developer running 50 conversations a day, that is 160,000 to 240,000 tokens per day spent on tool schemas alone. At current API pricing, that adds up.
Where the Tokens Go
Each tool schema includes the function name, description, parameter definitions, and type annotations. Verbose descriptions are the biggest offender. A single tool with a detailed description and five parameters can use 200 or more tokens.
The worst cases are tools with nested object parameters, long enums, or example values baked into the schema. These are helpful for the model but expensive in tokens.
Reducing the Overhead
Several strategies work:
- Trim descriptions - Keep tool descriptions under 50 words. The model usually gets it with less.
- Lazy loading - Only register tools when the conversation topic suggests they will be needed.
- Schema compression - Remove optional parameter descriptions and rely on parameter names being self-explanatory.
- Server consolidation - Merge related MCP servers to reduce duplicate type definitions.
- Dynamic tool sets - Use an orchestrator that selects relevant tools based on the user's first message.
The Native Advantage
Desktop agents that use native APIs like the macOS Accessibility API do not pay this token tax. The agent accesses UI elements directly through system calls rather than loading tool schemas into the context window. Fewer tools, lower overhead, more room for actual work.
Fazm is an open source macOS AI agent. Open source on GitHub.