The Hidden Token Cost of MCP Tools in Cursor and How to Fix It
The Hidden Token Cost of MCP Tools in Cursor and How to Fix It
Every MCP tool you register has a schema definition. That schema gets sent with every request to the LLM so it knows what tools are available. 31 Atlassian tools burn 2-3k tokens before you even ask a question. A large enterprise MCP server can burn through your entire context budget before you type a single word.
The Numbers Are Worse Than You Think
Research on MCP token costs shows that each tool definition costs 550-1,400 tokens for its name, description, JSON schema, field descriptions, enums, and system instructions. At the low end, 31 tools costs around 17,000 tokens. At the high end, it is over 40,000 tokens.
For comparison, Claude's default context in Cursor is 200,000 tokens. A large enterprise MCP server with 400 tools - which is not unusual for an organization that has deployed integrations for Jira, Confluence, GitHub, Slack, Salesforce, and a few internal tools - exceeds 200,000 tokens just in tool definitions. You cannot send a single message before hitting the limit.
The math nobody does upfront: if you make 100 requests per day and each one includes 4,000 tokens of tool definitions, that is 400,000 tokens per day spent describing what the LLM could do rather than what it should do.
Why MCP Servers Tend Toward Explosion
The MCP server design pattern encourages granularity. An Atlassian integration does not give you one "manage Jira" tool - it gives you create_issue, update_issue, delete_issue, search_issues, get_issue, list_projects, add_comment, get_comment, list_comments, assign_issue, transition_issue, and so on. Each with a full parameter schema that documents every optional field.
This is the right design for reliability and discoverability. But it creates a combinatorial explosion when you stack multiple integrations. Your actual user prompt might be 200 tokens. Your tool definitions might be 20,000 tokens. The ratio of "what you want to do" to "what you could possibly do" is 1:100.
What Happens When the LLM Has Too Many Tools
Research published in 2024 found that LLM decision-making degrades measurably when presented with more than 20-25 tools simultaneously. Accuracy drops, the model starts making incorrect tool selections, and latency increases because the model spends more inference cycles reasoning through the option space.
This is separate from the token cost problem. Even if you have unlimited tokens, drowning the model in irrelevant tool definitions hurts quality. When Claude or GPT-4 sees 400 tool definitions, it cannot reliably distinguish "the right tool for this specific task" from "a plausible-looking tool that is subtly wrong."
The 85-100x Fix: Progressive Disclosure
The most effective solution is progressive disclosure - only expose tools that are relevant to the current task rather than all tools all the time.
Production implementations report 85-100x reductions in token usage while maintaining or improving tool selection accuracy. Research on dynamic toolsets shows up to 160x token reduction compared to static toolsets with 100% success rates.
There are three practical approaches:
1. Dynamic toolsets by task type
Split your MCP server into logical groups. When you are working on code, expose only development tools. When you are managing tickets, expose only Jira tools. Cursor supports multiple MCP server configurations that you can enable or disable per project.
// .cursor/mcp.json - project-specific config
{
"mcpServers": {
"jira-minimal": {
"command": "npx",
"args": ["@atlassian/mcp", "--tools", "create_issue,update_issue,search_issues"],
"env": {}
}
}
}
2. CLI wrappers instead of granular tools
A single "run shell command" tool with access to the Jira CLI costs far fewer tokens than 31 individual tool definitions. If your team already uses the Jira CLI, a shell execution tool lets the LLM construct and run jira commands without loading any Atlassian-specific schemas.
The tradeoff is less structured output. The LLM has to know the CLI syntax and parse unstructured output. For most developer workflows, this is fine - developers read CLI output all the time.
3. Semantic tool routing
Some frameworks support a two-phase approach: a lightweight "router" model first selects which tools are relevant based on the user's query, then loads only those tool definitions for the main inference call. This keeps the baseline token cost near zero and only pays the schema cost for tools that are actually needed.
Measuring Your Current Token Overhead
Before optimizing, measure the actual cost. Cursor's debug view shows token counts per request. Alternatively, check the MCP server logs:
# Count tools in a running MCP server
npx @modelcontextprotocol/inspector --server your-server-command 2>&1 | grep '"tools"' | python3 -c "
import sys, json
data = json.load(sys.stdin)
tools = data.get('tools', [])
print(f'{len(tools)} tools registered')
"
For each tool, the rough token estimate is: len(json.dumps(tool_schema)) / 4. Sum across all tools to get your baseline overhead.
The Right Mental Model
Before adding an MCP server, ask: "How many of these tools will I actually use in a typical session?" If the answer is 3-5 out of 30, you are paying 10x the token cost you need to. A CLI wrapper or a focused custom tool exposing only those 5 operations will perform better and cost less.
The MCP ecosystem defaults to "expose everything." Production users should default to "expose what you need."
Fazm is an open source macOS AI agent. Open source on GitHub.