The 1M Context Trap: Why More Context Makes Claude Lazier
The 1M Context Trap: Why More Context Makes Claude Lazier
The 1M token context window sounds like a dream. Fit your entire codebase in context. No more worrying about what to include. Just dump everything and let the model figure it out.
In practice, it is a trap. The more context you provide, the lazier the model gets.
The Research Is Clear
Chroma published a study in 2025 testing 18 frontier models - including GPT-4.1, Claude Opus 4, and Gemini 2.5. Every single model exhibited measurable quality degradation as context length increased. This is not a Claude-specific problem. It is a property of transformer attention architecture.
The mechanism: transformer attention is quadratic. 100,000 tokens means 10 billion pairwise relationships the model must manage. As context grows, the model's attention gets distributed more thinly across more tokens. Important details buried in the middle get less attention weight than the same details in a focused context.
This is called "context rot" - the measurable degradation in LLM output quality as input context length increases. It is not a capability gap that future training will solve. It is architectural.
The Lost-in-the-Middle Effect
Stanford/TACL research from 2024 found that models attend well to the beginning and end of their context window but poorly to the middle. The same information placed at position 50% of a 100,000-token context gets 30%+ less accurate recall than the same information placed at position 10%.
This is a fundamental problem for "dump the whole codebase" workflows. The constraints you wrote in CLAUDE.md at the top of context get attention. The critical function buried in file 47 of a 200-file codebase does not. The model sees it - the token count says it is there - but attention weight is thin and recall is unreliable.
Additional 2025 research found that smaller gold contexts (the specific relevant information you actually need) consistently degrade model performance when surrounded by large volumes of irrelevant context. More context does not dilute bad information - it also dilutes good information.
The Developer Arc
Developers who discover the 1M context window tend to go through a predictable sequence:
- Excitement - "I can fit everything in context!"
- Experimentation - dump the whole codebase and ask for changes
- Disappointment - output is vague, misses edge cases, ignores stated constraints
- Retreat - go back to curating context carefully
The disappointment phase is predictable. The model is not "less smart" with more context. It is diluted. The signal-to-noise ratio in the context drops, and model output quality tracks that ratio.
A focused 30,000-token context with exactly the right files consistently outperforms a 300,000-token context with everything.
Why the Sweet Spot Is 20k-100k Tokens
This range is not arbitrary. It reflects two real constraints:
Below 20,000 tokens: You risk missing genuinely relevant context. The model cannot make good architectural decisions if it cannot see the existing architecture. It cannot catch naming conflicts if it cannot see existing names.
Above 100,000 tokens: Context rot effects become significant. The lost-in-the-middle effect means constraints and details stated in the middle of context have unreliable recall. The ratio of relevant to irrelevant tokens degrades past the point of diminishing returns.
Between 20,000 and 100,000 tokens of well-curated context, you get the full picture without dilution.
Practical Strategies That Work
Scope agents to specific files, not whole codebases.
Instead of providing all 200 files, identify the 5-10 files the task actually touches. For a bug fix in the authentication layer, include the auth module, the user model, the relevant tests, and the top-level CLAUDE.md. Leave out the payment module, the UI components, and the build configuration.
# Give Claude exactly what it needs for an auth bug fix
claude --print "Fix the JWT expiration bug" \
--context src/auth/jwt.ts \
--context src/auth/middleware.ts \
--context src/models/user.ts \
--context tests/auth/*.test.ts \
--context CLAUDE.md
Put constraints at the top, where they get attention.
The beginning of context gets the most attention weight. Your CLAUDE.md, your style rules, your constraints - these should be first, not buried after 50,000 tokens of codebase. The model reads left-to-right in attention space; front-loading important constraints improves compliance.
Let the agent pull context on demand.
Instead of pre-loading everything that might be relevant, give the agent tools to search and read files. grep and file reading let the agent find what it needs based on the actual task. This keeps the baseline context small and fills in specifics as they become relevant.
# Agent workflow: start small, expand based on need
initial_context = ["CLAUDE.md", "README.md", "package.json"]
# Agent uses tools to expand context based on what it actually needs:
# - grep for relevant functions
# - read specific files the task touches
# - search for test patterns to understand expected behavior
Split large tasks into smaller ones.
A task that touches 8 modules should probably be 8 separate agent sessions, each with a focused context for its module. The sessions can share state through git commits or a shared notes file, without each one needing access to all 8 modules' worth of context simultaneously.
The Counterintuitive Principle
The best context strategy is not "more" - it is "less but better."
Every token of context you add is an investment. Some of those tokens pay off - they contain information the model needs to do good work. Most of them do not - they contain information that is tangentially related but not actually needed for the specific task.
The 1M context window is a capability, not a recommendation. It exists for cases where you genuinely need that much context - analyzing an entire codebase for a security audit, synthesizing a large body of research. For routine development tasks, it is a trap that encourages lazy context curation and produces diluted output.
Curate aggressively. The model will thank you with better results.
Fazm is an open source macOS AI agent. Open source on GitHub.