Size Queen Energy - Does 1M Context Actually Work?
Size Queen Energy - Does 1M Context Actually Work?
Every model release touts a bigger context window. 128K. 200K. 1M tokens. The marketing implies you should dump your entire codebase in and let the model figure it out. In practice, nobody does this because it does not work the way you expect.
The Myth of Full Context
A 1M token context window does not mean the model has equally good attention across all million tokens. Attention degrades. Information in the middle gets lost more than information at the beginning or end. The model might technically accept 1M tokens but its effective reasoning capacity over that much text is significantly lower than over 10K well-chosen tokens.
Files on Demand Is the Real Pattern
The developers getting the best results from large context models are not stuffing everything in upfront. They are using the context window as working memory and loading files on demand:
- Start with a high-level map of the codebase (small)
- Load the specific files relevant to the current task
- As the task evolves, swap files in and out
- Keep only the active working set in context
This pattern uses maybe 50-100K tokens effectively rather than trying to use 1M poorly.
What This Means for Agents
For AI agents, the implication is clear: smart context management beats raw context size. An agent that intelligently decides which files to read, which context to keep, and which to discard will outperform one that tries to load everything.
Key practices:
- Semantic search over the codebase rather than full inclusion
- Context budgeting - track how much context each source uses and prioritize
- Progressive loading - start broad, narrow down as the task becomes clear
- Active forgetting - explicitly remove context that is no longer relevant
The 1M context window is a safety net, not a strategy. Use it to have room for the right files, not to hold all files.
Fazm is an open source macOS AI agent. Open source on GitHub.