I Rarely Use Planning Mode Anymore - Context Windows Are Big Enough
I Rarely Use Planning Mode Anymore
Planning mode was essential when context windows were 8K tokens. You needed the model to think through its approach before writing code because it could not hold the entire problem in context at once. A wrong start wasted most of your context budget and left no room to course-correct.
That constraint is mostly gone. Claude Sonnet 4.6 and Opus 4.6 operate at 200K and 1M token context windows respectively. The model can see your entire codebase, hold every relevant file in context simultaneously, and self-correct mid-implementation without running out of room.
Explicit planning mode now often just adds a round trip that produces a plan you confirm before work begins - work that the model would have done better by doing it.
Why Planning Mode Made Sense at 8K Tokens
With small context windows, the model faced a real resource allocation problem. It had to be strategic about what to read and in what order. Planning mode forced it to:
- Map the problem before touching any files
- Identify which files to read first to understand the architecture
- Commit to an approach before using up context on implementation
- Flag ambiguities before they became expensive mistakes
This was necessary discipline. A model that started implementing before it understood the system would hit the context limit mid-way through, with half the codebase unread and a partial implementation already in place.
What Changes at 200K-1M Tokens
The Claude context window documentation puts the practical ceiling at around 167K tokens of usable context in a 200K window (compaction happens around 83% utilization). At Claude Opus 4.6's 1M token window, you can load enormous codebases wholesale.
A medium-sized production application - say 50,000 lines of TypeScript - compresses to roughly 200-300K tokens including comments and whitespace. That fits in a single context load. The model can read every relevant file before touching anything, understand the full architecture, and plan implicitly as it works rather than explicitly before it starts.
The planning is still happening. It is just happening inside the implementation pass rather than in a separate step. Claude can:
- Read every relevant file in the codebase before writing a line
- Hold the complete problem context simultaneously
- Try an approach, recognize it does not fit the existing patterns, and pivot - all within one context
- Catch the downstream implications of a change before committing to it
Asking it to plan explicitly first adds friction without adding quality.
The Accuracy Argument for Extended Thinking
There is a counterargument worth taking seriously. Anthropic's data on Claude 3.5 Sonnet's extended thinking mode shows GPQA Diamond accuracy improving from 78.2% to 84.8% when extended thinking is enabled. The "society of thought" research from 2025 shows that frontier reasoning models improve not just by thinking longer but by simulating internal debate between different cognitive perspectives.
This matters for genuinely hard problems. A tricky database migration involving multiple interdependent schema changes, a performance optimization across a complex distributed system, or a security fix where the wrong approach creates new vulnerabilities - these benefit from explicit deliberation before action.
The distinction is not planning mode vs. no planning. It is: does this problem require deliberate reasoning about approach before implementation, or can the model work through the approach naturally as it reads the code?
Most bug fixes, feature additions, and refactors do not require explicit pre-planning with a 200K context window. The model reads the code, understands the problem, and implements the fix. The approach emerges from understanding the codebase rather than from upfront strategizing.
When I Still Use Planning Mode
Three situations genuinely benefit from explicit planning before implementation:
Multi-session continuity. A task spanning multiple days needs a documented plan because you will start new context windows for each session. The plan becomes the shared memory that lets you resume without re-reading everything. This has nothing to do with the model's capability - it is about human coordination across time.
Multi-agent coordination. When multiple Claude instances are working in parallel on different parts of a codebase, they need an agreed-upon interface contract before they start. Otherwise they build toward incompatible assumptions. Planning mode produces the document that coordinates parallel work.
High-stakes irreversible changes. Database migrations, production deployments, changes to authentication or billing flows. When a wrong approach costs hours of rollback work, the friction of explicit planning is worth it. The extra time spent planning is cheap compared to the cost of a wrong approach that touches production data.
Novel architectures. If you are building something genuinely new - a system design you have not built before with unfamiliar dependencies - planning forces you to reason through the unknowns before they become implementation problems. The model benefits here even with a large context window because it is not pattern-matching against familiar structures.
The Practical Workflow
For most tasks, I just open the file and start a conversation:
"In src/api/auth.ts, the token refresh logic is not handling network errors.
The user gets logged out instead of retrying. Fix it."
Claude reads the auth file, traces the token refresh path, reads whatever it needs from the network layer, and implements the fix. It might ask one clarifying question. It does not need a plan.
For bigger changes, I describe the scope and let the model decide whether it needs to plan:
"We need to add rate limiting to all API endpoints.
The approach should be consistent and should not break existing tests."
With a 200K context, Claude reads all the endpoint definitions, checks the test structure, and typically starts implementing rather than producing a plan document. The plan is implicit in the implementation sequence.
For the few cases that genuinely need upfront planning - the multi-day migrations, the multi-agent coordination - I use planning mode explicitly and get value from it.
The context window is not a replacement for thinking. It is a replacement for the ceremony of thinking out loud before you are allowed to start.
Fazm is an open source macOS AI agent. Open source on GitHub.