Why AI Agents Re-Plan From Scratch Every Turn - The Thinking Token Problem

Fazm Team··2 min read

The Thinking Token Problem in AI Agents

There is a fundamental issue with how AI agents maintain context across conversation turns: thinking tokens are not preserved. Only the visible output from each turn carries forward. This means that every time the conversation continues, the agent is essentially re-planning from scratch.

What Gets Lost

When an LLM processes a complex task, it uses "thinking tokens" - internal reasoning that happens before the visible response. This thinking might include evaluating multiple approaches, considering edge cases, and building a mental model of the problem. In models with extended thinking, this can be thousands of tokens of careful analysis.

But when the next turn arrives, all of that thinking is gone. The model only sees its previous visible output - the code it wrote, the explanation it gave, the plan it stated. The nuanced reasoning behind those decisions vanishes.

Why This Matters for Agents

For a desktop agent executing a multi-step workflow, this creates a specific failure mode. The agent might plan a five-step operation with careful reasoning about dependencies and ordering. It executes step one perfectly. Then on step two, it has lost the reasoning about why step one was done that way. If something unexpected happens, it cannot reason from its original analysis - it has to reconstruct the logic from the visible artifacts.

Partial Solutions

Explicit reasoning in output helps. When an agent writes out its thinking as part of the visible response, that reasoning persists into the next turn. The trade-off is context window usage - those reasoning tokens take up space that could be used for other information.

Structured state management is another approach. Instead of relying on the conversation history to carry state, the agent writes its plan and current status to an external store that gets injected into each turn. This is what well-designed desktop agents do - maintain explicit state rather than relying on implicit conversational context.

The thinking token problem is not a bug. It is an architectural constraint that agent designers need to build around explicitly.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts