Why Belief Extraction Beats Flat RAG for AI Agent Memory
Why Belief Extraction Beats Flat RAG for Agent Memory
Most AI agent memory systems use flat RAG - embed everything, retrieve by similarity. It works fine for the first 50 conversations. By the time you hit hundreds, retrieval quality degrades badly. The real question is what compression strategy to use, and belief extraction is the most practical answer.
Where Flat RAG Breaks Down
With flat RAG, every conversation gets chunked, embedded, and stored. As the corpus grows, similar but contradictory information piles up. The user said they prefer tabs in January and spaces in March. Both embeddings are similar. Both get retrieved. The agent has no way to resolve the conflict.
Semantic search returns what's similar, not what's current or correct. That distinction matters enormously for an agent that needs to act on its memory.
Layered Compression
A better architecture compresses memory into layers - raw episodes at the bottom, extracted beliefs in the middle, and identity-level patterns at the top. Each layer is progressively more abstract and more stable.
Episodes capture what happened. Beliefs capture what the agent learned. Identity captures who the user is and what they consistently value. When making decisions, the agent checks beliefs first (fast, opinionated) and falls back to episodes only when beliefs don't cover the situation.
Belief Extraction in Practice
The extraction step runs periodically - after every N conversations or on a schedule. It reads recent episodes and updates the belief set. "User prefers one-time purchases over subscriptions." "User works primarily in Swift." "User runs multiple agents in parallel."
These beliefs are small, searchable, and directly actionable. An agent can load its full belief set in a few hundred tokens. No embedding lookup needed for the hot path.
Desktop Agent Memory
For desktop agents specifically, this architecture maps well to structured markdown files organized by category - preferences, workflows, tool usage patterns. The agent loads relevant categories at startup rather than querying a vector store for every action.
Fazm is an open source macOS AI agent. Open source on GitHub.