Tiered Memory for Desktop Agents - Plain Text First, Vector Search for Long-Term
Tiered Memory for Desktop Agents - Plain Text First, Vector Search for Long-Term
If agentic AIs use memory stored as embedded tokens, how does it work like RAG? This question comes up a lot, and the answer from building desktop agents is simpler than most people expect.
The Two-Tier Approach
In practice, agent memory works best as two separate systems rather than one unified approach.
Tier 1 - Recent context is just plain text. The last few tasks the agent performed, the current state of what it is working on, user preferences it has observed - all stored as structured text files. No embeddings, no vector database, no semantic search. The agent reads this context directly into its prompt.
Tier 2 - Long-term recall uses vector search. When the agent needs to remember something from weeks or months ago - a specific workflow it performed, a user preference it learned, a tricky UI pattern it figured out - it queries a vector store with the current context and retrieves relevant memories.
Why Not Use RAG for Everything?
Because RAG adds latency and complexity for information the agent accesses constantly. Your agent checks recent context on every single action. Making that a vector lookup instead of a file read adds 200-500ms per action for no benefit. The recent context is small enough to fit in the prompt directly.
Why Not Use Plain Text for Everything?
Because plain text does not scale. After a few months of operation, an agent accumulates thousands of memories. Searching through all of them as raw text is slow and hits context window limits. Vector search lets the agent find the relevant 5-10 memories out of thousands.
The Practical Split
Keep the last 24 hours of activity as plain text. Everything older gets embedded and stored in a vector database. When the agent needs historical context, it searches semantically. This gives you fast access to recent information and smart retrieval for everything else.
Fazm is an open source macOS AI agent. Open source on GitHub.