Tiered Memory for Desktop Agents - Plain Text First, Vector Search for Long-Term

Matthew Diakonov·March 18, 2026·2 min read

memory rag embeddings desktop-agent vector-search ai_agents

If agentic AIs use memory stored as embedded tokens, how does it work like RAG? This question comes up a lot, and the answer from building desktop agents is simpler than most people expect.

The Two-Tier Approach

In practice, agent memory works best as two separate systems rather than one unified approach.

Tier 1 - Recent context is just plain text. The last few tasks the agent performed, the current state of what it is working on, user preferences it has observed - all stored as structured text files. No embeddings, no vector database, no semantic search. The agent reads this context directly into its prompt.

Tier 2 - Long-term recall uses vector search. When the agent needs to remember something from weeks or months ago - a specific workflow it performed, a user preference it learned, a tricky UI pattern it figured out - it queries a vector store with the current context and retrieves relevant memories.

Why Not Use RAG for Everything?

Because RAG adds latency and complexity for information the agent accesses constantly. Your agent checks recent context on every single action. Making that a vector lookup instead of a file read adds 200-500ms per action for no benefit. The recent context is small enough to fit in the prompt directly.

Why Not Use Plain Text for Everything?

Because plain text does not scale. After a few months of operation, an agent accumulates thousands of memories. Searching through all of them as raw text is slow and hits context window limits. Vector search lets the agent find the relevant 5-10 memories out of thousands.

The Practical Split

Keep the last 24 hours of activity as plain text. Everything older gets embedded and stored in a vector database. When the agent needs historical context, it searches semantically. This gives you fast access to recent information and smart retrieval for everything else.

This post was inspired by a discussion on r/AI_Agents by u/jinnyjuice.

Fazm is an open source macOS AI agent. Open source on GitHub.

Tiered Memory for Desktop Agents - Plain Text First, Vector Search for Long-Term

The Two-Tier Approach

Why Not Use RAG for Everything?

Why Not Use Plain Text for Everything?

The Practical Split

More on This Topic

Related Posts

Is RAG Dead? Bigger Context Windows Shift the Use Cases

Desktop Agents Can Control Apps but Lack the WHY - Cross-Channel Context Matters

Embeddings vs Tokens - How AI Agent Memory Actually Works

Comments ()

The Two-Tier Approach

Why Not Use RAG for Everything?

Why Not Use Plain Text for Everything?

The Practical Split

More on This Topic

Related Posts

Is RAG Dead? Bigger Context Windows Shift the Use Cases

Desktop Agents Can Control Apps but Lack the WHY - Cross-Channel Context Matters

Embeddings vs Tokens - How AI Agent Memory Actually Works

Comments (••)

Comments ()