AI Agent Memory: Cross-Session Persistence and Shared Context
Every time you start a new session with an AI agent, it has no idea who you are. Your preferences, your project context, the decisions you made yesterday, the workflow you spent 30 minutes teaching it last week - all gone. The agent starts from zero, and you start from frustration. This is the memory problem, and it is the single biggest barrier between current AI agents and ones that feel genuinely useful over time. This guide covers how persistent memory systems work, what the tradeoffs are between different architectures, and how shared identifiers let agents maintain stable references across sessions and even across platforms.
1. The Memory Problem - Why Agents Forget Everything
Most AI agents are stateless by design. A large language model receives a prompt, generates a response, and discards its working state. The next request starts with a blank slate unless the application layer explicitly reconstructs context by stuffing previous conversation history into the prompt window. This works within a single session, but it breaks down across sessions for several reasons.
Context windows have finite size. Even with models supporting 100k or 200k tokens, you cannot fit weeks of interaction history into a single prompt. The cost of sending that much context on every request would be prohibitive anyway. And the information density is terrible - most of what happened three sessions ago is irrelevant to what the user needs right now.
The deeper issue is that raw conversation logs are not memory. Human memory is not a transcript of everything that happened. It is a structured, prioritized, associative system that surfaces the right information at the right time. An agent that simply replays old conversations is doing the equivalent of reading its own diary before every interaction - slow, unfocused, and missing the point.
The core tension
Agents need to remember enough to be useful without remembering so much that they become slow, expensive, or confused. The challenge is not storage - it is retrieval, relevance, and knowing what to forget.
Consider a desktop automation agent that helps you manage files, draft emails, and navigate applications. In session one, you teach it that you prefer a specific folder structure for client projects. In session two, you show it how you like email replies formatted. In session three, you ask it to set up a new client project. Without persistent memory, it has no idea about your folder preferences or email style. You have to re-teach everything, and that re-teaching cost compounds with every session.
2. Types of Agent Memory
Cognitive science offers a useful framework for categorizing the kinds of memory an agent needs. Not all memories are the same, and different types require different storage and retrieval strategies.
Episodic Memory
Episodic memory records specific events and interactions. "The user asked me to reorganize their Downloads folder on Tuesday and preferred sorting by project rather than by date." This is the most straightforward type to implement - you can store timestamped summaries of what happened. The challenge is retrieval: when is a past episode relevant to the current task? A naive approach retrieves episodes by recency, but the most relevant episode might be from three weeks ago. Semantic search over episode summaries works better, though it requires embedding infrastructure.
Semantic Memory
Semantic memory captures facts and knowledge extracted from episodes. "The user prefers dark mode in all applications. Their main project is called Atlas. They use Notion for documentation and Linear for task tracking." This is distilled, de-duplicated knowledge that does not reference specific interactions. It is more compact and more universally useful than raw episodes. Building semantic memory requires an extraction step where the agent identifies durable facts from transient interactions and updates its knowledge base accordingly.
Procedural Memory
Procedural memory encodes how to do things. "To deploy the staging environment, first run the build script in the project root, then SSH into the staging server, pull the latest build artifact, and restart the service." This is the most valuable type for automation agents because it captures workflows that the user has demonstrated or described. Procedural memories are essentially learned skills that the agent can replay with variations.
Why all three types matter
An agent with only episodic memory can recall what happened but struggles to generalize. One with only semantic memory knows facts but cannot recall the context in which it learned them. One with only procedural memory can execute workflows but cannot adapt when circumstances change. Effective agents need all three, with different storage backends optimized for each.
3. Cross-Session Persistence Approaches
Making memory survive across sessions requires writing it to durable storage and having a retrieval mechanism that loads the right memories at the right time. There are several distinct approaches, each with meaningful tradeoffs.
File-Based Persistence
The simplest approach is writing memory to local files - JSON, YAML, Markdown, or plain text. A memory file might live at~/.agent/memory.jsonand contain structured entries with timestamps, categories, and content. At session start, the agent reads the file and injects relevant entries into its context. This is easy to implement, easy to inspect, and easy to debug. The user can open the memory file and see exactly what the agent remembers. The downsides are limited scalability (the file grows without bound), no semantic search (you can only filter by exact fields), and no built-in deduplication.
Vector Database Persistence
Vector databases like ChromaDB, Pinecone, or Qdrant store memories as embeddings - high-dimensional vectors that capture semantic meaning. When the agent needs to recall something, it embeds the current query and retrieves the most semantically similar memories. This solves the retrieval problem elegantly: you do not need to know the exact keywords or categories, just describe what you need and the closest matches come back. The tradeoffs are added infrastructure complexity, embedding cost, and opacity - you cannot easily browse or edit vector memories by hand.
Graph-Based Persistence
Knowledge graphs represent memories as entities and relationships. The user is a node, their projects are nodes, the tools they use are nodes, and edges encode relationships like "uses," "prefers," "owns," and "depends on." This structure excels at answering relational queries: "What tools does the user use for the Atlas project?" Graph databases like Neo4j make traversal queries fast, and the structure naturally prevents duplication because entities are unique nodes. The downside is complexity - building and maintaining a knowledge graph requires entity extraction, relationship classification, and conflict resolution when new information contradicts existing edges.
Hybrid Approaches
In practice, the most effective systems combine multiple approaches. A local file stores the agent's core configuration and user preferences (semantic memory). A vector store holds episodic memories for semantic retrieval. And a lightweight graph or structured index captures entity relationships. The agent decides which store to query based on the type of information it needs - facts come from the config file, relevant past interactions come from the vector store, and relational context comes from the graph.
5. Memory Architectures Compared
The following table compares the major approaches to agent memory persistence. No single approach dominates - the right choice depends on the agent's complexity, the user's privacy requirements, and how much infrastructure you are willing to manage.
| Approach | Best For | Retrieval | Scalability | Tradeoffs |
|---|---|---|---|---|
| File-based (JSON/YAML) | Config, preferences, small memory sets | Exact match, field filtering | Low (100s of entries) | Simple and inspectable, but no semantic search. Manual deduplication needed. |
| Vector DB (ChromaDB, etc.) | Episodic memory, similarity search | Semantic similarity via embeddings | High (millions of entries) | Excellent retrieval quality, but opaque storage. Requires embedding model and infra. |
| Graph DB (Neo4j, etc.) | Entity relationships, relational queries | Graph traversal, pattern matching | High (complex relationships) | Strong at relational reasoning, but complex to build and maintain. Entity extraction is hard. |
| Hybrid (file + vector + graph) | Full-featured agents with diverse needs | Multi-strategy, query-type dependent | High (each store handles its type) | Best overall capability, but highest complexity. Requires routing logic to pick the right store. |
| Local-first with screen context | Desktop agents, ambient awareness | Structured local files + live screen state | Medium (local disk bound) | Privacy-preserving and low-latency. No cloud dependency. Limited by local compute for search. |
A few patterns emerge from comparing these approaches. First, file-based memory is almost always part of the solution, even when other stores handle the bulk of memories. A local config file that stores the agent's core understanding of the user - their name, their role, their key projects, their tool preferences - is fast to load, easy to edit, and does not need semantic search. This forms the "always loaded" base layer of memory.
Second, the choice between vector and graph storage depends on whether the agent's primary challenge is finding relevant memories (favoring vectors) or reasoning about relationships (favoring graphs). For most current desktop agents, vector search over episodic memories is the higher-value addition because the primary failure mode is not retrieving relevant past interactions.
Third, local-first architectures have a significant advantage for desktop agents specifically. When the agent runs on the user's machine, memory storage is local by default. There is no network latency on retrieval, no cloud storage cost, and no privacy concern about sending interaction history to a remote server. The agent can combine persisted memory with live screen context - what the user is looking at right now - to make decisions that are both informed by history and grounded in current state.
6. Practical Implementation
Building persistent memory into an agent is not a single feature - it is an architectural decision that touches every part of the system. Here are the practical considerations that matter most.
Memory Write Strategy
Agents need to decide what to remember and when. Writing every interaction to memory creates noise. Writing nothing means starting over each session. The middle ground is having the agent evaluate each interaction for "memory-worthy" information - new user preferences, new procedures learned, corrections to existing knowledge, new entity relationships. This evaluation can happen at the end of each session (batch) or after each significant interaction (streaming). Batch is simpler but loses information if a session crashes. Streaming is more resilient but adds latency.
Memory Read Strategy
At session start, the agent needs to load relevant context without blowing up its context window. A practical approach is tiered loading: always load the core config (semantic memory), load episodic memories only when they match the current task (lazy retrieval), and load procedural memories only when a known workflow is triggered. This keeps the baseline context small and grows it on demand.
Conflict Resolution
Memory conflicts are inevitable. The user told the agent they prefer Slack for communication three months ago, but last week they started using Discord for a new project. Does the agent update the global preference, or create a context-dependent rule? The safest approach is to treat newer information as higher priority within the same scope but allow scope-specific overrides. "Prefers Slack" remains the default, but "uses Discord for Project X" takes precedence when the user is working on Project X.
Memory Decay and Forgetting
An agent that never forgets will eventually drown in irrelevant memories. Implementing decay - reducing the retrieval weight of memories that have not been accessed or reinforced - keeps the memory store focused. Episodic memories that are never recalled should eventually be archived or deleted. Semantic memories should be periodically reviewed and pruned. Procedural memories should be updated when workflows change rather than accumulating stale versions.
Real-world example - accessibility targeting
A desktop automation agent needs to click a "Send" button in a mail client. The first time, it locates the button by traversing the accessibility tree and records a logical key: mail:compose:send-button. On subsequent sessions, it resolves the logical key against the live accessibility tree. If the app updates and the button moves, the resolution logic searches by role and label rather than position. The logical key persists; the resolution adapts.
Ambient Awareness and Screen Context
The most capable persistent memory systems do not just store past interactions - they combine stored memory with live context. A local-first agent that monitors the screen through accessibility APIs can merge what it knows (from memory) with what it sees (from the current screen state) to make decisions that are both historically informed and contextually grounded.
For example, Fazm uses local memory files combined with real-time screen context to maintain awareness across sessions. When a new session starts, it loads the user's stored preferences and project context, then immediately reads the current screen state to understand what the user is working on right now. This combination of persistent memory and ambient awareness means the agent does not just remember who you are - it understands what you are doing and can act accordingly without requiring a detailed prompt.
The key insight is that memory and perception are complementary. Memory without perception means the agent knows your preferences but not your current state. Perception without memory means the agent sees your screen but has no history to contextualize it. The combination is what makes an agent feel like it actually knows you and your work.
Try an agent that remembers
Fazm combines persistent local memory with real-time screen awareness. Your preferences, workflows, and project context survive across every session.
Get started with Fazm