Agent Architecture Deep Dive

AI Agent Memory: Cross-Session Persistence and Shared Context

Every time you start a new session with an AI agent, it has no idea who you are. Your preferences, your project context, the decisions you made yesterday, the workflow you spent 30 minutes teaching it last week - all gone. The agent starts from zero, and you start from frustration. This is the memory problem, and it is the single biggest barrier between current AI agents and ones that feel genuinely useful over time. This guide covers how persistent memory systems work, what the tradeoffs are between different architectures, and how shared identifiers let agents maintain stable references across sessions and even across platforms.

1. The Memory Problem - Why Agents Forget Everything

Most AI agents are stateless by design. A large language model receives a prompt, generates a response, and discards its working state. The next request starts with a blank slate unless the application layer explicitly reconstructs context by stuffing previous conversation history into the prompt window. This works within a single session, but it breaks down across sessions for several reasons.

Context windows have finite size. Even with models supporting 100k or 200k tokens, you cannot fit weeks of interaction history into a single prompt. The cost of sending that much context on every request would be prohibitive anyway. And the information density is terrible - most of what happened three sessions ago is irrelevant to what the user needs right now.

The deeper issue is that raw conversation logs are not memory. Human memory is not a transcript of everything that happened. It is a structured, prioritized, associative system that surfaces the right information at the right time. An agent that simply replays old conversations is doing the equivalent of reading its own diary before every interaction - slow, unfocused, and missing the point.

The core tension

Agents need to remember enough to be useful without remembering so much that they become slow, expensive, or confused. The challenge is not storage - it is retrieval, relevance, and knowing what to forget.

Consider a desktop automation agent that helps you manage files, draft emails, and navigate applications. In session one, you teach it that you prefer a specific folder structure for client projects. In session two, you show it how you like email replies formatted. In session three, you ask it to set up a new client project. Without persistent memory, it has no idea about your folder preferences or email style. You have to re-teach everything, and that re-teaching cost compounds with every session.

2. Types of Agent Memory

Cognitive science offers a useful framework for categorizing the kinds of memory an agent needs. Not all memories are the same, and different types require different storage and retrieval strategies.

Episodic Memory

Episodic memory records specific events and interactions. "The user asked me to reorganize their Downloads folder on Tuesday and preferred sorting by project rather than by date." This is the most straightforward type to implement - you can store timestamped summaries of what happened. The challenge is retrieval: when is a past episode relevant to the current task? A naive approach retrieves episodes by recency, but the most relevant episode might be from three weeks ago. Semantic search over episode summaries works better, though it requires embedding infrastructure.

Semantic Memory

Semantic memory captures facts and knowledge extracted from episodes. "The user prefers dark mode in all applications. Their main project is called Atlas. They use Notion for documentation and Linear for task tracking." This is distilled, de-duplicated knowledge that does not reference specific interactions. It is more compact and more universally useful than raw episodes. Building semantic memory requires an extraction step where the agent identifies durable facts from transient interactions and updates its knowledge base accordingly.

Procedural Memory

Procedural memory encodes how to do things. "To deploy the staging environment, first run the build script in the project root, then SSH into the staging server, pull the latest build artifact, and restart the service." This is the most valuable type for automation agents because it captures workflows that the user has demonstrated or described. Procedural memories are essentially learned skills that the agent can replay with variations.

Why all three types matter

An agent with only episodic memory can recall what happened but struggles to generalize. One with only semantic memory knows facts but cannot recall the context in which it learned them. One with only procedural memory can execute workflows but cannot adapt when circumstances change. Effective agents need all three, with different storage backends optimized for each.

3. Cross-Session Persistence Approaches

Making memory survive across sessions requires writing it to durable storage and having a retrieval mechanism that loads the right memories at the right time. There are several distinct approaches, each with meaningful tradeoffs.

File-Based Persistence

The simplest approach is writing memory to local files - JSON, YAML, Markdown, or plain text. A memory file might live at~/.agent/memory.jsonand contain structured entries with timestamps, categories, and content. At session start, the agent reads the file and injects relevant entries into its context. This is easy to implement, easy to inspect, and easy to debug. The user can open the memory file and see exactly what the agent remembers. The downsides are limited scalability (the file grows without bound), no semantic search (you can only filter by exact fields), and no built-in deduplication.

Vector Database Persistence

Vector databases like ChromaDB, Pinecone, or Qdrant store memories as embeddings - high-dimensional vectors that capture semantic meaning. When the agent needs to recall something, it embeds the current query and retrieves the most semantically similar memories. This solves the retrieval problem elegantly: you do not need to know the exact keywords or categories, just describe what you need and the closest matches come back. The tradeoffs are added infrastructure complexity, embedding cost, and opacity - you cannot easily browse or edit vector memories by hand.

Graph-Based Persistence

Knowledge graphs represent memories as entities and relationships. The user is a node, their projects are nodes, the tools they use are nodes, and edges encode relationships like "uses," "prefers," "owns," and "depends on." This structure excels at answering relational queries: "What tools does the user use for the Atlas project?" Graph databases like Neo4j make traversal queries fast, and the structure naturally prevents duplication because entities are unique nodes. The downside is complexity - building and maintaining a knowledge graph requires entity extraction, relationship classification, and conflict resolution when new information contradicts existing edges.

Hybrid Approaches

In practice, the most effective systems combine multiple approaches. A local file stores the agent's core configuration and user preferences (semantic memory). A vector store holds episodic memories for semantic retrieval. And a lightweight graph or structured index captures entity relationships. The agent decides which store to query based on the type of information it needs - facts come from the config file, relevant past interactions come from the vector store, and relational context comes from the graph.

4. Shared Identifiers and Stable References

One of the hardest problems in persistent agent memory is maintaining stable references to things in the world. When an agent remembers "the user's main project," what exactly does that refer to? If the agent interacts with the project through a file path, that path might change when the user reorganizes their drive. If it interacts through a window title, that title changes when the document is renamed. If it interacts through a UI element, that element might move or be restructured when the application updates.

This is where the concept of a logical key becomes essential. A logical key is a stable identifier that maps to a real-world entity regardless of how that entity is represented in different contexts. Think of it as a shared reference that multiple systems can resolve independently.

The logical key pattern

A logical key like app:safari:bookmarks-bar resolves to a specific accessibility element in Safari, regardless of window position, tab arrangement, or OS version. The key is stable across sessions; the resolution happens at runtime.

Consider the problem of accessibility element targeting. A desktop automation agent that clicks buttons, reads text fields, and navigates menus needs to identify UI elements reliably across app restarts. Raw accessibility tree paths are brittle - they break when the app updates its layout, when the user changes their display settings, or when localization changes element labels. A stable reference system assigns logical keys to elements based on their semantic role (what the element does) rather than their structural position (where the element sits in the tree). The agent remembers compose-button rather than window[0]/toolbar[2]/button[3], and the resolution layer figures out which actual element matches at runtime.

The same pattern applies to cross-platform scenarios. If an agent operates across a macOS desktop, a web browser, and a mobile companion app, it needs identifiers that work in all three contexts. A user's "inbox" might be a Mail.app window on desktop, a Gmail tab in the browser, and a notification panel on mobile. A shared identifier system maps the logical concept "user's primary inbox" to the correct concrete target in each environment.

Workflow memory benefits enormously from stable references. When an agent learns a procedure like "move new client emails to the appropriate project folder," every noun in that sentence needs a stable reference - "client emails" maps to a mail filter or label, "project folder" maps to a directory path or a project management tool. If any of those references break, the entire workflow breaks. Logical keys with runtime resolution make workflows portable across sessions and even across machines.

5. Memory Architectures Compared

The following table compares the major approaches to agent memory persistence. No single approach dominates - the right choice depends on the agent's complexity, the user's privacy requirements, and how much infrastructure you are willing to manage.

Approach	Best For	Retrieval	Scalability	Tradeoffs
File-based (JSON/YAML)	Config, preferences, small memory sets	Exact match, field filtering	Low (100s of entries)	Simple and inspectable, but no semantic search. Manual deduplication needed.
Vector DB (ChromaDB, etc.)	Episodic memory, similarity search	Semantic similarity via embeddings	High (millions of entries)	Excellent retrieval quality, but opaque storage. Requires embedding model and infra.
Graph DB (Neo4j, etc.)	Entity relationships, relational queries	Graph traversal, pattern matching	High (complex relationships)	Strong at relational reasoning, but complex to build and maintain. Entity extraction is hard.
Hybrid (file + vector + graph)	Full-featured agents with diverse needs	Multi-strategy, query-type dependent	High (each store handles its type)	Best overall capability, but highest complexity. Requires routing logic to pick the right store.
Local-first with screen context	Desktop agents, ambient awareness	Structured local files + live screen state	Medium (local disk bound)	Privacy-preserving and low-latency. No cloud dependency. Limited by local compute for search.

A few patterns emerge from comparing these approaches. First, file-based memory is almost always part of the solution, even when other stores handle the bulk of memories. A local config file that stores the agent's core understanding of the user - their name, their role, their key projects, their tool preferences - is fast to load, easy to edit, and does not need semantic search. This forms the "always loaded" base layer of memory.

Second, the choice between vector and graph storage depends on whether the agent's primary challenge is finding relevant memories (favoring vectors) or reasoning about relationships (favoring graphs). For most current desktop agents, vector search over episodic memories is the higher-value addition because the primary failure mode is not retrieving relevant past interactions.

Third, local-first architectures have a significant advantage for desktop agents specifically. When the agent runs on the user's machine, memory storage is local by default. There is no network latency on retrieval, no cloud storage cost, and no privacy concern about sending interaction history to a remote server. The agent can combine persisted memory with live screen context - what the user is looking at right now - to make decisions that are both informed by history and grounded in current state.

6. Practical Implementation

Building persistent memory into an agent is not a single feature - it is an architectural decision that touches every part of the system. Here are the practical considerations that matter most.

Memory Write Strategy

Agents need to decide what to remember and when. Writing every interaction to memory creates noise. Writing nothing means starting over each session. The middle ground is having the agent evaluate each interaction for "memory-worthy" information - new user preferences, new procedures learned, corrections to existing knowledge, new entity relationships. This evaluation can happen at the end of each session (batch) or after each significant interaction (streaming). Batch is simpler but loses information if a session crashes. Streaming is more resilient but adds latency.

Memory Read Strategy

At session start, the agent needs to load relevant context without blowing up its context window. A practical approach is tiered loading: always load the core config (semantic memory), load episodic memories only when they match the current task (lazy retrieval), and load procedural memories only when a known workflow is triggered. This keeps the baseline context small and grows it on demand.

Conflict Resolution

Memory conflicts are inevitable. The user told the agent they prefer Slack for communication three months ago, but last week they started using Discord for a new project. Does the agent update the global preference, or create a context-dependent rule? The safest approach is to treat newer information as higher priority within the same scope but allow scope-specific overrides. "Prefers Slack" remains the default, but "uses Discord for Project X" takes precedence when the user is working on Project X.

Memory Decay and Forgetting

An agent that never forgets will eventually drown in irrelevant memories. Implementing decay - reducing the retrieval weight of memories that have not been accessed or reinforced - keeps the memory store focused. Episodic memories that are never recalled should eventually be archived or deleted. Semantic memories should be periodically reviewed and pruned. Procedural memories should be updated when workflows change rather than accumulating stale versions.

Real-world example - accessibility targeting

A desktop automation agent needs to click a "Send" button in a mail client. The first time, it locates the button by traversing the accessibility tree and records a logical key: mail:compose:send-button. On subsequent sessions, it resolves the logical key against the live accessibility tree. If the app updates and the button moves, the resolution logic searches by role and label rather than position. The logical key persists; the resolution adapts.

Ambient Awareness and Screen Context

The most capable persistent memory systems do not just store past interactions - they combine stored memory with live context. A local-first agent that monitors the screen through accessibility APIs can merge what it knows (from memory) with what it sees (from the current screen state) to make decisions that are both historically informed and contextually grounded.

For example, Fazm uses local memory files combined with real-time screen context to maintain awareness across sessions. When a new session starts, it loads the user's stored preferences and project context, then immediately reads the current screen state to understand what the user is working on right now. This combination of persistent memory and ambient awareness means the agent does not just remember who you are - it understands what you are doing and can act accordingly without requiring a detailed prompt.

The key insight is that memory and perception are complementary. Memory without perception means the agent knows your preferences but not your current state. Perception without memory means the agent sees your screen but has no history to contextualize it. The combination is what makes an agent feel like it actually knows you and your work.

Try an agent that remembers

Fazm combines persistent local memory with real-time screen awareness. Your preferences, workflows, and project context survive across every session.

Get started with Fazm