Three Layers of Agent Memory - Working, Session, and Long-Term

Matthew Diakonov

Updated March 30, 2026

ai-memory working-memory session-memory long-term-memory agent-architecture

Three Layers of Agent Memory

Agent memory is not one thing. Treating it as a single flat file leads to predictable failures: short-term task notes crowd out important long-term facts, permanent user preferences get deleted during cleanup, or the agent starts every session having forgotten things it learned months ago.

The fix is separating memory into three distinct layers with different retention policies, different relevance characteristics, and different management strategies.

Why Three Layers?

The three-layer model maps to how the information actually differs. Working memory is high-relevance, temporary, and fully automatic - the model manages it. Session memory is medium-relevance, semi-persistent, and structured - you or the agent generates it at session end. Long-term memory is variable-relevance, persistent, and manually curated - it changes rarely but applies broadly.

Mixing these into a single store causes problems in both directions. Important long-term facts get treated as temporary and pruned. Temporary task notes get promoted to permanent storage and never cleaned up. Separating the layers makes the right policy obvious for each piece of information.

Layer 1: Working Memory

Working memory is the context window - the current task, recent tool outputs, conversation history for this session. Everything here directly applies to what the agent is doing right now, which is why it is the most valuable memory despite being the most temporary.

Properties: Ephemeral (ends with the session), highest relevance, limited by model context window, automatically managed.

The practical constraint is size. Current context windows range from 32K to 200K tokens, which sounds large but fills quickly with tool outputs, intermediate reasoning, and file contents. The key management decision is what to include at session start - loading too much wastes the window on low-relevance context, loading too little forces the agent to rediscover information it already knew.

Layer 2: Session Summaries

When a session ends, the key decisions, outcomes, and open questions get compressed into a summary. The next session loads these summaries to understand recent history without replaying entire conversations.

Properties: Semi-persistent (days to weeks), medium-relevance, generated at session end, stored as structured notes.

A session summary format that works in practice:

## Session 2026-03-18

**Accomplished:**
- Implemented token refresh with TaskGate pattern to prevent reentrancy
- Fixed coordinate misalignment bug in UI automation (switched to center-point)

**Decisions made:**
- Using worktrees for agent isolation rather than branches alone
- Chose Whisper medium model over tiny for technical terminology accuracy

**Attempted but failed:**
- Tried shared-file approach for cross-agent context - caused race conditions
- Attempted to use SwiftUI accessibility hierarchy in electron app - not exposed

**Open:**
- Need to benchmark TaskGate overhead vs. baseline actor performance
- Token rotation schedule not yet implemented

The Mem0 research paper (arxiv 2504.19413) benchmarks this kind of structured session memory against six other approaches and finds it produces 26% higher response accuracy on the LOCOMO benchmark compared to OpenAI's built-in memory. The structure matters - a raw transcript summary is much less useful than a structured extraction of decisions and open items.

Layer 3: Long-Term Facts

These are stable truths about the project, user, and environment. Project architecture decisions, user preferences, API credential locations, known bugs and workarounds, recurring patterns. They change rarely and apply to many future sessions.

Properties: Persistent (months), variable relevance depending on task, manually curated, stored in CLAUDE.md or project notes.

Good candidates for long-term memory:

Architectural constraints that affect many decisions ("we are using SwiftUI, not AppKit")
User preferences that override defaults ("never use em dashes")
Discovered gotchas ("SwiftUI accessibility tree is not exposed in Electron apps")
Recurring context that would otherwise need re-explaining every session

Bad candidates for long-term memory:

Specific task state ("currently working on the login flow") - this belongs in session summaries
Temporary workarounds ("using this API endpoint until the new one ships") - date these explicitly

Promotion and Demotion

The hard design decision is the promotion and demotion criteria. When does a session observation become a long-term fact? When does a long-term fact become stale enough to remove?

Promotion heuristic: if you find yourself explaining the same thing to the agent in three different sessions, it should be a long-term fact. The three-session rule is a good threshold.

Demotion heuristic: if a long-term fact has not been relevant in the past 60 days, or if you know the underlying situation has changed, remove it. Stale long-term facts are actively harmful - the agent uses them with confidence and gets things wrong.

Implementation in Practice

Most developers implement this with three components:

Context window: Automatic. The model manages this. Your job is to be thoughtful about what you load at session start.

Sessions directory: A folder with dated markdown files, one per working session. At session end, write a summary using the structure above. At session start, load the last 2-3 relevant summaries.

CLAUDE.md: The project-level long-term facts file. This is the standard pattern in Claude Code. Keep it under 500 lines - if it is longer, you are storing too much and the relevance drops.

The layers feed each other: working memory generates session summaries; session summaries surface patterns that become long-term facts; long-term facts preload into working memory at session start. A well-maintained three-layer system means the agent starts each session knowing what matters without being overwhelmed by everything that ever happened.

Fazm is an open source macOS AI agent. Open source on GitHub.

Three Layers of Agent Memory - Working, Session, and Long-Term

Three Layers of Agent Memory

Why Three Layers?

Layer 1: Working Memory

Layer 2: Session Summaries

Layer 3: Long-Term Facts

Promotion and Demotion

Implementation in Practice

More on This Topic

Related Posts

I Tracked 530 Working Memory Entries and Found a Retention Curve

Agents Have the Same Capabilities. Identity Is What Makes Them Useful.

Using AI Agents with Persistent Memory at a New Job