AI Agent Memory - The Unsolved Problem of What to Remember vs What to Forget

Fazm Team··2 min read

AI Agent Memory - The Unsolved Problem of What to Remember vs What to Forget

Most conversations about AI agent memory focus on storage. Knowledge graphs, vector databases, embedding retrieval. These are solved problems. The unsolved problem is deciding what to remember and what to let decay.

The Unit of Knowledge

People think of memory as storing facts. "The user's name is Matt." "The project deadline is Friday." But the useful unit of knowledge is not a fact - it is a decision with context.

"The user chose React over Vue for this project because the team already knows React" is a decision. It contains the what (React), the why (team familiarity), and the context (this specific project). Six months later, for a different project with a different team, this memory should decay in relevance. The fact that the user once chose React should not bias every future recommendation.

The Decay Problem

This is where things get hard. An agent that remembers everything will eventually drown in irrelevant context. An agent that forgets too aggressively will repeat mistakes and ask questions it should already know the answers to.

The approaches people try - time-based decay, access-frequency weighting, explicit user feedback - all have failure modes. Time-based decay forgets important but infrequent knowledge. Frequency weighting over-indexes on recent obsessions. User feedback requires the user to manage the agent's memory manually, which defeats the purpose.

Task-Relevant Retrieval

The most promising direction is task-relevant retrieval. Instead of decaying memories globally, the agent evaluates which memories are relevant to the current task and loads only those. A memory about React preferences is highly relevant when choosing a framework and irrelevant when drafting an email.

This requires the agent to understand the semantic relationship between its current task and its stored memories - which is itself an LLM inference step. You are spending tokens to decide which memories to load before spending tokens on the actual task.

For local agents like Fazm, at least the retrieval stays on-device. The graph lives locally, the relevance scoring runs locally, and no memories ever leave your machine.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts