Data Quality vs Data Volume for AI Agent Memories: Why Fewer High-Quality Memories Win
Data Quality vs Data Volume for AI Agent Memories: Why Fewer High-Quality Memories Win
We ran into the classic data quality vs data volume problem while building memory for our AI agent. The agent extracts user memories from browser history - sites visited, articles read, products browsed, searches made. The naive approach is to store everything. That is exactly what we tried first, and it made the agent worse.
The Volume Trap
With 30 days of browser history, you get thousands of entries. Most of them are noise - quick Google searches, social media scrolling, ads clicked by accident. Dumping all of this into the agent's memory context means it spends most of its reasoning budget processing irrelevant data.
The agent started making bizarre associations. It would recommend restaurants based on a single search the user did for a friend. It would reference articles the user had opened but never actually read. More data was making the agent less useful, not more.
The Quality Filter
We switched to an extraction pipeline that scores each memory on three dimensions - intentionality (did the user deliberately engage with this?), recurrence (does this topic come up repeatedly?), and recency (is this still relevant?). Only memories that pass a threshold on all three dimensions get stored.
A user who searches for "best running shoes" once gets no memory stored. A user who visits three running shoe review sites over two weeks gets a memory tagged "interested in running shoes." The difference is intent signal strength.
Practical Memory Architecture
Our memory layer now stores roughly 50 to 100 high-confidence memories per user instead of thousands of raw data points. Each memory has a confidence score that decays over time - similar to an Ebbinghaus forgetting curve. Memories that get reinforced by new behavior stay strong. Memories without reinforcement fade.
This gives the agent a focused, accurate picture of the user instead of a noisy dump of everything they have ever browsed. The result is better personalization with far less context window usage.
Quality over quantity is not just a platitude - it is an engineering requirement for agent memory systems.
Fazm is an open source macOS AI agent. Open source on GitHub.