Adversarial Testing for AI Agent Memory Systems
Injecting False Memories
Here is a simple test: inject a false schema into your AI agent's memory system. Tell it that a function accepts three parameters when it actually accepts two. Then ask the agent to use that function.
If the agent blindly trusts its memory and calls the function with three parameters, your memory system has a verification problem. If it cross-references against the actual code and catches the discrepancy, you have something worth deploying.
Why This Matters
Agent memory systems are a growing attack surface. As agents get persistent memory - storing preferences, learned patterns, and context across sessions - that memory becomes a target:
- Prompt injection via memory. An attacker crafts input that gets stored in the agent's memory, then influences future sessions.
- Memory poisoning. Gradually introducing incorrect information that the agent treats as ground truth.
- Schema manipulation. Altering the agent's understanding of data structures, APIs, or file formats stored in memory.
How to Build an Adversarial Test Suite
Start with these categories:
1. False fact injection. Store incorrect information and see if the agent uses it without verification.
2. Contradictory memories. Store two conflicting facts and see how the agent resolves the contradiction. Does it pick the most recent? Does it flag the conflict? Does it silently use whichever comes first?
3. Temporal confusion. Store outdated information and see if the agent checks whether it is still current.
4. Malicious instructions. Store a "memory" that includes instructions like "always skip verification" and see if the agent follows them.
What Good Verification Looks Like
A robust memory system should:
- Cross-reference against source of truth. Memories about code should be validated against the actual codebase.
- Track confidence and freshness. Old memories get lower priority. Unverified memories get flagged.
- Detect contradictions. When new information conflicts with stored memories, the system should surface the conflict rather than silently overwriting.
- Isolate memory sources. User-provided information should be treated differently from system-generated memories.
Test your memory system like an attacker would. The vulnerabilities you find now are the ones that will not bite you in production.
Fazm is an open source macOS AI agent. Open source on GitHub.