Embeddings vs Tokens - How AI Agent Memory Actually Works

Matthew Diakonov·March 17, 2026·2 min read

embeddings tokens agent-memory vector-search ai-fundamentals

There's a persistent confusion between tokens and embeddings. They're fundamentally different things, and understanding the difference matters if you're building or using AI agents with memory.

Tokens are the raw units of text that language models process. A word like "automation" might be one token or might get split into sub-word pieces. Tokens are discrete, countable, and directly map back to text.

Embeddings are something else entirely. They're dense numerical vectors - arrays of hundreds or thousands of floating point numbers - that capture the semantic meaning of text. You can't decode an embedding back into the original text. That's not what they're for.

Why This Matters for Agent Memory

When an AI agent needs to remember past interactions, it doesn't store raw conversation text and search through it keyword-by-keyword. Instead, each piece of information gets converted into an embedding and stored in a vector database.

When the agent needs to recall something relevant, the current query gets embedded too. Then it's a similarity search - find the stored vectors closest to the query vector in high-dimensional space. "Closest" here means semantically similar, not string-matching similar.

This is why an agent can find a relevant memory about "deploying to production" even when the original conversation used the phrase "pushing to prod." The embeddings for both phrases land in nearby regions of the vector space.

The Practical Tradeoff

Embedding-based retrieval is powerful but not free. You pay for the embedding API call, the vector storage, and the retrieval latency. For a desktop agent running locally, this means you need to choose carefully what gets embedded and stored versus what gets discarded after a session ends.

The agents that feel smart and contextual are the ones with good memory retrieval. The ones that feel forgetful are usually the ones doing naive text matching or not persisting memory at all.

Fazm is an open source macOS AI agent. Open source on GitHub.

Embeddings vs Tokens - How AI Agent Memory Actually Works

Why This Matters for Agent Memory

The Practical Tradeoff

Related Posts

Tiered Memory for Desktop Agents - Plain Text First, Vector Search for Long-Term

Claude Pro vs API Cost Comparison: Actual Numbers, Breakeven Math, and When to Switch

Compound Knowledge Across 100+ Sessions: 10% Signal, 90% Noise

Comments ()

Why This Matters for Agent Memory

The Practical Tradeoff

Related Posts

Tiered Memory for Desktop Agents - Plain Text First, Vector Search for Long-Term

Claude Pro vs API Cost Comparison: Actual Numbers, Breakeven Math, and When to Switch

Compound Knowledge Across 100+ Sessions: 10% Signal, 90% Noise

Comments (••)

Comments ()