Is RAG Dead? Bigger Context Windows Shift the Use Cases

Matthew Diakonov·March 18, 2026·2 min read

rag context-windows llm embeddings ai-architecture

When context windows were 4K tokens, RAG was essential. You could not fit your knowledge base in context, so you retrieved the relevant chunks and hoped the embedding model found the right ones. Now context windows are 1 million tokens and growing. Does RAG still make sense?

What 1M Tokens Changes

A million tokens is roughly 750,000 words. That is about 15 average-length books. For many use cases - company documentation, codebases, personal knowledge bases - you can simply dump everything into context without retrieval.

No embeddings pipeline. No chunking strategy. No retrieval accuracy concerns. No "the model did not find the relevant chunk" failures. Just raw context with full document structure preserved.

Where RAG Still Wins

RAG is not dead, but its sweet spot is narrowing. It still wins when your knowledge base exceeds context limits (rare but real for large corpora), when you need to search across millions of documents, when cost matters (1M tokens of context is expensive per query), and when you need attribution - knowing exactly which document sourced each answer.

The cost argument is significant. Sending 1M tokens per query costs roughly $3-15 depending on the model. RAG with a targeted retrieval might send 10K tokens for $0.03. For high-volume applications, RAG is still the economical choice.

The Hybrid Approach

The practical answer for most teams is hybrid. Use full-context for small to medium knowledge bases where accuracy matters most. Use RAG for large corpora where cost per query matters. Use both for applications that need to balance breadth of knowledge with query economics.

What This Means for Agents

For desktop AI agents, the shift is significant. An agent with 1M context can hold your entire project in memory - every file, every conversation, every preference. No retrieval layer needed. The agent just knows everything because it has read everything.

Fazm is an open source macOS AI agent. Open source on GitHub.

Is RAG Dead? Bigger Context Windows Shift the Use Cases

What 1M Tokens Changes

Where RAG Still Wins

The Hybrid Approach

What This Means for Agents

More on This Topic

Related Posts

Tiered Memory for Desktop Agents - Plain Text First, Vector Search for Long-Term

Open Source AI Projects: Releases and Updates in April 2026

LLM Request Rejected: What It Means and How to Fix Every Variant

Comments ()

What 1M Tokens Changes

Where RAG Still Wins

The Hybrid Approach

What This Means for Agents

More on This Topic

Related Posts

Tiered Memory for Desktop Agents - Plain Text First, Vector Search for Long-Term

Open Source AI Projects: Releases and Updates in April 2026

LLM Request Rejected: What It Means and How to Fix Every Variant

Comments (••)

Comments ()