Using Opus as Orchestrator, Delegating to Sonnet and Haiku

Matthew Diakonov·March 17, 2026·3 min read

opus sonnet haiku model-routing context-window cost-optimization

Most people think the reason to use Opus as an orchestrator delegating to Sonnet and Haiku is cost. They are right that it saves money, but they are wrong about what the biggest win actually is. The real advantage is context window management.

The Context Window Problem

Opus is the most capable model, but it has a tendency to burn through 80% of the context window just understanding the codebase. It reads files thoroughly, explores dependencies, checks test files, reviews git history - all before writing a single line of code. By the time it starts implementing, the context is nearly full and it starts losing track of earlier information.

This is not a bug. Opus is doing the right thing by being thorough. But it means that for a complex task requiring many steps, Opus runs out of room.

The Orchestrator Pattern

The solution is to split the roles:

Opus handles planning, architecture decisions, and code review. These tasks benefit from deep reasoning and do not need much context for execution.
Sonnet handles implementation. It gets a scoped task description from Opus, the relevant files, and clear acceptance criteria. It does not need to understand the entire codebase.
Haiku handles simple, repetitive tasks - formatting, renaming, boilerplate generation, test scaffolding. These need minimal context.

Each model operates within its optimal context range.

The Real Numbers

On a typical feature implementation:

Opus solo: uses 180K tokens of context, loses early details, costs roughly $8-12 per task
Opus + Sonnet + Haiku: Opus uses 40K for planning, Sonnet uses 60K for implementation, Haiku uses 10K for cleanup. Total context pressure is lower, and cost drops to roughly $3-5

The cost savings are nice but the quality improvement from better context management is what matters. Each model has enough room to do its job well.

How to Set This Up

Use Claude Code with sub-agents or a custom routing layer. Opus generates a task breakdown with specific file lists and acceptance criteria for each sub-task. Sonnet instances pick up individual tasks. Haiku handles post-processing.

The pattern works because the orchestrator knows what context each worker needs - and only sends that context.

Fazm is an open source macOS AI agent. Open source on GitHub.

Using Opus as Orchestrator, Delegating to Sonnet and Haiku

The Context Window Problem

The Orchestrator Pattern

The Real Numbers

How to Set This Up

More on This Topic

Related Posts

Tips for Secondary Models - When to Use Haiku vs Opus in AI Agents

Use Sonnet for Grunt Work, Opus for Architecture

Multi-LLM Agent Routing - Using Different Models for Different Subtasks

Comments ()

The Context Window Problem

The Orchestrator Pattern

The Real Numbers

How to Set This Up

More on This Topic

Related Posts

Tips for Secondary Models - When to Use Haiku vs Opus in AI Agents

Use Sonnet for Grunt Work, Opus for Architecture

Multi-LLM Agent Routing - Using Different Models for Different Subtasks

Comments (••)

Comments ()