Tips for Secondary Models - When to Use Haiku vs Opus in AI Agents

Matthew Diakonov·March 18, 2026·3 min read

model-routing haiku opus cost-optimization ai-agents claudecode

Not every AI agent task needs the most powerful model. Running Opus for a simple button click confirmation is like hiring a surgeon to put on a bandage. It works, but you are burning money for no reason.

The key insight is that model selection should be task-driven, not workflow-driven. Within a single agent workflow, different steps have wildly different complexity requirements.

When Haiku Is Enough

Haiku handles the majority of agent micro-tasks perfectly well:

Screen element classification - "Is this a button or a text field?" Haiku gets this right 99% of the time.
Simple extraction - Pulling a name, date, or number from structured text.
Yes/no verification - "Does this page contain a success message?"
Status checks - Reading error codes, confirming page loads, checking element visibility.
Template filling - Inserting known values into predictable form fields.

These tasks are high-frequency and low-complexity. Routing them to Haiku can cut your per-workflow cost by 60% or more.

When You Need Opus

Opus earns its cost on tasks that require genuine reasoning:

Multi-step planning - Deciding the sequence of actions across multiple applications.
Error recovery - When something unexpected happens and the agent needs to figure out a new path.
Ambiguous UI interpretation - Complex layouts where the right action is not obvious from the accessibility tree alone.
Context-heavy decisions - Tasks that require remembering and synthesizing information from earlier steps.

The Handoff Problem

The tricky part is the handoff between tiers. If Haiku misclassifies a complex task, it might take a wrong action that Opus then has to recover from - costing more than just using Opus from the start.

A practical approach: start with Haiku, and if it returns low-confidence results or the task requires more than two reasoning steps, escalate to Opus automatically. This catches the 10-15% of tasks that genuinely need more power while keeping the other 85% cheap.

The best model is not always the biggest one. It is the one that matches the task.

This post was inspired by a discussion on r/ClaudeCode.

Fazm is an open source macOS AI agent. Open source on GitHub.

Tips for Secondary Models - When to Use Haiku vs Opus in AI Agents

When Haiku Is Enough

When You Need Opus

The Handoff Problem

More on This Topic

Related Posts

Using Opus as Orchestrator, Delegating to Sonnet and Haiku

Multi-LLM Agent Routing - Using Different Models for Different Subtasks

Memory of a Goldfish - Solving Mid-Conversation Context Drift in AI Agents

Comments ()

When Haiku Is Enough

When You Need Opus

The Handoff Problem

More on This Topic

Related Posts

Using Opus as Orchestrator, Delegating to Sonnet and Haiku

Multi-LLM Agent Routing - Using Different Models for Different Subtasks

Memory of a Goldfish - Solving Mid-Conversation Context Drift in AI Agents

Comments (••)

Comments ()