Tips for Secondary Models - When to Use Haiku vs Opus in AI Agents

M
Matthew Diakonov

Tips for Secondary Models - When to Use Haiku vs Opus in AI Agents

Not every AI agent task needs the most powerful model. Running Opus for a simple button click confirmation is like hiring a surgeon to put on a bandage. It works, but you are burning money for no reason.

The key insight is that model selection should be task-driven, not workflow-driven. Within a single agent workflow, different steps have wildly different complexity requirements.

When Haiku Is Enough

Haiku handles the majority of agent micro-tasks perfectly well:

  • Screen element classification - "Is this a button or a text field?" Haiku gets this right 99% of the time.
  • Simple extraction - Pulling a name, date, or number from structured text.
  • Yes/no verification - "Does this page contain a success message?"
  • Status checks - Reading error codes, confirming page loads, checking element visibility.
  • Template filling - Inserting known values into predictable form fields.

These tasks are high-frequency and low-complexity. Routing them to Haiku can cut your per-workflow cost by 60% or more.

When You Need Opus

Opus earns its cost on tasks that require genuine reasoning:

  • Multi-step planning - Deciding the sequence of actions across multiple applications.
  • Error recovery - When something unexpected happens and the agent needs to figure out a new path.
  • Ambiguous UI interpretation - Complex layouts where the right action is not obvious from the accessibility tree alone.
  • Context-heavy decisions - Tasks that require remembering and synthesizing information from earlier steps.

The Handoff Problem

The tricky part is the handoff between tiers. If Haiku misclassifies a complex task, it might take a wrong action that Opus then has to recover from - costing more than just using Opus from the start.

A practical approach: start with Haiku, and if it returns low-confidence results or the task requires more than two reasoning steps, escalate to Opus automatically. This catches the 10-15% of tasks that genuinely need more power while keeping the other 85% cheap.

The best model is not always the biggest one. It is the one that matches the task.

This post was inspired by a discussion on r/ClaudeCode.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts