When Sonnet Outperforms Opus - Choosing the Right AI Model Tier

M
Matthew Diakonov

The instinct is always to use the best model available. Opus is smarter, so it should produce better code, right? In practice, Sonnet wins on a surprising number of daily coding tasks - and understanding when to use which model saves both time and money.

The Cost Gap Is Larger Than Most Realize

Claude 3.5 Sonnet costs $3/MTok input and $15/MTok output. Claude 3 Opus costs $15/MTok input and $75/MTok output - five times more expensive at every tier. For a team running 10 million tokens per day through an agent pipeline, that difference is $120/day versus $600/day.

For individual developers using Claude Code or API access, the practical difference shows up in willingness to iterate. At Opus pricing, you think twice before asking for a rewrite. At Sonnet pricing, you run five variations and pick the best one.

Where Sonnet Excels

Speed-sensitive iteration loops. Sonnet responds in roughly 2-4 seconds for most coding requests. Opus takes 8-15 seconds for the same prompts. For tight edit-test-fix loops, that gap compounds. Running 20 small fixes takes 1 minute with Sonnet and 3-5 minutes with Opus. Over a workday, this is a meaningful difference in how many cycles you complete.

Well-defined mechanical tasks. Writing utility functions, generating boilerplate, creating test cases for existing code, converting between data formats, adding type hints to Python functions - these tasks have clear correct answers. Opus does not produce meaningfully better output than Sonnet on problems where the solution space is well-constrained.

High-volume batch operations. Processing a large codebase - adding docstrings, running lint fixes, normalizing import style - runs 5x cheaper with Sonnet at essentially identical quality. The tasks are repetitive and pattern-based, exactly where Sonnet's efficiency shines.

Straightforward debugging. "This function returns None when I pass an empty list" is a solved class of problem. Sonnet identifies common bugs, undefined variable references, off-by-one errors, and missing null checks just as reliably as Opus for the vast majority of cases.

Where Opus Justifies the Cost

Multi-constraint architecture decisions. Designing a system that needs to handle 10k concurrent users, maintain sub-100ms latency, operate within a $500/month infrastructure budget, and integrate with three legacy APIs requires holding many constraints simultaneously. Opus is measurably better at not dropping constraints when reasoning gets complex.

Debugging subtle race conditions and concurrency bugs. These require reasoning about time, state, and interleavings across multiple execution contexts. Opus produces more accurate analysis because it can maintain a more detailed mental model of what is happening.

Refactoring with deep dependency chains. When changing a function signature that is called in 40 places, and those callers have their own callers, and some of the intermediate callers are in test files that import from production code - Opus tracks this better and makes fewer mistakes at the leaves of the dependency tree.

First-attempt code for expensive iteration contexts. If you are writing a database migration that will run on production data, getting it right the first time matters. Spending $0.50 on Opus to get a migration that works beats spending $0.10 on Sonnet and then debugging why your migration corrupted 10,000 rows.

A Practical Routing Framework

The decision is fast once you have a framework:

Is the task mechanical or creative?
  Mechanical (write tests, fix lint, add types) → Sonnet

Is this a first attempt or an iteration?
  Iteration (fixing something Sonnet tried) → stay on Sonnet unless stuck
  First attempt on something critical → Opus

How long is the reasoning chain?
  Short (fix this bug, explain this function) → Sonnet
  Long (design this architecture, debug this distributed system) → Opus

Does the task require holding many constraints?
  Few constraints → Sonnet
  Many simultaneous constraints → Opus

In practice, 70-80% of daily coding tasks route to Sonnet. Opus gets the remainder - complex debugging, architecture, and anything where the first attempt needs to be reliable.

The Over-Engineering Problem

One underappreciated advantage of Sonnet: it tends to write simpler code. Opus sometimes adds unnecessary sophistication to simple problems. Ask both models to write a function that reads a CSV and returns a list of dicts, and Opus might introduce abstract base classes, context managers, and configurable error handlers. Sonnet returns 10 lines that do the job.

This is not always true, but it shows up enough that developers building production systems sometimes explicitly prefer Sonnet for implementation tasks because the output is easier to review and maintain.

The Real Lesson

The best model is not the most powerful one - it is the one that solves your specific problem at the right speed and cost. Over-speccing AI model selection is the same mistake as running a high-traffic web server on the same machine as your database "because it's the most powerful setup." Match capability to problem.

Start with Sonnet as the default. Escalate to Opus when you hit a problem Sonnet genuinely struggles with - which happens less often than you would expect.

This post was inspired by a discussion on r/ClaudeCode.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts