Using Multiple LLMs for Multi-Agent Workflows - Orchestration Patterns That Work

Matthew Diakonov·March 18, 2026·2 min read

multi-agent llm orchestration claude workflow claudecode

Running a single LLM for every task in an agent workflow is wasteful. Different models excel at different things, and the best multi-agent setups use the right model for each subtask instead of forcing one model to do everything.

The Orchestrator Pattern

The pattern that works best in practice is using Claude as the orchestrator - the central brain that plans, decides, and coordinates - while shelling out to other model CLIs for specific subtasks. Each subtask gets its own config, its own system prompt, and its own model optimized for that particular job.

For example, a workflow that processes documents might use Claude for understanding the overall task and planning steps, a fast local model for text extraction and classification, and a specialized model for code generation or data transformation. The orchestrator decides what to delegate and collects results.

Environment Variable Overrides

The simplest way to swap models globally is through environment variables. Set one variable and every subprocess that reads it uses the specified model. This gives you a single knob for switching between development (cheap, fast models) and production (frontier models) without changing any code.

For per-task overrides, pass the model config as part of the subprocess invocation. The orchestrator knows which model each task needs and configures it at launch time.

Why This Beats Single-Model Approaches

Cost drops dramatically when you route simple classification tasks to local models instead of sending everything to a frontier API. Latency improves because local inference skips the network round trip. And you get resilience - if one model provider goes down, you can reroute to alternatives without redesigning the workflow.

The key insight is that orchestration logic and execution logic should be separate concerns. The orchestrator does not need to be fast. It needs to be smart. The executors do not need to be smart. They need to be reliable at their specific task.

This post was inspired by a discussion on r/ClaudeCode by u/tomayt0.

Fazm is an open source macOS AI agent. Open source on GitHub.

Using Multiple LLMs for Multi-Agent Workflows - Orchestration Patterns That Work

The Orchestrator Pattern

Environment Variable Overrides

Why This Beats Single-Model Approaches

More on This Topic

Related Posts

LLM Request Rejected: What It Means and How to Fix Every Variant

LLM Request Rejected: Third-Party Apps Now Draw From Your Extra Usage

Broken Telephone in Agent Chains - Why Intent Gets Lost Beyond 2 Hops

Comments ()

The Orchestrator Pattern

Environment Variable Overrides

Why This Beats Single-Model Approaches

More on This Topic

Related Posts

LLM Request Rejected: What It Means and How to Fix Every Variant

LLM Request Rejected: Third-Party Apps Now Draw From Your Extra Usage

Broken Telephone in Agent Chains - Why Intent Gets Lost Beyond 2 Hops

Comments (••)

Comments ()