Using Multiple LLMs for Multi-Agent Workflows - Orchestration Patterns That Work
Multi-LLM Agent Workflows
Running a single LLM for every task in an agent workflow is wasteful. Different models excel at different things, and the best multi-agent setups use the right model for each subtask instead of forcing one model to do everything.
The Orchestrator Pattern
The pattern that works best in practice is using Claude as the orchestrator - the central brain that plans, decides, and coordinates - while shelling out to other model CLIs for specific subtasks. Each subtask gets its own config, its own system prompt, and its own model optimized for that particular job.
For example, a workflow that processes documents might use Claude for understanding the overall task and planning steps, a fast local model for text extraction and classification, and a specialized model for code generation or data transformation. The orchestrator decides what to delegate and collects results.
Environment Variable Overrides
The simplest way to swap models globally is through environment variables. Set one variable and every subprocess that reads it uses the specified model. This gives you a single knob for switching between development (cheap, fast models) and production (frontier models) without changing any code.
For per-task overrides, pass the model config as part of the subprocess invocation. The orchestrator knows which model each task needs and configures it at launch time.
Why This Beats Single-Model Approaches
Cost drops dramatically when you route simple classification tasks to local models instead of sending everything to a frontier API. Latency improves because local inference skips the network round trip. And you get resilience - if one model provider goes down, you can reroute to alternatives without redesigning the workflow.
The key insight is that orchestration logic and execution logic should be separate concerns. The orchestrator does not need to be fast. It needs to be smart. The executors do not need to be smart. They need to be reliable at their specific task.
Fazm is an open source macOS AI agent. Open source on GitHub.