Agent Architecture

AI Agent Model Provider Failover: A Two-Day Pattern

Any AI-powered product that depends on a single model provider is one policy decision, one rate-limit tightening, or one account suspension away from downtime. The news cycle reminds us of this at least once a quarter. The fix is simple in concept and non-trivial in practice: abstract the model layer so you can hot-swap providers. With a clean abstraction and a few patterns, a small team can ship this in about two days. This guide walks through the architecture, the code patterns, and the practical gotchas.

OSS

“Fazm uses real accessibility APIs instead of screenshots, so it interacts with any app on your Mac reliably and fast. Free to start, fully open source.”

fazm.ai

1. Why Single-Provider Is a Production Risk

The list of ways a single-provider setup can fail is longer than it first appears. Rate-limit tightening during peak hours. Model deprecation on 60 days notice. Pricing changes. Regional outages. Tool-use schema changes that break your integration. Account reviews that escalate into suspensions, sometimes with limited communication. The 2025 Anthropic suspension of the OpenClaw team's account is the high-profile example, but it is not the only one. The pattern repeats.

The cost of being down is almost always higher than the cost of spending a few days to build redundancy. Customers do not care whose API is broken. If your product does not work, you wear it. The cost of failover is also a one-time expense. The cost of an outage is a recurring hit to trust every time it happens.

The good news is that the major model providers have converged on a similar API surface. Messages with roles, tool definitions, structured outputs. The differences are in the wrapper, not the core semantics. That convergence is what makes a two-day failover implementation realistic in 2026.

2. The Model Abstraction Layer

The core of the failover pattern is a single interface your application code talks to. Your business logic never imports the Anthropic SDK or the OpenAI SDK directly. It talks to a local ModelClient type that you control.

A minimal interface covers the three shapes of call you will make: non-streaming completion, streaming completion, and completion-with-tools. Something like:

interface ModelClient {
  complete(req: CompletionRequest): Promise<CompletionResponse>;
  stream(req: CompletionRequest): AsyncIterable<StreamChunk>;
  completeWithTools(req: ToolRequest): Promise<ToolResponse>;
}

interface CompletionRequest {
  messages: Message[];
  system?: string;
  maxTokens?: number;
  temperature?: number;
  modelTier: "frontier" | "fast" | "cheap";
}

Notice the use of modelTier instead of a specific model ID. The abstraction is not about pinning to a specific model. It is about describing what quality level you need. Each provider implementation maps tiers to their specific models: frontier might be Claude Opus or GPT-5, fast might be Sonnet or GPT-5 Mini, cheap might be Haiku or GPT-5 Nano. The calling code does not care which.

The Message and CompletionResponse types should be your own, not vendor types. Each provider adapter translates to and from your internal shape. This is non-negotiable. The moment your business logic holds vendor-shaped objects, you have leaked the abstraction and failover becomes a refactor.

Libraries like Vercel's AI SDK, litellm, and OpenRouter offer some of this abstraction out of the box. Using one of them is a reasonable shortcut, with the caveat that you are now depending on their semantics matching across providers. Rolling your own thin adapter is 200 lines of code per provider and gives you complete control.

A desktop agent that pairs with any model

Fazm runs locally and lets you bring any model provider, so you stay in control of failover.

Try Fazm Free

3. Routing and Failover Logic

Once you have an abstraction, routing is where the failover actually happens. The simplest pattern is a prioritized list of providers per tier, with automatic fallback on failure.

async function callWithFailover(req: CompletionRequest) {
  const providers = routingTable[req.modelTier];
  for (const provider of providers) {
    try {
      return await provider.complete(req);
    } catch (err) {
      if (isRetryable(err)) {
        logFailover(provider.name, err);
        continue;
      }
      throw err;
    }
  }
  throw new Error("all providers failed");
}

The isRetryable function is where the interesting logic lives. You want to failover on 429s (rate limits), 503s (overloaded), timeouts, and provider- specific suspension errors. You do not want to failover on 400s (bad request from your code) because those will fail identically on the next provider.

Add a circuit breaker. If provider A has failed five times in the last minute, stop trying it for a few minutes. This prevents a sick provider from adding latency to every request while you wait for the timeout.

Log every failover event with enough detail to investigate later: which provider, which error, which request shape, which user. When something goes wrong at 3 AM you want a clear trail.

Consider traffic splitting for active health monitoring. Send 1 percent of traffic to each provider regardless of priority to keep them warm and to catch degradation early. If provider B is silently broken and you only discover it during a failover, you have a bad evening.

4. Handling Tool Use Across Providers

Plain completion failover is easy. Tool use is where the abstraction earns its keep, because the tool schemas differ between providers in small but real ways.

Anthropic uses an input_schema field nested inside a tool object. OpenAI uses function-calling with a parameters field. Google Gemini has its own shape. The differences are mostly naming. The hard part is when a provider supports a feature the others do not: parallel tool calls, forced tool choice, strict schema enforcement.

For failover, the sane move is to target the intersection of features across your chosen providers. Use parallel tool calls if all your providers support them. Otherwise serialize. This is a compromise. The alternative is writing different control flow per provider, which defeats the point of the abstraction.

Translate tool definitions in the adapter layer. Your business logic defines tools once, in your internal shape. Each provider adapter converts to the provider-specific format on the way out and converts tool-call responses back on the way in.

Test the tool use path on every provider with the same agent loop. The common failure mode is "works on Anthropic, times out in a weird way on OpenAI because of a stop condition we never noticed." Find this in staging, not in production.

Resilient automation that survives vendor outages

Pair Fazm with any model provider. Swap between Anthropic, OpenAI, Gemini, or local models without rewriting workflows.

Try Fazm Free

5. A Two-Day Rollout Plan

Shipping provider failover in roughly two days of focused work is realistic for a small team. Here is a schedule that works.

Day 1 morning: design the internal types. Message, CompletionRequest, CompletionResponse, ToolRequest, ToolResponse, StreamChunk. Keep them simple. Do not try to capture every provider feature; capture what your app actually uses.

Day 1 afternoon: implement two provider adapters. Anthropic and OpenAI are the easiest starting pair because documentation and SDKs are mature. Unit-test them against a shared fixture: same input, verify compatible output shape.

Day 1 evening: wire the failover function with a simple prioritized list. No circuit breaker yet, just try-next-on-retryable-error. Confirm both providers answer the same test prompts with similar quality at the same tier.

Day 2 morning: update the business logic to call the abstraction instead of the provider SDK directly. This is the bulk of the work if the codebase has grown. Grep for direct SDK imports and route them through the new client.

Day 2 afternoon: add observability. Metrics for requests per provider, failovers per minute, latency per provider, error rate per provider. Dashboards. Alerts. This is the difference between blind failover and visible failover.

Day 2 evening: add the circuit breaker and a small health-check job that pings each provider on a schedule. Configure a kill switch (an environment variable or config flag) that lets you manually force traffic to one provider in case of a bad rollout.

Optional day 3: add a third provider (Gemini or a hosted open model) for true diversity. Having three providers in rotation handles the case where one is down and the second is degraded at the same time.

6. Pitfalls and What Not to Bother With

Do not try to paper over quality differences with smart routing logic on day one. "Use the best provider for each kind of request" is a fun engineering project and a very boring one to debug. Start with a flat priority order and only specialize once you have data.

Do not obsess over exactly matching output. Different providers will answer slightly differently. If your product tolerates that variance when you manually switch, it will tolerate it during an automatic failover.

Do not failover on 400-class errors. A malformed request on Anthropic will be malformed on OpenAI. Fail fast to your own error handling.

Do not forget about streaming. Streaming semantics are the area where providers diverge the most (chunk shapes, end signals, tool-call delta streaming). Decide early whether your app needs streaming failover or only completion failover, because the first is meaningfully more code.

Do not pick every provider. Two is enough to start. Three is plenty. Supporting every provider on day one is a distraction.

The core payoff is durable. Once the abstraction is in place, your code is not tied to any single vendor's fortunes. You can move a percentage of traffic to a new provider the day a cheaper or better option ships, without touching any business logic. And when the next suspension or outage happens, you find out from your metrics dashboard rather than from your users.

A desktop agent that stays provider-neutral

Fazm is open source and lets you wire in whichever model you want. Swap Anthropic, OpenAI, Gemini, or a local model without changing your workflows.

Try Fazm Free

Free to start. Runs locally. Use any model provider with the same automation logic.

1. Why Single-Provider Is a Production Risk

2. The Model Abstraction Layer

A desktop agent that pairs with any model

3. Routing and Failover Logic

4. Handling Tool Use Across Providers

Resilient automation that survives vendor outages

5. A Two-Day Rollout Plan

6. Pitfalls and What Not to Bother With

A desktop agent that stays provider-neutral

Comments