A local AI agent fallback model that actually re-runs your query instead of showing you an error
Most fallback-model writeups describe abstract provider-routing SDKs. Fazm ships a narrower thing on a real Mac: when one path to Claude returns 'model not available,' the agent flips its bridge mode, re-sends the exact same message text on the other path, and the answer lands in the same chat. Two paths, three substring patterns, one auto-retry. Everything below is in the public Swift source.
What the SERP misses about local agent fallbacks
If you searched 'local ai agent fallback model' you got the same three articles I did: provider-router SDKs, Hermes fallback chains, and the SLM-default LLM-fallback pattern. They all describe a generic shape: try provider A, on failure try provider B. None of them describes a desktop app where the failover is between two credentials for the same model family, with a hard cost cap on one side and a model-entitlement check on the other.
That gap is what this page is about. Fazm's fallback isn't Claude-to-GPT. It's bundled-key Claude to subscription Claude, in either direction, with an automatic replay of the message text you already sent.
The shape of the fallback
0 bridge modes · 0 error patterns · 0 auto-retry
Two access paths, three substring matches that flip the path, one re-execution of the failed query. The reverse trigger (builtin to personal) is a single >= comparison against builtinCostCapUsd.
Numbers you can verify yourself
All four are literal Swift values from the public source tree. Clone the repo, open ChatProvider.swift, and the lines match.
Anchor fact: the matcher and the auto-retry, in real Swift
The whole behavior is three short blocks of code in one file. The matcher is dumb on purpose: a lowercase + substring check so any of Anthropic's wording variations across endpoints still triggers the same flip. The retry is a single flag plus one recursive call after the bridge has restarted.
“auto-retrying query after model access fallback to builtin”
Desktop/Sources/Providers/ChatProvider.swift:2970 (literal log line)
What the bridge sees, end to end
The agent's chat layer talks to a Node-based ACP bridge, which talks to Claude. When the bridge throws a BridgeError.agentError with a 'not have access' string in it, the chat layer's catch block intercepts, switches the bridge's mode, and replays the original user text against the new bridge.
Auto-retry across a bridge-mode flip
The full sequence, step by step
Five stages from the original sendMessage call to the answer arriving in the chat after the mode flip. The middle stages are invisible to the user.
User sends a message in personal mode
bridgeMode is 'personal', so the ACP bridge is talking to claude.ai through the user's OAuth. The selected model is whatever the floating bar's model picker is set to. The message text is captured into pendingRetryMessage at the top of sendMessage so it can be retried later if anything fails.
Anthropic returns a model-access error
The personal account isn't entitled to that model. The error message comes back as a BridgeError.agentError with a string like 'model claude-sonnet-4-6 may not exist or you do not have access to it.' isModelAccessError matches it on substring patterns at ChatProvider.swift:1362.
Catch block flips two flags and clears the surface error
pendingBridgeModeSwitch is set to 'builtin' and retryAfterModelFallback is set to true. errorMessage is set to nil so the UI does not flash the error to the user. pendingRetryMessage is intentionally NOT cleared so the retry has the original text to send.
Bridge restart applies in the new mode
applyPendingBridgeRestart and applyPendingBridgeModeSwitch tear down the personal-mode bridge and bring up a builtin-mode one with the bundled Anthropic API key. Per-mode UserDefaults keep the session IDs separate so a stale personal session doesn't get resumed under the API key.
Auto-retry fires the original message
Right after the mode switch, the post-query handler checks retryAfterModelFallback. It reads the original text out of pendingRetryMessage, clears it, and calls sendMessage(retryText). That second call goes through the new bridge with the bundled key. The answer streams back into the same chat session.
What goes in, what comes out
Three inputs feed the decision: the user's original message, the error string from the failed call, and the current bridge mode. The router has three possible outcomes, and only one of them shows the user a problem.
Fallback decision flow
The reverse trigger: the $10 cost cap
The forward path (personal to builtin) is a reaction. The reverse path (builtin to personal) is a budget. Each Fazm user has a single hard cap on the bundled Anthropic key: ten dollars of cumulative spend, tracked locally and seeded from Firestore on startup. When that cap is reached, the next query goes out on the user's personal Claude account instead.
Worth noting that this check happens twice. Once before the query (line 2272) so a user already over the cap goes straight to personal mode, and once after the query (line 2814) so a single expensive query that crosses the threshold flips the mode for everything that follows.
What it looks like in the log
If you tail Fazm's log file while the fallback fires, you see the exact transition below. The 'auto-retrying' line is the literal log statement at ChatProvider.swift:2970, no rephrasing.
How this differs from a generic provider router
The standard recipe described in most 'fallback model' articles assumes you own the model account on both sides. Fazm's setup is different because one of the two paths is the user's own consumer subscription, not a developer key, and the agent has to be polite about which one it spends.
| Feature | Generic provider router | Fazm local agent |
|---|---|---|
| What 'fallback' means | Swap to a different provider (OpenAI, Mistral, local Llama) | Swap which credential pays for the same Claude call |
| Trigger condition | Generic 5xx, timeout, or refusal heuristic | Three substring patterns matched in isModelAccessError, plus a $10 cost cap |
| Auto-retry of the failed query | Sometimes; often surfaces a 'try again' button | Yes, automatic. retryAfterModelFallback flag, sendMessage(retryText) on the new mode |
| What the user sees while it happens | Visible error then a re-prompt, or a model-name change | Loading state through the bridge restart, then the answer in the same chat |
| Where state lives | Cloud session, server-side router | Local Swift class, AppStorage flags, per-mode session IDs in UserDefaults |
| Cold-start experience | User adds API keys before they can try the agent | Bundled key gives free first $10 of usage. Personal mode is opt-in |
| Reverse direction | Rare. Failover is usually one-way | Yes. Hitting the $10 cap on builtin auto-switches to personal |
| Conversation continuity | Often a new session in the new provider | Per-mode session IDs and local SQLite restore preserve history across the flip |
Why this matters for a Mac agent specifically
On a desktop agent, the user's relationship with the model vendor is not abstract. They probably already pay for Claude Pro or Max. A failed local action that says 'model not available' is a bad first impression because the user has no idea whether the problem is their account, the agent, or the app's bundled key. A bridge that quietly swaps which credential is used and replays the same query is the difference between an agent that feels brittle and an agent that feels like it just works.
Fazm pushes this further by making the bundled key a free $10 of cold-start usage. New users do not have to enter an API key or sign into anything to get past the first message. After they hit the cap, the same fallback machinery moves them onto their own subscription. The user does not have to know any of this is happening.
Try the agent that doesn't show you the error
Install Fazm. The bundled $10 of bridge usage covers most users for the first week or two. After that, sign in with your Claude account and the fallback runs in reverse.
Download Fazm for Mac →Building this yourself
If you want to replicate the pattern in another local agent project, the four pieces you need are: a per-mode session registry so a stale session from one mode does not get resumed on another, a single 'pendingRetryMessage' slot that is set on send and cleared on success, a substring matcher (or your language's equivalent) for the model-access error shape, and a post-query block that runs after the bridge restart and decides whether to replay.
Anything more elaborate than that (token bucket retries, exponential backoff, vendor-agnostic adapters) is overkill for the model-access case specifically. The error is deterministic. Either the account has the model or it doesn't. There's no value in retrying the same path more than once.
The non-obvious part is that the failed query's text has to survive the catch block. Fazm sets pendingRetryMessage at the top of sendMessage and only clears it after a successful completion or an unrecoverable error. That's why the auto-retry can read it back at line 2968 and call sendMessage again with the original text.
Frequently asked questions
What does 'fallback model' actually mean for a local AI agent on a Mac?
In Fazm's case it does not mean swapping families like Claude to GPT to Gemini. It means swapping the access path to the same model. Fazm has two bridge modes: builtin (a bundled Anthropic API key, capped at $10 of usage per user) and personal (the user's own Claude Code OAuth, which uses their Pro or Max subscription). When one path returns a model-access error or runs out of credit, the local agent transparently flips to the other and re-runs the failed query. The user sees the answer, not the error.
Where is this fallback logic actually defined?
Desktop/Sources/Providers/ChatProvider.swift in the public Fazm source tree. The matcher that detects a model-access error is at line 1362 (isModelAccessError). The catch block that triggers the fallback is at lines 2943 to 2953. The auto-retry that re-runs the user's exact message after the bridge mode switches is at lines 2967 to 2972. The reverse trigger, the $10 cost cap that pushes builtin users to their personal account, is at line 476 (builtinCostCapUsd = 10.0).
What error strings actually count as a model-access error?
isModelAccessError lowercases the message and returns true if any of three patterns match: ('may not exist' AND 'not have access'), ('model' AND 'not found'), or ('model' AND 'not available'). That covers the common shapes Anthropic returns when CLI credentials exist but the account or plan does not include the requested model (for example, claude-sonnet-4-6 on a personal account that has not been entitled to it). Generic 5xx errors and rate limits go through different branches and do not trigger this fallback.
Does the user have to retry manually?
No. The flag retryAfterModelFallback is set inside the same catch block that requests the bridge mode switch. After applyPendingBridgeModeSwitch() finishes restarting the ACP bridge in the new mode, the post-query handler checks that flag, pulls the original message text out of pendingRetryMessage, and calls sendMessage(retryText) again. From the UI side, the original user message stays in the chat, the loading spinner keeps spinning through the bridge restart, and the answer arrives over the new mode.
What happens in the other direction, when the bundled API key runs out of credits?
The pre-query guard at ChatProvider.swift:2272 checks builtinCumulativeCostUsd against builtinCostCapUsd ($10.0). If the user has hit the cap, switchBridgeMode(to: 'personal') fires before the query goes out, the credit-exhausted alert is shown, and the query proceeds on the user's personal Claude account instead. The same check fires post-query at line 2814 to catch the moment the cap is crossed during a single query.
Why two access paths to the same model instead of two different model providers?
The Mac agent's strength is that the user already has Claude. Most Fazm users either have a Claude Pro or Max subscription, or they want to try the agent without setting up an Anthropic API account first. The bundled API key gives a cold-start experience with no signup. Once the cap is reached, the agent transitions the user to their own subscription. If their subscription account does not have access to the chosen model, it falls back the other way. The point is that the user never has to think about which credential is in use.
What about rate limits? Do those trigger the fallback too?
Rate limits are handled separately and do not flip the bridge mode. The credit-exhausted branch at ChatProvider.swift:2902 specifically checks for the pattern 'resets <something>' in the error message to distinguish a temporary five-hour or seven-day rate limit from real credit exhaustion. If it's a rate limit on the builtin account, the mode stays put and the user sees a wait-and-retry message. If it's an actual cap or hard credit failure, the mode switches and the user is moved to personal.
Is the user's chat history preserved across the fallback?
Yes. Per-mode session IDs live in UserDefaults under floatingACPSessionId_<mode> and mainACPSessionId_<mode>, and floating-chat history is restored from local SQLite before the bridge restarts (see the comment at ChatProvider.swift:1031). When the new bridge starts, it can either resume the existing ACP session for that mode or start a fresh one seeded with the conversation history that was just restored from disk. Either way, the next message lands in the same conversation the user was already in.
Can I see this fallback fire on a real run?
Yes. Set bridgeMode to 'personal' in your AppStorage, then ask Fazm to use a model your personal Claude account does not have access to (for instance, set the model selector to claude-sonnet-4-6 if you only have Pro). On the first send, the personal bridge will return a 'model may not exist or not have access' error. In the logs you will see two lines from ChatProvider: one about the model access error and one that begins 'auto-retrying query after model access fallback to builtin'. The query goes out a second time over the bundled API key and the answer comes back.
Does this work with non-Anthropic models?
Not today. Both bridge modes route through the Agent Client Protocol bridge to a Claude Code subprocess. The thinking model itself is always a Claude family model (Haiku, Sonnet, Opus). The fallback is between two ways of paying for that Claude call, not between Claude and a third party. The other model in the system, Gemini, is used for the proactive task observer and is not part of this code path.
Source links if you want to read it cold
Every line referenced on this page lives in Desktop/Sources/Providers/ChatProvider.swift inside the public Fazm desktop source tree. Look for isModelAccessError, retryAfterModelFallback, pendingRetryMessage, and builtinCostCapUsd to land on the relevant blocks.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.