VERIFIED 2026-05-08 / STATUS.CLAUDE.COM

Parallel Claude Code agents do not survive a Claude Code outage

They all hit the same upstream. When api.anthropic.com sneezes, every subagent, every worktree, every parallel session in your team stops at the same moment. Reducing concurrency does not help. Spawning more subagents does not help. The only thing that helps is pointing some of those agents at a different upstream, and that is one environment variable, ANTHROPIC_BASE_URL, that Fazm exposes as a single field in Settings. This guide covers verified May 2026 outage data, why subagents share a failure mode, and the exact source line that wires the override.

M
Matthew Diakonov
9 min read
DIRECT ANSWER (VERIFIED 2026-05-08)

Do parallel agents survive a Claude Code outage?

Not by default. Every Claude Code instance, every Task-spawned subagent, and every parallel worktree session points at api.anthropic.com. One degraded upstream takes them all down at once. Real mitigation is rerouting the client through a different base URL (corporate proxy, Copilot bridge, Bedrock or Vertex region) or switching to a different model family entirely. Source of truth for incidents is status.claude.com; on May 8, 2026 it logged three separate incidents in one day.

The shape of the failure mode

When you run parallel Claude Code agents, what you have is one orchestrator process spawning many client connections, all bearing the same API key, all resolving the same hostname, all terminating at the same TLS frontend at Anthropic. There is no transparent fan out across providers. There is no hot standby. The redundancy you imagine you bought by spinning up four worktrees and a Task tool fanned out three deep is, from the upstream's point of view, a single tenant pushing a load-balanced burst at a single endpoint.

When that endpoint sneezes, all of your agents sneeze. The status page on May 8, 2026 logged three incidents inside one work day. Inside one of those windows, your fleet of parallel agents was serializing on the recovery, not chewing through tickets. Most of the advice that ranks for this query is some version of “wait, retry, reduce concurrency.” That advice describes how to be well-behaved under rate limits. It does not describe how to keep working when the upstream itself is broken.

The thing that helps is breaking the shared dependency. You do that by pointing the client at a different upstream. Two paths: a different gateway that speaks the Anthropic message format (corporate proxy, Copilot bridge, local LLM bridge, Bedrock or Vertex region), or a different model family entirely (GPT through Codex). The first path is one environment variable (ANTHROPIC_BASE_URL). The second is a backend toggle.

What helps and what does not

Six things you can do during an outage, ranked by whether they actually decouple your agents from the broken upstream.

Default behaviour

Every Claude Code instance, every Task subagent, every parallel worktree session, all using api.anthropic.com. Outage hits the upstream once; every agent stops once.

Concurrency knobs

CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY, queue depth caps, smaller subagent counts. Real fix for 429s and rate limiting. Zero help when upstream is returning 5xx.

Model switching with /model

Works when one Claude family is degraded and another is healthy (e.g. May 7 Opus 4.7 incident). Fails when the incident is 'across Claude Models'.

Fallback CLIs in YAML

Useful for transient errors, exponential backoff, and per-model retries. Still hits the same gateway if every entry points at api.anthropic.com.

Different upstream

ANTHROPIC_BASE_URL pointed at a corporate proxy, GitHub Copilot bridge, or a non-Anthropic Bedrock or Vertex region. The single change that actually decouples your agents from the outage.

Different model family

Fazm's Codex backend toggle (advanced.codex.toggle in SettingsSidebar.swift). Adds GPT-5 family models through OpenAI's Codex CLI. Independent platform, independent status page.

3x

Elevated errors across Claude Models, Elevated errors on Claude Sonnet 4.6, Elevated Errors on File Operations. All three logged in a single day.

status.claude.com, May 8 2026

The single line that does the rerouting

Fazm runs Claude Code under the hood through a small Node bridge called acp-bridge. When the Swift app spawns the bridge, it builds an environment for the child process. One conditional inACPBridge.swift checks the user-set customApiEndpointUserDefaults value. If it is non-empty, the bridge process inherits a redirected ANTHROPIC_BASE_URL and the Anthropic SDK happily uses it for every subsequent request.

Desktop/Sources/Chat/ACPBridge.swift

The Settings UI for that field lives inDesktop/Sources/MainWindow/Pages/SettingsPage.swift around line 984. Placeholder text: https://your-proxy:8766. Help line: “Route API calls through a custom endpoint (e.g. local LLM bridge, corporate proxy, or GitHub Copilot bridge). Leave empty to use the default Anthropic API.” If you want to verify the wiring before trusting it during an actual outage, thecustomApiEndpoint string is searchable from the Fazm source tree at github.com/m13v/fazm.

Mitigation in four steps

What to do the next time the status page lights up while you are mid-task.

  1. 1

    Confirm it is upstream

    Check status.claude.com first. Three incidents on May 8 alone. If your retries are 5xx with the same correlation across machines, you are inside an Anthropic incident, not a local issue.

  2. 2

    Pick a different upstream

    Corporate proxy you already operate, a GitHub Copilot Anthropic-compatible bridge, a self-hosted local LLM bridge, or a Bedrock or Vertex region in a different cloud.

  3. 3

    Set the override

    In Fazm: Settings > Advanced > AI Chat > Custom API Endpoint. Paste the URL (placeholder shows the shape: https://your-proxy:8766). Restart the bridge from the same panel.

  4. 4

    Or switch model family

    If the alternative upstream is also flaky, flip the Codex backend toggle and pick a GPT-5 model from the dropdown. Same window, same conversation, different platform underneath.

The Codex backend toggle

The harder failure mode is when none of the Anthropic-shaped upstreams are healthy and your alternative gateway is also flaky. That is the case when an inference layer bug propagates across fleets, which is what the May 6, 2026 “Elevated errors across multiple models” incident looked like. The honest answer there is to switch model families. Fazm has a separate setting for that.

In Desktop/Sources/MainWindow/SettingsSidebar.swift line 86, there is a search item registered with idadvanced.codex.toggle: “Codex Backend, Route GPT-5 family models through OpenAI’s Codex CLI”. Enable it and the model picker on the floating control bar gains GPT-5 family entries. Pick one from the same dropdown you use for Sonnet or Opus. The session continues without a restart, the running tools and prompts and conversation context stay attached. Underneath, the bridge has flipped from Anthropic to OpenAI. That is a real fallback to an independent platform with an independent status page, not a different SKU on the same fleet.

Combined with thecustomApiEndpoint override, you have two orthogonal moves: change the upstream (different gateway, same model family) or change the family (different vendor entirely). Most of the time the first one is enough. When it is not, the second one is.

Things this does not fix

Honest list. If the outage is at your edge (DNS resolver, captive portal, corporate proxy that you did not configure, a recently-rotated GitHub IP allowlist), no amount of upstream redirection helps because the redirection itself cannot leave your network. The May 7, 2026 incident affecting orgs with IP-based GitHub access restrictions is exactly this shape: Anthropic changed its outbound IPs, downstream firewalls denied the new ones, everyone’s parallel agents stopped. The fix in that case is at the firewall, not at the client.

Likewise, if a parallel-agent run is failing because the model is producing bad output during a regression rather than because the API is timing out, switching upstreams just gives you the same bad output from a different host. That is what 4xx-level “model quality” incidents look like; the right answer is wait or downgrade to a known-good version, not reroute.

And rerouting through a public Copilot bridge under load can hit the bridge’s own rate limits. The bridge is somebody else’s server. Treat it as a peer to api.anthropic.com, not as a guaranteed-up alternative.

The one-sentence version

Parallel agents do not give you redundancy against an upstream outage; only a different upstream does. Fazm exposes that different upstream as a text field in Settings and, when the field is not enough, as a backend toggle that swaps the entire model family. It is a desktop app for people who do not want to be editing dotfiles in the middle of an incident.

Talk through your outage playbook

If you run parallel Claude Code agents in production, walk us through your current setup and we will sketch a redundancy plan in 20 minutes. No pitch, just a working session.

Frequently asked questions

Do parallel Claude Code agents survive a Claude Code outage?

No. By default, every Claude Code instance, every Task subagent, and every parallel worktree session points at the same upstream: api.anthropic.com. When that endpoint is degraded, all of them stop at the same time. Spawning more subagents does not buy you redundancy. The only thing that buys you redundancy is pointing some of those agents at a different upstream, which is what ANTHROPIC_BASE_URL is for.

What outages did Claude have in early May 2026?

On May 8, 2026 the public status page logged three separate incidents in a single day: 'Elevated errors across Claude Models' resolved at 11:40 UTC, 'Elevated errors on Claude Sonnet 4.6' resolved at 15:11 UTC, and 'Elevated Errors on File Operations' resolved at 15:17 UTC. The day before, 'Elevated errors on Claude Opus 4.7' resolved at 12:45 UTC, and there was a connection issue affecting orgs with IP-based GitHub access restrictions because Anthropic's outbound IPs changed. There was also a major incident on April 28 from 17:34 to 18:52 UTC where claude.ai was unavailable and the API saw elevated errors. All of these are visible at status.claude.com.

Why does reducing CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY not help during an outage?

It helps when you are getting rate-limited under your own load: you back off and the rate limiter recovers. It does not help when the upstream itself is failing requests, because every single one of your serial calls hits the same broken endpoint. Lower concurrency is the right answer for 429s under heavy fan-out and sub-agent storms; it is the wrong answer for 500s, 503s, gateway timeouts, and 'Elevated errors across Claude Models'. Treat them as different failure modes.

Can I just switch to /model sonnet from /model opus during an outage?

Sometimes. The status page often lists model-specific incidents (e.g. 'Elevated errors on Claude Opus 4.7' on May 7, 'Elevated errors on Claude Haiku 4.5' on April 30) where the platform is healthy but one model family is degraded. In those cases switching to a different Claude model with /model is enough. It is not enough when the incident is platform-wide ('Elevated errors across Claude Models', 'claude.ai and API unavailable'); for those you need a different upstream entirely.

What is the ANTHROPIC_BASE_URL override and where is it set in Fazm?

Fazm exposes a single text field in Settings labelled 'Custom API Endpoint' under Advanced > AI Chat. The placeholder text is 'https://your-proxy:8766'. When you fill it in, Fazm sets ANTHROPIC_BASE_URL on the acp-bridge process so every Claude Code session running through Fazm hits your endpoint instead of api.anthropic.com. The wiring is in Desktop/Sources/Chat/ACPBridge.swift around line 469: 'env["ANTHROPIC_BASE_URL"] = customEndpoint'. The Settings UI is in Desktop/Sources/MainWindow/Pages/SettingsPage.swift around line 984. The endpoint can be a corporate proxy, a local LLM bridge, or a GitHub Copilot Anthropic-compatible bridge, anything that accepts the Anthropic message format.

Why route through GitHub Copilot during a Claude outage?

Two reasons. One, Copilot's upstream is independent of api.anthropic.com, so a Copilot bridge often stays up while Anthropic is degraded. Two, if you already pay for Copilot you are not paying twice for tokens during the outage. The bridge has to translate Anthropic's message format on the wire (most popular bridges do this), and Fazm just respects whatever URL you put in the field. The same field also accepts a local LLM bridge if you have one running, though local models are usually a step down in capability.

Does Fazm have a different model family I can switch to without changing the endpoint?

Yes. Fazm ships a Codex backend toggle in the same Settings sidebar (Desktop/Sources/MainWindow/SettingsSidebar.swift line 86, advanced.codex.toggle, keywords 'codex, openai, chatgpt, gpt, gpt-5'). Enabling it adds GPT-5 family models to the picker through OpenAI's Codex CLI. When the Anthropic API is the broken thing, you keep the same Fazm window open and pick a GPT model from the dropdown; the session continues. This is genuinely a different upstream (OpenAI, not Anthropic), not just a different Claude model.

What about Bedrock and Vertex during outages?

Bedrock (AWS) and Vertex (Google Cloud) host Claude as managed services and have their own status pages, their own regional fleets, and their own outages that often do not correlate with api.anthropic.com. Claude Code supports both via environment variables and lets you fall back to a smaller model if a region is missing your selection. They are a real fallback, not a placebo. Caveat: when the underlying inference fleet itself has a bug, all three frontends (Anthropic, Bedrock, Vertex) can light up at once. The May 6, 2026 'Elevated errors across multiple models' incident is the kind of thing where any single Claude provider is suspect.

How does this differ from a YAML failover CLI that retries on 4xx/5xx?

A failover CLI sits in front of the model and retries the same request against the next model in the list. That is useful for transient errors and per-model degradation. The thing it does not change: when the wall is api.anthropic.com itself, every entry in the YAML that points at api.anthropic.com fails. Routing some agents through a different base URL is complementary, not redundant. Use both: a fallback list inside the API client, and a different upstream when the platform itself is the problem.

Why does Fazm matter for this if it is a desktop agent and not a CLI?

Because the people typing 'claude code outage parallel agents' into search are not just CLI users. Many of them are running parallel Claude Code agents from a desktop app or IDE that wraps Claude Code, and the outage is freezing their day. Fazm is a macOS app that runs Claude Code (and Codex) under the hood with one user-visible setting for upstream and one toggle for backend. During an outage you can keep typing into the same window. That is the practical difference: you are not editing dotfiles in an emergency.