Traced from Anthropic's live rate_limit_event stream

How does Claude extra usage work: the five-state machine behind every session

Most explanations of this feature describe the billing side: you pay standard API rates after your Pro, Max 5x, or Max 20x included usage is gone. True, but shallow. Underneath, every Claude session is a five-state machine that Anthropic emits on the wire in real time. This guide traces the transitions end to end, from allowed through allowed_warning and into overage_allowed, with the exact field names a well-written client reads to know which state it is in.

Matthew Diakonov, Fazm maintainer

Published April 23, 202610 min read

4.9from Traced from the Fazm acp-bridge source

Five-state transition table, from the SDK

surpassedThreshold vs utilization, explained

Patched ACP entry forwards the dropped event

Works with your own Claude Pro or Max seat

Extra usage is a state machine, not a toggle

Five states, emitted in real time on rate_limit_event

allowed: inside the window

allowed_warning: 80 to 85 percent used

rejected: window exhausted, overage off

overage_allowed: isUsingOverage flips true

overage_rejected: monthly cap hit

0:00 / 0:05

The framing most articles miss

Every guide currently online about this topic treats extra usage as a billing feature: turn it on in settings, set a cap, pay standard API rates when you exceed your Pro or Max budget. That description is correct and useless. It tells you what is billed, not what actually happens on the wire when a session walks from healthy to exhausted to overage.

The useful framing is a state machine. Anthropic's server-side rate limiter tracks two budgets per seat, a rolling 5-hour session window and a 7-day weekly window, and emits the current state on every single response via a structured event called rate_limit_event. The event carries eight fields. Three of them fully describe the state you are in: status, isUsingOverage, and overageStatus.

Once you see it as a state machine, every behavior that feels mysterious from the outside has an obvious explanation. Why did the chat keep going past my Max 20x limit? Because isUsingOverage flipped to true on the next event. Why did it suddenly stop? Because overageStatus turned to "rejected" with an overageDisabledReason of "monthly_cap_reached". No dashboard refresh needed.

0states on the wire

0fields per event

0status string values

0 of 5is an overage state

5 states

“Every Claude session walks through up to five states in one timeline. The SDK emits the current state on every response. The stock ACP agent drops the event. A patched client forwards it.”

Fazm acp-bridge/src/patched-acp-entry.mjs, lines 131 to 152

The five states, in order

Any given request is in exactly one of these states. The transitions happen automatically based on your usage and your account configuration; you don't trigger them explicitly. What follows is the exact sequence a long-running session walks through, with the fields on the rate_limit_event that identify each state.

1. allowed

status = "allowed". utilization is any value below the warning threshold. isUsingOverage = false. You are inside your included budget with no cause for concern. This is the state 95 percent of requests live in for most users.

2. allowed_warning

status = "allowed_warning". utilization has crossed the early-warning threshold (typically 0.80 to 0.85). surpassedThreshold reports the exact boundary that was just crossed. Requests still succeed, but the client should show a visible indicator so the user isn't surprised when the window closes.

3. rejected

status = "rejected". utilization has hit 1.0 for the current rateLimitType. resetsAt carries a Unix timestamp telling you when the window rolls over. If extra usage is off, the chat stops here. If extra usage is on and has headroom, the very next event transitions to state 4.

4. overage_allowed

isUsingOverage flips to true. rateLimitType becomes "overage". overageStatus = "allowed". Requests continue, now metered against the monthly overage budget and billed at standard API rates. The utilization value now represents progress toward the monthly cap, not the 5-hour or weekly bucket.

5. overage_rejected

overageStatus flips to "rejected". overageDisabledReason carries a short string explaining why: no_payment_method, monthly_cap_reached, admin_disabled, billing_paused, or a generic fallback. Chat stops until the reason is resolved: raise the cap, add a card, wait for the next billing cycle.

What the events actually look like on the wire

Here is a real sequence of three consecutive rate_limit_event payloads as they transition from state 2 (allowed_warning) through state 3 (rejected) into state 4 (overage_allowed). These are the fields Anthropic actually emits, mapped one-to-one to the shape Fazm declares in acp-bridge/src/protocol.ts at line 244.

rate_limit_event stream, three consecutive events

The field no one writes about: surpassedThreshold

Look again at the three events above. Event N carries surpassedThreshold: 0.80. Event N+1 carries surpassedThreshold: 1.0. Event N+2 carries surpassedThreshold: null. That field is the programmatic answer to the question "did I just cross a boundary?" It is declared at acp-bridge/src/protocol.ts line 254, a 0-1 float, documented inline as the threshold level that was just surpassed.

Think of it as a complement to utilization. Utilization is continuous: it changes on every event. surpassedThreshold is discrete: it reports the last boundary crossed and fires at the moment of transition. A client that reads only utilization has to diff against the previous value to know a threshold was crossed. A client that reads surpassedThreshold gets the transition handed to it.

Excerpt, acp-bridge/src/protocol.ts lines 244 to 256

/** Rate limit info from Claude API (forwarded from SDK rate_limit_event) */
export interface RateLimitMessage {
  type: "rate_limit";
  status: "allowed" | "allowed_warning" | "rejected" | "unknown";
  resetsAt: number | null;           // Unix timestamp (seconds)
  rateLimitType: string | null;      // "five_hour" | "seven_day" | etc.
  utilization: number | null;        // 0-1 float
  overageStatus: string | null;      // "allowed" | "rejected"
  overageDisabledReason: string | null;
  isUsingOverage: boolean;
  surpassedThreshold: number | null; // 0-1 float
  sessionId?: string;
}

How the event travels from Anthropic to your Mac

Anthropic emits the event. Your client has to receive it, parse it, and render it. In practice most clients never do, because the default path eats the event before the application ever sees it. Here is the journey in Fazm, which patches every hop in the chain so the state reaches the SwiftUI layer intact.

rate_limit_event delivery path

The patched entry point is the load-bearing piece. It lives at acp-bridge/src/patched-acp-entry.mjs and it explicitly re-emits any item whose value.type equals "rate_limit_event", copying the eight fields from value.rate_limit_info onto a session update of type rate_limit.

A single request, traced through the machine

Here is what happens when a Mac desktop client sends a single query that happens to be the one that tips you into overage. Each step is an actual line in the Fazm source, traceable from the accessibility layer through the bridge into Anthropic and back.

One request, one state transition

What the client should actually do in each state

A state machine is only useful if the client acts differently in each state. Here is what Fazm does in each of the five.

allowed

No UI change. Utilization is logged to local telemetry, not surfaced. The user should never know this state exists.

allowed_warning

Heads-up indicator in the floating bar: 'session limit 84%, resets 7:42 PM'. PostHog fires rate_limit_event with status=allowed_warning for aggregate monitoring.

rejected

Visible banner showing the rateLimitType label ('session limit', 'weekly limit', 'Opus weekly limit', 'Sonnet weekly limit') and the reset time. Chat input disabled.

overage_allowed

'Extra usage' badge in the bar. Continues accepting queries. No blocking. An optional PostHog event fires for product analytics.

overage_rejected

Actionable copy derived from overageDisabledReason. 'Add a card', 'raise your monthly cap', or 'your admin disabled extra usage'. Not 'rate limited, try again later'.

what a well-behaved client does with rate_limit_event

Forwards the event past the stock ACP runner so the application layer can see it
Stores status, resetsAt, rateLimitType, and utilization on reactive state
Reads surpassedThreshold separately from utilization to detect boundary crossings
Maps rateLimitType to a human label so users see 'session limit' rather than 'five_hour'
Maps overageDisabledReason to actionable copy rather than a generic error
Formats resetsAt with the user's local timezone, not UTC
Logs allowed_warning to product telemetry for aggregate monitoring
Treats the SDK event as the source of truth, not the billing dashboard

Watching the transitions happen, live

You can tail the Fazm dev log and watch the state machine walk through its transitions during a real session. Every log line corresponds to a single handleRateLimitEvent call in ChatProvider.swift at line 513.

tail -f /tmp/fazm-dev.log

What actually triggers the flip from rejected to overage

Three conditions have to be simultaneously true for a session to transition from state 3 (rejected) into state 4 (overage_allowed). Miss any one and the chat stops. This is the part the official docs don't spell out as a conjunction.

overage trigger conditions, all three required

Budget exhausted

utilization = 1.0 on the current rateLimitType (five_hour, seven_day, seven_day_opus, or seven_day_sonnet)

Extra usage on

toggled on at settings.claude.com with a payment method on file and no admin_disabled flag

Cap has headroom

month-to-date overage spend is below the user-configured monthly cap

overage_allowed

next rate_limit_event arrives with isUsingOverage=true and overageStatus=allowed

states in the machine, enumerated above

fields on the wire that describe current state

events forwarded by the stock ACP runner without a patch

Practical consequences for anyone using Claude hard

Once you understand this is a state machine, a few practical decisions fall out naturally.

First: long-running agents should check status and isUsingOverage between expensive steps. A multi-tool workflow that silently flips into overage in the middle of the session will bill per-token at Opus rates without the user ever being told. Reading the event lets the agent show a confirmation before spending real money.

Second: the resetsAt value is more actionable than "try again later". If the window resets in 12 minutes, the right UX is to tell the user that and optionally queue the request. Fazm formats this with formatResetTime at ChatProvider.swift line 549.

Third: on Max plans with differentiated Opus and Sonnet weekly caps, rateLimitType tells you which bucket is the bottleneck. A client can fall back from Opus to Sonnet on seven_day_opus rejection without switching plans, which saves money and preserves flow.

Want to see the state machine live on your own Mac?

Book 20 minutes with the Fazm team. We'll walk you through the bridge, show the patched ACP entry, and tail the rate_limit_event stream on a real session.

Frequently asked questions

How does Claude extra usage work, in one paragraph?

Every request inside a paid Claude plan flows through a server-side rate limiter that tracks two budgets: a rolling 5-hour session window and a 7-day weekly window. On every response the SDK emits a rate_limit_event carrying the current state as a tuple of status, rateLimitType, utilization, resetsAt, overageStatus, overageDisabledReason, isUsingOverage, and surpassedThreshold. Until you exhaust your included budget, status is "allowed" and isUsingOverage is false. Crossing a warning threshold flips status to "allowed_warning". Running out flips status to "rejected" with a resetsAt timestamp. If extra usage is enabled and you still have headroom under your monthly cap, the next event arrives with isUsingOverage = true and rateLimitType = "overage", and requests keep flowing, billed at standard API rates.

What are the states the rate limiter actually walks through?

Five distinct states on the wire: allowed (inside the window, no warning), allowed_warning (inside the window but past the early-warning threshold, typically around 80 to 85 percent utilization), rejected (window exhausted and extra usage is off or capped), overage_allowed (pay-as-you-go path is active, isUsingOverage is true, overageStatus is allowed), and overage_rejected (you have hit your monthly overage cap, overageStatus is rejected with an overageDisabledReason string). Fazm's acp-bridge/src/protocol.ts at line 247 declares the first three as literal string union values on the RateLimitMessage interface. The overage pair is carried on overageStatus, declared on line 251 of the same file.

What does surpassedThreshold actually mean and how is it different from utilization?

Utilization is a continuous 0-1 value on every event: it answers 'where am I in the window right now'. surpassedThreshold is a different 0-1 value that reports the last boundary you crossed: 0.5, 0.8, 0.95, and 1.0 are typical values. It fires at the moment of a transition, not on every event. A client that only tracks utilization has to infer that a threshold was just crossed by comparing to a previous value. A client that reads surpassedThreshold gets the answer handed to it. The field is declared at acp-bridge/src/protocol.ts line 254 and forwarded on patched-acp-entry.mjs line 146.

Why don't most Claude clients show this state in real time?

Because the stock Agent Client Protocol runner that ships with claude-code drops rate_limit_event before it reaches the application layer. It reads the event, uses it internally for retry decisions, and doesn't forward it downstream. Fazm patches this: patched-acp-entry.mjs lines 131 to 152 wrap the session update stream, detect any item with value.type === 'rate_limit_event', pull rate_limit_info off it, and re-emit the payload as a sessionUpdate of type rate_limit. Without that patch, a desktop or IDE client has no idea where the user is in the window until Anthropic returns a 429.

When exactly does extra usage 'flip on' from the user's perspective?

Three conditions have to be true at the same time. One, your current rateLimitType budget has been exhausted (utilization hits 1.0). Two, extra usage is enabled on your settings.claude.com account with a card on file. Three, your month-to-date extra usage spend is under the cap you set. When those three align, the next rate_limit_event arrives with rateLimitType = "overage", isUsingOverage = true, overageStatus = "allowed", and a utilization value that now meters against the overage budget rather than the Pro/Max budget. If any of the three conditions fails, you get status = "rejected" with a resetsAt Unix timestamp telling you when the included window rolls over.

What values does overageDisabledReason take when extra usage is off?

Anthropic emits a short string explaining why the overage path is unavailable. Typical values are no_payment_method (extra usage is enabled but no card is on file), monthly_cap_reached (you have hit your self-set monthly spend cap), admin_disabled (an org admin turned it off for this seat), billing_paused (the account is in a hold state), and a generic fallback when the SDK can't map the server reason. Fazm stores the raw string on ChatProvider.rateLimitStatus and renders it verbatim in the floating bar so the user sees actionable copy instead of 'rate limited, try again later'.

Does extra usage bill the same way for Sonnet and Opus?

No. Extra usage is billed at standard API rates for whichever model served the request, so a response from Opus costs substantially more per token than the same response from Sonnet. The rate_limit_event doesn't carry the per-token price (the SDK exposes usage per message elsewhere), but rateLimitType can distinguish the two: seven_day_opus and seven_day_sonnet are separate weekly buckets on Max plans, which means a user can be at 100 percent on Opus while Sonnet still has headroom. The five rateLimitType values are listed at ChatProvider.swift line 537 onward.

How does the 5-hour window actually reset?

The 5-hour window is rolling, not calendar-aligned. Every accepted request pushes its consumed tokens into a bucket timestamped at the request time. Anthropic's limiter sums all tokens in the last 5 hours on each new request. That means your resetsAt timestamp is not a single fixed point; it's the time the oldest heavy request in the window ages out. On every rate_limit_event Anthropic emits the current resetsAt as a Unix seconds value. Fazm's ChatProvider stores it on @Published var rateLimitResetsAt (line 380) and SwiftUI reformats it into a local 'resets 7:42 PM' string via formatResetTime (line 549).

Can extra usage be paused mid-task, or is it all-or-nothing for the session?

It can flip mid-task. The rate limiter evaluates on every request, so if your month-to-date extra spend crosses your cap between two tool calls, the next call gets overageStatus = rejected with overageDisabledReason = monthly_cap_reached. The task the agent was running stops wherever it was. Fazm maps this to the 'extra usage limit' label (ChatProvider.swift line 543) and surfaces an actionable upgrade prompt rather than a generic 429.

If I'm using the Claude API directly, does extra usage apply?

No. Extra usage is a subscription concept. API keys bill every call at API rates from the first token, so there is no included-plan window to exhaust and no overage to flip on. The rate_limit_event stream still exists on API calls (it carries organization-level quota and abuse limits), but the overage-specific fields (overageStatus, overageDisabledReason, isUsingOverage) are not meaningful in that mode. Fazm's bridge distinguishes these two paths explicitly: ACPBridge initializes in either personalOAuth or bundledKey mode depending on how the user authenticated.

What should a well-behaved client do when it sees status = allowed_warning?

Three things. One, surface a visible but non-blocking indicator in the UI so the user knows the window is running out before a task stalls. Two, log the event for telemetry so the product team can see aggregate warning distributions. Three, if the current session is doing expensive work (Opus calls, long tool chains), consider finishing the current step before starting another. Fazm logs the warning with 'Rate limit warning - X% of TYPE used' (ChatProvider.swift line 524) and ships the event to PostHog via AnalyticsManager.rateLimitEvent at line 1007.

Where does the truth live: the billing dashboard or the SDK event?

The SDK event is closer to ground truth for the current state because it is emitted in-line with the request that caused it. The billing dashboard is eventually consistent: it aggregates usage on a delay. Two practical consequences. One, a client that renders state from the rate_limit_event will always reflect the reality of the next request faster than a page refresh can. Two, month-end invoice reconciliation still happens on the server side; the event stream is the live signal, the invoice is the system of record.