Field notes from one shipping macOS agent

Agent persistent session state, the rollover trap nobody warns you about

Every guide on this topic walks you through picking a backend (SQLite, Redis, MongoDB) and a memory taxonomy (short-term, long-term, episodic). None of them warn you that the upstream session ID itself is not stable, and that any system built around the assumption that it is will silently lose context the first time a rate limit fires on a live conversation. This is what the missing layer looks like, written from the inside of one product that learned the hard way.

M
Matthew Diakonov
9 min read

Direct answer (verified 2026-05-06)

Two-layer the identity. The conversation has a stable, client-generated ID that never changes for the life of the conversation. The upstream agent SDK session ID is transient and will roll forward on rate limits, credit exhaust, bridge restart, and upstream expiry. Keep an append-only chain of every upstream ID the conversation has ever held, stamp messages with the current upstream ID, and replay the last N messages loaded across the full chain as a preamble on every send. Not just on detected resume failure. Every send.

Source: github.com/m13v/fazm, file Desktop/Sources/Providers/ChatProvider.swift, chain helpers around line 562 to 625, priorContext code path around line 2904 to 2960.

The trap

You build an agent. You give every conversation a session ID. You store messages in a row stamped with that session ID. You think you are done. The first month you run it against a live model, you watch a rate limit fire on someone’s conversation, and you watch the next reply come back as though the model never met them.

The reason is that the SDK you are running on top of did exactly what its docs said it would: it detected the session was unusable, rolled forward to a fresh session ID, and accepted a preamble of prior context. It just did not tell your message store that the rollover happened. Your priorContext lookup is filtering by the new session ID. The pre-rollover messages, stamped with the old one, are stranded in the same table, three rows away.

This is the rollover trap. It is the actual reason most agent persistent session state implementations look fine on a happy path and fall apart in a real production week. The fix is structural, not defensive: the upstream session ID is not the conversation’s identity. The conversation has its own identity, and the upstream ID is a transient handle the conversation has held.

What rollover actually looks like

Walk through one rate-limit scenario, two turns apart, on the same logical conversation. The host is a Swift app, the bridge is a Node process running the Anthropic Agent SDK, the model is Claude. The exact actors do not matter. The shape is what matters.

One rate limit, two turns apart

HostBridgeModelsend turn 1, sessionId=Astream(A)tokens for turn 1reply, sessionId=Asend turn 2, sessionId=Astream(A)rate_limit_exceededcreate sessionId=Bpreamble + turn 2 on (B)tokens for turn 2reply, sessionId=B

On turn 3, the host writes a message stamped with B. On turn 4, your priorContext lookup runs WHERE session_id = ? against the current upstream ID, which is B, and quietly leaves turn 1 (stamped A) out of the preamble. The model wakes up with the conversation half-amputated. It still replies, because models are good at filling in plausible context, but the user notices that the agent has somehow forgotten the thing it agreed to do four minutes ago.

The lookup is not wrong. The schema is not wrong. The data is all there, in the same table, on disk. The wrong thing is the assumption that the conversation and the upstream session share an identity. They do not.

The chain pattern

Once you accept the upstream ID as transient, the implementation is small. The conversation has a stable client-generated identity, written exactly once when the user opens a new chat. Alongside it, you keep an append-only list of every upstream session ID this conversation has ever owned. Every time the SDK hands the host a new upstream ID, you append. The chain does not care about the order of failures, only that it has them all.

// Desktop/Sources/Providers/ChatProvider.swift
private static let sessionChainMaxSize = 16
private static let sessionChainSuffix = "_chain"

private static func appendToSessionChain(_ id: String, storageKey: String) {
    let trimmed = id.trimmingCharacters(in: .whitespaces)
    guard !trimmed.isEmpty else { return }
    let chainK = chainKey(forStorageKey: storageKey)
    var chain = UserDefaults.standard.stringArray(forKey: chainK) ?? []
    if chain.last == trimmed { return }
    chain.removeAll { $0 == trimmed }
    chain.append(trimmed)
    if chain.count > sessionChainMaxSize {
        chain = Array(chain.suffix(sessionChainMaxSize))
    }
    UserDefaults.standard.set(chain, forKey: chainK)
}

private static func persistSessionId(_ id: String, storageKey: String) {
    guard !id.isEmpty else { return }
    UserDefaults.standard.set(id, forKey: storageKey)
    appendToSessionChain(id, storageKey: storageKey)
}

The chain is capped at sixteen entries because UserDefaults is not the right place for unbounded growth and a single conversation rarely owns more than two or three IDs in a normal week. If your storage is a real database, the cap can be larger, but it should still be there. The chain becomes the slow query in the priorContext path on the long-lived conversations you most want to be fast.

Reset only on explicit New Chat or sign-out. Never on a transient failure. The whole point of the chain is to outlive the failures.

PriorContext on every send, not on detected failure

The second part of the pattern is the rule that the host always sends a small preamble of recent messages with every turn, even when nothing appears to be wrong. The bridge ignores it on the happy path and only consults it for recovery: when the SDK reports a session resume failure, or when a prior turn returned empty text from a poisoned upstream session. Without unconditional priorContext, those recovery paths cannot replay history, and the recovery becomes a silent context loss.

// On every send, regardless of whether resume is being attempted:
let storageKey = "acpSessionId_\(key)_\(bridgeMode)"
var ids = Self.loadSessionChain(storageKey: storageKey)
if let head = UserDefaults.standard.string(forKey: storageKey),
   !head.isEmpty,
   !ids.contains(head) {
    ids.append(head)
}
let recent = await ChatMessageStore.loadMessages(
    context: ctx,
    sessionIds: ids.isEmpty ? nil : ids,
    limit: 20
)
// recent is forwarded to the bridge as priorContext on every turn.

One DB read of about twenty rows per send is microseconds. The cost of skipping it on the turn that silently rolled is a model that wakes up with no memory of the conversation, mid-task, and a user who decides the agent is broken. The tradeoff is not close.

The five rules, in source order

Persistent session state checklist

  • Two identities per conversation: a stable, client-generated conversation ID for storage, and a transient upstream session ID for the SDK.
  • Append-only chain in durable storage, capped (16 works), recording every upstream session ID this conversation has ever held.
  • Message rows stamped with the upstream session ID, not the conversation ID. The chain spans, the rows are narrow.
  • PriorContext loaded across the full chain, sent on every turn, not only on detected resume failure.
  • Reset the chain only on explicit New Chat or sign-out. Never on rate limit, credit exhaust, bridge restart, or any other transient failure.

What this looks like in the storage layer

The shape of the message table itself is small. One row per message, with the conversation context as taskId, the upstream session as session_id, plus the usual id / sender / text / timestamps. The session_id column is indexed so the priorContext load is a fast lookup against the chain.

-- Desktop/Sources/AppDatabase.swift, fazmV5 migration
ALTER TABLE chat_messages ADD COLUMN session_id TEXT;
CREATE INDEX idx_chat_messages_session
    ON chat_messages(session_id);

-- Insert each message with the current upstream session id:
INSERT OR REPLACE INTO chat_messages
    (taskId, messageId, sender, messageText,
     createdAt, updatedAt, backendSynced, session_id)
VALUES (?, ?, ?, ?, ?, ?, 0, ?);

-- Read across the full chain on every send:
SELECT messageId, sender, messageText, createdAt
FROM chat_messages
WHERE taskId = ?
  AND session_id IN (?, ?, ?, ...)  -- the chain
ORDER BY createdAt DESC
LIMIT 20;

Nothing exotic. The whole pattern is one extra column, one extra index, and a deduped string list in durable storage on the client. The cleverness is in the discipline of always loading by the chain and always sending priorContext, not in the data model.

Why this matters more for a computer-use agent

A chat agent that loses session state has to re-greet the user. Annoying, not dangerous. A computer-use agent that loses session state can re-execute a destructive action it already executed: re-send an email, re-charge a card, re-write a file, re-post a Reddit comment. The model does not know it already did the thing, because the thing it did is gone from its preamble. The chain plus per-send priorContext is what lets the agent continue a multi-step task across an upstream rollover without doubling work.

Fazm is a macOS computer-use agent that drives the browser, edits documents, and operates Google Apps via accessibility APIs rather than screenshots. It runs against the Claude Agent SDK through a Node bridge. Every constraint above (rate limit on a live conversation, bridge OOM, OAuth token rotation) has fired in production, and the chain is the layer that made the difference between a continued conversation and an agent that re-sent the same email twice.

OSS

Fazm uses real accessibility APIs and screen context instead of screenshot-based approaches. Works with any app on your Mac, not just the browser. Free to start, fully open source.

fazm.ai

Adopting the pattern in five steps

1

Pick a stable conversation identity

Generate a UUID on the client when the user opens a new chat. Store it next to your transcripts. It never changes for the life of the conversation. This is your real identity, not whatever the SDK gives you.

2

Add a session_id column to the message table

One nullable text column. Index it. Stamp each row with the current upstream session ID at write time. Do not stamp with the stable conversation ID; you want narrow rows and a wide chain.

3

Keep the chain in durable client storage

Append-only string list, deduped on the head, capped at sixteen entries. UserDefaults works on macOS, IndexedDB on web, Room on Android, sqlite on a server. The chain is the spine of recovery.

4

Load priorContext across the chain on every send

Roughly twenty rows, ordered by createdAt, loaded with WHERE session_id IN (chain). Forward it to the bridge or the model as a preamble. Cost: microseconds. Benefit: rollover-safe by default.

5

Reset only on explicit user actions

New Chat, sign-out, pop-out into a fresh window. Not on rate limit. Not on bridge restart. The whole point of the chain is to outlive the failures, so any reset path is a place to look for bugs first.

Where this differs from the framework guides

The published guides for LangGraph, OpenAI Agents SDK, Microsoft Agent Framework, Google ADK, and Databricks all describe persistence as a backend choice plus a memory taxonomy. Pick your storage (file SQLite, async SQLite, Redis, SQLAlchemy, MongoDB, Dapr). Pick your memory layer (short-term, long-term, episodic). Pass your session ID back on resume.

All of that is fine. None of it addresses the case where the framework’s own session abstraction loses identity under it. The chain pattern is not a replacement for any of those layers. It is the layer above them. The framework handles the persistence of one session. The chain handles the continuity of one conversation across the framework’s sessions.

If you have built on top of any of those frameworks for more than a few months and have a real conversation that has survived a rate-limit afternoon, you have probably already invented something close to this in your own code. This page is a name for that thing, plus the line numbers in one shipping product where the discipline is enforced.

Building a computer-use agent and hitting the rollover trap?

Twenty minutes on what we did, what we tried that did not work, and the bits of the chain that took six months to settle.

Frequently asked questions

Short version: how do you keep agent session state across restarts?

Two-layer the identity. The conversation has a stable, client-generated ID that lives forever in your local DB and never changes during the conversation's life. The upstream agent SDK has its own session ID, which is a transient handle the SDK can lose at any time. Keep an append-only chain of every upstream session ID a conversation has ever owned, write messages stamped with the current upstream ID, and on every send replay the last N messages loaded across the entire chain as a preamble. The model sees a continuous conversation even when the upstream handle has rolled three times under it.

Why does the upstream session ID roll over at all?

Because session creation is bounded by a token budget on the upstream side, and any of these will invalidate the handle: hitting a rate limit, hitting the per-month credit cap on a Claude or OpenAI account, the bridge process getting OOM-killed by macOS, the SDK detecting a corrupted resume payload, the user's OAuth token rotating, or the upstream session simply expiring on a TTL. The SDK reaction in every case is the same: create a fresh session ID, return it to the host, accept a priorContext block as a preamble, and continue. Without a chain, the host loses the handle to its own previous messages on the very next priorContext lookup.

Why send priorContext on every turn instead of only on detected resume failure?

Because the failure modes are not all loud. A poisoned upstream session can return empty text on a turn without raising a resume error. A bridge process can get killed and restarted between two consecutive sends without the host noticing. The cost of unconditional priorContext is one DB read of about 20 rows per send, which is microseconds. The cost of skipping it on a turn that silently rolled is the model waking up with no memory of the conversation, mid-task. The host code in this product runs the DB read every send, comments out the cleverness, and treats robustness as the default.

What is the right size for the chain cap?

Sixteen entries works in practice for a conversation that lives for hours. Each rollover is a discrete event, not continuous, so even a heavy day rarely produces more than two or three. Sixteen is enough headroom for a multi-day conversation that survives a bad rate-limit afternoon plus a bridge crash plus an OAuth rotation, without unbounded growth in UserDefaults. If your storage layer is something other than UserDefaults, the cap can be larger, but you still want one. An unbounded chain becomes the slow query in the priorContext load path on long-lived conversations.

Should the message rows store the upstream session ID or the stable conversation ID?

The upstream session ID, on whichever rows it covers. That way the chain mechanism is purely additive: the message store does not need to know about rollovers, and you can always reconstruct the full conversation by loading by the chain. Stamping with the stable conversation ID would also work but loses information you might want later (which messages were on which upstream session, which upstream session got rate-limited, where the rollovers fell). Keep the row stamp narrow and let the chain do the spanning.

When do you reset the chain?

Only on explicit user actions that mean a new conversation: the user clicks New Chat, the user signs out, the user pops a chat out of one window into another fresh window. Never on a transient failure. A rate limit is not a new conversation, it is a glitch in an old one, and resetting the chain on it is exactly the bug this whole pattern exists to prevent.

Where does this differ from frameworks like LangGraph, OpenAI Agents SDK, or Microsoft Agent Framework?

Those frameworks give you a session abstraction with a single ID. They handle persistence by writing turns to a backend (SQLite, Redis, MongoDB, Dapr) under that one ID, and they document resume by passing that one ID back. None of them, in their published guides, describes what happens when the underlying model session itself rolls. The session abstraction in those guides is treated as the stable layer. In a real shipping product on top of a frontier model, it is not. The chain pattern in this guide is what you bolt under those frameworks once you have run them long enough to see a rate limit on a live conversation.

Is this only relevant to chat agents, or does it apply to computer-use agents too?

It matters more for computer-use agents, not less. A chat agent that loses session state has to re-greet the user. A computer-use agent that loses session state can re-execute a destructive action it already executed (re-sending an email, re-charging a card, re-writing a file) because it does not know it already did. The chain plus per-send priorContext is what lets a computer-use agent continue a multi-step task across an upstream rollover without doubling work.

Where in the source can I read the actual implementation?

The chain itself is in Desktop/Sources/Providers/ChatProvider.swift in the public Fazm repo. The constant sessionChainMaxSize is defined at line 574, the appendToSessionChain helper at lines 586 to 598, the loadSessionChain helper at lines 601 to 604, the persistSessionId wrapper that writes both the head ID and the chain at lines 615 to 619, and the priorContext-on-every-send code path at lines 2904 to 2960. The schema for the underlying chat_messages table, including the session_id column, is in Desktop/Sources/AppDatabase.swift around the fazmV5 migration at line 990. All MIT-licensed at github.com/m13v/fazm.

What if my agent does not run a separate bridge process? Do I still need this?

Yes, with one simplification. Without a separate bridge, you do not have to worry about the bridge process restarting independently of the host, but you still face rate limits, credit exhaust, and upstream expiry on the model API itself. The pattern collapses to: one stable conversation ID, one chain of model-session IDs, priorContext on every send. If you are running directly against an Anthropic or OpenAI API and using their session abstractions, the chain is the layer above their session, not below it.