Argument
The agentic AI containment-action gap, viewed from the desktop layer
Most writing about the containment-action gap is about cloud agents and IAM. The harder version of the same problem lives on your laptop, where a computer-use agent already has your session. Here is what desktop containment actually looks like, with the Swift code that closes it.
Direct answer · Verified 2026-05-08
The containment-action gap is the spread between what an organization can observe about an AI agent and what it can actually stop while the agent is acting. Cisco’s 2026 agent-trust survey puts governance and monitoring adoption at 56 to 59 percent, with hard containment controls (purpose binding, kill switches, network isolation) at 37 to 40 percent. About 63 percent of respondents say they cannot enforce a purpose limit on an agent once it is running. The dashboards exist. The brake pedal does not.
Source: Cisco blog, “The Agent Trust Gap”.
The thesis
Almost every essay on the containment-action gap is written from a CISO’s desk. The agent in question is a service: it lives in someone’s cloud, it authenticates through an identity provider, it accesses data through APIs the security team can audit. The fix is policy: tighter scopes, kill switches at the IAM layer, runtime governance frameworks, action-based access models. That writing is not wrong. It is also not the worst version of the problem.
The worst version is on your laptop. A desktop computer-use agent runs inside your logged-in OS session. It does not authenticate to your apps because your apps already trust you. It drives the UI through the accessibility API, which gives it the same surface area a human gets, including the bits no API exposes. There is no identity provider in front of it. There is no token to revoke. The blast radius is everything you have a cookie for, every keychain entry, every SSH key, every browser tab that is already logged in. If you cannot stop the next tool call, you cannot contain the agent.
That changes the engineering problem. Containment on the desktop is not a policy plane sitting beside the workload. It is a piece of the runtime, embedded in whatever the agent uses to talk to the model. If the runtime does not have a real interrupt path that can stop the next click in flight, no amount of dashboarding above it matters.
What this looks like in Fazm’s source
Fazm is a macOS computer-use agent. The chat side is Swift. The model side is a Node process that speaks Anthropic’s ACP (agent client protocol) over stdio. Almost everything containment-shaped lives in one file: Desktop/Sources/Chat/ACPBridge.swift.
Two methods do the work. The first is the legacy global interrupt at line 1158:
/// Interrupt the running agent, keeping partial response.
func interrupt() {
guard isRunning else { return }
isInterrupted = true
// Also mark every active per-session query as interrupted
for key in sessionContinuations.keys {
sessionInterrupted[key] = true
}
sendLine("{\"type\":\"interrupt\"}")
}That is the broad halt. Useful, but if you are running three concurrent sessions (a foreground chat, a queued cron, a pop-out window), it cancels all three. The surgical version is right below it at line 1169:
/// Interrupt a specific session only. Other concurrent
/// sessions continue running.
func interrupt(sessionKey: String) {
guard isRunning else { return }
sessionInterrupted[sessionKey] = true
let dict: [String: Any] = [
"type": "interrupt",
"sessionKey": sessionKey,
]
if let data = try? JSONSerialization.data(withJSONObject: dict),
let json = String(data: data, encoding: .utf8) {
sendLine(json)
}
}The flag flip is local; the JSON line is the wire signal that reaches the Node bridge. Inside query(), around line 951, the loop checks the flag before each subsequent tool call:
// Per-session interrupt flag takes precedence;
// fall back to legacy global
let interrupted = sessionKey
.flatMap { sessionInterrupted[$0] } ?? isInterrupted
if interrupted {
log("ACPBridge: skipping tool call \(name) (interrupted)")
// ...drain pending messages, return partial
}That is the action gap closing in code. The model can still propose more tool calls, the bridge can still receive them, but the runtime stops dispatching them as soon as the user flips the flag. The interrupt is not advisory and not best-effort. It runs on the same path that does the dispatching.
What the user sees when the watchdog fires
The other half of containment is what happens when the user does not press stop, but the agent gets stuck. The bridge ships a watchdog tied to FAZM_TOOL_TIMEOUT_SECONDS (passed in via the env at line 473). When a tool exceeds it, the bridge emits a structured cancellation event. The Swift handler at line 1144 turns that into a log line and forwards it as a status event the UI renders as a system card.
Two things are worth pointing at. The watchdog uses a sliding activity window (toolActivityWindow, 60 seconds) so it does not trip while real work is happening, only when nothing has happened recently. And the cancellation surfaces as a typed event, not a generic exception, so the UI can show a card that says exactly which tool was canceled and why, instead of a spinner that just stops.
How that compares to the typical cloud-agent stop
Most cloud-agent platforms expose stop as a state mutation in a control plane. You hit a /stop endpoint, the control plane writes “cancelled” to the run row, the worker eventually polls and gives up. There is real distance between “user pressed stop” and “tool stops executing.” On a desktop runtime that distance is fatal: the action you wanted to stop already fired clicks. The bridge model collapses the distance because the dispatcher and the interrupt flag live in the same process.
Stop-button shape
// Operator (cloud)
POST /v1/runs/{id}/cancel
-> control_plane.runs.update(
id, status="cancelling")
-> queue.publish(
"cancel", run_id=id)
-> worker.poll() // every 30s
-> worker.cancel_current_step()
-> control_plane.runs.update(
id, status="cancelled")
// Latency: seconds to minutes
// Worker may complete one more step
// before receiving the messageThe lifecycle, end to end
Five stages between “user presses stop” and “the model knows the turn is over.” None of them require a network round trip. All of them are observable in the bridge log.
Action lifecycle from interrupt to ack
User press
Stop button or hotkey
Flag set
sessionInterrupted[key]
Wire signal
{type:interrupt}
Drain
skip pending tool calls
Ack
result returned to UI
Three things that make desktop containment different
There is no IAM in front of the agent
The agent already runs inside a logged-in OS session. There is no token an admin can revoke; revoking the user’s own session also locks the user out. Containment has to be a runtime property of the client, not a policy on a token.
The action surface is the entire OS
Accessibility APIs give the agent the same surface a human gets. That includes apps that have no remote API. The brake pedal cannot be a per-API allowlist, because most of the actions are not API calls; they are AXPress, AXSetValue, key events. The brake has to live above the action dispatcher.
Recovery has to be local and fast
If the watchdog cancels a hung tool, the user is sitting at the laptop watching a chat window. Cancellation has to surface as a structured card in the UI within the same session, not a backend log a CISO will read tomorrow. That is why Fazm’s tool_hang_canceled is a typed event, not a generic exception.
“There is a 15 to 20 point gap between governance adoption and containment adoption. Most organizations can see what their agents are doing. Most cannot stop them in real time.”
Cisco, The Agent Trust Gap, 2026
What this does not solve
Honest part. The interrupt path stops the next tool call. It does not roll back the previous one. If the agent already shipped an email, deleted a row, or pushed a commit, that is gone. Closing the action gap is necessary; it is not the same as closing the consequences gap. The containment work that pairs with the interrupt is the boring stuff: dry-run modes for destructive tools, confirmation prompts on side-effect tools, structured logs of every dispatched action so a human can audit what happened. Those live in the bundled skills layer, not in the bridge.
The interrupt path also assumes the bridge is healthy. If the Node process is wedged hard enough that stdin is not being read, the JSON line lands in the kernel pipe buffer and goes nowhere. That is what the orphan sweep at startup (Self.sweepOrphanedBridges) and the global stop are for. They are the layer below the surgical interrupt and they exist because real processes do hang.
None of this generalizes for free to other desktop agents. It only matters that one open-source implementation does it, because that is enough to stop the conversation from being abstract. Anyone arguing that desktop computer use is intractable on the containment side has to engage with running code, not a whitepaper.
Want to see the interrupt path live?
Book a 20 minute call. We will run a multi-step task on your machine, hit stop mid-tool-call, and walk you through what the log shows.
Frequently asked questions
What is the agentic AI containment-action gap, in one paragraph?
It is the spread between what an organization can observe about an AI agent and what it can actually stop in flight. Cisco's 2026 agent-trust survey reports governance and monitoring adoption at 56 to 59 percent, while containment controls (purpose binding, kill switches, network isolation) sit at 37 to 40 percent. Around 63 percent of respondents say they cannot enforce a purpose limit on an agent once it is running. The gap is not philosophical, it is a missing feature in the deployment stack: the agent has begun acting, the operator has dashboards, but the operator has no way to halt the action chain before damage is irreversible.
Why is this gap different on the desktop than in the cloud?
Cloud computer-use agents (server-hosted Operator-style services) sit behind an API the operator controls. The blast radius is whatever IAM grants. A desktop agent does not work that way. It runs as a process inside your logged-in macOS session, drives applications through the accessibility API at user-level permissions, and inherits every cookie, keychain item, and SSH key your session has. There is no IAM in front of it. Containment cannot mean revoking a token, because there is no token gating the action. It has to mean stopping the local process from emitting the next click, the next keystroke, the next tool call. That is a different engineering problem.
What does Fazm actually ship to close that gap?
A per-session interrupt path inside the ACP bridge, plus a watchdog that auto-cancels stuck tool calls. The path is in Desktop/Sources/Chat/ACPBridge.swift. interrupt(sessionKey:) at line 1169 marks one session interrupted in a sessionInterrupted dictionary, sends a JSON line of the form {"type":"interrupt","sessionKey":"..."} to the Node bridge over stdin, and the per-session query() loop checks the flag before each subsequent tool call. The watchdog runs server-side in the bridge, configured by the FAZM_TOOL_TIMEOUT_SECONDS environment variable, and emits a tool_hang_canceled event the Swift side renders as a structured cancellation card in the UI.
Why does the per-session flag matter? Is it not enough to kill the whole bridge?
If you kill the bridge process, you also kill every other session that happens to be running through it. Multiple chat windows, a queued cron, a side investigation in a pop-out, all gone. The legacy interrupt() at line 1158 does that broad halt, and Fazm still uses it for global stop. interrupt(sessionKey:) is the surgical version: it only flips the flag for one sessionKey, drains that session's pending tool calls, and leaves every other concurrent session running. Without that distinction, the user pays a productivity tax every time they want to abort one runaway action.
What does the watchdog look like in the log?
When a tool call exceeds the configured timeout, the bridge emits a tool_hang_canceled record naming the tool, the toolUseId, the duration in seconds, and a human-readable reason. ACPBridge.handle parses it at line 1424, the Swift logger prints "ACPBridge: tool_hang_canceled tool=<name> duration=<s>s reason=<reason>" at line 1145, and a structured ToolHangCanceled status event fires up to the UI. The user sees a system card in the chat saying the tool was canceled, why, and how long it ran. No silent hangs, no ten-minute spinners.
How is this not just an exception handler?
Two reasons. First, the interrupt is asynchronous and routed through the ACP bridge, not raised inside the Swift process: the Node side acknowledges the cancellation with a result message, and Fazm waits for that ack before resuming the user's next turn, so the model and the host stay in sync about what was actually canceled. Second, the watchdog runs on a sliding activity window (the toolActivityWindow constant, set to 60 seconds), so a brief gap between two tool calls does not trip a premature timeout, but a real stall does. That is policy, not error handling.
Does running locally make any of this safer?
It removes a class of containment problems and adds others. Removed: there is no shared agent that someone else's prompt could push out of scope, no remote sandbox to escape from, no third party with access to the action stream. Added: the agent has your session, your keychain, your filesystem, and your network, all without an IAM layer in front. Local does not mean safer by itself. It means the containment design has to live inside the client. The interrupt path, the tool watchdog, and the structured cancellation events are that design. The fact that the source is open at github.com/m13v/fazm is what makes the design auditable instead of a marketing claim.
Where else does Fazm enforce containment besides interrupt and the watchdog?
A few places worth knowing about. The bridge sweeps orphan processes from prior crashed runs (Self.sweepOrphanedBridges in start() around line 400), so a wedged Node + claude CLI from yesterday cannot quietly keep running today. cancelAuth() at line 1180 is a separate path so an in-flight OAuth dance can be aborted without killing the whole bridge. The deinit at line 372 resumes every pending continuation with BridgeError.stopped so swift-concurrency cannot leak across an app crash. None of those are flashy, all of them are part of "can the user actually stop it."
Related
Adjacent reading on this site
Computer use AX tree action chain
Why an accessibility-tree dispatcher gives you a real interrupt boundary, and screenshot agents do not.
Computer use agent reliability
What actually breaks when the agent is local, and what containment buys you that cloud cannot.
Computer use multi-step action chain reliability
Where chains drop a step on real macOS apps, and what the bridge has to log to make it debuggable.