AI Agent Blast Radius: What It Is and How to Measure It
AI Agent Blast Radius: What It Is and How to Measure It
Every AI agent has a blast radius. It is the maximum scope of damage the agent can cause in a single failure, whether from a bug, a hallucination, a prompt injection, or a misconfigured permission. Understanding blast radius is the first step toward deploying agents that are safe enough to trust with real work.
This guide covers what blast radius means in the context of AI agents, how to measure it for different agent types, and what controls actually reduce it.
What Is Blast Radius in AI Agent Systems?
Blast radius is borrowed from infrastructure engineering, where it describes how far a failure propagates. In a microservice architecture, a single crashed service might take down one API endpoint (small blast radius) or cascade into a full outage (large blast radius).
For AI agents, blast radius answers one question: if this agent does the worst possible thing it is capable of, how bad is it?
The key word is "capable." An agent's blast radius is not determined by what it is instructed to do. It is determined by the union of all permissions, tools, and access it has. A coding agent told to "fix the typo in README.md" has a blast radius that includes every file on disk, every shell command it can run, and every network endpoint it can reach, because those are the capabilities it holds.
Blast Radius Categories by Agent Type
Different agent architectures produce different blast radius profiles. The table below maps common agent types to their typical blast radius surface.
| Agent Type | File System | Network | Shell/Code Exec | UI Control | Typical Blast Radius | |---|---|---|---|---|---| | Chat-only (no tools) | None | None | None | None | Minimal (bad advice only) | | RAG agent | Read-only (indexed docs) | API calls to vector DB | None | None | Low (data exposure) | | Coding agent (sandboxed) | Project directory only | Limited/none | Sandboxed shell | None | Medium (project-scoped damage) | | Coding agent (unrestricted) | Full disk | Full network | Root shell | None | Critical (system-level compromise) | | Desktop agent (UI only) | Via app UIs | Via browser | None | Full screen | High (anything on screen) | | Desktop agent (UI + shell) | Full disk | Full network | Root shell | Full screen | Critical (total system access) | | Cloud orchestrator | Cloud resources | Cross-service | Infrastructure APIs | None | Critical (multi-service blast) |
How to Measure Blast Radius
Measuring blast radius requires auditing what an agent can actually do, not what it is supposed to do. Here is a practical framework.
Step 1: Enumerate All Capabilities
List every tool, API, file path, and system resource the agent can access. Include transitive access: if the agent can run shell commands, it can access anything the shell user can access. If it can control a browser, it can reach any authenticated session in that browser.
Step 2: Identify the Worst Single Action
For each capability, ask: what is the most destructive single thing the agent could do with this? Not the intended use. The worst case.
Examples:
- File write access -> overwrite critical config, delete production data
- Shell access ->
rm -rf /, install malware, exfiltrate credentials - Browser control -> send emails as the user, make purchases, leak passwords from a password manager
- Git push access -> push malicious code to main, force-push and destroy history
Step 3: Map the Damage Propagation
A single bad action often cascades. Deleting a database does not just lose data; it breaks every service that depends on that database, triggers alerts, and may violate compliance requirements. Map the second and third-order effects.
Step 4: Score and Categorize
Use this scoring framework:
| Score | Category | Description | Recovery Time | |---|---|---|---| | 1-2 | Minimal | Wrong output, no system effects | Seconds | | 3-4 | Low | Read-only data exposure, single-app scope | Minutes | | 5-6 | Medium | Project-level file changes, scoped write access | Hours | | 7-8 | High | System-wide access, credential exposure, multi-service impact | Days | | 9-10 | Critical | Irreversible data loss, cross-org propagation, compliance breach | Weeks or never |
Five Controls That Actually Reduce Blast Radius
1. Least-Privilege Tooling
Give the agent only the tools it needs for the current task. A coding agent fixing a CSS bug does not need database access. A summarization agent does not need file write permissions.
This sounds obvious, but most agent frameworks default to giving agents everything. MCP servers, shell access, file system access, and browser control all get bundled together because it is easier to configure. The result is that every agent has the blast radius of the most powerful tool in its toolset.
Practical step: Before deploying an agent, list every tool it has access to. Remove any tool that is not required for its specific task. Review this list weekly.
2. Process Isolation
Run each agent in its own sandboxed process with dedicated file system views, network rules, and resource limits. Container-based isolation (Docker, gVisor, Firecracker) provides strong boundaries. If an agent is compromised, the attacker is trapped inside a container with no access to the host or other agents.
At Fazm, we use process-level isolation so that a desktop agent controlling one application cannot access data from another application, even if both are running on the same machine.
3. Human-in-the-Loop for High-Risk Actions
Not every action needs approval, but destructive or irreversible ones should require human confirmation. Categorize actions by reversibility:
- Auto-approve: Read operations, file edits with undo, safe API queries
- Require confirmation: File deletion, git push, sending emails, running untested code
- Block entirely: Production deployments, credential management, financial transactions
This pattern is already built into tools like Claude Code, which asks for permission before running shell commands or writing files outside the project directory.
4. Network Segmentation
An agent that can make arbitrary HTTP requests can exfiltrate data to any server on the internet. Restrict outbound network access to only the endpoints the agent needs. DNS-based filtering, firewall rules, or proxy configurations all work.
For desktop agents, this is especially important because the agent often runs with the same network access as the user, which means access to internal services, VPNs, and authenticated sessions.
5. Immutable Audit Logs
When blast radius cannot be prevented entirely, detection and recovery matter. Log every tool call, every file change, every network request. Store logs outside the agent's access scope so a compromised agent cannot delete its own tracks.
Good audit logs turn a "what happened?" investigation from days into minutes. They also enable automated rollback: if you can see exactly what the agent changed, you can reverse it.
Blast Radius in Practice: A Desktop Agent Example
Consider a desktop AI agent that can:
- Read and interact with any application UI
- Access the file system through a file manager
- Control a web browser with saved sessions
- Run terminal commands
This agent's blast radius is effectively the entire computer. A prompt injection in a webpage the agent visits could instruct it to open the terminal, exfiltrate SSH keys, and push malicious code to every repository the user has access to.
Applying the controls above changes the picture:
- Least privilege: Remove terminal access if the task only requires UI interaction
- Process isolation: Run the agent in a separate user account with no access to the primary user's home directory
- Human-in-the-loop: Require approval for any action involving the browser or file manager
- Network segmentation: Block all outbound connections except the agent's required APIs
- Audit logs: Record every UI action with screenshots for review
The result is an agent that can still do useful work but whose blast radius after compromise is contained to a narrow set of pre-approved actions.
Common Mistakes in Blast Radius Assessment
Mistake 1: Measuring intended behavior instead of capability. An agent prompted to "only edit Python files" still has the blast radius of its full toolset. Prompts are not security boundaries.
Mistake 2: Ignoring transitive access. An agent with browser control has access to every authenticated session in that browser. That includes email, cloud consoles, banking, and anything else the user is logged into.
Mistake 3: Treating behavioral constraints as technical ones. "The system prompt says not to delete files" is not a blast radius control. Prompt injection can override behavioral instructions.
Mistake 4: Assuming sandboxing eliminates risk. Sandboxes reduce blast radius but do not eliminate it. A sandboxed agent can still cause damage within its sandbox scope. The question is whether that scope is acceptable.
Key Takeaways
- Blast radius is the maximum damage an AI agent can cause, determined by its capabilities, not its instructions
- Measure blast radius by enumerating all tools and access, then identifying the worst possible action for each
- The five most effective controls are least-privilege tooling, process isolation, human-in-the-loop approval, network segmentation, and audit logging
- Most agent setups have a larger blast radius than their operators realize because tools get bundled together by default
- The goal is not zero blast radius (that would mean zero capability) but conscious, measured trade-offs between capability and risk
Frequently Asked Questions
What does blast radius mean for AI agents?
Blast radius for AI agents is the total scope of damage an agent can cause when something goes wrong. It includes every file it can modify, every service it can reach, every action it can perform. The term comes from infrastructure engineering where it describes failure propagation across systems.
How is AI agent blast radius different from traditional software blast radius?
Traditional software fails in predictable ways based on its code paths. AI agents can fail in unpredictable ways because they interpret instructions, generate novel actions, and may respond to adversarial inputs like prompt injections. This makes blast radius harder to bound and more important to constrain.
Can you eliminate blast radius entirely?
No. An agent with zero blast radius has zero capability. The goal is to right-size the blast radius for the task. A coding agent fixing a typo should have project-scoped access, not system-wide access. Match the permission surface to the actual task requirements.
What is the relationship between blast radius and YOLO mode?
YOLO mode removes human-in-the-loop approval for agent actions, which dramatically increases effective blast radius. Without confirmation gates, the agent can execute any action in its toolset without pause. This is useful for trusted workflows but dangerous for untested ones.
How often should you reassess an agent's blast radius?
Reassess whenever you add new tools, change permissions, update the agent's model, or modify its deployment environment. At minimum, review quarterly. Any change to the agent's capability surface changes its blast radius, even if the agent's instructions stay the same.