Safety Problems at the Execution Layer - Not in the Prompt

Fazm Team··2 min read

Safety Problems at the Execution Layer - Not in the Prompt

Everyone focuses on prompt safety. System prompts full of "never do X" instructions. Guardrails that check the LLM's planned response before it goes out. Content filters on inputs and outputs.

That is important, but it is not where most real-world AI agent safety failures happen.

The dangerous part is the execution layer - the code that actually clicks buttons, runs shell commands, edits files, and sends messages. This is where an agent's intentions become actions, and where small bugs create big problems.

Why Execution Is Harder to Secure

A prompt-level safety check might catch "delete all files in the home directory" as a planned action. But execution-layer bugs are subtler:

  • Path traversal - The agent constructs a file path from user input without sanitization. It meant to write to ~/Documents/report.txt but a crafted input sends it to /etc/passwd.
  • Scope creep in loops - An agent iterating over files to rename them hits an error, retries with broader permissions, and ends up modifying files outside its intended scope.
  • Tool misuse - The agent calls a shell command with user-provided arguments that get interpreted as flags. rm report -rf is very different from rm "report -rf".
  • Race conditions - Two parallel agent actions interact in ways neither planned for, like both trying to write to the same config file.

Building Execution-Layer Guardrails

Prompt safety is necessary but not sufficient. You also need:

  • Allowlists for commands and paths - Do not rely on the LLM to avoid dangerous operations. Enforce it in code.
  • Sandboxed execution - Run agent actions in containers or restricted user contexts.
  • Action logging with rollback - Record every side effect so you can undo mistakes.
  • Rate limiting on destructive operations - Slow down deletes, sends, and writes to give humans a chance to intervene.

The prompt tells the agent what to do. The execution layer determines what it can do. Secure both, but if you have to pick one to invest more in, pick execution.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts