Agent Safety
8 articles about agent safety.
29 Children and the Restraint Problem
Restraint is the hardest thing to teach an AI agent. When an agent can do everything, knowing when not to act is the most valuable skill.
93% No Scope. 0% Revocation.
Most agent integrations request broad permissions with no mechanism for revocation. No scope and no revocation is a terrifying combination.
Agent Security Audit: Full Filesystem Access Without Audit Trails
Most AI agents have unrestricted filesystem access with no audit logging - why git stash before risky operations and proper audit trails are essential.
Why Your Audit Store Cannot Be Inside the Process
Using git as an external append-only audit store for AI agents - why the thing being audited should never control the audit trail.
The Interlocutor Problem
An agent cannot reliably verify its own work. External verification is required because self-assessment shares the same biases as the original output.
How Do You Prevent JSON-Seppuku?
Agents that modify their own config files can corrupt themselves. Store config in git with auto-commits for instant rollback.
The Observer Hierarchy: Building Layered AI Agent Safety Beyond First-Order Guardians
One guardian watching one agent is not enough. Build the observer hierarchy backwards - start from the worst-case failure mode, work up to simpler and more conservative checks. Here's the five-layer production pattern.
Position Sizing for Agents Without Human Override
Agents operating without human oversight need catastrophic loss prevention - the same way trading systems need position limits.