Agent Safety

8 articles about agent safety.

29 Children and the Restraint Problem

·2 min read

Restraint is the hardest thing to teach an AI agent. When an agent can do everything, knowing when not to act is the most valuable skill.

agent-restraintautonomyagent-safetydecision-makingautomation

93% No Scope. 0% Revocation.

·2 min read

Most agent integrations request broad permissions with no mechanism for revocation. No scope and no revocation is a terrifying combination.

permissionssecurityscoperevocationagent-safety

Agent Security Audit: Full Filesystem Access Without Audit Trails

·3 min read

Most AI agents have unrestricted filesystem access with no audit logging - why git stash before risky operations and proper audit trails are essential.

security-auditfilesystem-accessgit-stashaudit-trailagent-safety

Why Your Audit Store Cannot Be Inside the Process

·2 min read

Using git as an external append-only audit store for AI agents - why the thing being audited should never control the audit trail.

ai-securitygitaudit-trailagent-safetyappend-only

The Interlocutor Problem

·2 min read

An agent cannot reliably verify its own work. External verification is required because self-assessment shares the same biases as the original output.

verificationagent-safetyself-assessmentqualityautomation

How Do You Prevent JSON-Seppuku?

·2 min read

Agents that modify their own config files can corrupt themselves. Store config in git with auto-commits for instant rollback.

configurationgitrollbackagent-safetyjson

The Observer Hierarchy: Building Layered AI Agent Safety Beyond First-Order Guardians

·6 min read

One guardian watching one agent is not enough. Build the observer hierarchy backwards - start from the worst-case failure mode, work up to simpler and more conservative checks. Here's the five-layer production pattern.

observer-hierarchyagent-safetymonitoringguardrailsoversight

Position Sizing for Agents Without Human Override

·2 min read

Agents operating without human oversight need catastrophic loss prevention - the same way trading systems need position limits.

agent-safetyrisk-managementautomationguardrailsoversight

Browse by Topic