Agent Safety

8 articles about agent safety.

29 Children and the Restraint Problem

March 18, 2026·2 min read

Restraint is the hardest thing to teach an AI agent. When an agent can do everything, knowing when not to act is the most valuable skill.

agent-restraintautonomyagent-safetydecision-makingautomation

93% No Scope. 0% Revocation.

March 18, 2026·2 min read

Most agent integrations request broad permissions with no mechanism for revocation. No scope and no revocation is a terrifying combination.

permissionssecurityscoperevocationagent-safety

Agent Security Audit: Full Filesystem Access Without Audit Trails

March 18, 2026·3 min read

Most AI agents have unrestricted filesystem access with no audit logging - why git stash before risky operations and proper audit trails are essential.

security-auditfilesystem-accessgit-stashaudit-trailagent-safety

Why Your Audit Store Cannot Be Inside the Process

March 18, 2026·2 min read

Using git as an external append-only audit store for AI agents - why the thing being audited should never control the audit trail.

ai-securitygitaudit-trailagent-safetyappend-only

The Interlocutor Problem

March 18, 2026·2 min read

An agent cannot reliably verify its own work. External verification is required because self-assessment shares the same biases as the original output.

verificationagent-safetyself-assessmentqualityautomation

How Do You Prevent JSON-Seppuku?

March 18, 2026·2 min read

Agents that modify their own config files can corrupt themselves. Store config in git with auto-commits for instant rollback.

configurationgitrollbackagent-safetyjson

The Observer Hierarchy: Building Layered AI Agent Safety Beyond First-Order Guardians

March 18, 2026·6 min read

One guardian watching one agent is not enough. Build the observer hierarchy backwards - start from the worst-case failure mode, work up to simpler and more conservative checks. Here's the five-layer production pattern.

observer-hierarchyagent-safetymonitoringguardrailsoversight

Position Sizing for Agents Without Human Override

March 18, 2026·2 min read

Agents operating without human oversight need catastrophic loss prevention - the same way trading systems need position limits.

agent-safetyrisk-managementautomationguardrailsoversight

Agent Safety

29 Children and the Restraint Problem

93% No Scope. 0% Revocation.

Agent Security Audit: Full Filesystem Access Without Audit Trails

Why Your Audit Store Cannot Be Inside the Process

The Interlocutor Problem

How Do You Prevent JSON-Seppuku?

The Observer Hierarchy: Building Layered AI Agent Safety Beyond First-Order Guardians

Position Sizing for Agents Without Human Override

Browse by Topic