Machine-Enforceable Policy

Matthew Diakonov·March 18, 2026·2 min read

ai-safety policy sandboxing security ai-agents

Tell an AI agent "do not access the user's email" and it will comply. Probably. But there is no mechanism that prevents it from trying. Policy enforcement for agents is almost entirely honor-based.

The Honor System Problem

Current agent safety works like this: you write rules in a system prompt, and the model follows them because it was trained to. But system prompts can be overridden by clever inputs. Guardrails can be bypassed. And there is no OS-level enforcement that says "this process literally cannot open Mail.app."

The gap between stated policy and enforced policy is where incidents happen.

OS Sandboxing Falls Short

macOS has sandboxing. Apps can declare entitlements that restrict their access. But AI agents typically need broad permissions to be useful - accessibility access, file system access, network access. A sandboxed agent that cannot read files or interact with applications is an agent that cannot do its job.

The granularity is wrong. We need policies like "can read files in ~/Projects but not ~/Documents" or "can interact with Terminal but not Messages." Current sandboxing is too coarse for agent-specific policies.

What Machine-Enforceable Means

Real enforcement requires:

Runtime capability checking - the OS verifies every action against a policy before executing it
Tamper-proof policy storage - the agent cannot modify its own restrictions
Audit logging - every action is recorded regardless of whether it was allowed
Revocation - permissions can be removed mid-session without restarting

Until we build this layer, agent safety is a gentleman's agreement. It works until it does not.

Fazm is an open source macOS AI agent. Open source on GitHub.

Machine-Enforceable Policy

The Honor System Problem

OS Sandboxing Falls Short

What Machine-Enforceable Means

More on This Topic

Related Posts

Why Community Skill Repos Need Platform-Level Sandboxing

Special Token Injection Attacks on AI Coding Agents

What It Means to Have a Human

Comments ()

The Honor System Problem

OS Sandboxing Falls Short

What Machine-Enforceable Means

More on This Topic

Related Posts

Why Community Skill Repos Need Platform-Level Sandboxing

Special Token Injection Attacks on AI Coding Agents

What It Means to Have a Human

Comments (••)

Comments ()