Machine-Enforceable Policy

Fazm Team··2 min read

Machine-Enforceable Policy

Tell an AI agent "do not access the user's email" and it will comply. Probably. But there is no mechanism that prevents it from trying. Policy enforcement for agents is almost entirely honor-based.

The Honor System Problem

Current agent safety works like this: you write rules in a system prompt, and the model follows them because it was trained to. But system prompts can be overridden by clever inputs. Guardrails can be bypassed. And there is no OS-level enforcement that says "this process literally cannot open Mail.app."

The gap between stated policy and enforced policy is where incidents happen.

OS Sandboxing Falls Short

macOS has sandboxing. Apps can declare entitlements that restrict their access. But AI agents typically need broad permissions to be useful - accessibility access, file system access, network access. A sandboxed agent that cannot read files or interact with applications is an agent that cannot do its job.

The granularity is wrong. We need policies like "can read files in ~/Projects but not ~/Documents" or "can interact with Terminal but not Messages." Current sandboxing is too coarse for agent-specific policies.

What Machine-Enforceable Means

Real enforcement requires:

  • Runtime capability checking - the OS verifies every action against a policy before executing it
  • Tamper-proof policy storage - the agent cannot modify its own restrictions
  • Audit logging - every action is recorded regardless of whether it was allowed
  • Revocation - permissions can be removed mid-session without restarting

Until we build this layer, agent safety is a gentleman's agreement. It works until it does not.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts