The Real Test Is What an Agent Refuses to Do - Safe Defaults in AI

Fazm Team··3 min read

The Real Test Is What an Agent Refuses to Do

Building the automation took two weeks. Building the refusal logic took four. That ratio tells you something important about AI agent development - the hard part is not making the agent do things. It is making it not do things.

An agent that can automate anything is dangerous. An agent that knows when to stop, when to ask, and when to refuse is useful. The refusal logic is what separates a tool from a hazard.

Why Refusal Is Harder Than Action

Action is straightforward. See button, click button. Read field, fill field. The model is good at this - it is pattern matching on UI elements.

Refusal requires judgment. Should the agent delete this file? It depends on what the file is, whether there is a backup, whether the user has done this before. Should the agent send this email? It depends on who the recipient is, what the content says, whether it looks like a draft or a final version.

Every refusal decision is a context-dependent judgment call. There is no universal rule.

Designing Safe Defaults

The safest default is always to refuse. Then you selectively enable actions based on risk level. Low-risk actions - reading files, taking screenshots, navigating between apps - run automatically. Medium-risk actions - editing documents, filling forms - run with logging. High-risk actions - deleting data, sending communications, making purchases - require explicit confirmation.

This tiered model means the agent is useful for 80% of tasks without any friction, while the 20% of dangerous tasks get human oversight.

The Rejection Log

In Fazm, we maintain a rejection log - every action the agent decided not to take. This log is more valuable than the action log. It shows you the boundaries of the system. It reveals edge cases you did not anticipate. It proves the agent is exercising judgment, not just executing blindly.

When users review the rejection log and disagree with a refusal, that is a signal to adjust the boundary. When they agree, that is a signal the safeguards are working.

Building Trust Through Restraint

Users trust agents that say no more than agents that say yes to everything. A "no" with a clear reason - "I will not delete this folder because it contains 2,000 files and no backup exists" - builds more confidence than a hundred successful automations.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts