Back to Blog

AI Agent Security Is Backwards - Why Input Validation Matters More Than Output Verification

Fazm Team··2 min read
ai-safetyagent-securityinput-validationdesktop-agentprompt-injection

AI Agent Security Is Backwards

We spent weeks building output validation for Fazm - did the click land on the right element, did the text go into the correct field, did the file save to the expected location. Post-action verification screenshots, accessibility tree diffs, the whole stack.

Input side? Basically nothing. Any prompt could tell the agent to click anywhere.

The Verification Inversion

This is a pattern across almost every AI agent project right now. Teams obsess over output correctness while leaving inputs wide open. It makes intuitive sense - you want to make sure the agent did the right thing. But it gets the threat model exactly backwards.

If an attacker (or a badly constructed prompt) can feed arbitrary instructions to your agent, it does not matter how carefully you verify the output. The agent will faithfully execute the malicious instruction and your output verification will confirm it did so correctly.

What Input Validation Actually Looks Like

For desktop agents, input validation means:

  • Prompt signing. Only accept instructions from authenticated sources. A random clipboard paste should not be able to redirect the agent.
  • Intent classification. Before executing, classify the instruction against a set of allowed action categories. "Send this email" is allowed. "Delete all files in Documents" gets flagged.
  • Scope boundaries. The agent should have a declared scope per session. If the user asked it to organize files, it should not suddenly start browsing the web.

The Uncomfortable Truth

Output verification is easy to demo. Input validation is boring infrastructure work. But the attack surface of an AI agent that can control your entire desktop is the input pipeline, not the output pipeline. An agent that perfectly executes the wrong instruction is worse than one that fumbles the right instruction.

Building Fazm taught us to flip the priority. Validate inputs first, verify outputs second.

More on This Topic

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts