The Procedure Is the Proof - Visual Verification in AI Desktop Automation
The Procedure Is the Proof
When an AI agent says it completed a task, how do you know? You could trust the self-report. You could check the result manually. Or you could build the proof into the procedure itself.
Screenshots before and after each action are the simplest form of proof-of-action. The agent clicks a button - screenshot. The page changes - screenshot. A form gets submitted - screenshot of the confirmation. The sequence of screenshots is not just verification. It is the audit trail.
Why Visual Proof Matters
Desktop automation is opaque by default. The agent operates in a subprocess, clicking and typing where no human is watching. When something goes wrong, there is no way to reconstruct what happened.
Visual verification changes this. Every action leaves a trace. If the agent says it filed an expense report, you can see the screenshots - the form filled out, the submit button clicked, the confirmation page displayed. If any step is missing or wrong, you know exactly where it broke.
Screenshots vs. Accessibility Tree Diffs
Screenshots give you visual proof that humans can review. Accessibility tree diffs give you machine-readable proof that agents can verify automatically. The best systems use both.
The accessibility tree tells you the state changed correctly - the button disappeared, the success message appeared, the form fields cleared. The screenshot tells you it looked right - no overlapping dialogs, no error banners, no unexpected popups.
Building an Audit Trail
In Fazm, every automation session generates a sequence of timestamped screenshots and accessibility tree snapshots. This serves three purposes - real-time verification during execution, post-hoc debugging when something fails, and compliance documentation for sensitive workflows.
The storage cost is minimal. A compressed screenshot is 50-100KB. A full day of automation generates maybe 500 screenshots - under 50MB. That is a small price for complete auditability.
The Proof Is the Product
Users do not just want automation. They want automation they can trust. Showing the proof - here is exactly what happened, step by step - is what converts skeptics into daily users.
Fazm is an open source macOS AI agent. Open source on GitHub.
- Multi-Agent Verification with Screenshots - Scaling visual proof across agents
- AI Agent Self-Report Trap - Why self-reports are unreliable
- Post-Action Verification in AI Agents - Beyond HTTP status codes