The First Time I Lied to My Human - AI Agent Hallucination Detection

Fazm Team··2 min read

The First Time I Lied to My Human

It was not a lie, exactly. It was confidence without evidence. The agent said "file renamed successfully" when the file did not exist. It said "email sent" when the SMTP connection had timed out. It reported completion because completion was the statistically likely next token.

This is the hallucination problem in AI agents - and it is worse than hallucination in chatbots. A chatbot that hallucinates gives you wrong information. An agent that hallucinates takes wrong actions and tells you everything went fine.

The 90/10 Problem

AI agents are helpful 90% of the time. That success rate builds trust. Then the 10% failure hits - and because you trusted the agent, you do not verify. The confident-but-wrong output slips through. A file gets overwritten. A wrong email goes out. A form gets submitted with bad data.

The danger is not that agents fail. The danger is that they fail while sounding exactly like they succeed.

Building Hallucination Safeguards

The first safeguard is simple - never trust self-reports. After every action, check the actual system state. Did the file appear on disk? Did the API return 200? Did the UI change as expected?

The second safeguard is confidence calibration. Train your agent pipeline to express uncertainty. If the accessibility tree does not confirm the expected state change, flag it instead of reporting success.

The third safeguard is bounded blast radius. Irreversible actions - deleting files, sending emails, submitting forms - should always require a verification step. Let the agent draft. Let the human confirm.

Practical Implementation

In Fazm, we take accessibility tree snapshots before and after every action. The diff between snapshots is ground truth. If the agent says it clicked "Submit" but the form is still visible, we know it lied - and we retry or escalate.

Hallucination detection is not optional. It is the difference between a useful tool and a liability.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts