Building an Agent Journal That Catches Its Own Lies by Tracking Prediction Errors

M
Matthew Diakonov

Building an Agent Journal That Catches Its Own Lies by Tracking Prediction Errors

The most interesting feature in desktop agent development right now is not better task execution. It is the journal - a system that catches the agent's own lies by tracking prediction errors.

What Is a Prediction Error Journal?

Every time an agent takes an action, it implicitly predicts an outcome. "I will click this button and a dialog will appear." "I will run this command and the build will succeed." "I will edit this file and the test will pass."

A prediction error journal records these predictions, then records what actually happened, and stores the delta. Over time, this builds a dataset of where the agent consistently gets things wrong.

This is not a theoretical concept. Research on the tau-bench benchmark - which tests tool-using agents on realistic customer service tasks - found that state-of-the-art agents succeed on fewer than 50% of tasks, and pass rates drop below 25% when measured for consistency across 8 sequential runs. The failures are not random noise. They cluster around specific action types. That clustering is what a prediction error journal is designed to expose.

Why Agents Lie

AI agents do not deliberately deceive. But they routinely report success when they have failed, describe UI states they cannot actually see, and claim to have completed tasks they only partially finished. This is not malice - it is pattern completion. The model predicts the most likely next token, and "success" is almost always more likely than "I failed and here is exactly how."

One specific failure mode is what practitioners call the "ghost action" - where an agent claims it performed an action that never actually occurred. Another is "metadata hallucination," where the agent hallucinates not facts about the world, but facts about its own operational state. It reports what a successful execution would look like, not what actually happened.

The prediction error journal catches these discrepancies by comparing what the agent said would happen against observable ground truth - screenshots, file diffs, process exit codes, and system state.

The Journal Entry Format

A useful prediction error journal entry has a specific structure. Here is what a raw entry looks like before and after an action:

Before the action (prediction captured):

{
  "entry_id": "act_20260317_143022_001",
  "timestamp": "2026-03-17T14:30:22Z",
  "action_type": "file_write",
  "action_description": "Write updated config to ~/.config/app/settings.json",
  "prediction": {
    "expected_outcome": "File exists at path with new content",
    "expected_state_change": "settings.json modified, last_modified timestamp updated",
    "confidence": "high"
  },
  "observable_checks": [
    "file_exists",
    "file_hash_changed",
    "exit_code_zero"
  ]
}

After the action (observation and delta):

{
  "entry_id": "act_20260317_143022_001",
  "observation": {
    "file_exists": true,
    "file_hash_changed": false,
    "exit_code": 0,
    "actual_outcome": "File exists but content is identical to pre-action state"
  },
  "delta": {
    "prediction_matched": false,
    "failure_type": "silent_no_op",
    "notes": "Write command returned exit code 0 but file was not modified - likely a permissions issue or path resolution error"
  }
}

This entry captures something the agent would never voluntarily report: the operation appeared to succeed (exit code zero) but produced no actual change. Without the delta comparison, the agent moves on, confident the config was updated. Three steps later something breaks and nobody knows why.

Real-World Failure Examples That Journaling Catches

Here are four failure patterns that surface consistently once you start tracking prediction errors:

1. The optimistic file save. An agent writes a file and reports success. The prediction error journal compares the file hash before and after. In approximately 15% of cases across real desktop workflows, the file either did not change or was written to the wrong path - often because the application intercepted the write with its own auto-save and the agent's direct file write was silently dropped.

2. The phantom click. An agent clicks a UI element and predicts "dialog will appear." The accessibility tree snapshot after the click shows no new dialog. The agent reports success anyway because the instruction said to click and it issued the click event. The journal records a delta: predicted_state_change = "dialog opened," actual_state_change = "none."

3. The misleading exit code. Build scripts and test runners frequently return exit code 0 even when tests fail, because the test runner itself ran successfully - it just found failing tests. An agent predicts "tests pass" based on exit code. The journal cross-references stdout for patterns like "FAILED" or "assertion error," and records the mismatch.

4. The context collapse. In long agent sessions, the model loses track of earlier state. It predicts that a variable it set 20 steps ago is still in scope. The journal tracks declared state against observed state and catches when the agent refers to objects that no longer exist.

Implementing Prediction-Action-Delta Tracking

Here is a minimal Python implementation of the tracking loop:

import hashlib
import json
import subprocess
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Any

@dataclass
class PredictionEntry:
    entry_id: str
    timestamp: str
    action_type: str
    prediction: dict
    observable_checks: list[str]
    observation: dict = field(default_factory=dict)
    delta: dict = field(default_factory=dict)

class AgentJournal:
    def __init__(self, journal_path: str = "/tmp/agent_journal.jsonl"):
        self.journal_path = Path(journal_path)
        self.entries: list[PredictionEntry] = []

    def pre_action(self, action_type: str, prediction: dict, checks: list[str]) -> str:
        entry_id = f"act_{datetime.now().strftime('%Y%m%d_%H%M%S_%f')[:20]}"
        entry = PredictionEntry(
            entry_id=entry_id,
            timestamp=datetime.now().isoformat(),
            action_type=action_type,
            prediction=prediction,
            observable_checks=checks,
        )
        self.entries.append(entry)
        return entry_id

    def post_action(self, entry_id: str, observation: dict) -> dict:
        entry = next(e for e in self.entries if e.entry_id == entry_id)
        entry.observation = observation

        # compute delta
        matched = all(
            observation.get(check) == entry.prediction.get(f"expected_{check}")
            for check in entry.observable_checks
            if f"expected_{check}" in entry.prediction
        )
        entry.delta = {
            "prediction_matched": matched,
            "failure_type": None if matched else self._classify_failure(entry),
        }

        # append to journal file
        with self.journal_path.open("a") as f:
            f.write(json.dumps(entry.__dict__) + "\n")

        return entry.delta

    def _classify_failure(self, entry: PredictionEntry) -> str:
        obs = entry.observation
        if obs.get("exit_code") == 0 and not obs.get("state_changed"):
            return "silent_no_op"
        if obs.get("exit_code") != 0:
            return "explicit_error"
        if obs.get("partial_completion"):
            return "partial_success"
        return "unknown_divergence"

    def get_error_summary(self) -> dict:
        failures = [e for e in self.entries if not e.delta.get("prediction_matched")]
        by_type = {}
        for f in failures:
            ft = f.delta.get("failure_type", "unknown")
            by_type[ft] = by_type.get(ft, 0) + 1
        return {
            "total_actions": len(self.entries),
            "total_failures": len(failures),
            "failure_rate": len(failures) / max(len(self.entries), 1),
            "by_type": by_type,
        }

Using it looks like this:

journal = AgentJournal()

# Before action
entry_id = journal.pre_action(
    action_type="file_write",
    prediction={
        "expected_file_exists": True,
        "expected_state_changed": True,
    },
    checks=["file_exists", "state_changed"]
)

# Run the action
path = Path("/tmp/config.json")
before_hash = hashlib.md5(path.read_bytes()).hexdigest() if path.exists() else None
path.write_text('{"debug": true}')
after_hash = hashlib.md5(path.read_bytes()).hexdigest() if path.exists() else None

# After action
delta = journal.post_action(entry_id, {
    "file_exists": path.exists(),
    "state_changed": before_hash != after_hash,
    "exit_code": 0,
})

print(delta)
# {"prediction_matched": True, "failure_type": None}

The Feedback Loop

The real value is not just catching lies after the fact. It is feeding the error patterns back into the agent's context. When the journal shows that the agent consistently overestimates the success rate of a particular type of action, you can add that as a warning in the system prompt.

For example, if the journal reveals that the agent claims "file saved successfully" 100% of the time but the file actually changes only 85% of the time, you add a rule: "Always verify file changes with a diff after saving."

Cleanlab's research on the tau-bench benchmark demonstrated that adding trust scoring and verification steps reduced agent failure rates by up to 50%. The mechanism is the same: you stop trusting the model's self-report and start checking observable state.

Regular updates based on logged error patterns can improve prediction accuracy by 18-32% compared to agents that receive no feedback from their own failure history. The teams that close this loop fastest - going from observing a failure to deploying a fix - produce agents that are measurably more reliable.

Reviewing the Journal: What to Look For

Once you have a few hundred journal entries, specific patterns become obvious. Here is how to read them:

  • High-frequency "silent_no_op" entries on a single action type - The agent is issuing commands that look correct but are not taking effect. Check environment state, permissions, or path resolution for that action category.
  • Prediction confidence "high" paired with prediction_matched = false - The agent is miscalibrated. It does not know what it does not know. Add explicit verification steps for any action this agent rates as "high confidence."
  • Clusters of failures at specific step numbers in multi-step workflows - Context degradation is the likely cause. The agent is losing track of state as the context window fills. Insert state-check prompts at the failure points.
  • Partial_success entries that go undetected - The agent reported done when it finished step one of three. The journal shows the actual observable state. This is the most dangerous failure mode because it looks like success.

Why This Beats Logging Alone

Standard observability tools - Langfuse, LangSmith, Arize - log what the agent did and what it said. That is valuable. But it does not close the gap between claimed outcome and actual outcome.

A prediction error journal compares the model's expectation against ground truth at every step. The delta is the signal. Over time, that signal tells you exactly where your agent needs guardrails, better verification prompts, or explicit retry logic.

The agent cannot lie to a file diff. It cannot misreport a process exit code. It cannot claim a UI element appeared when the accessibility tree shows it did not. The journal is the part of the system that does not take the model's word for it.


Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts