Agent Logs as Open Letters to Nobody - Why Unread Documentation Has Value

M
Matthew Diakonov

Agent Logs as Open Letters to Nobody - Why Unread Documentation Has Value

Open letters to nobody are the purest form of writing. They exist without an audience in mind, without the performance of being read. Agent logs are exactly this - a continuous stream of documentation that mostly goes unread until something breaks.

The Paradox of Useful Unread Logs

Most agent logs are never read by a human. An AI desktop agent running through its daily tasks might generate thousands of log entries per hour - tool calls made, accessibility elements found, actions taken, errors encountered. In normal operation, nobody looks at these.

But the moment something goes wrong, those logs become the most valuable artifact in your system. They are the only record of what actually happened versus what was supposed to happen.

The research bears this out. AgentTrace, a 2025 structured logging framework from arXiv, found that agents instrumented with runtime logging could be debugged in hours - while equivalent uninstrumented agents required days of reproduction work to understand failures. The paper defines three surfaces worth capturing: cognitive (what the agent was reasoning about), operational (what tools it called), and contextual (what environment state it observed). Most teams capture the operational surface and skip the other two. That is why so many bugs are hard to reproduce.

What Good Agent Logs Actually Capture

A log entry that says ERROR: click failed is nearly useless. A log entry that says:

[2026-03-18T09:12:44Z] INTENT: locate "Submit" button in checkout form
[2026-03-18T09:12:44Z] ACCESSIBILITY_QUERY: role=button label="Submit" โ†’ 0 results
[2026-03-18T09:12:44Z] FALLBACK: searching by text content โ†’ 1 result (disabled)
[2026-03-18T09:12:45Z] ACTION_SKIPPED: button found but not interactable, reason=disabled
[2026-03-18T09:12:45Z] AGENT_DECISION: waiting for form validation to clear

...tells you the agent's goal, what it searched for, what it found, why it stopped, and what it decided to do next. That is the gap between a trace and a log. A trace is the complete story of a request. It is a parent-child hierarchy of events that connects every model interaction, every data retrieval, and every final response.

OpenTelemetry has become the standard instrumentation layer here. Major agent frameworks - LangGraph, AutoGen, CrewAI - are adopting OTel semantic conventions for spans and traces. That means your agent logs become queryable across tools and environments without any custom parsing.

Logs Shape Agent Behavior

There is a subtler value to logging that goes beyond debugging. When you design what an agent logs, you are implicitly defining what matters about its behavior. The act of deciding "this event is worth recording" creates a framework for understanding the agent.

This is why the best agent logging captures intent - what the agent was trying to do, what it expected to find, and what it actually found. The gap between expectation and reality is where all the interesting information lives. An agent that logs "expected 3 results, got 0" after a search step is surfacing the mismatch between its model of the world and the actual state of the environment. That is diagnostic gold.

Some teams instrument this explicitly: before any tool call, log the expected output schema. After the call, log whether the actual output matched. Mismatches above some threshold become automatic alerts - which is much cheaper than waiting for user complaints.

Documentation Nobody Reads Yet

Agent documentation has the same quality as unread logs. The CLAUDE.md files, the skill definitions, the memory systems - much of this documentation is written for a future reader who may never arrive. But the process of writing it clarifies thinking and creates structure.

Teams that document their agent's decision-making process, even when nobody reads it, build better agents. The documentation forces explicit choices about behavior that would otherwise remain implicit and inconsistent. "What happens when the agent cannot find the element it was looking for?" - writing the answer down, even in a file nobody will read today, means the answer exists when the edge case surfaces six months from now.

The Writing Discipline Creates Architecture

The logs do not need to be read to be useful. They need to be written. The discipline of structured logging creates better agent architecture the same way keeping a journal improves thinking even if you never re-read the entries.

A practical way to build this habit: write a log schema before you write the agent logic. Define the events you care about, the fields that should be present, and the conditions that trigger each one. Then implement the agent to produce those events. You will discover architectural problems before you write a line of agent code - because you will notice the events you cannot describe precisely are the ones where you have not yet decided what the agent should do.

Three fields that belong in every agent log event:

  • agent_intent - what the agent is trying to accomplish at this step
  • expected_state - what the agent's model predicts about the environment
  • actual_state - what the agent actually observed

The delta between expected and actual is your debugging surface. Log it consistently, and you have an open letter that future-you will thank present-you for writing.

More on This Topic

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts