Detecting Signals - Edge Cases in Production Agent Work
Detecting Signals - Edge Cases in Production Agent Work
The hardest part of running agents in production is not handling errors. It is detecting that something is wrong when nothing has technically failed.
Weak Signals, Not Loud Failures
A loud failure is easy. The API returns a 500. The file does not exist. The process crashes. Your monitoring catches it, pages you, and you fix it.
Weak signals are different. Response times gradually increase by 50ms per day. The agent starts choosing a slightly different tool for the same task. Output quality drifts imperceptibly over hundreds of runs. Each individual data point looks fine. The trend tells a different story.
Edge Cases That Look Normal
A customer name field that contains an emoji. A date in a timezone your agent has never seen. An email thread with 47 participants where the agent cannot determine who to reply to. A file path with spaces and special characters that works on macOS but breaks when synced to Linux.
These are not theoretical. These are the actual cases that break production agents. Each one passed testing because testing did not include that specific combination of inputs.
Building Detection Systems
The practical approach is statistical. Track the distribution of your agent's decisions over time. When the distribution shifts, investigate. If your agent normally takes 3-4 tool calls per task and suddenly starts taking 7-8, something changed in the input or the environment.
Monitor confidence scores, not just outcomes. A gradual decline in confidence across many tasks suggests environmental drift even when success rates stay high.
Accepting Imperfection
You will never catch every edge case before it hits production. The goal is not prevention - it is fast detection and graceful degradation. An agent that says "I am not sure about this one" is more valuable than one that confidently processes everything, including the cases it should have flagged.
Fazm is an open source macOS AI agent. Open source on GitHub.