What Half a Million Desktop Agent Actions Taught Us About Failure
What Half a Million Desktop Agent Actions Taught Us About Failure
After collecting telemetry on over 500,000 desktop agent actions, we finally have real numbers instead of guesses. Some confirmed what we suspected. Others surprised us.
The Top Failures
Clicking the wrong element accounts for 34% of all failures. Not "element not found" - the agent finds an element and clicks it, but it is the wrong one. A similarly labeled button, a nearby link, a duplicate element that appeared after a dynamic page update. The agent is confident. The click lands somewhere wrong.
Timeout errors come second at 22%. The agent waits for something to appear and it never does - usually because a previous action silently failed and the expected UI state never materialized.
Stale element references are third at 18%. The agent identified the right element, but by the time it acted, the DOM or accessibility tree had changed.
The Top Successes
Form filling is the most reliable action type with a 94% success rate. Text fields are predictable, seldom move, and have clear boundaries. The agent types, the text appears. Simple.
Navigation actions - opening URLs, switching tabs - come in at 91%. These are deterministic operations with little room for ambiguity.
What This Means for Building Agents
If you are building a desktop agent, spend your engineering time on element identification accuracy. Better visual grounding, better element matching, better staleness detection. That single improvement would eliminate over half of all failures in our dataset.
Do not start by optimizing the planning layer or the prompt engineering. The model usually knows what to do. The execution layer is where things break. And within execution, precise element targeting is the highest-leverage fix you can make.
The data does not lie. Fix clicking first. Everything else improves downstream.
Fazm is an open source macOS AI agent. Open source on GitHub.