Back to Blog

The 2AM Debugging Session - What AI Agent Development Actually Looks Like

Fazm Team··2 min read
debuggingdeveloper-lifeai-agentbuildingreality

The 2AM Debugging Session - What AI Agent Development Actually Looks Like

The demo video shows a polished 30-second clip of an AI agent filling out a form, clicking submit, and moving on. What it does not show is the 14 hours of debugging that made those 30 seconds possible.

Building AI agents that control a desktop is a different kind of engineering. You are not building a web app where the environment is predictable. You are building something that interacts with a living, changing OS where windows resize, menus shift, and elements render differently depending on system preferences.

What the Debugging Actually Involves

Last week, our screenshot pipeline broke because macOS changed how it handles retina scaling in a minor update. The agent was clicking 3 pixels to the left of every target. Three pixels. That is the kind of bug that takes hours to even identify because everything looks right in the logs.

Accessibility tree parsing is another rabbit hole. You ask the OS for the element tree and sometimes you get a beautifully structured hierarchy. Other times you get a flat list of unlabeled buttons because the app developer did not bother with accessibility attributes. Your agent needs to handle both cases gracefully.

Then there is timing. Click too fast and the UI has not rendered yet. Click too slow and a loading spinner replaced the button you were targeting. Every app has different animation speeds and transition behaviors.

The hardest part is not any single bug. It is that these bugs compound. A slightly wrong screenshot crop feeds bad data to the vision model, which generates a slightly wrong click coordinate, which triggers an unexpected dialog, which breaks the entire workflow. Debugging means tracing through four layers of abstraction at 2am with coffee that stopped working an hour ago.

This is what building AI agents actually looks like. It is not glamorous, but getting it right means building something that genuinely saves people hours of repetitive work every day.

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts