The Reality of Long-Running AI Agents - What They Can and Cannot Do
The Reality of Long-Running AI Agents - What They Can and Cannot Do
There is a persistent fantasy that you can describe an app, walk away, and come back to a finished product. Every few months a demo goes viral showing exactly this. And every time, the reality is more nuanced.
What Does Not Work Yet
Let us be direct about the limitations:
- Full app generation from a description - the agent gets 60-70% right but the remaining 30% requires human judgment calls about UX, edge cases, and business logic
- Multi-hour autonomous sessions - context degrades, the agent loses track of its own decisions, and errors compound
- Cross-system orchestration - an agent that needs to coordinate between a database, an API, a frontend, and a deployment pipeline will eventually make a wrong assumption that cascades
The failure mode is not that the agent crashes. It is that it keeps going confidently in the wrong direction, producing plausible-looking output that has subtle problems throughout.
What Actually Works
Long-running agents succeed at bounded, well-defined tasks:
- Test generation - given a codebase, generate comprehensive tests. The scope is clear, the output is verifiable, and mistakes are caught by running the tests.
- Migration scripts - converting code from one pattern to another across many files. Repetitive, pattern-based work with clear success criteria.
- Documentation generation - analyzing code and producing docs. The agent can work through an entire codebase methodically.
- Data processing pipelines - transforming, cleaning, and organizing data according to defined rules.
The common thread: clear inputs, verifiable outputs, and bounded scope.
The Practical Middle Ground
The agents that deliver value today are not autonomous. They are collaborative. They work for 10-20 minutes, produce something, and check in with the human. The human provides course corrections, the agent continues.
This is how Fazm approaches desktop automation. Instead of trying to run unsupervised for hours, it handles discrete tasks - send this email, organize these files, update this spreadsheet - and confirms before taking irreversible actions. Reliability beats ambition.
Fazm is an open source macOS AI agent. Open source on GitHub.