The Browser Is a Trap for Desktop AI Agents
The Browser Is a Trap for Desktop AI Agents
If your desktop AI agent's primary interaction model is controlling a browser, you are building on quicksand. The browser was designed for humans, not for programmatic control, and it shows.
Dynamic DOM Is a Moving Target
Modern web apps do not have stable DOM structures. React re-renders components, Angular uses change detection cycles, and SPAs rewrite the entire page on navigation. A selector that works today breaks when the app deploys a new version tomorrow.
Shadow DOM makes it worse. Web components encapsulate their internal structure, making elements invisible to standard DOM queries. Your agent cannot click a button it cannot find.
The iframe Problem
Iframes create isolated browsing contexts. Cross-origin iframes are completely opaque to automation - you cannot read their content or interact with their elements from the parent page. Payment forms, embedded widgets, and third-party integrations all use iframes.
An agent that needs to fill out a checkout form often has to navigate through multiple iframe boundaries, each with its own security restrictions. Some are simply impossible to automate from outside.
Why Native Desktop Is Better
The accessibility API on macOS gives you every interactive element across every application - including the browser itself. Instead of fighting the DOM, you interact with the browser the same way a screen reader does: through a stable, well-defined interface.
The accessibility tree does not care about Shadow DOM. It does not break when a framework re-renders. It exposes elements by their role and label, not by their implementation details.
The Practical Difference
A browser-first agent breaks every time a website updates. A desktop-first agent breaks only when the application fundamentally changes its UI structure - which happens orders of magnitude less frequently.
Build your agent on the stable layer, not the shifting one.
Fazm is an open source macOS AI agent. Open source on GitHub.