Forked Chrome for Agent Browsers - Snapshot Navigation vs Live DOM
Forked Chrome for Agent Browsers - Snapshot Navigation vs Live DOM
Standard browsers were built for humans. When AI agents try to use them, the mismatch shows up fast. That is why teams are forking Chrome to build browsers optimized for agent control.
The Problem with Live DOM
When a human browses the web, the page is constantly changing - animations, lazy-loaded content, dynamic updates, tooltip hovers. Humans process this naturally. AI agents do not.
An agent that reads the live DOM gets a moving target. By the time it processes the page structure, decides which element to click, and executes the click, the DOM may have shifted. A new notification banner pushed the target button down 40 pixels. A lazy-loaded image changed the layout. The element the agent targeted no longer exists at those coordinates.
Freeze-and-Snapshot Approach
Agent-optimized browsers solve this with a snapshot model. Instead of giving the agent live access to the DOM, they freeze the page state, generate an accessibility tree snapshot, and let the agent reason over a static representation.
The accessibility tree is already a simplified version of the DOM - it strips out visual noise and exposes the semantic structure. Elements have roles (button, link, text field), labels, and states (enabled, checked, expanded). This is exactly the information an agent needs to decide what to do, without the visual complexity that confuses it.
After the agent makes its decision, the browser unfreezes, executes the action, waits for the page to settle, and takes a new snapshot. Each interaction is a clean cycle: snapshot, reason, act, snapshot.
Why Not Just Use Screenshots
Vision-based agents that work from screenshots can handle any application, but they are slow and brittle. OCR introduces errors. Coordinate-based clicking breaks when resolution or scaling changes. Accessibility tree snapshots give you structured, reliable data at a fraction of the computational cost.
The tradeoff is that snapshots only work in browsers where you control the rendering pipeline. For native desktop apps, you need the operating system's accessibility API instead. But for web automation, a forked browser with snapshot-based navigation is currently the most reliable approach.
Fazm is an open source macOS AI agent. Open source on GitHub.