Manus Uses browser_use Under the Hood - Why Browser-Only Agents Hit a Ceiling
Manus Uses browser_use Under the Hood - Why Browser-Only Agents Hit a Ceiling
Manus built an impressive product, but under the hood it leans heavily on browser_use - an open source browser automation framework. This is not a criticism of their engineering. Browser automation is genuinely useful. The problem is that browser-only agents hit a hard ceiling when users need to automate things outside the browser.
What Browser Agents Cannot Do
Think about what you actually do on your computer in a given day. Yes, a lot happens in the browser. But a significant chunk happens in native applications that browser automation cannot touch.
Figma runs as a native app (or an Electron wrapper with custom internals). Terminal and iTerm2 are native. Finder is native. System Preferences is native. Xcode, VS Code, Slack's desktop app, Spotify, Preview, Notes - all native apps that a browser automation framework has zero access to.
When someone asks a browser-only agent to "organize my Downloads folder" or "change my display settings" or "run this terminal command," the agent simply cannot do it. It is limited to tasks that happen inside a web page.
The Native Integration Layer
Real desktop automation requires accessibility APIs. On macOS, this means the Accessibility framework that exposes element trees for every running application. Through these APIs, an agent can read button labels in Finder, click menu items in Xcode, type into Terminal, and navigate Figma's layer panel.
This is fundamentally different from DOM manipulation in a browser. Accessibility APIs work across all applications uniformly. The agent does not need special handling for each app - it reads the same element tree structure regardless of whether it is interacting with Safari or System Preferences.
The Hybrid Approach
The strongest desktop agents combine both. Use browser automation for web tasks where DOM access gives you speed and precision. Use accessibility APIs for everything else. This way the agent can handle whatever the user throws at it instead of stopping at the browser boundary.
Building for both surfaces is harder, but it is the only way to build an agent that truly automates a desktop rather than just a browser tab.
Fazm is an open source macOS AI agent. Open source on GitHub.