Web Agent SDKs Are Great - But They Only Cover One App
Browser Automation Has a Boundary Problem
Web agent SDKs have gotten remarkably good. Playwright, Puppeteer, and the newer AI-powered browser agents can navigate complex web apps, fill forms, click through multi-step flows, and extract data with impressive reliability.
But your workday doesn't live entirely in a browser.
The Apps That Aren't Web Pages
You work in terminal, native email clients, local IDEs, desktop spreadsheets, design tools, and system settings. A browser automation framework can't touch any of these. It's limited to the browser sandbox by design.
When your workflow spans "check this spreadsheet, then update the CRM, then send a Slack message, then commit code in the terminal," a browser-only agent can handle maybe two of those steps. The rest require you to step in manually.
Accessibility APIs Cover Everything
Desktop agents that use the operating system's accessibility APIs don't have this limitation. On macOS, the accessibility tree exposes every interactive element in every application - native or web, browser or terminal, system app or third-party tool.
A single agent can navigate your email client, switch to your IDE, open a browser tab, update a spreadsheet, and send a message - all through the same interface. No switching between different automation frameworks or hitting boundaries where one tool's scope ends.
When Browser Agents Make Sense
Browser agents still make sense for focused web scraping, testing, or when your entire workflow genuinely lives in web apps. If you're automating a SaaS dashboard or running browser-based tests, a web agent SDK is the right tool.
But for personal productivity - where your work constantly crosses application boundaries - a desktop agent that sees and controls everything on your screen is the more complete solution. Your workflow shouldn't be limited by which apps your automation tool can reach.
Fazm is an open source macOS AI agent. Open source on GitHub.