Web browser automation toolsCross-runtime chat, not single-runtime scriptVerifiable from acp-bridge/src/index.ts:1266

Web browser automation tools stop at the browser. Fazm wires five runtimes into one chat.

Every listicle for "web browser automation tools" in April 2026 enumerates the same single-runtime choices: Playwright, Selenium, Puppeteer, Cypress, Bardeen, Browserbase, Stagehand, Multion. Different packaging, same shape: the automation runtime is the browser, and when a workflow needs to leave the browser you exit the tool and glue it together in Zapier. Fazm breaks that shape. Its chat session spawns five MCP-driven automation runtimes in parallel, proven by name at `acp-bridge/src/index.ts` line 1266, so a Cloudflare-gated browser scrape and a Numbers paste and a WhatsApp send land in a single turn without leaving the session.

F
Fazm
10 min read
4.9from 200+
Five builtin MCP runtimes, named in BUILTIN_MCP_NAMES at acp-bridge/src/index.ts:1266
Browser runtime uses @playwright/mcp with --extension attached to your real Chrome
Non-browser runtimes (macos-use, whatsapp) speak native accessibility APIs, not screenshots
5 runtimes

The five-runtime model is not marketing. It is a single line in the bridge source: `const BUILTIN_MCP_NAMES = new Set(["fazm_tools", "playwright", "macos-use", "whatsapp", "google-workspace"]);` at /Users/matthewdi/fazm/acp-bridge/src/index.ts:1266. Each name maps to a real bundled binary or script: playwright to node_modules/@playwright/mcp/cli.js, macos-use to Contents/MacOS/mcp-server-macos-use, whatsapp to Contents/MacOS/whatsapp-mcp, google-workspace to Contents/Resources/google-workspace-mcp/. Run `rg -n BUILTIN_MCP_NAMES acp-bridge/src/index.ts` on a fresh clone and the line lines up with this page.

acp-bridge/src/index.ts:1266

What the 2026 SERP calls a "web browser automation tool"

Read the top ten Google results for this keyword in April 2026 and the tools bucket into four types. Every one of them owns the browser and nothing else. When a workflow needs a local app, a chat platform, a Google Sheet, or a native OS action, you leave the tool and reach for a second system. The category is structurally incomplete for anyone automating real consumer workflows, because real consumer workflows do not live entirely inside one browser window.

SeleniumPlaywrightPuppeteerCypressSelenium IDEPlaywright CodegenCypress StudioBrowserStack Low CodeUiPath Studio WebPower Automate DesktopBardeenAxiom.aiUi.VisionTestMu AIBugBugBrowserbaseStagehandFirecrawl /agentMultionAnthropic Computer UseFazm (five-runtime chat)

The five buckets every listicle cycles through

  • Developer framework (scripts, code-first): Playwright, Selenium, Puppeteer, Cypress
  • Recorder/replay IDE: Selenium IDE, Playwright Codegen, Cypress Studio, BrowserStack Low Code
  • Enterprise RPA canvas: UiPath Studio Web, Power Automate Desktop, Automation Anywhere
  • Consumer browser extension: Bardeen, Axiom.ai, Ui.Vision
  • AI browser agent: Stagehand, Multion, Browserbase, Firecrawl /agent, Anthropic Computer Use
  • Cross-runtime chat (Fazm): browser runtime + 4 more in one session

Five automation runtimes orbit the same chat

In a Selenium or Playwright script there is one runtime: the browser driver. In a Fazm chat there are five (plus however many you add via ~/.fazm/mcp-servers.json). Each runtime speaks MCP; the bridge multiplexes them into a single agent tool list, so the model can pivot runtime-to-runtime on a single user turn.

Fazm chat
ACP bridge, Node subprocess
playwright (real Chrome)
macos-use (AX APIs)
whatsapp (Catalyst)
google-workspace (OAuth)
fazm_tools (in-process)
~/.fazm/mcp-servers.json

The anchor fact: the five runtime names are a single line of source

Most category pages wave their hand at "integrations" and link to a logo wall. Fazm's five-runtime claim is a single TypeScript set literal, pinned at line 1266 of the bridge source. Binary existence checks for each entry happen earlier in the same file, so if any one of the five fails to exist at spawn time the bridge logs it (line 2550) and continues without that runtime.

acp-bridge/src/index.ts, line 1266
acp-bridge/src/index.ts, lines 47 to 69

The five-runtime claim in one row

Counts pulled directly from the bridge source. Each number is verifiable with a single ripgrep against a fresh clone.

0Builtin MCP runtimes in one chat
0Line with BUILTIN_MCP_NAMES set
0File that wires them (acp-bridge/src/index.ts)
0Glue scripts needed for cross-runtime turns

How a chat turn fans out to five runtimes

The diagram is literal. The Fazm chat surface on the left hands turns to the ACP bridge in the middle, which fans out to each of the five MCP runtimes on the right. The model picks which runtimes to call based on the user's sentence, and it can call any combination in a single turn.

Fazm chat to five parallel MCP runtimes

User message
Floating control bar
Chat thread history
ACP bridge (Node)
playwright (real Chrome, CDP)
macos-use (AXUIElement)
whatsapp (Catalyst AX)
google-workspace (OAuth)
fazm_tools + user MCPs

A real cross-runtime chat turn, frame by frame

Not a mockup. A walkthrough of a turn I actually run on my own Mac most weekdays, with the real tool call names the model emits.

One user message, three runtimes, one turn

01 / 06

Turn begins

User types one sentence into the Fazm floating bar: pull yesterday's Stripe payout, paste into a new Numbers sheet, DM the total to @ops on WhatsApp

What a cross-runtime turn looks like in the tool call log

Every MCP tool call is logged to the chat thread with its ID, name, and input. Here is the abbreviated log for the Stripe to Numbers to WhatsApp turn described above. Notice the `mcp__` prefix switches runtime mid-turn without a new session, a new script, or a new permission prompt.

tool-call-log.txt (chat thread, one user turn)

Single-runtime tool vs five-runtime chat, task by task

Not a feature grid of who supports what. Concrete tasks a Mac user actually runs, scored against the two structural models. Single-runtime tools win on reproducibility, isolation, CI pipelines. Five-runtime chat wins when the task crosses tools the way a human workday actually does.

FeatureSingle-runtime browser tool (Playwright, Selenium, Bardeen, Browserbase, Stagehand, Multion)Fazm (five-runtime chat)
Scrape a logged-in Stripe payout pageScripted login needed, often gets blocked by CloudflareYes, your real Chrome is already signed in
Paste scraped data into Numbers or ExcelNot possible, browser-only runtimemcp__macos-use tool call in the same turn
DM a summary to a WhatsApp chatRequires external Zapier or Make workflowmcp__whatsapp tool call in the same turn
Read a Gmail thread, reply, mark as readPossible inside Gmail web UI, not cross-appmcp__google-workspace tool calls via OAuth server
Reproducible CI end-to-end test of a marketing siteYes, this is exactly what these tools are forPossible but not the design center
Add a Linear MCP and use it alongside the browserRequires custom wrapper or separate agent stackDrop into ~/.fazm/mcp-servers.json, loaded at line 1104
Full audit trail of every action in one logSplit across script file, Zapier log, Slack auditChat thread is the log, tool call IDs queryable

The six runtimes, named and wired

Five builtins (the ones in BUILTIN_MCP_NAMES) plus the sixth slot, which is whatever you configure in ~/.fazm/mcp-servers.json. The bridge treats all of them identically: spawn the server, read its tool list, surface those tools to the agent.

playwright

@playwright/mcp CLI, --extension flag, attached to your real Chrome over CDP. Exposes browser_navigate, browser_click, browser_type, browser_snapshot.

macos-use

mcp-server-macos-use, Rust binary in Contents/MacOS/. Drives any AX-compliant Mac app via accessibility APIs. Open app, traverse, click, type, press key.

whatsapp

whatsapp-mcp, also in Contents/MacOS/. Controls the WhatsApp Catalyst app via accessibility APIs. List chats, read messages, send, search, scroll.

google-workspace

Python server under Contents/Resources/google-workspace-mcp/. Uses per-user OAuth client, credentials live in ~/.google_workspace_mcp/credentials.

fazm_tools

In-process tool surface for Fazm-native actions (database queries, skills, observer cards). Bridged via a Unix pipe at FAZM_BRIDGE_PIPE.

your own (user-defined)

Drop a Claude-Code-style config at ~/.fazm/mcp-servers.json and the bridge loads it at line 1104. Enabled flag, per-env vars, args. Composes with the five builtins in the same chat.

How to actually build a cross-runtime workflow in Fazm

Not a wiring diagram of a state machine. Five concrete steps from the install to a cross-runtime turn. The whole onboarding target is that step 4 is the only one that ever feels like work, and it feels like typing a sentence.

1

1. Install Fazm and install the Playwright MCP Bridge Chrome extension

Fazm opens the extension's Chrome Web Store URL via AppleScript so it lands in your real Chrome profile, polls ~/Library/Application Support/Google/Chrome/<profile>/Extensions/<id> until it appears, then asks you to paste the auth token. Validation is base64url, 20+ characters.

2

2. Bridge spawns five MCP servers on first chat session

getMcpServers() in acp-bridge/src/index.ts registers fazm_tools, playwright (with --extension if the token is present), macos-use if Contents/MacOS/mcp-server-macos-use exists, whatsapp if Contents/MacOS/whatsapp-mcp exists, google-workspace if the Python venv exists. Each server exposes its tool set to the agent.

3

3. Optionally drop your own MCP servers into ~/.fazm/mcp-servers.json

Same JSON schema as Claude Code. Each entry has command, args, env, and enabled. The bridge reads the file at line 1104, filters disabled entries, and appends to the server list. Linear MCP, Notion MCP, GitHub MCP, anything you want alongside the browser runtime.

4

4. Type a cross-runtime task in natural language

No visual canvas, no script file, no trigger setup. The model plans tool calls across runtimes as needed. Every tool call logs to the chat thread for auditability. If you want to replay the same task, you click the retry icon on the message.

5

5. Inspect snapshots and tool call IDs for audit

Browser snapshots land in /tmp/playwright-mcp/ (PLAYWRIGHT_OUTPUT_DIR at line 712). macOS accessibility traversals return text trees saved in /tmp/macos-use/ per call. Every tool call ID and input is queryable in the chat log, so you can always answer 'what did Fazm do in my Chrome five minutes ago'.

When a single-runtime browser tool is still the right answer

This page argues the five-runtime model is structurally richer than the category. It does not argue Fazm replaces Playwright or Selenium for developer use cases. If you are writing a CI test suite for a web app, Playwright's sandboxed Chromium is exactly the right tool because reproducibility beats state. If you are scraping hundreds of public pages in parallel, Browserbase beats one real Chrome because parallelism beats fidelity.

The five-runtime chat model is for a different job: automate the work a specific human actually does on their specific Mac, where 60 percent of the steps are browser and 40 percent are native apps and chat platforms and Google Docs. That is the gap the SERP's browser-only tools leave open.

Frequently asked questions

What are the best web browser automation tools in April 2026?

The SERP splits cleanly into four buckets. Developer frameworks: Playwright (still the dominant choice, roughly 45 percent of QA adoption in 2026 surveys), Selenium, Puppeteer, Cypress. Low/no-code visual builders: Selenium IDE, Cypress Studio, BrowserStack Low Code, UiPath Studio Web, Power Automate Desktop. AI browser agents: Stagehand, Browserbase, Firecrawl /agent, Multion, Anthropic Computer Use. Consumer browser extensions: Bardeen, Axiom.ai, Ui.Vision. All of them share one structural choice: the automation runtime is the browser. When the workflow needs to leave the browser (open Numbers, paste into Mail, send a WhatsApp), you exit the tool and glue it together yourself.

Where does Fazm fit in the 'web browser automation tools' category?

Fazm is not strictly a web browser automation tool. It is a consumer Mac app in which the browser runtime is one of five parallel automation runtimes wired into the same chat session. The five are named explicitly in the source: `BUILTIN_MCP_NAMES = new Set(["fazm_tools", "playwright", "macos-use", "whatsapp", "google-workspace"])` at `acp-bridge/src/index.ts` line 1266. You can verify this by cloning the repo and grepping. One chat turn can legitimately cross all five runtimes. Every other tool on the SERP ships only the browser runtime.

Which browser runtime does Fazm use?

The official `@playwright/mcp` Node package, pinned inside the app bundle at `acp-bridge/node_modules/@playwright/mcp/cli.js`, resolved at `acp-bridge/src/index.ts` line 47. Playwright MCP is launched with the `--extension` flag (line 1030), which tells it to attach to the user's real Chrome over the Chrome DevTools Protocol through the Playwright MCP Bridge extension rather than launch a fresh Chromium. That gives the browser runtime access to your existing cookies, logged-in sessions, Chrome fingerprint, and 2FA state.

What is the anchor fact that proves Fazm has five runtimes, not one?

A single line in the bridge source: `const BUILTIN_MCP_NAMES = new Set(["fazm_tools", "playwright", "macos-use", "whatsapp", "google-workspace"]);` at `/Users/matthewdi/fazm/acp-bridge/src/index.ts:1266`. Each name corresponds to a real binary or script bundled with the app: `fazm_tools` is the in-process tool surface (line 1016 registration), `playwright` is the `@playwright/mcp` Node CLI (line 1050), `macos-use` is the Rust binary `Contents/MacOS/mcp-server-macos-use` (line 63, registered at 1059), `whatsapp` is `Contents/MacOS/whatsapp-mcp` (line 64, registered at 1069), `google-workspace` is a Python server under `Contents/Resources/google-workspace-mcp/` (lines 67-69, registered at 1082). All five get announced to the agent in the same session.

How does Fazm connect the browser runtime to non-browser runtimes in one chat turn?

Each runtime exposes its tools through the Model Context Protocol (MCP). When the model decides to do something browser-shaped, it emits a `mcp__playwright__browser_*` tool call. When it decides to do something app-shaped, it emits a `mcp__macos-use__*` call. Both return into the same chat session. The bridge at `acp-bridge/src/index.ts` spawns all five MCP servers as child processes in the `getMcpServers()` function (line 977 onward) and multiplexes their tools into a single agent surface. There is no manual glue script. The model picks the right runtime per step.

What about sandboxing and reproducibility?

Sandboxed browser runtimes (fresh Chromium per run, no state carried over) are the right choice for CI test pipelines, public-page scraping, and isolation-critical workflows. Fazm's real-Chrome model is deliberately the opposite tradeoff: it wins on logged-in consumer flows (Stripe dashboard, Gmail, Notion, Figma, etc.) where the account state is the feature, not a bug. If your use case is a reproducible end-to-end test suite, use Playwright or Cypress directly. If your use case is 'do this thing I would otherwise do by hand across my real apps,' that is what Fazm's five-runtime model is for.

Can I add my own MCP runtimes beyond the five builtins?

Yes. The bridge reads `~/.fazm/mcp-servers.json` at startup (line 1104 of `acp-bridge/src/index.ts`) in the same schema Claude Code uses: `{ "name": { "command": "...", "args": [...], "env": {...}, "enabled": true } }`. Anything you add there shows up alongside the five builtins in the same chat session. Line 1267 classifies each server as builtin or user-defined via the `isUserMcpServer(name)` check, so the UI can distinguish them. This is how users wire in, say, a Linear MCP or a GitHub MCP and have it compose with the browser runtime.

Why use accessibility APIs for the non-browser parts instead of screenshots?

The Rust binary `mcp-server-macos-use` talks to AXUIElement, macOS's native accessibility tree. That returns a structured tree with `[Role] "text" x:N y:N w:W h:H visible` per element, pulled directly from the target app's view hierarchy. Screenshot-based approaches (OmniParser, GPT-4o vision, Anthropic Computer Use in screenshot mode) pay a 500ms to 2s vision inference per step and occasionally click the wrong pixel because the vision model misreads a rendered icon. Accessibility APIs are typically sub-100ms per traversal, return exact coordinates the OS itself uses to compute hit-testing, and never hallucinate elements that are not actually on screen.

What browser anti-bot gates does the real-Chrome model handle that sandbox tools cannot?

Cloudflare Turnstile, Google's 'this browser may not be secure' interstitial on OAuth consent screens, hCaptcha on logged-in admin panels, Akamai Bot Manager on banking sessions, Datadome on e-commerce checkouts. All five look at the fingerprint, session age, and navigator properties of the Chrome making the request. A year-old real Chrome profile that is signed into Google, has your extensions, and has been on that IP for months clears all of them without stealth plugins. A freshly-spun Chromium instance fails most of them out of the box.

Is Fazm open enough that I can verify these claims myself?

Yes. The relevant source files are all in the public repo structure referenced in this page: `Desktop/Sources/BrowserExtensionSetup.swift` (the Chrome extension install flow, token validation), `Desktop/Sources/Chat/ACPBridge.swift` (the Swift-side env var wiring for PLAYWRIGHT_USE_EXTENSION), and `acp-bridge/src/index.ts` (the five-runtime registration, the BUILTIN_MCP_NAMES set, the user MCP config loader). `rg -n 'BUILTIN_MCP_NAMES' acp-bridge/src/index.ts` prints line 1266 verbatim. `rg -n 'mcp-server-macos-use'` prints lines 63 and 1059. Nothing about the five-runtime structure is hand-wavy marketing.

How does Fazm compare to Anthropic Computer Use, Stagehand, and Multion specifically?

Anthropic Computer Use is screenshot-based and cross-app but cloud-hosted: each step round-trips a PNG to Claude and back. Stagehand is an AI layer on top of Playwright, still sandboxed Chromium, still browser-only. Multion is a browser agent (extension plus cloud planner) that only touches web surfaces. Fazm's differentiator against this cohort is the combination: local execution (no screenshot round-trip), your real Chrome (no sandbox), and the non-browser runtimes. You do not have to pick 'browser agent that is fast but browser-only' vs 'general agent that is slow but cross-app.'

What does a real cross-runtime chat turn look like in practice?

One user message: 'pull yesterday's Stripe payout as a CSV, paste it into a new Numbers sheet titled April 17 Payouts, and DM the summary to @ops in WhatsApp.' The model plans three runtime hops: `mcp__playwright__browser_navigate` to dashboard.stripe.com (real Chrome, already logged in), `mcp__playwright__browser_click` on the export button, then `mcp__macos-use__macos-use_open_application_and_traverse` to open Numbers and paste, then `mcp__whatsapp__whatsapp_send_message` via the bundled WhatsApp MCP. Four tool calls, three runtimes, one chat turn, one app. No glue script, no copy-paste, no Zapier.

Try the five-runtime model on your own Mac

Fazm runs locally, attaches to your real Chrome, and ships four more automation runtimes alongside the browser. Free to start, no card required. Verify the five-runtime claim in the source yourself: rg -n BUILTIN_MCP_NAMES acp-bridge/src/index.ts.

Download Fazm for Mac
fazm.AI Computer Agent for macOS
© 2026 fazm. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.