Selenium web browser automation is one engine. Most real Mac tasks need two.
Selenium drives what lives inside a browser tab. That is a clean, well-defined surface: the HTML document, its frames, cookies, storage, and the browser's own chrome. It is also half of the story on a Mac. The other half is Finder, Mail, Slack, Preview, the menu bar, and every native dialog that shows up between the interesting web steps. This guide is about what happens when your workflow needs both.
The boundary Selenium was built for
Selenium is a W3C WebDriver implementation. The spec defines roughly forty endpoints. Every one of them is scoped to a browsing context: a window, a tab, a frame, the document inside that frame. That is not a limitation to engineer around, it is the contract Selenium signs with you. If you ask WebDriver to click a menu item in Finder, the answer is not “try harder,” it is “that is out of scope.”
This is why accept_alert and dismiss_alert handle JavaScript alerts but not OS modals. The JS alert is drawn by the page. The OS modal is drawn by the system UI service, in a different process, using the Accessibility tree that Selenium cannot see. Asking Selenium to close a “Chrome wants to access Downloads” sheet is like asking a file-system driver to send email.
What ships inside the Fazm .app
The anchor fact
If you pry open a shipped Fazm build you will find a Node bridge at acp-bridge/src/index.ts. The interesting part is how few lines it takes to register both a Selenium-class engine and a native one as peers. This is the actual shape, annotated with line numbers:
The two names playwright and macos-use are not special strings the model knows about. They are routing keys the bridge uses to attribute tool calls, track timeouts, and decide which logs to scrape. Both engines show up in the Claude Code ACP tool list with equal weight.
Fazm bundles both engines behind a 0-second per-tool watchdog. When a step stalls, the agent gets a synthetic timeout error and can recover rather than hang, which is the failure mode most raw Selenium tests have to guard with try/finally and explicit waits.
How a cross-app run actually routes
A single chat turn can cross engines several times. The model does not announce the switch. You see it in the tool-call log.
You type the task in chat
"Open the latest Stripe invoice PDF in Chrome, save it to Desktop, then attach it to a reply to the last Mail thread from finance." Natural language, no step markers.
Model picks mcp__playwright__navigate
Chrome is a web target. The model calls the Playwright MCP tool. Same primitive Selenium would use. The tool returns the page snapshot.
Model picks mcp__playwright__click
Clicks the "Download invoice" button. Playwright's file-chooser logic saves to the configured download dir. No native dialog handling required here.
Workflow leaves the browser
Next step is Mail. The browser engine cannot help. The model picks mcp__macos-use__open_application_and_traverse with element "Mail".
Model picks mcp__macos-use__click_and_traverse
Clicks the finance thread in the Mail sidebar by AXRole=AXRow, AXTitle="Finance Team". Gets back the reading pane tree.
Model picks mcp__macos-use__press_key_and_traverse
Sends Cmd+R to start a reply, then uses type_and_traverse to write the body, then click_and_traverse on the attach button. All via AX, all in the same session.
Log shows both engines interleaved
Tail of /tmp/fazm-dev.log shows a mix of mcp__playwright__* and mcp__macos-use__* lines in a single conversation. One binary, two engines, one watchdog.
The two engines, in sequence
A single chat turn, routed across both MCPs
The six native tools, in the same format WebDriver fans already like
The native engine exposes exactly six tool calls. Every one of them performs an action AND returns the refreshed accessibility tree in the same round trip. This is the single most important design decision in the native side, and the one that makes it feel like WebDriver without the stale-element problem.
open_application_and_traverse
Launches a macOS app by name and returns the full AX tree for its frontmost window. The parallel of driver.get() for native apps.
click_and_traverse
Clicks an element located by AXRole + AXTitle + AXValue and returns the refreshed tree in the same response. No stale references to babysit.
type_and_traverse
Focuses a text field, types a string, and returns the tree. Parallel of element.send_keys() for any app, not just the browser.
press_key_and_traverse
Sends a key or key combo (Cmd+S, Return, Escape). Handles the macOS Save dialog Selenium cannot see.
scroll_and_traverse
Scrolls an element and returns the new tree. Works on any scrollable AXRole, not just document body.
refresh_traversal
Returns the AX tree without performing an action. Use when you need to re-read state before deciding the next step.
Apps the native engine reaches that the browser engine cannot
Selenium vs. a dual-engine Mac agent
| Feature | Selenium | Fazm |
|---|---|---|
| Scope | HTML document inside a browser tab | Any Mac app, including the browser |
| Element model | DOM nodes, CSS selectors, XPath | AXUIElement tree, role + title + value |
| Cross-app step (e.g. Chrome → Finder → Mail) | Stalls at the first native dialog | Model routes to macos-use, keeps going |
| Stale element handling | Explicit waits, WebDriverWait.until(...) | Every tool is act_and_traverse, returns fresh tree |
| Install | Language SDK + chromedriver + glue code | Signed .app bundle, Accessibility permission once |
| Who writes the script | A developer, in Python/Java/C#/JS | The user, in plain English |
| Cross-browser matrix in CI | First-class, this is what Selenium is for | Not the goal; this is a workflow agent, not a test runner |
Selenium still wins when you want a cross-browser test matrix in CI. This comparison is for the other job: one-shot, cross-app, user-facing workflows on a single Mac.
What a log line from each engine looks like
The per-tool watchdog timestamps every call. During a mixed workflow the tail of /tmp/fazm-dev.log shows both engines interleaved. This is the single clearest signal that the routing is happening per step, not per session.
The part Selenium does best, the part it does not
The spec-compliant, cross-browser, CI-friendly surface area of Selenium is one of the most battle-tested pieces of software on the planet. If you are testing a web app on four browsers and two operating systems, Selenium is not the thing you should replace, and Fazm is not trying to. What Fazm does is recognize that the same structured-tree automation idea WebDriver invented for the DOM is sitting, unused, in the macOS accessibility API, and wire both into one agent so a chat turn can touch both worlds without a glue script.
The model calls mcp__playwright__click for web targets and mcp__macos-use__click_and_traverse for native targets. Your job stops at describing the task. The routing is the product.
“I stopped writing the glue script. The hand-off between my browser automation and the rest of the Mac used to be the flakiest part of my workflow. Now it is one chat turn.”
Have a workflow that crosses the browser window?
30 minutes. We will map the exact steps and show you whether both engines are the right fit or whether Selenium alone is enough.
Book a call →Frequently asked questions
What is Selenium web browser automation?
Selenium is a W3C WebDriver implementation that automates a real web browser by talking to a browser-specific driver process, which manipulates the HTML DOM and dispatches real input events. It drives Chrome, Firefox, Edge, and Safari across Windows, macOS, and Linux. Its scope is anything that lives inside the browser tab: the document, frames, cookies, storage, and the browser's own UI surfaces like JavaScript alerts and the file chooser. Anything drawn outside the browser window is out of scope by design.
Why is Selenium only half of most workflows on a Mac?
Because real tasks cross apps. You open a web report, export a CSV, drag it into Finder, attach it to a Mail message, and paste a summary into Slack. Selenium handles step one. Steps two through five are Finder, Mail, Slack, and the macOS menu bar, none of which have a DOM. You either script each app separately or you use an automation engine that can read the OS's own accessibility tree. The hand-off between the two is where scripts usually break.
What does Fazm do differently from Selenium?
Fazm bundles two automation engines in one Mac app. The first is Playwright MCP, registered in acp-bridge/src/index.ts around line 1050, which does everything Selenium does for the browser. The second is a native binary called mcp-server-macos-use, registered at lines 1056 to 1064 of the same file, which drives any macOS app via the system AXUIElement accessibility API. A model decides per tool call which engine to use. No glue code, no hand-off, no separate test harness.
Where in the Fazm source can I verify both engines are bundled?
Open /Users/matthewdi/fazm/acp-bridge/src/index.ts. Line 47 points playwrightCli at @playwright/mcp. Line 63 points macosUseBinary at Contents/MacOS/mcp-server-macos-use inside the signed app bundle. Lines 1028 to 1064 build the MCP server list that gets passed to the Claude Code ACP runtime: entry 1 is named "playwright", entry 2 is named "macos-use". Line 1266 defines the builtin allowlist: `new Set(["fazm_tools", "playwright", "macos-use", "whatsapp", "google-workspace"])`. Line 2317 shows the route matcher that accepts tool titles starting with either `mcp__playwright` or `mcp__macos-use`.
Does this mean you lose Selenium's cross-browser testing?
No, because Fazm is not a test runner. It is a workflow agent. If your job is to prove a checkout form works in Firefox on Windows and Safari on iOS, you still want Selenium or Playwright in CI with a grid. Fazm replaces the second kind of task: the human who sits at one Mac and does a thing across six apps once a day. For that task, cross-browser coverage is irrelevant and cross-app coverage is the whole point.
Why does every native tool name end in _and_traverse?
Because accessibility tree references go stale the moment the UI changes. The classic Selenium flakiness, StaleElementReferenceException, has the same root cause on the Mac. Fazm's six native tools (open_application_and_traverse, refresh_traversal, click_and_traverse, type_and_traverse, press_key_and_traverse, scroll_and_traverse) collapse act-then-read into one call. You click, the screen updates, and the response already contains the new tree. Your next click is against current coordinates, not the ones you captured before the click.
How does the model decide between playwright and macos-use for a given step?
At tool-call time the model sees both toolsets in its available tools list, with prefixes `mcp__playwright__` and `mcp__macos-use__`. The step is just natural language ("go to sheets.google.com and paste my clipboard into A1" or "open the PDF in Preview and rotate page 2 clockwise"). The model picks a tool whose description matches the step. Fazm does not hard-code this routing anywhere, which is why a single chat can start in Chrome and end in Preview without you saying "switch engines".
Can I still see what tool ran if I want to debug?
Yes. Every tool call appears in the Fazm log with its MCP prefix. A tail of /tmp/fazm-dev.log during a workflow looks like a mix of mcp__playwright__navigate, mcp__playwright__click, mcp__macos-use__open_application_and_traverse, and mcp__macos-use__click_and_traverse lines. The per-tool watchdog logs the duration and result. This is the same observability shape you would write by hand around a Selenium + AppleScript pipeline, except it ships by default.
Is this a library I integrate into my code, or an app I install?
An app. Fazm is a signed .app bundle. You install it, grant Accessibility and Screen Recording permissions once in System Settings, and talk to it in chat. The MCP servers, the bridge, the tool routing, and the watchdog are implementation details. Selenium is a library you import; Fazm is a product where those same ideas are wrapped so a non-developer can use them.
When should I still reach for Selenium specifically?
When your task is a repeatable, assertable test of a web app, runs in CI, needs to pass on four browsers, and lives in a repo next to the app's code. Selenium is the right answer there, it has 15 years of tooling around it, and it is going to stay the right answer. This page is about everything else: the one-shot, cross-app, ad-hoc workflow where a test harness is overkill and a chat-driven agent with both engines is the right shape.