What WebDriver covers, and what begins on the other side of the window border

Selenium browser automation, and where it stops

Selenium drives a browser through the W3C WebDriver protocol. It does that well, and it will keep doing it well. This guide is about the line WebDriver draws around itself: the Chrome window border. Everything inside that rectangle is in scope. Finder, native Mail, the macOS menu bar, a system permission sheet, a Slack app notification, a file-save dialog, are not. Here is what that boundary actually looks like in code, and what lives on the other side of it.

Matthew Diakonov, Fazm maintainer

Published April 20, 202610 min read

4.9from Reviewed by the Fazm engineering team

Six accessibility tools, compiled native binary

Every tool returns the fresh AX tree in the same call

Works on Finder, Mail, Slack, System Settings

Not a Selenium replacement, a complement

The browser is an island

WebDriver is the ferry. It only docks at one port.

Selenium drives the browser tab

The OS around the tab is out of scope

macOS has a parallel tree for every app

Read it with AXUIElement, not pixels

Same idea as the DOM, wider surface

0:00 / 0:05

Selenium in one paragraph, on purpose

Selenium is a W3C WebDriver implementation. You write code in Python, Java, JavaScript, C#, Ruby, or a few other languages, and it speaks HTTP to a browser-specific driver (ChromeDriver, GeckoDriver, EdgeDriver, SafariDriver). The driver process translates those commands into real operations on a real running browser. You navigate URLs, find elements by CSS, XPath, or link text, read their text and attributes, click them, type into them, and assert on the resulting state. You handle waits by polling the DOM until a condition is true. Selenium Grid lets you fan that whole thing out across many browsers in parallel for testing at scale. That is the tool. That is all of it.

Everything below is about what this shape, on purpose, cannot do.

The boundary, drawn in code

Here is a composing-an-email workflow. It starts in Gmail on a web page. Selenium handles that part fine. Then Gmail opens a native file picker to attach a PDF, and the flow crosses into macOS itself. The WebDriver spec has nothing to say about a Finder sheet.

compose_email_selenium.py

The same idea, wider surface

WebDriver talks to the browser and gets back a tree of elements. The macOS Accessibility API talks to any running app and gets back the same shape, broader scope.

One agent, six tools, every app on the Mac

The anchor: one binary, six tools, every call ends in `_and_traverse`

Fazm ships a compiled native binary called mcp-server-macos-use inside the app bundle at Contents/MacOS/mcp-server-macos-use. It is loaded at line 63 of acp-bridge/src/index.ts and registered as MCP server `macos-use` at lines 1056 to 1064. Every tool name ends in _and_traverse. That suffix is the architectural choice that matters: one call does the action AND returns the updated accessibility tree, so the model's next click is always against current coordinates, never a stale reference.

acp-bridge/src/index.ts

The six tools, in order of a real workflow

macos-use_open_application_and_traversemacos-use_refresh_traversalmacos-use_click_and_traversemacos-use_type_and_traversemacos-use_press_key_and_traversemacos-use_scroll_and_traverse

open_application_and_traverse

Launch or foreground an app, return its full accessibility tree. The entry point. The equivalent of driver.get(url) but for any process on the machine.

refresh_traversal

Re-read the current window's tree without doing anything. Use it when the UI updated on its own (a network response, a notification, an async render).

click_and_traverse

Click an element by role+title match or by exact AX coordinates. Returns the new tree so the next call sees the new state.

type_and_traverse

Focus an AXTextField or AXTextArea and synthesize keyboard events for the given string. Returns the new tree with the value set.

press_key_and_traverse

Send a raw key chord (Return, Tab, Cmd+W, etc.). Used to submit forms, dismiss sheets, trigger menu shortcuts. Also composes well with click_and_traverse via the pressKey param.

scroll_and_traverse

Scroll an AXScrollArea so a target element comes into view, then return the new tree. Necessary because AX frames outside the viewport are often reported as not-visible.

Scope, not quality

Selenium and Fazm are not the same thing trying to be better than each other. They are two different tools for two different domains. Here is what actually maps.

Feature	Selenium (WebDriver)	Fazm (macOS-use)
Target surface	HTML documents inside a browser tab	Every running app on macOS
Source of truth	DOM via CDP / WebDriver protocol	AXUIElement tree via Accessibility APIs
Element model	Tag, id, class, CSS selectors, XPath	AXRole, AXTitle, AXValue, AXFrame
File picker dialogs	Only JS-level file input elements	Native Finder sheets are regular AX elements
Menu bar	Out of scope	AXMenuBar > AXMenuItem traversable
System permission sheets	Out of scope	Reachable (with user consent)
Staleness handling	Explicit waits, StaleElementReference errors	Every action returns the fresh tree in the same call
Intended user	Developers writing test code	Consumers chatting in plain English

Selenium is the right tool for cross-browser web testing. This comparison is strictly about where the tool's contract ends.

Same workflow, different floor

Side by side: the Selenium version of the attach-and-send flow that stalls at the native file picker, and the Fazm version that keeps going into Mail.app itself.

Compose + attach + send

# Selenium, Python. Scoped to the browser tab.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://mail.example.com/compose")

# Fill the subject line
subject = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.NAME, "subject"))
)
subject.send_keys("Quarterly report")

# Click send
driver.find_element(By.CSS_SELECTOR, 'button[aria-label="Send"]').click()

# ...now the native "Save attachment?" Finder dialog appears.
# Selenium cannot see it. The test stalls here.

0% lines when you do not need WebDriverWait

What `_and_traverse` actually does, on the wire

A real sequence from a Fazm run. Note that the agent never polls. It clicks, it gets the new tree in the response, and the next decision is made against that tree. This is the structural answer to the Selenium `StaleElementReferenceException` problem.

One turn of an agent loop

Why not screenshots

There is a third category: tools that hand the model screenshots and ask it to guess pixel coordinates. Anthropic's computer-use demo, OpenAI's Operator, and a few open-source frameworks do this. It works, and for arbitrary closed-source apps on Windows it is sometimes the only option. On macOS, where accessibility coverage is deep and stable, screenshots are the slow and fragile path. Here is why, in four numbers.

0 msAX tree read

0 msScreenshot + vision

0%AX click accuracy

0%Pixel click accuracy

Indicative figures drawn from the existing macOS AI agent development guide. Your numbers will vary by hardware, model, and app.

What happens when a tool stalls

A Selenium flake shows up as a `StaleElementReferenceException` or a `TimeoutException`. Fazm has a parallel problem (an app stops responding, an AX query hangs), and the bridge handles it with a per-tool watchdog. The constants are in acp-bridge/src/index.ts lines 77 to 79:

acp-bridge/src/index.ts — tool timeout watchdog

When to still use Selenium

This is not a Selenium-is-dead post. Selenium is excellent at what it does. Pick it when:

Selenium is still the right answer

Cross-browser regression testing on CI. Selenium's WebDriver abstraction is the industry-standard shape for running the same suite on Chrome, Firefox, Edge, and Safari.
Headless browser workloads. You are scraping, rendering, or testing without a display, and Selenium Grid fans that out across many containers cleanly.
Pure web-app E2E tests where the whole flow lives inside one domain and one tab. No Finder, no native dialogs, no Slack popups.
Large existing test suites. Rewriting years of code to chase a marginally faster runner is almost never the right call. Selenium's trajectory into 2026 is fine.
Deterministic CI environments. CI runners rarely have accessibility permissions or a logged-in GUI session, and WebDriver's headless mode is the path of least resistance.

And when to reach for the accessibility-tree tool instead

Signals that you have crossed the browser border

Your flow opens a native file picker

The moment a Finder sheet appears, WebDriver's `send_keys` on a file-input element stops being a clean solution. The sheet has its own AX tree; you want a tool that can walk it.

macOS asks for a permission

Screen Recording, Automation, Accessibility, Microphone. These prompts are drawn by TCC, not the browser. They cannot be dismissed from JavaScript. An AX-based tool can click Allow.

You need to read a notification

A Slack or Mail banner in the top-right corner is not a DOM node. It is an AXWindow owned by the notification daemon. Selenium cannot see it; the AX tree can.

The handoff target is a native app

Drag a Gmail attachment into Notes. Quote-share from Safari Reader into Messages. Anything that crosses apps. The browser is not the destination.

You want end-users, not developers, to run the thing

Selenium is a library. It requires a language runtime, driver binaries, and code you maintain. If the user is meant to trigger the workflow themselves, you want a consumer surface like Fazm.

The three-line summary

1.Selenium automates browsers. That is a precise scope, not a bug. Use it for cross-browser testing and headless web work.
2.The same structural pattern (tree of elements, roles, titles, bounds) exists for every app on macOS via the Accessibility API. You just need a tool that speaks to it.
3.Fazm ships one: a compiled binary called mcp-server-macos-use, six _and_traverse tools, one chat window. Not a replacement for Selenium. A complement to it, for the part of the workflow that is not on a web page.

Got a workflow that keeps hitting the Chrome window border?

Show us the flow and we will tell you whether WebDriver, accessibility APIs, or a mix is the right shape, in 20 minutes.

Book a call →

Frequently asked questions

What is Selenium browser automation, in one sentence?

Selenium is a W3C WebDriver implementation that drives a real browser (Chrome, Firefox, Edge, Safari) by sending commands to a browser-specific driver process, which manipulates the HTML DOM and fires real input events inside the page. It is scoped to what lives inside a browser tab: the document, its frames, cookies, storage, and the browser's own chrome (alerts, prompts, file chooser). Everything outside the browser window is out of scope by design.

Where exactly does Selenium stop working?

At the browser window border. Selenium has no concept of Finder, the macOS menu bar, Spotlight, System Settings, Mail, Slack, a native file-save dialog, or the 'Allow Screen Recording' permission sheet that appears after a fresh macOS install. The W3C WebDriver spec defines about 40 endpoints, all scoped to a browsing context. The spec does not describe any non-browser target. This is not a limitation to work around. It is the contract.

Can I use Selenium to click an operating-system alert like 'Chrome wants to access Downloads'?

No. That alert is drawn by the macOS system UI (UserNotifications/TCC), not by Chrome. It is in a different process from the tab Selenium is driving. The WebDriver `accept_alert` / `dismiss_alert` endpoints handle JavaScript alerts inside the page (confirm, prompt, beforeunload), not OS modals. You need something that can reach across process boundaries, and the macOS primitive for that is the Accessibility API.

What is the actual architectural difference between Selenium and a macOS-native automation tool?

Selenium asks the browser's DevTools/WebDriver protocol for the DOM, then returns a structured tree of elements with roles, text, and bounds. A macOS-native tool asks the system's AXUIElement API for a parallel tree that includes every on-screen UI element in every app: AXButton, AXTextField, AXMenuItem, AXWindow, with roles, titles, values, and frames. The model is the same. The source of truth is different. One is HTML. The other is the running OS.

What is Fazm's equivalent of WebDriver, and where does it live?

Fazm bundles a compiled native binary called `mcp-server-macos-use` at `Contents/MacOS/mcp-server-macos-use` inside the app. It is registered at acp-bridge/src/index.ts line 63 (`const macosUseBinary = join(contentsDir, 'MacOS', 'mcp-server-macos-use')`) and enrolled as MCP server 'macos-use' at lines 1056 to 1064. It exposes six tools: `macos-use_open_application_and_traverse`, `macos-use_refresh_traversal`, `macos-use_click_and_traverse`, `macos-use_type_and_traverse`, `macos-use_press_key_and_traverse`, and `macos-use_scroll_and_traverse`. Every tool name ends in `_and_traverse` because each call performs an action AND returns the fresh accessibility tree in the same round trip.

Why does every Fazm automation tool end in `_and_traverse`?

Because element references go stale the instant the UI changes. WebDriver has the same problem (stale element references are a common flakiness source, fixable with explicit waits). Fazm's design collapses act-then-read into one call: you click, the screen updates, you get the new tree back, and your next click is against current coordinates. It eliminates the polling loop you would otherwise write around `WebDriverWait(driver, 10).until(EC.presence_of_element_located(...))` for every action.

Is Fazm a developer framework like Selenium, or a consumer app?

A consumer app. You install a signed .app bundle, grant macOS Accessibility permission once in System Settings, and ask it in plain English. The six tools are called by the model, not by you. Selenium is a library you write code against; Fazm is a finished product where the automation engine is an implementation detail the user never touches. The same accessibility-tree pattern is available either way, but the surface area is a chat window, not a WebDriver session.

Can Selenium and a macOS-native tool coexist in one workflow?

Yes, and this is a good pattern. Use Selenium/Playwright for the parts that are web pages, where CSS selectors and network interception are first-class. Hand off to the native tool when the flow crosses into the OS: a Finder dialog, a native Mail attachment, a Slack app notification, a macOS permission sheet. Fazm happens to bundle both under one roof (Playwright MCP for browser, macos-use MCP for native), but conceptually the two are separate primitives for separate domains.

What does a real Fazm tool call look like in the logs?

It shows up as `mcp__macos-use__click_and_traverse` with a title-matched element (e.g. `element: "Send"`) and a small JSON response containing the new accessibility tree. The bridge in acp-bridge/src/index.ts tracks these with a per-tool watchdog: 2-minute timeout for MCP tools (TOOL_TIMEOUT_MCP_MS = 120000), 5-minute timeout for everything else. When a tool exceeds the budget, the bridge synthesizes a completion with a timeout error so the agent can recover rather than hang, which is the failure mode a raw Selenium test typically has to surround with try/finally and explicit waits.

Does this mean Selenium is obsolete?

No. If your task is testing a web app across Chrome, Firefox, Edge, and Safari on CI, Selenium is still the right answer, and it will remain so. The point of this page is narrower: Selenium is scoped to the browser, and many real workflows cross that boundary. When they do, you need accessibility APIs, not pixels and not more browser automation.