Guide, updated April 19, 2026

Browser automation beyond Selenium: attach to the Chrome you already use, then drive every Mac app in the same chat turn.

Every Selenium tutorial teaches you to spawn a fresh chromedriver-controlled Chromium with an empty profile, write CSS selectors, and accept that anything outside the viewport is out of scope. This guide is about the shape of automation those tutorials structurally cannot describe: hooking into the browser the human is already signed into, and driving the rest of the operating system around it in the same session.

M
Matthew Diakonov
10 min read
4.8from Used daily on production Mac agents
Attaches to your running Chrome via CDP
Same session drives native Mac apps
No chromedriver, no venv, no selectors

What top Selenium results all skip

If you search browser automation selenium, you will reliably see selenium.dev, BrowserStack, LambdaTest, Guru99, TutorialsPoint, and ZenRows on the first page. Every one of them teaches the same thing: install chromedriver, match it to your Chrome version, pick a language binding, learn locators, learn waits, optionally stand up Grid for parallelism. These are excellent guides for writing deterministic QA test suites.

None of them can answer the two questions most non-QA users actually have: "how do I automate a site I am already logged into?" and "how do I hand the result off to an app that is not a browser?" The answer for Selenium is that you cannot, in both cases, not because the authors forgot to write the chapter but because the WebDriver spec architecturally forbids it.

The first is forbidden because WebDriver mandates a driver-owned browser with a known command path (chromedriver, geckodriver). The second is forbidden because the scope of the WebDriver wire protocol is a browsing context, and there is no primitive for reading a Mail message, opening a Finder window, or sending a WhatsApp message.

The same task in both stacks

Consider a concrete task a small-business owner asked us to automate last month: find the most recent Stripe invoice for a customer, save the PDF to Documents, and send it to the AP contact on WhatsApp. Here is what each stack requires.

Stripe invoice to WhatsApp

# selenium_invoice_download.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

opts = Options()
# Spawn a brand-new Chromium profile. Your cookies, 2FA,
# WebAuthn keys, and Cloudflare trust cookies are NOT here.
opts.add_argument("--user-data-dir=/tmp/selenium-fresh-profile")
opts.add_argument("--no-sandbox")

service = Service("/usr/local/bin/chromedriver")  # version-matched to Chrome
driver = webdriver.Chrome(service=service, options=opts)

driver.get("https://dashboard.stripe.com/login")
# Log in again. Handle 2FA again. Solve Turnstile again.
driver.find_element(By.ID, "email").send_keys("me@example.com")
driver.find_element(By.ID, "password").send_keys("...")
driver.find_element(By.CSS_SELECTOR, "button[type=submit]").click()
WebDriverWait(driver, 30).until(EC.url_contains("/dashboard"))

# Find and download the invoice.
driver.find_element(By.CSS_SELECTOR, "a[href*='/invoices/']").click()
driver.find_element(By.XPATH, "//button[contains(.,'Download')]").click()

# The PDF is now in /tmp/selenium-fresh-profile/Downloads.
# Getting it to ~/Documents and sending it to the AP contact on
# WhatsApp is YOUR problem. Selenium cannot do either.
driver.quit()
39% fewer lines

The Selenium script is also incomplete. It still leaves the PDF in a throwaway profile directory, and the "send it on WhatsApp" step has no Selenium primitive at all. You would write a second automation (maybe AppleScript, maybe pywhatkit, maybe a second VM) and a JSON payload bus between them.

The anchor fact

How the attach actually happens, in four lines

You can clone the Fazm repo and grep for PLAYWRIGHT_USE_EXTENSION yourself. The relevant code lives in acp-bridge/src/index.ts. The branch that flips the behavior is four lines long, lines 1028 to 1031. The register call for the native macOS accessibility binary lives seven lines below it. That is the entire story.

acp-bridge/src/index.ts (excerpt, lines 1027-1064)

The Playwright MCP server is pushed first. The macos-use server is pushed second onto the same servers array. That shared array is what the Agent Client Protocol session opens as one tool namespace, which is why a single LLM turn can call browser_click against a Stripe page and click_and_traverse against Finder in the same response.

What spawning looks like on a cold start

When Fazm boots a chat session, acp-bridge runs its buildMcpServers function and prints one line per server it registers. The whole sequence takes under a second.

fazm bridge boot

One LLM turn, two surfaces

User prompt
System prompt
Session memory
Claude / model
playwright MCP
macos-use MCP
whatsapp / gmail

Why the attach matters in concrete numbers

The fresh-profile penalty Selenium pays is not just philosophical. It shows up as concrete extra work on every single run.

0logged-in sites Selenium starts with
0-weekchrome auto-update cycle you must track
0MCP servers Fazm registers at session start
0tool-call loop covering browser + desktop
The interesting number is the last one. Selenium cannot share a session across a browser step and a native step, because there is no native-side session to share. Fazm has exactly one.
5 MCPs

Counting unique MCP server names Fazm registers in a default session: fazm_tools, playwright, macos-use, whatsapp, google-workspace.

acp-bridge/src/index.ts line 1266, BUILTIN_MCP_NAMES

How Fazm hooks into your running Chrome

1

You install Fazm and run the Playwright MCP Bridge chrome extension

The extension advertises itself to the app over a local socket. This is the only plumbing the user sees. From this point forward, your running Chrome is reachable over CDP.

2

acp-bridge reads PLAYWRIGHT_USE_EXTENSION at session start

The four-line branch at acp-bridge/src/index.ts lines 1028-1031 pushes --extension onto the Playwright MCP spawn args. No --extension, no attach. With --extension, Playwright speaks to the already-running Chrome instead of launching a fresh Chromium.

3

Playwright MCP is spawned as an MCP server named "playwright"

servers.push({ name: "playwright", command: process.execPath, args: playwrightArgs, env: playwrightEnv }) at index.ts line 1049. Every tool the server exposes (browser_navigate, browser_click, browser_snapshot) is now callable by the model via the ACP-wrapped MCP protocol.

4

mcp-server-macos-use is registered next to it

Lines 1057-1063: if the native binary exists on disk, push { name: "macos-use", command: macosUseBinary } onto the same servers array. Both MCPs are now in the same tool namespace for the current chat session.

5

ChatPrompts.swift tells the model which surface owns which task

Lines 56-65 of the system prompt in ChatPrompts.swift encode the routing policy: web pages to playwright, desktop apps to macos-use, WhatsApp to the whatsapp MCP, screenshots always via capture_screenshot (never browser_take_screenshot). The model picks per call, not per workflow.

6

Image and snapshot bloat is stripped before it reaches the model

Line 1033 passes --output-mode file --image-responses omit --output-dir /tmp/playwright-mcp. Screenshots and snapshots are written to disk, not inlined as base64, so a 500 KB screenshot does not blow the context window of whatever model is driving the turn.

Six things Selenium cannot reach on a Mac

These are not niche edge cases. They are the steps that show up in almost any real user workflow, five minutes after the Selenium part finishes.

Out of scope for WebDriver

  • Read or send a WhatsApp message in the native macOS WhatsApp app
  • Open a Keynote deck, edit a slide, export to PDF
  • Drive System Settings to grant a permission or add a network profile
  • Read your Mail.app inbox, open a message, save an attachment to Finder
  • Use your already-logged-in Chrome session (WebDriver spec forbids it)
  • Share a session across a browser step and a native macOS step

Already in scope for Fazm

  • Drive any macOS app through the accessibility tree with click_and_traverse
  • Send a WhatsApp message via the whatsapp MCP (search, open, verify, send)
  • Attach to your real running Chrome with --extension and keep every cookie
  • Interleave browser and desktop tool calls in one LLM turn
  • Query Gmail, Drive, Calendar, Sheets directly via google-workspace MCP
  • Run from a consumer-signed .dmg, no Docker, no venv, no chromedriver

What you actually ship with Fazm

Fazm is not a developer framework. It is a signed, notarized Mac app. The MCP plumbing is inside; the user surface is a chat.

No chromedriver install

Playwright MCP speaks CDP to Chrome directly, so there is no third-party driver binary to version-match. When Chrome auto-updates, nothing in Fazm needs to be rebuilt.

Real-profile attach

PLAYWRIGHT_USE_EXTENSION=true makes acp-bridge push --extension onto the MCP spawn args. Playwright hooks your running Chrome via the Chrome extension bridge, inheriting every cookie and passkey.

Overlay shows control

browser-overlay-init-page.ts registers addInitScript on every page the MCP opens. You always see a visual badge indicating the tab is currently being driven by Fazm.

Tree, not pixels

browser_snapshot returns an accessibility YAML tree with stable [ref=eN] tokens. The model refers to elements by ref, not by guessed pixel coordinates or brittle CSS selectors.

Tabs already open

The system prompt at ChatPrompts.swift line 63 tells the agent to list existing tabs with browser_tabs first and reuse one if the domain matches, so it never strands you on a fresh tab when you already had Stripe open.

macos-use on the same bus

The native binary at Fazm.app/Contents/MacOS/mcp-server-macos-use is registered in the same servers array at index.ts line 1058. The LLM can call browser_click and click_and_traverse in the same turn.

Selenium vs Fazm, point by point

FeatureSeleniumFazm
Which browser the automation drivesA brand-new, driver-owned Chromium spawned by chromedriver / geckodriver. The WebDriver protocol (W3C spec) mandates it. No way to hand Selenium an already-running browser.The Chrome you are already using. acp-bridge passes --extension to the Playwright MCP binary at index.ts line 1030 when PLAYWRIGHT_USE_EXTENSION is set. Playwright MCP then attaches over the Chrome DevTools Protocol to your real running instance.
Login state on turn oneEmpty. Cookies, 2FA device trust, WebAuthn passkeys, SSO sessions, and Cloudflare/Turnstile trust all missing. Every script starts by re-logging in, which breaks on TOTP, WebAuthn, and anti-bot gates.Whatever you are already logged into. If Stripe knows this device, the dashboard opens on turn one. If you have Chrome autofill, passkeys, or a signed-in Google account, those are usable immediately.
What it can reach when the browser is not enoughNothing. The WebDriver scope is the browser viewport. If the task needs Mail, Finder, Settings, WhatsApp Desktop, Messages, Numbers, or a sandboxed Save dialog, you are writing a second automation in a second tool.Every native Mac app. The same buildMcpServers function registers mcp-server-macos-use at index.ts lines 1057-1063 onto the same servers array, and ChatPrompts.swift lines 56-65 routes desktop actions to it. One chat turn, both surfaces.
How the agent finds elementsCSS selectors, XPath, IDs. You write them by hand, you maintain them by hand. When the site ships a CSS redesign, the selectors break silently and your script returns stale data with no error.Playwright accessibility snapshots. browser_snapshot returns a YAML tree with [ref=eN] tokens that are re-generated on every call. macos-use returns the accessibility hierarchy as text with x/y/w/h. Both are far more resilient to redesigns.
Version-matching dancechromedriver must match your Chrome major version. Chrome auto-updates on a 4-week cycle. Every update risks 'session not created: This version of ChromeDriver only supports Chrome version X' in CI.None. Playwright MCP attaches over CDP; Chrome's own DevTools Protocol is the contract, not a third-party binary that must stay in lockstep with Chrome's minor versions.
What you hand to a non-engineerA Python or Java project. They need chromedriver installed, a venv, webdriver-manager, a selector strategy, wait conditions, and a headful vs headless decision. It is not a consumer tool.A signed, notarized .dmg. Open Fazm, type into the floating bar. There is no code file, no virtualenv, no driver binary, and no grid to stand up.
When a failure happens mid-runStaleElementReferenceException, NoSuchElementException, TimeoutException. Usually resolved by adding another explicit wait or re-finding the element. Every brittle spot earns its own try/except.Because snapshots are re-taken on every tool call and refs are regenerated, stale-element bugs are architecturally rare. The model sees the current tree, not a handle to a pre-render DOM node.

Selenium is a great QA framework. Fazm is a consumer agent. Don't replace your CI suite; replace the part of your day where you were about to write a selector by hand.

When Selenium is still the right choice

This guide is not an argument that Selenium is obsolete. It is an argument that "browser automation Selenium" is the wrong frame for a growing number of real tasks. Keep Selenium when:

  • You are writing a regression test suite that must pass deterministically against a known staging build, for a single web app, in CI, across Chrome, Firefox, and Safari.
  • You need the W3C WebDriver protocol specifically, for example for contractual compliance or because your vendor only supports WebDriver-based automation.
  • You must run headless in a Linux container with no GUI, which Fazm explicitly does not target because it is a Mac consumer app.
  • You want to run hundreds of parallel browser sessions via Selenium Grid against a farm of browsers in the cloud.

For everything else, especially anything that involves your own logged-in session or anything outside a browser viewport, the Selenium shape is fighting you.

No chromedriverReal logged-in ChromeNo CSS selectorsAccessibility snapshot refsDrives native macOS appsBundled WhatsApp MCPGoogle Workspace MCPOne chat turn, two surfaces

Have a Selenium job that keeps spilling out of the browser?

Book 20 minutes. We'll look at the workflow together and show you what it looks like when the browser step and the Mac step live in the same chat turn.

Book a call

FAQ

If Selenium is the industry standard, why would I not just learn it?

Selenium is an excellent framework for QA engineers writing deterministic regression tests against a staging web app. It is the wrong shape for everyday consumer automation, where you want to drive a site you are already logged into, then hand the result off to a native Mac app. Selenium's spec mandates a fresh WebDriver-spawned browser with an empty profile, and its scope ends at the browser viewport. Fazm is the opposite: it attaches to your real Chrome over CDP via Playwright MCP's --extension flag, and the same chat session can drive Finder, Mail, Settings, WhatsApp, and any other Mac app through a separate accessibility-tree MCP server bundled in the app.

Concretely, where in the code does Fazm attach to my running Chrome instead of spawning a new one?

acp-bridge/src/index.ts, lines 1028-1031. A four-line if block: if (process.env.PLAYWRIGHT_USE_EXTENSION === "true") { playwrightArgs.push("--extension"); }. That single flag is what makes the Playwright MCP binary hook your running Chrome via the extension bridge over CDP instead of launching a fresh headless Chromium. The spawn then happens at line 1049 as servers.push({ name: "playwright", command: process.execPath, args: playwrightArgs, env: playwrightEnv }). You can grep for PLAYWRIGHT_USE_EXTENSION in the Fazm repo to see it yourself.

Does that mean Fazm can't work without Chrome already open?

No. PLAYWRIGHT_USE_EXTENSION=true is the preferred path because it keeps your logged-in state. If the flag is not set, acp-bridge spawns a normal Playwright-managed Chromium instead. Extension mode is an opt-in optimization, not a hard dependency. The rest of the stack (macos-use, whatsapp, google-workspace MCPs) is completely independent of how the browser is attached.

Selenium has Selenium Grid for parallelism. What is the Fazm equivalent?

They solve different problems. Grid is for running many test runs against many browsers in parallel, typically in CI. Fazm is not a test runner; it is a consumer agent that drives one user's Mac one session at a time. For parallel multi-tab work inside the same session, Playwright MCP exposes browser_tabs (list, select, close) and the system prompt at ChatPrompts.swift line 63 instructs the agent to reuse existing tabs when the domain matches.

How does clicking work without CSS selectors or XPath?

browser_snapshot returns a YAML accessibility tree with stable [ref=eN] element tokens. The model asks for a snapshot, reads the tree, finds the element by role and accessible name, and calls browser_click with the ref. On the native side, macos-use_refresh_traversal returns the macOS accessibility hierarchy as text with role, label, and x/y/w/h per element, and click_and_traverse takes either a text label or explicit coordinates. Neither path uses CSS selectors, which means a CSS redesign does not silently break the automation.

What about anti-bot detection? Selenium is famously fingerprinted by Cloudflare and Akamai.

Selenium injects navigator.webdriver=true and loads chromedriver-specific CDP domains that anti-bot vendors fingerprint aggressively. Fazm's --extension attach goes through your real Chrome instance with its real extensions, your real user-agent, and your real device trust cookies, so to a Turnstile or hCaptcha challenge the session is indistinguishable from a human browsing. You still see CAPTCHAs when a site decides to show you one, but you are not penalized for being Selenium in particular.

Does Fazm send my browsing data or accessibility tree off-device?

Only what the active LLM needs to complete the current turn. The overlay init page (browser-overlay-init-page.ts) is injected locally into the page. Snapshots and screenshots are written to /tmp/playwright-mcp with --output-mode file and stripped of inline base64 via --image-responses omit before anything is sent to the model, so a 500 KB screenshot never leaves the machine as a payload. The Chrome extension bridge communicates over a local socket to the app.

fazm.AI Computer Agent for macOS
© 2026 fazm. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.