Every business process automation software reaches your apps through one of four input methods, and no listicle tells you which

Open any ranked list for this keyword. It will sort BPA softwares by seats, by connectors, by industry, by whether they have a recorder. It will not tell you the single fact that decides what each one can physically touch: the input method the software uses to see the screen and click the button. There are four. Published API. Browser DOM. Screen pixels with OCR. Native accessibility tree. This page is the split, with the exact Swift calls that put Fazm in the fourth category.

M
Matthew Diakonov
12 min read
4.8from 420+
Four input methods, one classification
Real AX API calls, not screenshots
Works with any native Mac app
Open source installer, MIT licensed
Zapier (API)Make (API)Workato (API)n8n (API)Browse AI (DOM)Axiom.ai (DOM)Browserbase (DOM)UiPath (pixels)Automation Anywhere (pixels)Blue Prism (pixels)Power Automate Desktop (pixels)Fazm (accessibility tree)
4 input methods, 1 honest classification

Fazm calls AXUIElementCreateApplication(frontApp.processIdentifier) then AXUIElementCopyAttributeValue(appElement, kAXFocusedWindowAttribute, &focusedWindow) at AppState.swift lines 439 and 441. Those are the same Apple Accessibility framework functions VoiceOver uses to narrate native Mac apps to blind users. The rest of the automation is built on that primitive.

AppState.swift in the Fazm desktop repo, verified 2026-04-21

What every BPA listicle is still comparing

The top five results for "business process automation softwares" are a tier chart, then a feature grid, then a pricing table, then an industry matrix. Useful if you already know the input method you want. Actively misleading if you do not, because most of the tools in those grids cannot do the thing a buyer is assuming they can.

A team that picks a BPA software off one of those grids and finds out later that the product cannot see a native Mac app has already spent a quarter on a failed rollout. A team that picks the input method first and the feature grid second lands on the right category in about five minutes. The rest of this page is written for the second team.

Method 1: published APIs

The software calls documented HTTP endpoints. Zapier, Make, Workato, n8n, Tray, Pipedream. Cheapest, fastest, most durable. Can only reach what has an API. Every hop in the workflow must expose one.

Method 2: browser DOM

The software loads a page in a headless or attached Chromium and walks the DOM. Browse AI, Axiom.ai, Browserbase. Can reach anything inside a browser tab. Cannot reach native apps. Breaks on React rerenders if selectors are brittle.

Method 3: screen pixels and OCR

The software takes a screenshot and uses vision or template matching to find a button, then synthesizes a mouse event. UiPath image mode, Automation Anywhere, Blue Prism, most agent frameworks. Universal in theory. Fragile, slow, expensive in practice.

Method 4: native accessibility tree

The software reads the OS-provided accessibility tree directly. Every element has a role, a title, a description, live coordinates. Semantic. Fast. Robust across UI refreshes. macOS is the strongest host. Fazm sits here.

Where each input method goes, on your actual Mac

API-only BPA
DOM-only BPA
Pixel-OCR BPA
Accessibility BPA
Your workflow
Stripe, Notion API
Web apps in a tab
Any pixel, label guessed
Any native Mac app

All four terminate at "your workflow." Only one reaches into native Mac apps with semantic labels intact.

The anchor fact: the exact Swift that puts Fazm in Method 4

It is one function call to ask the macOS Accessibility framework for the front app's element handle, and a second call to pull its focused window. Everything downstream in Fazm is built on those two lines. If you can read Swift, you can confirm this yourself by opening Desktop/Sources/AppState.swift in the Fazm desktop repo and scrolling to line 439.

Desktop/Sources/AppState.swift

Method 3 vs Method 4, side by side

Pixel-OCR (Method 3) and the accessibility tree (Method 4) both claim to work with "any app." They are not equivalent. Screens are an output of the app. The accessibility tree is the app's own semantic representation of its UI, published for the OS.

Same task: click the Save button in a native app

# Typical screenshot-OCR RPA approach
# (UiPath image mode, most agent frameworks today)

# 1. Take a screenshot of the whole screen.
screenshot = capture_screen()

# 2. Run a vision model to find the Save button.
#    The model sees pixels. It has to guess:
#    - is that a button?
#    - what does the label say?
#    - did the UI refresh and move it?
result = vision_model.find_element(
    screenshot,
    description="Save button"
)

# 3. Click at the guessed pixel coordinates.
#    No semantic information. Breaks when theme changes.
mouse.click(result.x, result.y)
-5% no vision inference

What the tree gives you that pixels never will

0+Apple AX attributes per element
0 real actionSemantic click (AXPress) vs synthetic mouse
0Vision tokens needed to find a labeled button
0Max screenshot edge, in px, for the LLM sanity check

Sources: Apple developer documentation on NSAccessibility attributes, Fazm ScreenCaptureManager.swift line 30 (1568 px resize).

How a Fazm run actually reaches a Mac app

The path from a prompt in the Fazm control bar to a real click inside Figma, Linear, or Mail is short, and every step is either Apple framework code or an auditable bundled binary.

1

Grant accessibility permission

User enables Fazm in System Settings → Privacy & Security → Accessibility. The app verifies with AXIsProcessTrusted(). AppState.swift line 311.

2

Front app detected

NSWorkspace.shared.frontmostApplication gives Fazm the PID of whatever the user is looking at. No input from the user required beyond their normal focus.

3

Accessibility tree read

AXUIElementCreateApplication + AXUIElementCopyAttributeValue(kAXFocusedWindowAttribute) at AppState.swift lines 439-441 fetch the root element of the focused window. Role, title, description, children, coordinates.

4

Optional screen capture, for the LLM sanity check

CGWindowListCreateImage + CGWindowListCopyWindowInfo at ScreenCaptureManager.swift lines 30-58 grab pixels and window metadata together. Resized to 1568 px max edge and stored locally.

5

Action dispatched through the MCP bridge

acp-bridge/src/index.ts line 1059 registers the bundled mcp-server-macos-use binary. It translates the LLM's high-level action ("click the Save button") into AXUIElementPerformAction calls on the tree it already holds.

6

App receives a real click

The target app processes a semantic AX action, not a synthetic mouse event. Same path VoiceOver users go through every day. Robust across theme changes, UI refreshes, and window moves.

Which BPA softwares live in each category

This is the table the listicles never print. Softwares are grouped by input method, not by price. Within a category, the differences are real but smaller than the difference between categories.

FeatureTypical BPA listicle pickFazm
Input methodAPI (Zapier) or pixels (UiPath)Native macOS accessibility tree (Method 4)
Reaches native Mac apps (Figma, Linear, Notion desktop)NoYes
Reaches SaaS endpoints behind a documented APIYesYes, through bundled skills
Reads button labels without OCRNo, vision inference requiredYes, via AX attributes
Runs headless on a serverYes, vendor cloudNo, needs a logged-in Mac session
Requires Accessibility permissionNoYes, one-time OS grant
Data leaves the user's MacYes, entire workflow runs in vendor cloudOnly when the LLM is called
Configuration formatVendor-locked flow graph or JSONPlain markdown skill files

What Method 4 unlocks that Method 1, 2, and 3 cannot

The accessibility tree makes every cooperating native Mac app automatable. Which is most of them, because Apple requires it for VoiceOver, Switch Control, and keyboard navigation support. Below is a non-exhaustive list of apps Fazm can already read and click inside.

Fazm
Figma
Linear
Notion
Mail
Messages
Slack
Xcode
Safari
Chrome
Finder
Terminal
Calendar

Quick buyer's frame: pick the method first

Most BPA rollouts that fail pick the wrong input method before they pick the wrong product. Use the rule below to cut the shortlist to one category, then compare the softwares inside that category by the usual features.

When to pick each category

  • SaaS-to-SaaS hops with stable APIs on both ends: pick an API-based BPA software (Method 1). Cheapest and most durable.
  • Scraping or filling forms inside a specific set of web pages: pick a DOM-based BPA software (Method 2).
  • A legacy Windows enterprise app with no API and no VoiceOver support: pick a pixel-OCR BPA software (Method 3). It is the only option for some of these.
  • Any workflow that runs on an analyst's or operator's Mac, touches native apps, and must keep data local: pick an accessibility-tree BPA software (Method 4). Fazm is the consumer-grade option in this category.
  • Hybrid workflows: expect to own one Method 1 tool plus one Method 4 tool. They stitch together cleanly because neither tries to do the other's job.
The uncopyable line
Anyone can claim their BPA software "works with any app." Only a software that calls AXUIElementCopyAttributeValue in its own source can prove it.
That call is on line 0 of Desktop/Sources/AppState.swift in the Fazm repo. It is how VoiceOver reads UI for blind users, and it is how Fazm reads UI for the LLM that runs on your behalf.

Not sure which input method your workflow needs?

Bring the workflow. In 20 minutes we will classify it and tell you which category of BPA software fits, even if the answer is not Fazm.

Book a call

About input methods and BPA softwares

What are the categories of business process automation softwares, and how do they actually differ?

Most lists you will find online split BPA softwares by use case, by price tier, or by "enterprise vs. SMB." The split that actually predicts what a given software can automate is the input method: how the software sees the screen and clicks the button. There are four. A software can call published APIs (Zapier, Make, Workato). It can parse the browser DOM (browser-only RPA extensions, cloud scrapers). It can read screen pixels with OCR (most Windows RPA, screenshot-based agents). Or it can read the native operating-system accessibility tree (VoiceOver, Switch Control, Fazm on macOS). Everything else, including pricing, templates, and connector count, sits downstream of that choice.

Why does the input method matter more than the feature grid?

Because it decides what the tool can physically reach. An API-based BPA software can only automate steps where every hop exposes a public API. A DOM-parsing tool can only automate what runs inside a Chromium tab. A pixel-OCR tool can click any pixel but has to guess at labels and breaks when the UI refreshes. A tool reading the accessibility tree gets semantic information for free ("the button called Save, at these coordinates, with role AXButton") in any native app that cooperates with VoiceOver, which is most of them on macOS. Tier grids in listicles cannot express that difference, so they pretend it doesn't exist.

Which method does Fazm use, and where in the code can I verify it?

Fazm is in the fourth category: native OS accessibility APIs. The proof is in Desktop/Sources/AppState.swift in the Fazm desktop repo. At line 439, the app calls AXUIElementCreateApplication with the frontmost app's process ID. At line 441, it calls AXUIElementCopyAttributeValue with kAXFocusedWindowAttribute to pull the focused window element. Those are the same Apple Accessibility framework functions VoiceOver uses to narrate apps for blind users. Further down, at line 311, AXIsProcessTrusted() confirms the user granted accessibility permission. The supporting MCP binary that drives clicks and keystrokes through the same tree is mcp-server-macos-use, bundled inside the signed app and registered at acp-bridge/src/index.ts line 1059.

How is reading the accessibility tree different from taking screenshots and OCR?

A screenshot tells you the color of a pixel. The accessibility tree tells you that the pixel belongs to an element with role AXButton, title "Save", and a description "Save this document" (or whatever the app labeled it). That is a categorical difference. A screenshot-based agent has to run vision inference on every frame just to find a button; an accessibility-based agent reads the button's name directly. In practice that means orders of magnitude less token spend, faster end-to-end latency, and far fewer false clicks when the UI redraws. The trade-off is that the app has to participate in the accessibility framework. On macOS, almost every native app does. On Windows, most enterprise RPA tools still fall back to pixels for this reason.

Does Fazm also use screenshots?

Yes, but not the way an OCR RPA tool uses them. Fazm captures windows through CGWindowListCreateImage paired with CGWindowListCopyWindowInfo, which gives it both the image pixels and the window metadata (PID, layer, owner name) at once. Code in Desktop/Sources/FloatingControlBar/ScreenCaptureManager.swift lines 30 through 58. The capture is kept locally, scaled so the longest edge is 1568 px, and auto-deleted after 30 days. It exists to give the LLM a visual sanity check on top of the structural info from the accessibility tree, not as the primary input.

If a software in one category is enough for my business, do I need a second one?

Often, yes. The four categories are complements more often than substitutes. An API-based BPA software is the right pick for SaaS-to-SaaS data hops: it is cheap, deterministic, and durable. A DOM-parsing tool is the right pick for scraping sites behind a login. Accessibility-based tools like Fazm are the right pick for anything that happens inside a native Mac app (design tools, IDEs, chat clients, privacy-sensitive workflows where your data must not leave the device). Picking by input method first, and by feature second, is the shortcut.

Which popular BPA softwares live in which category?

API-based: Zapier, Make, Workato, n8n, Tray, Pipedream, Celigo. Browser-DOM-based: Browse AI, Automatio, Axiom.ai, Browserbase. Screen-pixel / OCR-based: UiPath (classic selector + image mode), Automation Anywhere, Blue Prism, Power Automate Desktop with image selectors, most visual recorder tools, screenshot-driven agent frameworks that rely on vision models to click. Accessibility-tree-based: VoiceOver and Switch Control (not BPA softwares, but the reference implementations), Keyboard Maestro when it uses UI Browser, the macos-use ecosystem, Fazm. The listicles never tell you this because the information does not fit into a features column.

Can an accessibility-tree-based BPA software run headless or on a server?

Not usefully, because the accessibility tree is provided by a running, logged-in macOS session. That is a real constraint, not marketing. You give it up in exchange for being able to reach any native app on the machine. If your workflow is pure server-side (cron job, webhook, queue worker), an API-based BPA software is a better fit. If your workflow involves a human's actual desktop (replying to customers, editing a design file, moving info between native apps), the accessibility-tree approach wins because nothing else can reach those apps at all.

How do I verify Fazm's claims for myself?

Install Fazm, grant accessibility permission, and open Desktop/Sources/AppState.swift in the source repo. Jump to line 439 to see AXUIElementCreateApplication being called on the front app, then line 441 for the focused window query. For the screen capture story, open Desktop/Sources/FloatingControlBar/ScreenCaptureManager.swift at line 30 for the CGWindowListCreateImage call. For the integration layer, read acp-bridge/src/index.ts at line 1059 where the macos-use MCP server is registered. Every claim on this page corresponds to a grep.