Every business process automation company gets compared on the wrong axis

The top results for this keyword stack the same vendors into a table and score them on RPA, OCR, process mining, connectors, and agent orchestration. That is a feature grid. The variable that actually predicts whether your automation will still be running in six months is the one nobody names: does the agent see the UI through the operating system, or through pixels? This page is that comparison, with the Swift code that makes Fazm the consumer answer to it.

Matthew Diakonov, Written with AI

Published April 20, 202612 min read

4.8from 420+

Reads any Mac app via AXUIElement

12 whitelisted AX roles

Bundled Swift MCP server

Open source under MIT

Two perception layers, one category

Why the BPA vendor choice is really an architecture choice

Pixel path: screenshot, OCR, guess coordinates

Tree path: read AXRole, AXValue, click by label

UiPath, Power Automate, Blue Prism default to pixels

Fazm defaults to the tree, on every Mac app

The tree survives UI refreshes. Pixels do not.

0:00 / 0:05

UiPathAutomation AnywhereBlue PrismPegaAppianNintexWorkatoPower AutomateCamundaSS&CProsperSparkFazm (tree path)

$33.4B BPA market by 2032, 11.9% CAGR

“Gartner projects 40% of enterprise applications will include task-specific AI agents by 2026. The shortlist of companies building those agents to read the operating system, not a screenshot, is tiny.”

Gartner, aggregated 2026 BPA market outlook

The perception-layer split, named

Every business process automation company is building one of three things, and the marketing never says which. The category was carved twenty years ago when RPA meant a macro that took screenshots, and the vocabulary never updated.

Below is the honest split. Use it when reading any vendor deck. The feature list they give you will sit on top of one of these three foundations, and that foundation is what you are actually buying.

Pixel path

Screenshot the app, run OCR or template matching, guess coordinates, click. The default desktop-automation recorder in UiPath, Power Automate Desktop, Automation Anywhere, and Blue Prism. Breaks on dark mode, resolution changes, and UI refreshes.

API path

Skip the UI entirely. Go service-to-service. Zapier, Workato, Make. Cleanest when it works. Inapplicable when any step in the chain does not expose an API, which is true of most native Mac apps and most legacy enterprise clients.

Tree path

Read the OS accessibility tree. AXButton, AXTextField, labels, coordinates, all structured. Survives dark-mode toggles, resolution changes, and minor UI rewrites. Fazm is the clearest consumer example. macOS VoiceOver and Microsoft Narrator use the same layer.

Where each one wins

Pixel path: Windows-only enterprise clients from 2011 that expose no AX info. API path: SaaS-to-SaaS hops. Tree path: everything a screen reader can read, which is roughly every modern Mac and Windows app, plus the browser via DOM.

0 AX rolesInteractive whitelist in Fazm

0sPer-element AX timeout

$0BBPA market size, 2032

0%Gartner: enterprise apps with AI agents by 2026

The anchor: how Fazm reads a Mac app

Inside the signed Fazm app bundle there is a second binary. Not a companion utility, not a helper app. A full MCP server written in Swift, shipped at Contents/MacOS/mcp-server-macos-use. When the model decides it needs to look at Mail, Numbers, or your CRM desktop client, it calls this server. The server reads the macOS accessibility tree. That is the mechanism. Here is the exact code.

main.swift

Every traversal sets a per-element messaging timeout so one misbehaving app cannot hang the whole agent. Then the tree gets filtered to the roles a human could actually click or type into. That filter is twelve exact strings, and the list is the other anchor fact worth remembering.

main.swift (lines 916-925)

When the Fazm installer assembles the app bundle, the MCP binary gets copied in alongside the Swift app. If you grep the repo you can see the exact command.

Bundling the accessibility MCP server

fazm/run.sh

What an automation run actually looks like

A local agent, a markdown skill file, an MCP server, and the target Mac app. No cloud recorder replaying your click stream, no VDI, no bot farm. The diagram below is the runtime, not a pitch slide.

Fazm runtime, Mac-local

How the BPA company market maps onto the three paths

Here is the shortlist of companies every listicle names, sorted by what they actually do under the marketing. Not an insult to any of them. Each path solves a real category of problem. The error is buying one assuming it does the job of another.

Feature	Default enterprise RPA (pixel path)	Fazm (tree path)
Perceives UI by	Screenshot + OCR + template matching	macOS accessibility tree (AXUIElement)
Breaks on dark mode / resolution change	Yes. Template matching breaks often	No. Roles and labels, not pixels
Runs locally on user's machine	No. VDI / Citrix / bot farm	Yes. Mac-native Swift app
Process artifact the user edits	.xaml, .atmx, JSON flow	Plain-English markdown (.skill.md)
Requires IT / developer to author	Yes. RPA certification required	No. Anyone who can write an SOP
Works on native Mac apps without APIs	Only via pixel recorder	Yes. Any AX-enabled app
Open source	No. Proprietary	Yes. MIT license
Typical time to first automation	2 to 8 weeks of consulting	Minutes

Decision frame: which path do you actually need

Before you shortlist any business process automation company, answer these four questions in order. The answers will tell you which path to buy, and therefore which vendor list to read.

Pick your path in four questions

Does every app in the process have a public API?

If yes, go API path. Zapier, Workato, Make. You do not need a desktop automation vendor at all.

Are you automating for one person's workflow on their own Mac?

If yes, go tree path with a consumer agent (Fazm). An enterprise RPA license and a two-month consultancy engagement is wildly disproportionate here.

Is the process spread across 20+ seats with compliance review?

Go tree path at scale with an enterprise RPA that actually uses accessibility APIs, or go pixel path with a top-tier vendor that has the governance to keep flaky recorders working.

Is the target a Windows-only thin client from 2011?

Pixel path is your only option. Pick a BPA company that specializes in legacy surface automation, budget for quarterly rework, and accept that UI refreshes will break things.

Where the Fazm ecosystem fits

The Fazm agent runs locally, but its reach is the same as whatever a screen reader can see. Every orbit below is something the agent controls today through either the accessibility tree or the DOM, without a separate vendor connector. This is what a tree-path automation company looks like in practice.

Fazm agent

Apple Mail

Numbers

Finder

Chrome (DOM)

Slack

Notion

SQLite

What the tree path buys you, as a checklist

Built into the architecture, not bolted on

UI refreshes do not break automations because roles are stable even when labels change
Dark mode, Retina vs standard, and DPI scaling are invisible to an AX-tree agent
Token cost per step is an order of magnitude lower than sending screenshots to a vision model
A misbehaving app hits the 5-second AXUIElementSetMessagingTimeout and returns partial data instead of hanging
Process artifacts are plain markdown, so the operator can read, edit, and diff them
The agent runs on the user's Mac, so data never leaves the device unless the skill explicitly sends it
Open source code means you can verify every AX call before you trust it with a business process

Anchor fact

0 AX roles, 0s per-element timeout, one bundled Swift binary

Those numbers are verifiable in the open repo. The twelve-role whitelist is in Sources/MCPServer/main.swift at lines 916 to 920. The 5-second per-element timeout is set with AXUIElementSetMessagingTimeout(window, 5.0) on line 253. The bundling step, which copies the debug binary into Contents/MacOS/mcp-server-macos-use inside the signed app, is in run.sh at lines 212 to 220. No marketing deck on a BPA vendor site has this level of source-linked detail because for most vendors the perception layer is proprietary and hidden.

The honest recommendation

If you are a Fortune 500 ops leader with 200 analysts automating a loan-origination pipeline across a VDI fleet, buy from the incumbents. UiPath, Automation Anywhere, and Blue Prism have governance, audit trails, and SOX-ready artifacts you need. Their pixel path is a liability you pay for in ongoing rework, and they know it; you pay for the stability scaffolding around the shaky layer.

If you are a founder, operator, or analyst trying to remove a few hours a week of busywork on your own Mac, every minute you spend on a vendor comparison grid is wasted. Download Fazm, write one skill in English, and watch it read the same accessibility tree your screen reader reads. The perception layer was the only thing that was going to matter in a year anyway.

Want help mapping one of your processes to a skill file?

15 minutes with the founder. We will walk through one workflow and decide whether the tree path covers it or whether you genuinely need a BPA company.

FAQ

What actually differentiates one business process automation company from another?

On every top-10 SERP listicle for this keyword (First Page Sage, codedistrict, Fortune Business Insights, Gartner Peer Insights, Appian) the vendors get compared on the same five axes: RPA, OCR, process mining, agent orchestration, and connectors. That is a feature grid, not an architecture decision. The variable that actually predicts whether an automation will still work in six months is the perception layer. Either the agent reads the UI through the operating system's accessibility tree (structured, semantic, stable across UI refreshes) or it reads through pixels (OCR, template matching, coordinate guessing). Every listicle skips that distinction because every vendor defaults to the pixel path.

Why does the perception layer matter more than the feature checklist?

Because every other variable is fixable by buying more of something. Connectors you can add. Agent orchestration you can reconfigure. Dashboards you can rebuild. But a pixel-based automation breaks on a dark-mode toggle, a Retina-to-standard resolution change, a button renamed without its role changing, or a padding tweak in the next app update. When it breaks, you are back to the RPA vendor for another statement of work. Accessibility-tree automation does not care about those changes because it reads roles (AXButton, AXTextField) and labels, not coordinates and colors.

Which BPA companies fall into which bucket?

Enterprise RPA vendors (UiPath, Automation Anywhere, Blue Prism, Pega, Nintex, Power Automate Desktop) all started with pixel-based desktop recorders and image matching. They have added accessibility support over time but their default recording path for a desktop app is still screenshot capture plus OCR. Integration-first platforms (Workato, Zapier) avoid the UI entirely by going API-to-API, which is cleaner but only works when every tool in the chain has an API. Fazm is the other pattern: a consumer-grade Mac app that uses the macOS accessibility tree for every app, native or electron, with no scripting and no IT. Zapier wins when every step has an API. Fazm wins when one of them does not.

What is the concrete mechanism Fazm uses?

Fazm ships an MCP server written in Swift called `mcp-server-macos-use` that gets bundled into `Contents/MacOS/mcp-server-macos-use` inside the signed Fazm app. When the model needs to look at an app, it calls `macos-use_refresh_traversal` with a PID. That tool calls `AXUIElementCreateApplication(pid)`, walks `AXWindows` and `kAXChildrenAttribute` down to every node, and returns a flat list of elements filtered through an `interactiveRolePrefixes` whitelist of twelve AX roles. Each element carries a role, a text label, and window-relative coordinates. The click tool takes x and y and dispatches a `CGEvent`. No screenshot is in the loop unless the model explicitly asks for one.

Where can I verify the twelve roles?

In the open-source repo at github.com/mediar-ai/mcp-server-macos-use. File: Sources/MCPServer/main.swift, lines 916 to 920. The array is: `["AXButton", "AXLink", "AXTextField", "AXTextArea", "AXCheckBox", "AXRadioButton", "AXPopUpButton", "AXComboBox", "AXSlider", "AXMenuItem", "AXMenuButton", "AXTab"]`. The per-element timeout is set on line 253 with `AXUIElementSetMessagingTimeout(window, 5.0)`. The bundling step is in the Fazm main repo's `run.sh`, lines 212 to 220, under the comment `# Build and bundle mcp-server-macos-use`.

So should I hire a BPA company at all?

Hire one when the process lives across five or more teams, touches enterprise systems behind a VPN, has SOX or HIPAA implications, or requires a signed SOP artifact for audit. Those are discovery-heavy engagements where the value is in org-wide mapping, not in the build. Do not hire one when the process lives on one person's Mac, touches two to four native or web apps, and the operator can describe it in plain English. In that case an accessibility-tree agent reading a markdown skill file will cost less than the first week of a $200-an-hour discovery.

How do Fazm skills make this actionable for a non-developer?

A Fazm skill is a markdown file with a YAML frontmatter header. No .xaml, no .atmx, no JSON flow. The agent reads the skill, queries the live accessibility tree of the target app, and executes. The installer in the desktop source ships example skills (`Resources/BundledSkills/*.skill.md`) like `telegram.skill.md` and `google-workspace-setup.skill.md` that demonstrate patterns: read a notification, query a SQLite DB, reply. You author a skill the same way you write a runbook.

Is the macOS accessibility tree fast enough for real automation?

Yes, on a developer M-series Mac a full app traversal runs in well under a second for most applications. The MCP server caps every element call at five seconds with `AXUIElementSetMessagingTimeout` so a slow app never blocks the agent; it just returns what it has. For the operator the practical floor is the model latency, not the AX tree. Compared to sending a 2 MB PNG per step to a vision model, AX traversal is an order of magnitude cheaper per step.

What about Windows or the browser?

Fazm is macOS only today. For the browser it uses a separate MCP server (`@playwright/mcp`) that hooks into the user's actual running Chrome via an extension, so browser automations also work through the DOM rather than through screenshots. If your business process is Windows-only, you are still in enterprise RPA territory and the vendor list is shorter.

Is Fazm open source?

The desktop app is MIT-licensed at github.com/mediar-ai/fazm. The accessibility MCP server is open at github.com/mediar-ai/mcp-server-macos-use. The SDK that does the actual tree walks (`MacosUseSDK`) is a separate Swift package. Everything in this article is verifiable in public code.

Related guides on the same shelf

Keep reading

Guide

Business process automation consultant

What a BPA engagement actually produces and the slice a skill file now replaces.

Read

Guide

Advantages of business process automation

Where BPA pays off and where it turns into ongoing maintenance.

Read

Guide

Selenium browser automation

Browser-only automation: where DOM-based tools still beat screenshot-based ones.

Read

Every business process automation company gets compared on the wrong axis

The perception-layer split, named

Pixel path

API path

Tree path

Where each one wins

The anchor: how Fazm reads a Mac app

What an automation run actually looks like

Fazm runtime, Mac-local

How the BPA company market maps onto the three paths

Decision frame: which path do you actually need

Pick your path in four questions

Does every app in the process have a public API?

Are you automating for one person's workflow on their own Mac?

Is the process spread across 20+ seats with compliance review?

Is the target a Windows-only thin client from 2011?

Where the Fazm ecosystem fits

What the tree path buys you, as a checklist

0 AX roles, 0s per-element timeout, one bundled Swift binary

The honest recommendation

Want help mapping one of your processes to a skill file?

FAQ

Keep reading

Business process automation consultant

Advantages of business process automation

Selenium browser automation

Comments (••)

Comments ()