Every business process automation company gets compared on the wrong axis
The top results for this keyword stack the same vendors into a table and score them on RPA, OCR, process mining, connectors, and agent orchestration. That is a feature grid. The variable that actually predicts whether your automation will still be running in six months is the one nobody names: does the agent see the UI through the operating system, or through pixels? This page is that comparison, with the Swift code that makes Fazm the consumer answer to it.
“Gartner projects 40% of enterprise applications will include task-specific AI agents by 2026. The shortlist of companies building those agents to read the operating system, not a screenshot, is tiny.”
Gartner, aggregated 2026 BPA market outlook
The perception-layer split, named
Every business process automation company is building one of three things, and the marketing never says which. The category was carved twenty years ago when RPA meant a macro that took screenshots, and the vocabulary never updated.
Below is the honest split. Use it when reading any vendor deck. The feature list they give you will sit on top of one of these three foundations, and that foundation is what you are actually buying.
Pixel path
Screenshot the app, run OCR or template matching, guess coordinates, click. The default desktop-automation recorder in UiPath, Power Automate Desktop, Automation Anywhere, and Blue Prism. Breaks on dark mode, resolution changes, and UI refreshes.
API path
Skip the UI entirely. Go service-to-service. Zapier, Workato, Make. Cleanest when it works. Inapplicable when any step in the chain does not expose an API, which is true of most native Mac apps and most legacy enterprise clients.
Tree path
Read the OS accessibility tree. AXButton, AXTextField, labels, coordinates, all structured. Survives dark-mode toggles, resolution changes, and minor UI rewrites. Fazm is the clearest consumer example. macOS VoiceOver and Microsoft Narrator use the same layer.
Where each one wins
Pixel path: Windows-only enterprise clients from 2011 that expose no AX info. API path: SaaS-to-SaaS hops. Tree path: everything a screen reader can read, which is roughly every modern Mac and Windows app, plus the browser via DOM.
The anchor: how Fazm reads a Mac app
Inside the signed Fazm app bundle there is a second binary. Not a companion utility, not a helper app. A full MCP server written in Swift, shipped at Contents/MacOS/mcp-server-macos-use. When the model decides it needs to look at Mail, Numbers, or your CRM desktop client, it calls this server. The server reads the macOS accessibility tree. That is the mechanism. Here is the exact code.
Every traversal sets a per-element messaging timeout so one misbehaving app cannot hang the whole agent. Then the tree gets filtered to the roles a human could actually click or type into. That filter is twelve exact strings, and the list is the other anchor fact worth remembering.
When the Fazm installer assembles the app bundle, the MCP binary gets copied in alongside the Swift app. If you grep the repo you can see the exact command.
What an automation run actually looks like
A local agent, a markdown skill file, an MCP server, and the target Mac app. No cloud recorder replaying your click stream, no VDI, no bot farm. The diagram below is the runtime, not a pitch slide.
Fazm runtime, Mac-local
How the BPA company market maps onto the three paths
Here is the shortlist of companies every listicle names, sorted by what they actually do under the marketing. Not an insult to any of them. Each path solves a real category of problem. The error is buying one assuming it does the job of another.
| Feature | Default enterprise RPA (pixel path) | Fazm (tree path) |
|---|---|---|
| Perceives UI by | Screenshot + OCR + template matching | macOS accessibility tree (AXUIElement) |
| Breaks on dark mode / resolution change | Yes. Template matching breaks often | No. Roles and labels, not pixels |
| Runs locally on user's machine | No. VDI / Citrix / bot farm | Yes. Mac-native Swift app |
| Process artifact the user edits | .xaml, .atmx, JSON flow | Plain-English markdown (.skill.md) |
| Requires IT / developer to author | Yes. RPA certification required | No. Anyone who can write an SOP |
| Works on native Mac apps without APIs | Only via pixel recorder | Yes. Any AX-enabled app |
| Open source | No. Proprietary | Yes. MIT license |
| Typical time to first automation | 2 to 8 weeks of consulting | Minutes |
Decision frame: which path do you actually need
Before you shortlist any business process automation company, answer these four questions in order. The answers will tell you which path to buy, and therefore which vendor list to read.
Pick your path in four questions
Does every app in the process have a public API?
If yes, go API path. Zapier, Workato, Make. You do not need a desktop automation vendor at all.
Are you automating for one person's workflow on their own Mac?
If yes, go tree path with a consumer agent (Fazm). An enterprise RPA license and a two-month consultancy engagement is wildly disproportionate here.
Is the process spread across 20+ seats with compliance review?
Go tree path at scale with an enterprise RPA that actually uses accessibility APIs, or go pixel path with a top-tier vendor that has the governance to keep flaky recorders working.
Is the target a Windows-only thin client from 2011?
Pixel path is your only option. Pick a BPA company that specializes in legacy surface automation, budget for quarterly rework, and accept that UI refreshes will break things.
Where the Fazm ecosystem fits
The Fazm agent runs locally, but its reach is the same as whatever a screen reader can see. Every orbit below is something the agent controls today through either the accessibility tree or the DOM, without a separate vendor connector. This is what a tree-path automation company looks like in practice.
What the tree path buys you, as a checklist
Built into the architecture, not bolted on
- UI refreshes do not break automations because roles are stable even when labels change
- Dark mode, Retina vs standard, and DPI scaling are invisible to an AX-tree agent
- Token cost per step is an order of magnitude lower than sending screenshots to a vision model
- A misbehaving app hits the 5-second AXUIElementSetMessagingTimeout and returns partial data instead of hanging
- Process artifacts are plain markdown, so the operator can read, edit, and diff them
- The agent runs on the user's Mac, so data never leaves the device unless the skill explicitly sends it
- Open source code means you can verify every AX call before you trust it with a business process
Anchor fact
0 AX roles, 0s per-element timeout, one bundled Swift binary
Those numbers are verifiable in the open repo. The twelve-role whitelist is in Sources/MCPServer/main.swift at lines 916 to 920. The 5-second per-element timeout is set with AXUIElementSetMessagingTimeout(window, 5.0) on line 253. The bundling step, which copies the debug binary into Contents/MacOS/mcp-server-macos-use inside the signed app, is in run.sh at lines 212 to 220. No marketing deck on a BPA vendor site has this level of source-linked detail because for most vendors the perception layer is proprietary and hidden.
The honest recommendation
If you are a Fortune 500 ops leader with 200 analysts automating a loan-origination pipeline across a VDI fleet, buy from the incumbents. UiPath, Automation Anywhere, and Blue Prism have governance, audit trails, and SOX-ready artifacts you need. Their pixel path is a liability you pay for in ongoing rework, and they know it; you pay for the stability scaffolding around the shaky layer.
If you are a founder, operator, or analyst trying to remove a few hours a week of busywork on your own Mac, every minute you spend on a vendor comparison grid is wasted. Download Fazm, write one skill in English, and watch it read the same accessibility tree your screen reader reads. The perception layer was the only thing that was going to matter in a year anyway.
Want help mapping one of your processes to a skill file?
15 minutes with the founder. We will walk through one workflow and decide whether the tree path covers it or whether you genuinely need a BPA company.
Book a call →FAQ
What actually differentiates one business process automation company from another?
On every top-10 SERP listicle for this keyword (First Page Sage, codedistrict, Fortune Business Insights, Gartner Peer Insights, Appian) the vendors get compared on the same five axes: RPA, OCR, process mining, agent orchestration, and connectors. That is a feature grid, not an architecture decision. The variable that actually predicts whether an automation will still work in six months is the perception layer. Either the agent reads the UI through the operating system's accessibility tree (structured, semantic, stable across UI refreshes) or it reads through pixels (OCR, template matching, coordinate guessing). Every listicle skips that distinction because every vendor defaults to the pixel path.
Why does the perception layer matter more than the feature checklist?
Because every other variable is fixable by buying more of something. Connectors you can add. Agent orchestration you can reconfigure. Dashboards you can rebuild. But a pixel-based automation breaks on a dark-mode toggle, a Retina-to-standard resolution change, a button renamed without its role changing, or a padding tweak in the next app update. When it breaks, you are back to the RPA vendor for another statement of work. Accessibility-tree automation does not care about those changes because it reads roles (AXButton, AXTextField) and labels, not coordinates and colors.
Which BPA companies fall into which bucket?
Enterprise RPA vendors (UiPath, Automation Anywhere, Blue Prism, Pega, Nintex, Power Automate Desktop) all started with pixel-based desktop recorders and image matching. They have added accessibility support over time but their default recording path for a desktop app is still screenshot capture plus OCR. Integration-first platforms (Workato, Zapier) avoid the UI entirely by going API-to-API, which is cleaner but only works when every tool in the chain has an API. Fazm is the other pattern: a consumer-grade Mac app that uses the macOS accessibility tree for every app, native or electron, with no scripting and no IT. Zapier wins when every step has an API. Fazm wins when one of them does not.
What is the concrete mechanism Fazm uses?
Fazm ships an MCP server written in Swift called `mcp-server-macos-use` that gets bundled into `Contents/MacOS/mcp-server-macos-use` inside the signed Fazm app. When the model needs to look at an app, it calls `macos-use_refresh_traversal` with a PID. That tool calls `AXUIElementCreateApplication(pid)`, walks `AXWindows` and `kAXChildrenAttribute` down to every node, and returns a flat list of elements filtered through an `interactiveRolePrefixes` whitelist of twelve AX roles. Each element carries a role, a text label, and window-relative coordinates. The click tool takes x and y and dispatches a `CGEvent`. No screenshot is in the loop unless the model explicitly asks for one.
Where can I verify the twelve roles?
In the open-source repo at github.com/mediar-ai/mcp-server-macos-use. File: Sources/MCPServer/main.swift, lines 916 to 920. The array is: `["AXButton", "AXLink", "AXTextField", "AXTextArea", "AXCheckBox", "AXRadioButton", "AXPopUpButton", "AXComboBox", "AXSlider", "AXMenuItem", "AXMenuButton", "AXTab"]`. The per-element timeout is set on line 253 with `AXUIElementSetMessagingTimeout(window, 5.0)`. The bundling step is in the Fazm main repo's `run.sh`, lines 212 to 220, under the comment `# Build and bundle mcp-server-macos-use`.
So should I hire a BPA company at all?
Hire one when the process lives across five or more teams, touches enterprise systems behind a VPN, has SOX or HIPAA implications, or requires a signed SOP artifact for audit. Those are discovery-heavy engagements where the value is in org-wide mapping, not in the build. Do not hire one when the process lives on one person's Mac, touches two to four native or web apps, and the operator can describe it in plain English. In that case an accessibility-tree agent reading a markdown skill file will cost less than the first week of a $200-an-hour discovery.
How do Fazm skills make this actionable for a non-developer?
A Fazm skill is a markdown file with a YAML frontmatter header. No .xaml, no .atmx, no JSON flow. The agent reads the skill, queries the live accessibility tree of the target app, and executes. The installer in the desktop source ships example skills (`Resources/BundledSkills/*.skill.md`) like `telegram.skill.md` and `google-workspace-setup.skill.md` that demonstrate patterns: read a notification, query a SQLite DB, reply. You author a skill the same way you write a runbook.
Is the macOS accessibility tree fast enough for real automation?
Yes, on a developer M-series Mac a full app traversal runs in well under a second for most applications. The MCP server caps every element call at five seconds with `AXUIElementSetMessagingTimeout` so a slow app never blocks the agent; it just returns what it has. For the operator the practical floor is the model latency, not the AX tree. Compared to sending a 2 MB PNG per step to a vision model, AX traversal is an order of magnitude cheaper per step.
What about Windows or the browser?
Fazm is macOS only today. For the browser it uses a separate MCP server (`@playwright/mcp`) that hooks into the user's actual running Chrome via an extension, so browser automations also work through the DOM rather than through screenshots. If your business process is Windows-only, you are still in enterprise RPA territory and the vendor list is shorter.
Is Fazm open source?
The desktop app is MIT-licensed at github.com/mediar-ai/fazm. The accessibility MCP server is open at github.com/mediar-ai/mcp-server-macos-use. The SDK that does the actual tree walks (`MacosUseSDK`) is a separate Swift package. Everything in this article is verifiable in public code.
Related guides on the same shelf
Keep reading
Business process automation consultant
What a BPA engagement actually produces and the slice a skill file now replaces.
Advantages of business process automation
Where BPA pays off and where it turns into ongoing maintenance.
Selenium browser automation
Browser-only automation: where DOM-based tools still beat screenshot-based ones.