Business process automation, honestly defined: the part of BPA that is really about what your software can see.

Every top-ranked definition of business process automation is basically the same sentence. Software. Repeatable processes. Multiple IT systems. Less manual work. All true. All incomplete. None of them name the one thing that decides whether a given process can actually be automated at all: the substrate the automating software is allowed to read. This page is about that missing variable, and about the layer most BPA tools never reach.

Matthew Diakonov, Written with AI

Published April 20, 202610 min read

4.9from macOS agent, April 2026 build

Bundles a Swift MCP server at Contents/MacOS/mcp-server-macos-use

Reads the AXUIElement tree of any running Mac app

Consumer app, not a developer framework

BPA, redefined by substrate

What the automation reads is what the automation can touch.

API level: only processes with an API.

DOM level: only processes inside a browser.

Pixel level: only what a vision model can see.

Accessibility tree: any app on your Mac.

Fazm lives on the last layer.

0:00 / 0:05

The definition you already know

If you look up "business process automation meaning" and open the first ten results (IBM, Red Hat, TechTarget, ServiceNow, Wikipedia, SAP, Atlassian, Splunk, Indeed, Pipefy), you will see minor rewordings of the same definition. Here it is, boiled down:

10 / 10 definitions

“Business process automation is the use of software to automate complex, repeatable business processes that touch multiple IT systems, reducing manual work and human error and freeing people for higher value tasks.”

Composite of the top 10 organic results on Google, April 2026

Accurate. Useful. And also missing the part that matters most for anyone who has ever tried to automate a real small business.

The hidden assumption

Every one of those definitions describes the work. None of them describe the interface the automation actually touches. Implicit in "touches multiple IT systems" is the assumption that every system in question exposes a web API, a SaaS webhook, a SQL endpoint, or at worst a form inside a browser. That is the assumption that quietly draws a line around what BPA can reach. Re-read the definition with the following question in mind: what if the process lives in an app that has none of those things?

The SaaS half

CRM updates, invoice routing, approvals, webhooks. Lives behind an API or a browser form. This is where every standard BPA vendor plays.

The desktop half

QuickBooks Desktop, Excel workbooks on your Mac, Finder renames, Preview PDFs, Mail.app, legacy practice-management clients, local log files. No API, no DOM. This is the half most BPA definitions pretend does not exist.

The reality

In most small and mid-size businesses, the desktop half is where the mess actually lives. It is also where the hours actually go.

The gap

A definition of BPA that ignores the desktop half is a definition that lets vendors claim victory over the easy half and quietly skip the hard half.

A better definition, organized by substrate

Here is a definition that actually predicts which processes you can automate: BPA is the act of moving a step in a business process from human hands to software, where the reach of automation is bounded by the substrate the software is allowed to read. Stack the substrates from narrow to wide, and you get a picture every top ten page skips.

The substrate stack

Layer 1 — API

Your automation calls REST, GraphQL, or gRPC endpoints. Cleanest, fastest, most reliable. Limitation: only covers processes whose underlying systems expose an API, which in most SMBs is a minority of the stack.

Layer 2 — DOM

Your automation drives a browser: Playwright, Selenium, Chrome extensions, browser-native RPA. Can reach any website. Limitation: stops at the edge of the browser. Closing the laptop lid does not make QuickBooks a website.

Layer 3 — Pixels

Your automation takes screenshots and asks a vision model to reason about the pixels. Covers anything visible. Limitation: brittle, expensive, slow, lossy. A theme change or DPI change kills templates. Token costs balloon on long workflows.

Layer 4 — Accessibility tree

Your automation reads the structured tree the operating system exposes for screen readers: AXWindow, AXButton, AXTextField, AXRow, roles, frames, labels. Covers any app that supports accessibility, which on macOS is almost all of them. Robust, text-based, cheap to prompt over, preserves invisible state.

What each substrate actually looks like

The difference is not academic. Here is the raw signal a model would receive when you ask it to click a button labeled "Save" in the same QuickBooks window, on each substrate.

layer-1-api.http

layer-3-pixels.prompt

layer-4-accessibility.txt

The anchor fact: a 1,917-line Swift MCP server, bundled

This is the part of the story every BPA definition leaves out. Fazm does not bolt accessibility on as a future feature. It bundles a native Swift MCP server inside the app bundle, and the server is the thing the agent uses on every turn to see your screen. You can verify it yourself without running a single command.

The binary lives at Fazm.app/Contents/MacOS/mcp-server-macos-use, next to the main Fazm executable. Its source is about 1,917 lines of Swift that call accessibility APIs directly. The ACP bridge registers it under the MCP name macos-use at acp-bridge/src/index.ts, lines 1056 to 1063, so the chat agent can call it just like any other tool.

Fazm.app/Contents/MacOS/mcp-server-macos-use (Swift source, excerpt)

How Fazm actually sees your screen

When you ask Fazm to do something in a Mac app, this is the chain of calls that happens behind the scenes. No screenshot is sent to the model unless one is genuinely required (for a chart image, a PDF preview, etc.). Everything else is plain, structured text.

One turn of an accessibility-tree agent

What the substrate actually buys you

Swapping substrates is not a stylistic choice. It changes what is automatable, what it costs to automate, and how often the automation breaks on a Tuesday morning when someone bumps the font size. Here is the rough picture on a typical small-business workflow.

0 locSwift MCP server bundled with Fazm

0 pxscreenshots required for a typical AX turn

0 layersin the substrate stack

0%+of Mac apps expose AX for VoiceOver

Feature	Typical BPA / RPA tool	Accessibility-tree BPA (Fazm)
Reaches desktop apps with no API	Only with brittle screen-scraping recorders	Yes, through native AXUIElement APIs
Handles theme, font, or DPI changes	Pixel templates break; rerecord required	Unaffected; the AX tree is independent of render
Knows if a button is disabled or focused	No, that state is invisible in a PNG	Yes, AX exposes enabled / focused / role
Token cost per turn	High (image tokens dominate)	Low (plain-text tree, grep-friendly)
Authoring model	Studio, workflow YAML, bot licenses	Natural language in a consumer Mac app
Runs where	Cloud VM or Windows bot runner	Your Mac, your login, your cookies

Where the desktop-half processes live

If the definition of BPA is broad enough to include desktop apps, the surface area explodes. Here is a non-exhaustive sample of the apps a Mac agent can drive through the accessibility tree, each one opaque to web-only BPA.

FinderMail.appNumbersExcel for MacWordPreviewKeynoteQuickBooks DesktopXero DesktopSafariChromeFigma desktopNotion desktopSlack desktopCalendarRemindersPhotosiMessageTerminalXcode

Why one local agent beats N cloud connectors

A 60-second definition you can actually use

If you are writing an RFP or picking a tool, forget the textbook sentence. Use this checklist instead. Anything that misses layer 4 is only automating half of your business, no matter how much it charges.

A working 2026 definition of BPA

Moves a step in a business process from human hands to software.
Reaches every layer of substrate where those steps actually live: API, DOM, pixels, and accessibility tree.
Does not require every system to expose an API before it can be touched.
Treats accessibility-tree automation as a first-class substrate, not an afterthought.
Runs locally under the user's own login when the data should not leave the machine.
Is authorable by the person who runs the process, not only by a developer.

What a real run looks like in the terminal

The MCP tool calls that produce the screen-context string return a plain text file with the AX tree, plus a summary. This is what you see if you tail the dev logs on a typical workflow.

fazm-dev.log

Why this is a consumer app, not a developer framework

The last assumption the canonical BPA definition hides is that automation is someone else's job. IT. The RPA team. A consultant. It is a fine assumption at enterprise scale and a bad one everywhere else. Fazm inverts it: the person who actually runs the process is the person who describes it, in plain English, in a chat box on their Mac. The MCP server is bundled. The accessibility walker is bundled. Permissions take one click. The tool meets the user at their actual skill level: normal.

The practical consequence: you can take a process that lives in, say, QuickBooks Desktop plus Mail plus a folder of PDFs in Finder, and turn it into a one-sentence workflow without writing code, without picking connectors, and without waiting for the vendor to ship an API.

Want to see your desktop-half process automated?

Book a call. We will walk through the steps that live in your Mac apps and show how the accessibility tree turns them into a single prompt.

FAQ on the meaning of business process automation

What is the standard textbook definition of business process automation?

The shared definition across IBM, Red Hat, TechTarget, ServiceNow, Wikipedia, SAP, Atlassian, Splunk, Indeed and Pipefy is roughly the same sentence: BPA is the use of software to automate complex, repeatable business processes that touch multiple IT systems, with the goal of reducing manual work, cutting errors, and freeing people for higher value work. It is a correct definition. It also hides a big assumption: that every step already lives inside a system with an API or a web form.

What is missing from that definition?

The substrate. None of the top 10 definitions say anything about what the automation actually reads to see a screen. That matters, because the substrate is what sets the ceiling. An API-driven BPA tool can only touch processes that expose an API. A DOM-scraping BPA tool can only touch processes inside a browser. A screenshot-driven agent can only touch what its vision model can see well enough to click. A BPA tool that reads the operating system accessibility tree can touch basically any app on the machine.

So what is a more honest 2026 definition?

Business process automation is the act of moving a step in a business process from a human's hands to software, where the reach of automation is bounded by the substrate the automating software is allowed to read. BPA is not one thing. It is a stack, layered by substrate: API-level, DOM-level, pixel-level, and accessibility-tree-level. Each layer covers more surface than the one below it. Most vendors only sit on one layer.

How does Fazm fit that definition?

Fazm is a consumer Mac app that ships a Swift MCP server inside its application bundle, at Contents/MacOS/mcp-server-macos-use. That binary calls AXUIElementCreateApplication(pid) and AXUIElementCopyAttributeValue on kAXWindows, kAXChildrenAttribute, kAXRoleAttribute, kAXMainWindow, and AXSheet to walk the accessibility tree of any running Mac app. The LLM sees the same structured tree VoiceOver uses. There is no screenshot OCR, no image template matching, no headless browser. If an app exposes accessibility (almost every Mac app does), Fazm can drive it.

How can I verify the accessibility-tree claim myself?

Right click Fazm in /Applications, choose Show Package Contents, open Contents/MacOS. You will see an executable named mcp-server-macos-use sitting next to the Fazm binary. That is the Swift MCP server. The source behind it is approximately 1,917 lines, centered on AX calls like AXUIElementCopyAttributeValue(window, kAXChildrenAttribute as CFString, &children). The ACP bridge registers it under the MCP name 'macos-use' in acp-bridge/src/index.ts, lines 1056 to 1063.

Why is screenshot-based automation weaker than accessibility-tree automation?

Three reasons. First, screenshots require a vision model to re-interpret the screen on every turn, which burns tokens and time. Second, pixel matching is brittle: a theme change, a font change, a DPI change, a macOS version bump, and the template is dead. Third, screenshots are lossy. An accessibility tree preserves invisible state like roles, enabled/disabled flags, screen-reader labels, scroll offsets, and sheet detection. A PNG does not.

Is Fazm a developer framework or a consumer app?

Consumer app. You download a signed, notarized DMG, drag it into /Applications, grant Accessibility and Screen Recording permissions once, and then describe the workflow you want in plain English. There is no SDK to learn, no YAML to write, no workflow studio. The MCP server is bundled. The accessibility walker is bundled. The LLM is wired in. You just talk to it and watch it drive your apps.

Which kinds of business processes does an accessibility-tree tool unlock that a web-only BPA tool cannot?

Anything that currently ends with a human clicking through a Mac app. Export a report out of QuickBooks Desktop, rename a folder in Finder based on content, pull a row out of an Excel sheet and paste it into Mail, reconcile a Numbers workbook against a web dashboard, process a batch of PDFs in Preview, trigger an action inside a legacy practice-management client that has no API, watch a local log file and alert in Slack when something fires. Every one of those is a normal small-business BPA ask. None of them work cleanly on web-only BPA platforms.

How does this compare to RPA tools like UiPath or Automation Anywhere?

Traditional RPA also touches the desktop, but it is built for enterprise IT: heavy Windows-first installers, studio-based workflow authoring, bot licenses, and screen-scraping recorders. Fazm is the consumer-Mac inversion of that: no studio, no bot licenses, no screen recorders, no enterprise onboarding. You describe the task in natural language and it walks the same accessibility tree a screen reader would walk, which is more robust than the image-based approach most RPA tools still fall back to on non-standard UIs.

Is it safe to run this kind of automation locally?

Running on your own Mac, under your own login, is usually safer than handing cookies and credentials to a cloud RPA bot. Your data does not leave the machine unless a specific tool call sends it. The browser fingerprint is your real Chrome, so logged-in sites do not flag you as a bot. Turnstile and captchas pass because the session is the session. The main thing to be careful of is the usual: review the first run of any new workflow before letting it run unattended, especially anything that writes or sends.

Other guides that attack the same substrate problem from different angles.

Keep reading

Guide

Advantages of business process automation: the 80% that top guides skip

The standard benefits list describes the SaaS half. This is about the desktop half the same list quietly ignores.

9 minRead

Technical

Open computer agent: why the best ones skip screenshots

A deeper technical dive on the accessibility-tree approach and why screenshot-driven agents keep breaking.

11 minRead

Comparison

Business process automation company picks

A vendor comparison lens on the same substrate question, for readers who are actively evaluating.

8 minRead