The real advantage of business process automation is the 80% of work that your BPA tool cannot see.

Every top guide lists the same six benefits: cost, accuracy, compliance, productivity, scale, standardization. They are all real, and they all describe the same thing: automating the slice of work that already lives in a web form. This page is about the advantage nobody writes about, because nobody else can touch it. The desktop apps. The native dialogs. The Finder windows. The legacy GUIs with no API. Read on.

Matthew Diakonov, Fazm

Published April 20, 20269 min read

4.9from macOS agent, April 2026 build

Bundles a Swift MCP server for native accessibility

Reads AXUIElement trees, not screenshots

Drives any Mac app, not just the browser

BPA that reads the OS

Not screenshots. Not screen recordings. Not OCR.

Web-only BPA automates your forms.

Fazm also automates your desktop apps.

It reads the macOS accessibility tree.

AXUIElement, kAXChildrenAttribute, kAXRoleAttribute.

The same structure VoiceOver uses.

0:00 / 0:05

What the top 5 guides all say

I pulled the top five organic results for this keyword and read them end to end. Every one of them centers the same benefit list. It is a completely fair list. It is also a completely incomplete list.

Feature	What every top guide covers	What is missing
Cost savings	Processing costs drop up to 50%. Save 240+ hours per employee per year.	Only counted against SaaS work. Time lost on desktop apps never shows up because it never gets measured.
Accuracy	Automated systems do not get tired. Each form entry is identical. Fewer typos in the CRM.	Screenshot-based automation introduces its own errors: pixel drift after OS updates, OCR misreads, stale templates.
Compliance and audit trails	Workflow runs are logged. Every step has an owner. Auditors can trace approvals.	Only the web portion is traced. The desktop steps your team still does by hand vanish from the audit log.
Employee productivity	Staff move off repetitive tasks and onto strategic work.	Not if the repetitive task is in your accounting desktop client that no BPA vendor integrates with.
Scalability	Add volume without adding headcount.	Unless the bottleneck is a native app dialog a human still has to click, which it usually is.
Standardization	Every run follows the same rules.	Only across the 20% of your stack that is already a SaaS form. The rest still drifts by person.

None of the top five guides mention the accessibility tree. None of them mention desktop apps. None of them mention native automation APIs. That is the gap.

The anchor fact, in one paragraph

Open the Fazm app bundle on your Mac. Right click the app icon in Applications, Show Package Contents, Contents, MacOS. Next to the Fazm binary is an executable named mcp-server-macos-use. That is a Swift MCP server bundled with the app. It is registered by the ACP bridge at startup and is how Fazm controls native macOS apps.

acp-bridge/src/index.ts (lines 63, 1056-1063)

The server itself is a thin wrapper around MacosUseSDK, which does the actual accessibility work. The code that reads a running app's window tree looks like this:

mcp-server-macos-use/Sources/MCPServer/main.swift

No CGWindowListCreateImage. No OCR. No vision model ingesting pixels. The LLM reads a structured tree of roles, labels, positions, and values, the same tree VoiceOver reads when a blind user navigates your Mac.

Screenshot-based automation vs. accessibility-based automation

This is the quiet technical gap behind the quiet content gap. Almost every "AI automates your Mac" product you have seen in the last year screenshots the screen, sends the PNG to a vision model, and asks it to figure out where the button is. Fazm does not.

How the agent decides where to click

Capture the screen. Downscale to something the model can handle. Send to a vision model. Ask for coordinates. Click there. Pray the UI did not shift three pixels since the last tick.

Re-OCRs the same labels on every tick, burning tokens
Breaks on dark mode, OS updates, retina scaling changes
Cannot read disabled state, focus, scroll position, ARIA role
Image compression can turn a checkbox into a blur
Works on any OS in theory, reliable on none

What this changes about the "advantages" list

Same six benefits the other guides list. Different scope, because the agent can touch work the other tools cannot.

Cost savings, including the invisible costs

Standard BPA measures minutes saved on the forms it can already reach. The real saving, the one that never shows up in any ROI deck, is the 30 minutes a bookkeeper spends every day in a desktop accounting client that the SaaS BPA vendor does not integrate with. Agent that reads the accessibility tree reaches into that window and does the work, so the time actually disappears from the ledger.

Accuracy that survives OS updates

Accessibility contracts are stable. A button is the same button after a dark-mode toggle, a retina scaling change, or a macOS point release. Screenshot templates rot. Accessibility references do not.

Compliance that covers desktop work

Audit trails only matter if they cover the whole process. Agent logs every tool call, which app, which window, which element, which value. The desktop half of the process finally lands in the same log as the SaaS half.

Productivity gains that actually free the person

The tasks people hate most are the ones stuck inside desktop apps nobody has automated yet. Pulling those off a human is the step that actually unlocks the strategic work the benefit lists promise.

Scale that does not require a new SaaS seat

Because the agent runs under a logged-in user on a Mac, volume does not trigger per-seat license upcharges or new API quotas. The Mac is the quota.

Standardization across the whole stack

A single agent, writing to a single log, touching both Chrome and native apps, gives you one consistent representation of how a process actually runs across the whole company. Not one standard for web, a different standard for desktop, and a gray zone for the legacy client. One standard.

Who this advantage actually matters for

If everything you do is already a SaaS form, traditional BPA is fine. These are the shops where the other advantage kicks in.

0%Share of SMB workflows with at least one desktop-app step

0sAXUIElementSetMessagingTimeout per element (Fazm default)

0Automation surfaces: Chrome + native macOS, one agent

0Screenshots required on the primary code path

Workflow types this unlocks

Invoice ingestion with a desktop accounting client

Read unread invoice emails in Mail.app, extract line items, switch to the accounting desktop app, fill a new bill window, save, file the PDF in Finder. None of these steps exist in a web form.

Practice management in healthcare, legal, and accounting

Most vertical desktop tools have no usable API. The accessibility tree is often the only programmatic way in, and every mainstream macOS app exposes one.

Media and design ops on Photoshop, Figma desktop, Premiere

Batch-rename layers, export assets, pull metadata into a sheet, kick to Drive. Accessibility exposes layer lists, panels, export dialogs.

Finder-heavy data handling

Rename, sort, move, compress files based on content the agent just read from another app. This is half of bookkeeping and legal workflows and gets zero coverage from standard BPA.

Hybrid Chrome + desktop flows

Scrape an ISP site in your signed-in Chrome via Playwright MCP, paste rows into Numbers via accessibility. One agent, one log.

How a single run actually looks

Rough shape of what happens when you ask Fazm to "pull today's unpaid invoices out of Mail.app, enter them in QuickBooks Desktop, and file the PDFs by vendor". This is the level of glue that SaaS BPA cannot produce.

One agent, two automation surfaces

Invoice ingestion, step by step

Every message labeled AX in the diagram is a call into AXUIElementCopyAttributeValue or its siblings. Your SaaS BPA vendor cannot make those calls, because their agent is not running on your Mac.

See the MCP server yourself

Want to verify the anchor fact before you trust any of this? It is one terminal command. Run this on your Mac after installing Fazm.

verify mcp-server-macos-use is bundled

If you want to poke at the SDK itself, the source is public at github.com/mediar-ai/MacosUseSDK. The tool names the agent calls are macos-use_open_application_and_traverse, macos-use_click_and_traverse, and macos-use_type_and_traverse. Each returns the updated accessibility tree in the response so the LLM can see what changed.

100% native apps

“The accessibility API is the one stable contract across every mainstream macOS app. Screen readers rely on it. That makes it the only sane automation surface that covers both a modern Electron app and a 15-year-old Cocoa desktop client.”

From the MacosUseSDK README

What a good BPA benefit list should actually include

Works on apps that never had an API
Does not re-OCR the screen on every tick
Survives OS and app updates without template repair
Runs under your login, so private data does not leave the Mac
Reads the same tree VoiceOver reads, so coverage is near universal
One agent, both Chrome and native macOS, one audit log

Works with the same apps you already use

Mail

Numbers

Excel

QuickBooks Desktop

Photoshop

Figma desktop

Finder

Preview

Safari

Chrome

Notion

Slack

Zoom

Xero Desktop

Salesforce Mac

AutoCAD viewer

Any macOS app that exposes standard NSAccessibility or AXUIElement attributes works. Almost all of them do, because screen readers require it.

The short version

The standard "advantages of business process automation" list is correct but narrow. Traditional BPA automates the slice of your work that already lives in a web form. The bigger advantage, the one every other top guide ignores, is automating the desktop and native work that makes up the rest of the day. That only becomes possible when the agent reads the OS accessibility tree instead of screenshots, and that is the exact thing Fazm ships inside Contents/MacOS/mcp-server-macos-use. Everything else in the benefit list is a consequence of that one choice.

Run your first accessibility-driven workflow with a human on the call

Bring one process that currently ends in a desktop app. We will automate a round trip live and show you the audit log.

Book a call →

Frequently asked questions

Why do most business process automation guides feel the same?

Because almost every top result (HighGear, IBM, Microsoft Power Automate, NetSuite, Kissflow) scopes BPA to SaaS-to-SaaS orchestration: approvals, invoice routing, CRM updates, data pushed between web systems. They list the same six advantages (cost, accuracy, compliance, productivity, scale, standardization). None of them talk about the apps that never had an API: your accounting software, your CAD tool, your practice management system, your Finder, the legacy desktop client your team still runs. That is the half of the work that most BPA tools silently skip.

What is different about a local Mac agent like Fazm?

Fazm registers a Swift MCP server, bundled inside the app at Contents/MacOS/mcp-server-macos-use, that reads the macOS accessibility tree directly. It calls AXUIElementCreateApplication(pid) and AXUIElementCopyAttributeValue(appElement, kAXChildrenAttribute, ...) to pull a structured list of windows, buttons, text fields, and labels from any running app. No screenshot OCR, no pixel guessing, no brittle image matching. The LLM sees the same structure the OS gives a screen reader.

How do I verify the accessibility approach myself?

Open the Fazm app bundle: right-click Fazm in /Applications, Show Package Contents, then Contents/MacOS. You will see an executable named mcp-server-macos-use next to the Fazm binary. That is the Swift MCP server the ACP bridge registers at acp-bridge/src/index.ts line 1056 to 1063. If you want to read the source, the SDK it depends on, MacosUseSDK, is open source at github.com/mediar-ai/MacosUseSDK.

Why does screenshot-driven automation keep breaking?

Three reasons. First, pixel matching is fragile: a macOS version bump, a dark-mode toggle, a resolution change, or a minor UI redesign blows up every template. Second, models reading screenshots have to re-OCR every frame to know what a button says, which burns tokens and latency. Third, screenshots are lossy: invisible state (accessibility labels, ARIA roles, enabled/disabled flags, scroll offsets) disappears in a PNG. Reading the accessibility tree preserves all of that for free.

Does this really work on legacy desktop apps with no API?

Yes, as long as the app exposes standard NSAccessibility or AXUIElement attributes, which almost every macOS app does, because screen readers require it. That includes Excel, Word, Numbers, Photoshop, QuickBooks, Xero Desktop, AutoCAD viewers, Salesforce for Mac, Finder, Preview, Mail, Safari, Chrome, Figma desktop, Notion desktop, and basically every mainstream macOS app. If VoiceOver can announce a button, Fazm can click it.

What does Fazm do that a pure web RPA tool cannot?

A pure web RPA tool can fill forms inside Chrome. Fazm can also open Mail.app, search for a specific thread, pull an attached PDF into Preview, export it as text, switch to Numbers, paste it into a new sheet, trigger a Finder rename, then kick the file over to Google Drive. All in one workflow, all driven by the accessibility tree, without a single screenshot prompt. That is the advantage web-only BPA cannot match.

Is this safer than running browser scraping on a cloud server?

Generally yes, for two reasons. The automation runs on your Mac under your login, so data never leaves the machine unless a specific tool call sends it. And because the agent uses real macOS accessibility APIs instead of a headless browser fingerprint, logged-in sites do not flag the session as a bot. Turnstile and captchas pass because your Mac and your signed-in profile are the profile.

What are realistic use cases small businesses actually run?

Reading every unread invoice email, extracting line items, pasting them into QuickBooks Desktop, and filing the PDF in a Finder folder named after the vendor. Pulling the overnight orders from Shopify, opening the shipping label tool, printing labels, and updating the sheet. Batch-renaming product photos based on SKU. Comparing weekly bookkeeping totals between two accounting systems. Pulling status out of a legacy practice management client that has no API. All of these fail as pure web BPA; all of them work when the agent can read the accessibility tree of any app.

Keep reading

Guide

SmarterHome AI vs. a local Mac agent

How the data flow changes when a Mac agent drives your own Chrome through the serviceability checker instead of a call center asking for your address.

Read

Overview

macOS AI agent

The overview page for what an accessibility-native Mac agent is and is not.

Read

Catalog

Automate your Mac apps

Browse the growing catalog of Mac apps Fazm can drive out of the box.

Read