Business process automation examplesGrouped by UI layer, not by industryLayer 3 covered with real Swift + TS source

Examples of business process automation, organized by the UI layer the target app lives in

Every other 2026 list of BPA examples groups them by department or by vendor. That hides the only dimension that decides whether you can actually build the thing: the UI layer the target app exposes. Map every example to one of three layers, API-native, browser-UI, and desktop-only, and suddenly the feasibility map stops lying. This guide spends most of its word count on the third layer, because that is where most real Mac small-business work runs, and it is where every other listicle goes quiet.

Matthew Diakonov, Written with AI

Published April 18, 202614 min read

Download Fazm for Mac

4.9from 200+

Six accessibility-tree primitives cover every Layer 3 example on this page

Tree lines look like [AXButton (button)] "Open" x:680 y:520 w:80 h:30 visible, dumped to /tmp/macos-use

mcp-server-macos-use binary wired at acp-bridge/src/index.ts line 1057 inside Fazm

BPA examples usually stop at Layer 1

The interesting work lives down in Layer 3

Layer 1: target has an API — covered by every listicle

Layer 2: target has a web UI — partly covered

Layer 3: target is a desktop app with no API — almost never covered

Accessibility-tree automation is the only honest answer for Layer 3

Six primitives cover every Layer 3 example you will ever see

0:00 / 0:05

Layer 3 primitives

Business Mac apps w/o public API (approx)

Desktop automation watchdog

Accessibility tree line overhead (tokens)

6 primitives

“The anchor fact this page is built around: every Layer 3 action on a Mac reduces to one of six primitives (open, click, type, press, scroll, refresh), each dumps a structured accessibility tree to /tmp/macos-use/ where every line has the format [AXButton (button)] "Open" x:680 y:520 w:80 h:30 visible, and the server that provides them is wired into the Fazm session as mcp-server-macos-use at acp-bridge/src/index.ts line 1057. Six tools. One dump format. That covers Numbers, Keynote, Preview, Mail, Finder, Outlook, QuickBooks Desktop, Figma desktop, Xcode, and the other forty-plus Mac business apps that ship no public API.”

mcp-server-macos-use llms.txt + acp-bridge/src/index.ts L1057

The three UI layers every BPA example actually sits in

Pick any BPA example from any 2026 listicle. It maps to exactly one of three layers, determined by what automation surface the target app exposes. The layer decides everything: which vendor can implement it, which tool surface it uses, which failure modes it hits. Department and industry are downstream of this, not upstream.

The decision tree for any BPA example

API or connector?

Layer 1. Use Zapier, Make, n8n, Workato, Power Automate Cloud, or direct SDKs. Rate limits and auth are the main failure modes. ~25% of daily business apps fit.

Web UI, no API?

Layer 2. Playwright, Selenium, browser-RPA. Cloudflare and Turnstile are the main failure modes, both evaporate when the automation attaches to a real logged-in Chrome. ~40-50%.

Accessibility tree?

Layer 3. Accessibility-API driver (macOS: AXUIElement via mcp-server-macos-use). Six primitives cover every action. Brittle only to raw Canvas/WebGL, rare in business software.

Pixel fallback

Last resort. Pixel matching and vision models. Brittle to theme, scale, and UI refreshes. Avoid when the tree is available; every Mac business app this page discusses exposes the tree.

Three layers, side by side

Each card is a feasibility claim, not a marketing claim. Every sentence is something you can check against a real app you already own.

Layer 1 — API-native

The target app ships a documented REST or GraphQL API, or a vendor-supported connector in Zapier, Make, n8n, Workato, Power Automate Cloud. This is where every '10 business process automation examples' article spends 100% of its word count: new Stripe charge triggers a Slack message, new Typeform submission writes a Google Sheet row, new Salesforce opportunity posts to #sales. High reliability, vendor-maintained connectors, rate limits as the main failure mode.

Layer 2 — Browser UI

The target is a web app with no public API, or with an API so incomplete that the web UI is the only realistic integration surface. Legacy internal admin tools, bank dashboards, state tax portals, municipal permit systems, supplier ordering portals, airline booking back-offices. Drivable through Playwright, Selenium, Puppeteer, browser-first RPA (Bardeen, Browserbase). Breaks on Cloudflare, Turnstile, and Google security gates unless the automation attaches to a real logged-in browser.

Layer 3 — Desktop-only, no API

The target is a native Mac or Windows app with no public automation API and often no browser equivalent. iWork (Numbers, Pages, Keynote), macOS Mail, Finder, Preview, QuickBooks Desktop, Figma desktop, Xcode, Adobe CC, many accounting and legal desktop tools. The only honest automation surface is the OS-level accessibility tree. This is where most 2026 BPA listicles end the conversation; it is where most real small-business daily work begins.

Fazm covers all three layers in one chat turn

One chat can scrape a bank portal in Chrome (Layer 2), parse the CSV in Python (Layer 1), open Numbers to categorize (Layer 3), export a PDF via Preview (Layer 3), and attach the PDF to a Mail draft (Layer 3) — all dispatched through a single model over a single MCP tool surface. The routing rules are in `Desktop/Sources/Chat/ChatPrompts.swift` line 60.

Six primitives cover Layer 3 end-to-end

`open_application_and_traverse`, `click_and_traverse`, `type_and_traverse`, `press_key_and_traverse`, `scroll_and_traverse`, `refresh_traversal`. Spec at `/Users/matthewdi/mcp-server-macos-use/llms.txt` lines 19 to 75. Every Layer 3 example on this page decomposes into a short sequence of these calls.

Tree format, not pixels

Each traversal writes a text file where every line is `[AXButton (button)] "Open" x:680 y:520 w:80 h:30 visible`. Clicks target elements by role + partial text, not by pixel. Theme changes, monitor scale, and UI shifts that would break OCR-based RPA do not invalidate role-based selectors.

Layer 3 is wider than most BPA articles admit

A partial list of Mac apps with no public automation API. Every one exposes an accessibility tree that can be traversed, clicked, typed into, and diffed. Every one is invisible to the Zapier/Make/Power Automate connector catalog.

NumbersKeynotePreviewmacOS MailOutlook for MacFinderSystem SettingsTextEditCalendarContactsQuickBooks DesktopFigma desktopSlack desktopXcodeTerminalNotesRemindersPhotosPagesSafari (for non-browser UI)Notion desktopDiscord desktopZoomiMovie

The anchor fact: what a Layer 3 action looks like on disk

Every `macos-use` action writes a plain-text accessibility dump to `/tmp/macos-use/`. Every line follows the same format. That format is the contract. A real dump from opening the Numbers app and scrolling to a row looks like the block below.

/tmp/macos-use/1713481500_open_application_and_traverse.txt (excerpt)

Role, label, coordinates, visibility. That is every piece of information the automation needs to click the right cell without a screenshot, without OCR, without a vision model. The same format covers every Mac app. The spec is documented in /Users/matthewdi/mcp-server-macos-use/llms.txt line 115.

Where Fazm wires the Layer 3 binary in

The binary is bundled inside the Fazm app at Contents/MacOS/mcp-server-macos-use. The ACP bridge registers it as an MCP server named macos-use, which is how the model calls it in chat as mcp__macos-use__*. Two short snippets from the real source.

acp-bridge/src/index.ts, lines 63 and 1056-1064

Desktop/Sources/Chat/ChatPrompts.swift, line 60 (tool routing rules)

Twelve desktop-only BPA examples the other listicles skip

Each of these lives mostly in Layer 3. Each involves at least one app with no public API, no Zapier connector, and no Power Automate Desktop activity. Each decomposes into a short sequence of the six accessibility-tree primitives. Nothing here is hypothetical; every target app ships on a regular Mac.

1. Statement reconciliation (bank PDF → Numbers → accountant email)

Download the monthly PDF from Chase/Mercury/Wise (Layer 2, Playwright inside real Chrome), open in Preview, extract the table, paste into Numbers, run a category lookup against a master sheet, export a categorized PDF, attach to a Mail draft to the accountant. Four Layer-3 apps, all without public APIs.

2. Quote → invoice in QuickBooks Desktop

Open an approved quote PDF, pull line items, open QuickBooks Desktop for Mac, click Customers → Create Invoice, paste each line item, save, print to PDF, email. QuickBooks Desktop has no web API; QuickBooks Online's API does not cover Desktop. Layer 3 only.

3. Slack desktop → Linear ticket

Watch for a message with a reaction emoji in a Slack channel. When it fires, read the thread, create a Linear ticket via the Linear API (Layer 1), and post back a link in the thread via the Slack desktop app's accessibility tree (Layer 3 — the bot doesn't have write-scope on the workspace, but the desktop app does).

4. New hire onboarding in Mac-only tools

Create the user in 1Password, add them to Calendar invites, add a Contacts entry with the right company tag, share a Finder folder via right-click → 'Share'. None of these apps have a multi-user admin API. The accessibility tree does.

5. Real-estate MLS listing photo batch

Open Photos, select 28 images matching a listing tag, right-click → Export → JPEG quality 0.9, rename to `<address>_<n>.jpg`, move to `~/Dropbox/<listing>/`, then open a property MLS portal in the browser and bulk-upload. Photos on Mac has no public export API; the accessibility tree drives the right-click menu.

6. Paralegal PDF redaction + renaming

Open a discovery PDF in Preview, navigate page by page, redact marked text with Tools → Annotate → Rectangle, save, rename from `scan-4392.pdf` to `2026-04-18_DiscoveryResponse_Smith-v-Jones.pdf`, file in the correct Finder folder. Preview has no scripting API beyond a few AppleScript surface commands that don't cover Annotate.

7. Expense report assembly in Numbers + Mail

Drag last month's receipts out of Mail into a Finder folder, open Numbers, paste a SUMIF-ready ledger, import receipt file paths as images, export the final PDF, then reply-all to the original expense-policy thread with the PDF attached. Six actions across Mail, Finder, Numbers, Preview.

8. Recurring Outlook for Mac triage

Outlook for Mac ignores most of the exchange-web-services integrations iPaaS tools rely on. Move messages matching a sender pattern into a folder via drag-and-drop in the desktop tree, mark-as-read, forward the summary to a team channel. Layer 3 drives it directly; iPaaS has to fall back to OWA with its own gaps.

9. Figma desktop → Linear spec

Open a Figma file, walk the layer tree of a selected frame, extract text and child component names, format into a Markdown spec, push via Linear API. The Figma REST API covers published files; unpublished drafts in the desktop app do not round-trip without this.

10. Cashier close-out in a legacy POS

Every small restaurant has a POS terminal with a macOS binary from 2014. Daily close-out is a menu walk (File → Reports → End of Day → Print → Save as PDF) that would take seven minutes if your eyes wandered. Accessibility tree executes it in under ten seconds; the PDF lands in the shared Drive folder for the bookkeeper.

11. Calendar conflict cleanup

Walk the Calendar desktop app's week view, detect overlapping blocks, open each event, reschedule the lower-priority one based on a rule table, send an update to attendees. Calendar's CalDAV API covers core fields; priority metadata stored in the desktop app's note field does not round-trip cleanly, so the desktop tree is the simpler path.

12. Weekly backup into a specific Finder folder

Open System Settings → iCloud → Drive → Manage, list folders over a size threshold, copy any matching a prefix into a local Finder folder, then open Disk Utility to verify space. System Settings exposes no programmable API for iCloud management; the accessibility tree is the only route.

Screenshot RPA vs accessibility-tree RPA

Before the accessibility-tree approach, the only way most RPA vendors could touch Layer 3 apps was pixel matching and OCR. It worked on a demo; it broke on the user's real machine the moment the theme changed, the user moved the window, or macOS shipped a subtle Mail redesign.

Pixel-OCR RPA vs accessibility-tree automation

Take a screenshot, run a template match or an OCR pass, guess a click coordinate, hope the window hasn't moved. Recording a workflow bakes in the screenshot reference. Retry logic is 'take another screenshot and try again'. Dark mode breaks it. A macOS point release with a new control shape breaks it. Localisation breaks it.

Template matches break on theme/scale/OS update
Every step re-ships a screenshot to the model (thousands of tokens)
No structured element identity; click targets are pixels
Hard to debug when a step misfires (no tree snapshot)

How one chat turn can hit all three layers

The diagram is the real data flow when a user asks Fazm to run a multi-layer workflow. Sources on the left converge on the ACP bridge (the Node subprocess spawned by the desktop app). The bridge dispatches to the right MCP server for each layer. Every step leaves an on-disk artefact for audit.

One chat turn, three layers, one tool router

Layer 3 safety: InputGuard + Escape-cancel + 30s watchdog

Running automation inside a user's real Mac session is unforgiving: a stray keystroke from the user mid-click can land in the wrong field. `mcp-server-macos-use` ships three guardrails out of the box. InputGuard blocks user keyboard/mouse input for the duration of each action. A 30 second watchdog guarantees you can never be locked out longer than that if the server crashes. Pressing Escape cancels the current action immediately and releases the guard. All three are documented in llms.txt lines 112 to 114.

A concrete Layer-3-heavy example, step by step

Example 1 from the grid: monthly reconciliation. Seven steps. Four distinct apps touched. Only the first step is in Chrome (Layer 2); every other step is desktop-only (Layer 3). This is the decomposition a Fazm session walks through when the user asks for "run last month's reconciliation".

1. Chrome, logged in, Playwright attaches via extension

`browser_navigate` to the bank dashboard tab that is already open and signed in. The Playwright MCP Bridge extension passes the CDP control through, so the automation runs inside the real session with cookies and 2FA already there.

2. Click Statements → April 2026 → Download PDF

Three `browser_click` calls targeting aria-labeled elements. The PDF lands in `~/Downloads/statement-apr-2026.pdf`.

3. Switch to Preview, walk the table by accessibility tree

`open_application_and_traverse` with `identifier: "Preview"`. Reads the table structure from the tree; each row comes back as `[AXCell (cell)] "2026-04-03 STRIPE PAYOUT $3,421.00" x:... y:... visible`.

4. Switch to Numbers, paste the rows

`open_application_and_traverse "Numbers"`, `click_and_traverse element: "Sheet 1"`, `type_and_traverse` per row. Six calls total for a 32-row statement. The tree refreshes after each call so the cursor position is never guessed.

5. Apply category lookup via VLOOKUP against the master sheet

`click_and_traverse element: "Category", text: "=VLOOKUP(...)", pressKey: "Return"`. The whole formula enters in one call because `click_and_traverse` accepts `text` and `pressKey` simultaneously. No drag, no double-click simulation.

6. Export categorized PDF, attach to a Mail draft

`press_key_and_traverse keyName: "p", modifierFlags: ["Command"]` opens Print; `click_and_traverse element: "PDF", pressKey: "Return"` saves to disk; `open_application_and_traverse "Mail"`; `click_and_traverse element: "New Message"`; `type_and_traverse` the subject and body; drag-and-drop is simulated by clicking the attach button.

7. Every action logged to /tmp/macos-use/ for audit

The `.txt` accessibility dump and the `.png` screenshot for each action sit in `/tmp/macos-use/`. Any post-hoc review can open them to see what the tree looked like at the moment of the action. That is not possible with pixel-OCR automation.

Where iPaaS / Zapier / Power Automate stops, and where accessibility-tree automation keeps going

Not a feature war. Seven concrete checks on the kinds of BPA examples this guide is organized around. The iPaaS column is the collective best case across Zapier, Make, n8n, Workato, Power Automate Cloud. The product column is Fazm using `mcp-server-macos-use` plus Playwright in real Chrome.

Feature	iPaaS / Zapier / Power Automate	Fazm + accessibility-tree automation
Reconciliation bookkeeper flow (8 apps, 4 without API)	Requires 3-4 vendor products stitched together	Full coverage in one chat turn
Reach into QuickBooks Desktop for Mac	No Zapier/Make/Power Automate connector	Yes, via accessibility tree
Reach into Numbers / Pages / Keynote	No; iPaaS uses Google Sheets as substitute	Yes, all three (no APIs exist)
Reach into native macOS Mail / Outlook for Mac	Only via Exchange Web Services; attachment gaps	Yes, full tree incl. attachments
Token cost per automation step	Thousands of tokens (screenshot) for vision RPA	Tens of tokens (tree line)
Reliability when macOS theme changes	Pixel-OCR RPA templates often break	Unchanged (role+label selectors)
Setup time to first automation	Studio install + bot runner + activity packages	Install app, grant AX permission, describe in chat

What a Fazm Layer 3 session looks like from the outside

Inside the chat thread you see natural-language updates. Inside /tmp/macos-use/ you see the real trace. Listing that directory after a reconciliation run looks like this.

ls -1t /tmp/macos-use/ | head -14

Why this reframing matters

The four numbers below are what makes the three-layer map actionable. Six primitives cover Layer 3 exhaustively. The long tail of Mac business apps with no API is large enough to matter. The safety defaults are small and concrete. The token cost of a tree step is low enough that you can stack dozens of them per minute without choking on the model bill.

0Layer 3 primitives (open/click/type/press/scroll/refresh)

0+Mac business apps with no public API

0sMax input lockout on crash (seconds)

~0Token overhead per tree step

Questions

Frequently asked questions

Why organize business process automation examples by UI layer instead of by industry?

Because industry grouping (sales, HR, finance, supply chain) is useful for picking motivation but useless for picking tooling. Two 'accounts payable' automations can look identical on paper and have completely different feasibility: one runs entirely in QuickBooks Online (API-native, any iPaaS works), one runs in QuickBooks Desktop for Mac (no public API, no Zapier connector, no UiPath activity pack that reliably reads the tree). The UI layer decides whether the automation can be built at all. The department it sits inside does not.

What are the three UI layers in this guide?

Layer 1 is API-native: the target app has a documented public API or a vendor-supported connector in Zapier, Make, n8n, Workato, or Power Automate Cloud. Layer 2 is browser-UI: the target is a web app with no public API, but you can drive its HTML through Playwright, Selenium, or a browser-first RPA like Bardeen. Layer 3 is desktop-only: the target is a native macOS or Windows app with no public API and often no web equivalent. Layer 3 is where most 2026 BPA listicles go quiet, and it is where most real small-business workflows actually run.

What is a real example of a Layer 3 process that no other BPA listicle mentions?

A reconciliation bookkeeper pulls a statement PDF from an online banking dashboard (Layer 2, browser), opens it in Preview, copies the transaction table into Numbers, applies a category lookup against a master sheet, then exports a PDF of the categorized report and attaches it to a Mail message to the accountant. Zero of those four apps (Preview, Numbers, Keynote, macOS Mail) ship a public automation API. All of them expose a full accessibility tree. The accessibility tree is what `mcp-server-macos-use` (the binary wired into Fazm at `acp-bridge/src/index.ts` line 1057) reads and drives. No screenshots, no OCR, no pixel guessing.

How exactly does accessibility-API automation work on these desktop apps?

Apple's AXUIElement API exposes every onscreen element as a tree node with a role (AXButton, AXTextField, AXMenuItem, AXRow, AXCell, AXStaticText), a label, x/y coordinates, width/height, and visibility. The `mcp-server-macos-use` Swift binary (version 0.1.17, BSL 1.1, source at github.com/mediar-ai/mcp-server-macos-use) traverses that tree on demand, writes a structured dump to `/tmp/macos-use/<timestamp>_<action>.txt`, and lets the model find elements by role and partial text match rather than by pixel coordinate. Each line has the format `[AXButton (button)] "Open" x:680 y:520 w:80 h:30 visible` (documented in `llms.txt` line 115). Input goes through CGEvent, indistinguishable from a real mouse or keyboard at the OS level.

What six primitives cover every Layer 3 example on this page?

Six tools: `open_application_and_traverse` (launch or focus an app, read its tree), `click_and_traverse` (click a button, then re-read the tree, optionally typing and pressing a key in the same call), `type_and_traverse` (type into the focused field), `press_key_and_traverse` (fire a key with optional modifiers), `scroll_and_traverse` (scroll by N lines at a coordinate), and `refresh_traversal` (re-read without acting). Every example in the section 'Twelve desktop-only BPA examples the other listicles skip' decomposes into a short sequence of these six calls. The spec is pinned in `/Users/matthewdi/mcp-server-macos-use/llms.txt` lines 19 to 75.

Why not screenshot-based RPA like pixel OCR or vision models?

Two failure modes. First, brittleness: a 1-pixel UI shift, a theme change, a language change, or a different monitor scale invalidates the template match. Accessibility trees are addressed by role and label, not by pixel location, so none of those invalidate the match. Second, token cost: returning a screenshot to an LLM for every step costs thousands of tokens. Returning a structured tree line (`[AXButton (button)] "Open" x:680 y:520 w:80 h:30 visible`) costs dozens. For a ten-step workflow, that is a 100x cost delta. The Fazm chat prompt routes intentionally: `macos-use` tools for desktop apps, Playwright for web pages, `capture_screenshot` only when the user literally asks 'what does this look like'. The routing rule is in `Desktop/Sources/Chat/ChatPrompts.swift` line 60.

How does the automation avoid stepping on the user while it runs?

Two mechanisms, both in `mcp-server-macos-use`. InputGuard blocks keyboard and mouse input from the user during the action so a stray click does not land in the middle of an automated click sequence. A 30-second watchdog prevents a permanent lockout if the server crashes mid-action, and pressing Escape cancels immediately. A floating overlay tells the user what is being driven, and the mouse cursor position is saved and restored around each action so the pointer does not visibly jump. All four are documented in `llms.txt` lines 112 to 114 of the same file.

Is there a cross-app handoff when an automated click in one app opens another?

Yes, and it is the single feature that makes multi-app BPA examples practical. When `click_and_traverse` causes a different app to become frontmost (a classic case: clicking a mailto: link opens Mail, clicking a PDF attachment opens Preview, clicking 'Open in Numbers' opens Numbers), the server detects the new frontmost process, re-reads the tree of the new app automatically, and returns the new tree. The caller never has to track process IDs by hand. Documented in `llms.txt` line 112 under 'Cross-app handoff'.

Does this mean Fazm is RPA?

Functionally yes, packaging no. The primitive set (click, type, press, scroll, read) matches every RPA vendor's primitive set. The packaging is a consumer Mac app: install the .app, grant accessibility permission once, then describe the workflow in chat. No activity library to learn, no studio canvas to wire, no orchestrator to deploy, no bot runner license to buy. The trade is expressiveness for reach: you give up the visual canvas of UiPath Studio and get a chat surface that covers every app on the Mac including the ones UiPath never built a connector for.

What is the honest answer about reliability versus API automation?

API automation is more reliable when an API exists. If Salesforce or Stripe or Linear exposes an endpoint, use the endpoint. Accessibility-API automation is strictly a fallback for apps that have no endpoint. Within that fallback, it is more reliable than pixel-based RPA because the tree is a structured contract the OS maintains, not an image the UI designer can accidentally break with a theme refresh. The failure modes are different: APIs fail on rate limits and auth; accessibility automation fails when an app renders its UI in Canvas or WebGL and skips the accessibility tree entirely (rare; some games, some chart-heavy analytics tools). For the long tail of business apps (Office for Mac, iWork, Mail, Finder, accounting desktop software, Figma desktop, Slack, Discord, Outlook, Adobe CC), the accessibility tree is populated and usable.

Can I verify the accessibility tree format on my own machine?

Yes. Grant accessibility permission to any AX-aware tool (Script Editor, Accessibility Inspector, or Fazm itself), trigger a traversal, and inspect `/tmp/macos-use/`. Every file there is plain text, one element per line, in the documented format. Run `rg -n AXButton /tmp/macos-use/*.txt` after a Fazm session and you will see dozens of matches. Same format documented in the public llms.txt at github.com/mediar-ai/mcp-server-macos-use.

Where does Fazm fit on this three-layer map?

Fazm is a consumer Mac app whose chat surface routes every request to the right layer. Web requests go to Playwright (attached to real Chrome via the Playwright MCP Bridge extension, ID `mmlmfjhmonkocbjadbfplnigmagldckm`). Desktop requests go to `mcp-server-macos-use`. API requests go to the skill for the relevant service (`telegram` skill uses the telethon API, `google-workspace` skill uses a bundled Python MCP, etc.). The routing rules are explicit in `Desktop/Sources/Chat/ChatPrompts.swift` starting at line 55 under the `<tools>` block. One session can use all three layers in a single chat turn.

Try the Layer 3 examples in an actual Mac chat

Install Fazm, grant accessibility permission, and ask it to do example #1 (monthly reconciliation), #6 (Preview PDF redaction), or #10 (legacy POS close-out). The six-primitive surface is the same for each. Nothing to configure, no studio to wire.

Download Fazm for Mac →