Business process automationNo workflow designerSource-verified, April 20 2026

A business process automation tool that picks the process to automate by watching 60 minutes of your screen.

Every top SERP result for this keyword is a workflow designer (UiPath, Camunda, Appian, Power Platform, FlowForma, Nintex, Cflow). They assume you already know which process to automate and will model it in a BPMN canvas before the first run. Fazm inverts that. The screen observer records the focused window at 2 FPS, buffers 60 minutes of activity, and asks Gemini Pro to pick the single most impactful task to take off your plate. The prompt at GeminiAnalysisService.swift line 13 reads verbatim: “Your job is to identify the ONE most impactful task an AI agent could take off their plate.”

F
Fazm
13 min read
4.9from 200+
Screen observer buffers 60 min at 2 FPS before Gemini analysis fires (GeminiAnalysisService.swift line 69)
Tasks persist to local SQLite observer_activity table (AppDatabase.swift line 960)
Executes against any Mac app via bundled macos-use MCP server (acp-bridge line 1057)
0 min
Target duration of buffered screen activity before Gemini analysis fires (GeminiAnalysisService.swift:69)
0
Tasks per analysis. The prompt says 'identify the ONE most impactful task' (line 13)
0
MCP servers spawned at app launch: fazm_tools, playwright, macos-use, whatsapp, google-workspace
0
Workflow designer screens. There is no canvas. The observer writes the task; the agent runs it
1 task/hr

The anchor fact no competitor page mentions. Fazm's screen observer hands Gemini Pro a prompt that opens with 'You are watching ~60 minutes of a user's screen recording. Each video clip captures the active window of whatever app the user was using at that moment. Your job is to identify the ONE most impactful task an AI agent could take off their plate.' That text is lines 12 to 14 of /Users/matthewdi/fazm/Desktop/Sources/GeminiAnalysisService.swift. The prompt then forces the model at line 30 to run two mandatory SQL queries against the user's local SQLite DB (observer_activity and chat_messages) before suggesting anything. The 60-minute threshold is enforced by 'private let targetDurationSeconds: TimeInterval = 3600' on line 69 of the same file. TASK_FOUND results land in the observer_activity table created by migration fazmV4 at AppDatabase.swift line 960. That entire pipeline, from screen chunks to a row in a local table, is what this page documents.

/Users/matthewdi/fazm/Desktop/Sources/GeminiAnalysisService.swift:12-65 + AppDatabase.swift:960

The inversion, in one toggle

Traditional BPA tools ask “which process?” on day one and spend months getting to the first automated run. Fazm asks a different question: watch the user, then tell them.

From designer-first to observation-first

Business analyst gathers requirements. Process architect draws BPMN. Developers implement activities and exception branches. QA tests. Ops deploys. Support runs the bots. Weeks to first run.

  • BPMN canvas, gateway nodes, compensating-transaction modeling
  • Center of Excellence (CoE) team is table stakes
  • Persistent workflow file lives in a platform tenant
  • Admin role separate from the user who runs it

From 60 minutes of screen to one row in a local table

The pipeline is linear and auditable. Each arrow is a real function in the Fazm source, not a conceptual step. The hub is the prompt that forces one suggestion, or none.

How the observer turns screen activity into an automatable task

Focused window
60-second chunks
Buffer index JSON
User context
Gemini Pro analysis prompt
VERDICT: TASK_FOUND
VERDICT: NO_TASK
VERDICT: UNCLEAR
Discovered Tasks tab

What makes the discovery loop safe

Four design choices in the prompt and the service that keep suggestion volume low and precision high. Suggestion spam is the thing that kills BPA tools that try this approach and do not bother.

60 minutes of buffered screen activity

GeminiAnalysisService.swift line 69 sets targetDurationSeconds = 3600. The observer recorder encodes the focused window at 2 FPS (SessionRecordingManager.swift line 77), buffers chunks locally to Application Support, and only fires analysis once total buffered duration crosses one hour.

One candidate task per hour, not ten

The prompt on line 13 reads: 'identify the ONE most impactful task an AI agent could take off their plate.' Gemini Pro is forced to pick a single high-impact candidate. Spam kills adoption. One per hour is the product-level rate limit baked into the prompt, not a config knob.

Checks you haven't already suggested it

Line 31 of the prompt mandates a SELECT against the user's local observer_activity table (LIMIT 10 most recent). Line 34 tells the model: 'Err on the side of NO_TASK when in doubt about similarity.' Same app + same category of work = dropped.

Verdict is one of three words

VERDICT: NO_TASK, TASK_FOUND, or UNCLEAR. Only TASK_FOUND persists to the observer_activity SQLite table (AppDatabase.swift line 960, migration fazmV4). UNCLEAR is first-class. The prompt says 'A wrong suggestion is worse than no suggestion' on line 38.

The prompt that runs the discovery

This is the shape of the thing no competitor page has. Not a workflow, not a BPMN shape library. A prompt that watches you work and picks one thing to take off your plate.

Desktop/Sources/GeminiAnalysisService.swift (excerpt, real lines)

Where a Fazm “process” lives on disk

A workflow in UiPath is a .xaml. A flow in Power Automate is JSON in a Dataverse tenant. A process in Fazm is a row in this SQLite table, created by the fazmV4 migration and owned by the user on their own Mac.

Desktop/Sources/AppDatabase.swift (excerpt)

How a TASK_FOUND verdict becomes a visible suggestion

After Gemini returns TASK_FOUND, persistAndShowOverlay writes the row and paints the AnalysisOverlayWindow below the floating bar. Everything after “VERDICT: TASK_FOUND” is a single function.

Desktop/Sources/GeminiAnalysisService.swift (excerpt)
0 fpsObserver recorder frame rate, chosen to keep CPU cost low on always-on capture (SessionRecordingManager.swift:77)
0Hard cap on buffered chunk count, prevents disk runaway (GeminiAnalysisService.swift:68)
0 sCooldown after a failed analysis before retrying Gemini (retryCooldown line 78)
0Verdict values the analysis can return: NO_TASK, TASK_FOUND, UNCLEAR

What the run looks like from outside the app

Simulated trace of one full discovery cycle, observed from a terminal watching the Fazm log and the fazm.db SQLite file. This is what a BPA “workflow” looks like when the workflow file is a database row.

Fazm discovery + run, live

What runs when the user clicks Run

The task is a sentence. The agent picks which of five bundled MCP servers handles each step. Parallel steps are just sequential tool calls batched in one turn.

macos-use MCP server, bundled

acp-bridge/src/index.ts line 1057 spawns the mcp-server-macos-use binary from Contents/MacOS. It reads AXUIElement trees from the macOS Accessibility APIs, the same machinery VoiceOver uses. Finder, Mail, Notes, Figma, Xcode, Slack, Notion, any Mac app.

Playwright MCP attached to your real Chrome

Not a headless Chromium downloaded to a cache. The bridge attaches over CDP through the Chrome Web Store extension (ID mmlmfjhmonkocbjadbfplnigmagldckm). Your cookies, 2FA state, and SSO sessions are live on turn one, so SOC 2 flows through Okta do not re-auth.

Google Workspace MCP, Python bundled

Gmail, Calendar, Drive, Docs, Sheets. acp-bridge/src/index.ts line 1076 spawns the Python server with PYTHONHOME pointing at Fazm.app/Contents/Resources/google-workspace-mcp/.venv. No pip install. No user-facing Python at all.

WhatsApp, Telegram, custom user servers

The whatsapp MCP is bundled (acp-bridge line 1052). Telegram is a bundled skill. Custom MCP servers declared at ~/.fazm/mcp-servers.json are loaded alongside on startup. One sentence can chain a browser_click, a Python call, and an AXPress in the same turn.

Versus every other business process automation tool on the first SERP page

Workflow-designer BPA is mature and well-suited to large, regulated processes with a CoE and a platform budget. Observation-first BPA is a different shape, suited to one person on one Mac who wants the repetitive thing gone.

FeatureTypical workflow-designer BPA (UiPath, Camunda, Appian, Power Platform, FlowForma, Nintex)Fazm (observation-first, Mac-native)
How the user tells the tool what to automateThe user opens a BPMN-style designer, drags shapes, wires triggers to actions, maps fields, handles exception branches, and publishes a workflow. Weeks of discovery workshops before the first automated run in most UiPath / Appian / Camunda deployments.They don't. Fazm's screen observer analyzes 60 minutes of active-window recordings with Gemini Pro and surfaces the task itself in the Discovered Tasks tab. The user clicks 'Run' on a suggestion, not 'Build a workflow'. Source: GeminiAnalysisService.swift lines 12 to 65, DiscoveredTasksSection.swift.
Shape of the thing you authorA persistent workflow artifact: a .bpmn file, a Power Automate flow, a UiPath .xaml, a Nintex form. It lives in a central repository, has a version, and requires an admin to edit.A single-sentence task description in English. The agent decides at runtime which MCP server handles which step. No pre-drawn diagram. No condition nodes. The turn is the workflow.
What it reads to decide what to clickRPA platforms (UiPath, Automation Anywhere, Blue Prism) use a mix of OCR, image templates, and UI Automation selectors. Screenshot-based agents (Anthropic Computer Use, browser-use variants) ask a vision model to guess x,y pixels, which breaks on the first redesign.AXUIElement accessibility trees via the bundled macos-use MCP server for native Mac apps, and browser_snapshot() labelled DOM nodes via Playwright MCP for web apps. The agent sends text refs (ref=eN), never pixel coordinates.
What apps it works withSaaS-only for most BPA suites (Zapier, Make, n8n). Desktop RPA (UiPath, Power Automate Desktop) works on Windows with heavy IT setup. Mac-native desktop BPA is essentially Fazm alone.Any app on your Mac. Browser and native. Finder, Mail, Notes, Xcode, Figma, Notion, Slack, WhatsApp, Linear, Stripe, Shopify, internal admin panels, anything the AXUIElement tree exposes. Non-Electron apps included.
What the 'process' artifact looks like on diskA versioned workflow file checked into a platform repository. Every change is a deploy. Every bug is a support ticket to the workflow admin team.A row in the local SQLite observer_activity table with type='gemini_analysis' and a JSON content blob of {task, description, document, chunks_analyzed, tokens}. Created by GeminiAnalysisService.swift line 1156. Status moves from pending to shown to acted to dismissed.
Who operates itA CoE (Center of Excellence) pattern is table stakes. Business analysts gather requirements, process architects model the flow, developers implement, QA tests, ops deploys, support runs the RPA bots on shared infrastructure.One person, the end user on their own Mac. There is no workflow admin role. The observer writes a suggestion, the user clicks run, the agent runs it. The whole loop happens without a second human.
Where the data livesTypically a cloud tenant: UiPath Orchestrator, Microsoft Power Platform environment, Appian cloud. Every run produces process telemetry for the platform. Some regulated industries cannot accept this tradeoff.Local. observer_activity, chat_messages, and ai_user_profiles live in ~/Library/Application Support/Fazm/fazm.db. The 60-minute screen recordings are cached locally and uploaded to Gemini's File API only at analysis time. Nothing writes to a shared workflow server.
What runs when the agent firesA BPMN engine executes sequential activities, each mapped to a connector or bot. Parallel steps need gateway nodes. Failure handling needs compensating-transaction modeling.Five MCP servers spawned at app launch: fazm_tools, playwright, macos-use, whatsapp, google-workspace (acp-bridge/src/index.ts line 2550). The agent picks which one handles which step. All speak stdio, nothing listens on a port.

The actual things that make it work

Every chip is a real symbol, constant, table, or file in the Fazm source tree, not a marketing phrase.

Gemini Pro 2.5macOS Accessibility APIsAXUIElement treeobserver_activity tableGRDB SQLitePlaywright MCPChrome Web Store extensionMCP stdio transportSessionRecorder 2 FPS60-second chunkstargetDurationSeconds 3600maxChunks 120retryCooldown 300sVERDICT: NO_TASKVERDICT: TASK_FOUNDVERDICT: UNCLEARDiscoveredTasksSectionAnalysisOverlayWindow

Six steps, from a blank screen observer to a finished automation

Steps one through four happen without the user authoring anything. Step five is the moment the discovered task appears in the main window. Step six is the run.

1

The observer records 60 minutes of your active window

SessionRecordingManager.swift line 77 configures the observer recorder at 2 FPS. It captures only the focused window, encodes 60-second chunks, and writes them to ~/Library/Caches/observer-recordings. The recording is local-only until analysis fires.

2

GeminiAnalysisService buffers and triggers

GeminiAnalysisService.swift line 213 checks whether the total buffered duration has crossed targetDurationSeconds (3600 s). Once it does, and there is no existing analysis running and no active failure cooldown, triggerAnalysis() runs.

3

Gemini Pro checks your local SQLite first

The agentic loop forces the model to run a SELECT against observer_activity (the last 10 discovered tasks) and chat_messages (the last 10 messages) before deciding. query_database(sql) is mediated by the service so only read-only SELECTs are allowed (index.ts line 534).

4

One sentence emerges, or the model returns UNCLEAR

The verdict is one of NO_TASK, TASK_FOUND, or UNCLEAR. Only TASK_FOUND writes to observer_activity. The prompt on line 38 tells the model 'A wrong suggestion is worse than no suggestion.' Quality over volume is enforced by the prompt, not by UX.

5

Discovered Tasks tab shows the candidate

DiscoveredTasksSection.swift polls observer_activity every 10 seconds (refreshTimer line 11). A wand-and-stars row appears in the main window. The user clicks 'Run' and the exact task becomes an agent turn, routed across the five bundled MCP servers.

6

The agent runs it against any Mac app

macos-use MCP handles native apps (Finder, Mail, Xcode, Notion, Slack). Playwright MCP --extension drives your real Chrome with your cookies intact. Python servers handle Gmail, Calendar, Drive. One user turn routes through whichever combination the task needs.

The boundary, point by point

Watching a user's screen to surface automation candidates is a bigger trust surface than a workflow canvas. These are the specific design choices that keep the observer loop contained.

What the screen observer does and does not touch on your Mac

  • Observer recordings live in ~/Library/Caches/observer-recordings, local-only until analysis fires (SessionRecordingManager.swift line 69).
  • The screen observer can be disabled in Settings. Line 168 of GeminiAnalysisService.swift short-circuits handleChunk when shortcut_screenObserverEnabled is false.
  • Gemini File API upload happens only at analysis time, and only for the buffered clips that crossed the 60-minute threshold. Chunks are deleted after successful analysis (line 282).
  • observer_activity and chat_messages live in ~/Library/Application Support/Fazm/fazm.db, an on-device SQLite file, not a cloud workflow tenant.
  • The agentic loop only allows read-only SELECT queries against the local DB. query_database's description on index.ts line 534 says 'Only SELECT queries are allowed.'
  • A failed analysis keeps the buffer intact and sets a 5-minute cooldown (retryCooldown = 300) so a broken API key does not spam Gemini (lines 78, 286).
  • The FoundationModel-style 'wrong suggestion is worse than no suggestion' rule is in the prompt itself (line 38), not a post-hoc filter.

Business process automation that picks the process for you.

Fazm is a Mac app that watches 60 minutes of your work at 2 FPS, uses Gemini Pro to surface the single most impactful repetitive task, writes it to a local SQLite row, and runs it through the macOS Accessibility APIs against any Mac app. No BPMN canvas. No CoE. Free to download.

Download Fazm

See observation-first BPA running on a real Mac

Book 20 minutes and we'll walk through the discovery prompt, the observer_activity table, and one automation you pick live on the call.

Book a call

Business process automation tools, answered against the Fazm source

How is this different from UiPath, Appian, Power Automate, Camunda, or any other top business process automation tool?

Those tools assume the user already knows which process to automate. The user (or a process architect) opens a workflow designer, drags BPMN nodes, wires actions, maps fields, publishes the workflow, then runs it on the platform. Fazm inverts the order. The screen observer records 60 minutes of active-window activity at 2 FPS (SessionRecordingManager.swift line 77), uploads those chunks to Gemini Pro at the one-hour mark (GeminiAnalysisService.swift line 213), and the prompt instructs the model to pick the single most impactful task that is currently eating the user's time (line 13 of the same file). The task is saved to the local observer_activity SQLite table (AppDatabase.swift line 960) and surfaced in the Discovered Tasks tab. The user clicks 'Run,' not 'Build.' The artifact is a sentence, not a .bpmn file.

Does Fazm require a workflow designer or BPMN canvas?

No. There is no canvas, no nodes to wire, no gateways to model. The 'workflow' is a single-sentence task that the agent decodes at runtime. The agent decides per step which of the five bundled MCP servers (fazm_tools, playwright, macos-use, whatsapp, google-workspace; spawned at acp-bridge/src/index.ts line 2550) should handle the step. Parallel steps are just sequential tool calls the agent batches. There is no persistent workflow file to version or deploy. If you need a versioned workflow, Fazm is the wrong tool. If you want the thing done, Fazm is the shape built around that.

What does 'works with any app on your Mac, not just the browser' actually mean?

It means the bundled macos-use MCP server (acp-bridge/src/index.ts line 1057 spawns mcp-server-macos-use from Contents/MacOS) reads AXUIElement trees from the macOS Accessibility APIs, the same APIs VoiceOver uses. That gives the agent programmatic access to every button, text field, and role in every native Mac app, including Finder, Mail, Notes, Xcode, Figma, Slack, Notion, Linear, and any Catalyst or SwiftUI or AppKit app. The browser is one MCP server among five, not the whole product. A BPA task that touches a Finder rename, then a Slack DM, then a Stripe invoice download, then a Google Sheet update lands as a single user turn chaining all four servers in order.

What is the accessibility API advantage over screenshot or OCR based RPA?

Three concrete differences. First, the agent reads structured data: role, label, value, children, with a stable ref token per element, so it never has to guess pixel coordinates. Second, it survives app redesigns that would break OCR templates and vision models that memorized pixel regions. Third, it is fast. An AXUIElement tree is microseconds to read; a screenshot pass through a vision model is hundreds of milliseconds per step. For a bookkeeping automation with 40 clicks, that difference compounds into minutes per run.

What is the anchor line in the prompt that controls which task gets suggested?

Line 13 of /Users/matthewdi/fazm/Desktop/Sources/GeminiAnalysisService.swift reads verbatim: 'You are watching ~60 minutes of a user's screen recording. Each video clip captures the active window of whatever app the user was using at that moment. Your job is to identify the ONE most impactful task an AI agent could take off their plate.' The word ONE is load-bearing. One task per analysis, not a dashboard of ten. The model is then forced at line 30 to run two mandatory SQL queries (observer_activity and chat_messages) before proposing anything, and the prompt at line 34 instructs the model to 'err on the side of NO_TASK when in doubt about similarity' to prior suggestions.

Is the observer always recording, and where do the recordings go?

The observer runs while the app is open and the screen observer setting is on. handleChunk (GeminiAnalysisService.swift line 166) short-circuits if shortcut_screenObserverEnabled is false (line 168). Chunks are 60 seconds of H.265 video of the currently focused window, written to ~/Library/Caches/observer-recordings, and copied into ~/Library/Application Support/Fazm/gemini-analysis/chunks on finalization. They stay on disk until the 60-minute buffer threshold triggers analysis (line 213), at which point they are uploaded to the Gemini File API, the model picks a task, and the successfully-analyzed chunks are deleted (line 282). A failed analysis keeps the buffer intact and imposes a 5-minute retryCooldown (line 78).

Why return UNCLEAR as a first-class verdict instead of always suggesting something?

Because suggestion spam kills adoption. The prompt at line 38 says literally 'A wrong suggestion is worse than no suggestion.' Line 42 adds: 'Only flag a task if ALL of these are true' followed by six conditions, including 'you can clearly see what the user is doing' and 'the task is NOT already being handled by Fazm's agent'. A model that returns UNCLEAR fifty percent of the time is more useful than one that always flags something, because the one it does flag is rarely garbage. This is a product decision encoded in the prompt, not in retrieval or post-filtering.

What does the output actually look like in the Discovered Tasks tab?

A row with a wand-and-stars icon, the one-sentence task title, a status badge (pending, shown, acted, dismissed), and an expand caret. Expanding reveals the description (3 to 5 sentences), the markdown document write-up with 'What Was Observed', 'The Task', 'Why AI Can Help', and 'Recommended Approach' sections (GeminiAnalysisService.swift line 61), and a 'Run' button. DiscoveredTasksSection.swift polls observer_activity every 10 seconds (line 11) so a newly-analyzed task appears without a manual refresh. Status transitions from pending to shown the moment the row is expanded (markAsRead, line 88).

How does this compare to process mining tools like Celonis that also 'discover' processes?

Process mining tools extract process event logs from ERP systems (SAP, Oracle, Salesforce) and reconstruct the canonical path through the process from timestamps and transaction IDs. Fazm does something different and smaller: it looks at your screen, not your database, and identifies one candidate task that the AI could take off your plate, not the whole topology of a process. Celonis tells an enterprise 'your procure-to-pay has 14 deviation paths, here are the bottlenecks.' Fazm tells one person 'you just spent 40 minutes copying invoice numbers from Stripe into a Google Sheet; want me to do that?' Different altitude, different customer, different price tag. Complementary, not competing.

Can I ignore the discovered tasks and just type my own automation commands?

Yes. The discovered-tasks pipeline is one entry point into the agent; the floating bar is the other. Typing 'open Stripe, pull the last three invoices, rename by date, drop in Dropbox' into the floating bar kicks off the same agent turn the observer would have, except the user authored the task instead of Gemini. The observer exists so that users who would not have thought to ask still get a suggestion. It is an onboarding slope, not a gate.

What happens to the business process automation artifact when I quit the app?

It persists. observer_activity rows live in ~/Library/Application Support/Fazm/fazm.db, which is not cleared on quit. The buffered chunks that have not yet been analyzed are also persisted, to the buffer-index.json file at ~/Library/Application Support/Fazm/gemini-analysis/buffer-index.json (GeminiAnalysisService.swift line 138). On relaunch, restored entries that no longer have a file on disk are pruned (line 146). Orphaned .mp4 files are cleaned up (line 156). Buffer index survives a crash or a reboot, so a 45-minute recording does not become a 45-minute loss.

How many MCP servers does the agent have access to and what does each one do?

Five, spawned at app launch by acp-bridge/src/index.ts and logged on line 2550: fazm_tools (the Fazm HTTP tools for file indexing, user profile, memory graph), playwright (the Playwright MCP with --extension, attached to your real Chrome via the Chrome Web Store extension), macos-use (AXUIElement-based native Mac app control), whatsapp (the bundled WhatsApp MCP binary), and google-workspace (the bundled Python MCP for Gmail, Calendar, Drive, Docs, Sheets). Any custom server declared in ~/.fazm/mcp-servers.json is also loaded on startup (line 1104). The agent picks which server per step, so one user turn can chain several.

Is this a consumer product or an enterprise product?

Consumer-shaped: signed, notarized Mac app downloaded from fazm.ai, one-click install, no admin role, no CoE. Enterprise pilots run through cal.com/matt364/fazm-enterprise-demo where we walk through the macos-use MCP, the observer's local-only posture, and what an enterprise deployment would look like for a team where each person runs Fazm on their own Mac but shares MCP server configs through ~/.fazm/mcp-servers.json. The architecture is per-user, so adding seats is copy-paste; the workflow-admin role common in UiPath / Appian deployments does not exist in Fazm.

fazm.AI Computer Agent for macOS
© 2026 fazm. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.