A business process automation tool that picks the process to automate by watching 60 minutes of your screen.
Every top SERP result for this keyword is a workflow designer (UiPath, Camunda, Appian, Power Platform, FlowForma, Nintex, Cflow). They assume you already know which process to automate and will model it in a BPMN canvas before the first run. Fazm inverts that. The screen observer records the focused window at 2 FPS, buffers 60 minutes of activity, and asks Gemini Pro to pick the single most impactful task to take off your plate. The prompt at GeminiAnalysisService.swift line 13 reads verbatim: “Your job is to identify the ONE most impactful task an AI agent could take off their plate.”
“The anchor fact no competitor page mentions. Fazm's screen observer hands Gemini Pro a prompt that opens with 'You are watching ~60 minutes of a user's screen recording. Each video clip captures the active window of whatever app the user was using at that moment. Your job is to identify the ONE most impactful task an AI agent could take off their plate.' That text is lines 12 to 14 of /Users/matthewdi/fazm/Desktop/Sources/GeminiAnalysisService.swift. The prompt then forces the model at line 30 to run two mandatory SQL queries against the user's local SQLite DB (observer_activity and chat_messages) before suggesting anything. The 60-minute threshold is enforced by 'private let targetDurationSeconds: TimeInterval = 3600' on line 69 of the same file. TASK_FOUND results land in the observer_activity table created by migration fazmV4 at AppDatabase.swift line 960. That entire pipeline, from screen chunks to a row in a local table, is what this page documents.”
/Users/matthewdi/fazm/Desktop/Sources/GeminiAnalysisService.swift:12-65 + AppDatabase.swift:960
The inversion, in one toggle
Traditional BPA tools ask “which process?” on day one and spend months getting to the first automated run. Fazm asks a different question: watch the user, then tell them.
From designer-first to observation-first
Business analyst gathers requirements. Process architect draws BPMN. Developers implement activities and exception branches. QA tests. Ops deploys. Support runs the bots. Weeks to first run.
- BPMN canvas, gateway nodes, compensating-transaction modeling
- Center of Excellence (CoE) team is table stakes
- Persistent workflow file lives in a platform tenant
- Admin role separate from the user who runs it
From 60 minutes of screen to one row in a local table
The pipeline is linear and auditable. Each arrow is a real function in the Fazm source, not a conceptual step. The hub is the prompt that forces one suggestion, or none.
How the observer turns screen activity into an automatable task
What makes the discovery loop safe
Four design choices in the prompt and the service that keep suggestion volume low and precision high. Suggestion spam is the thing that kills BPA tools that try this approach and do not bother.
60 minutes of buffered screen activity
GeminiAnalysisService.swift line 69 sets targetDurationSeconds = 3600. The observer recorder encodes the focused window at 2 FPS (SessionRecordingManager.swift line 77), buffers chunks locally to Application Support, and only fires analysis once total buffered duration crosses one hour.
One candidate task per hour, not ten
The prompt on line 13 reads: 'identify the ONE most impactful task an AI agent could take off their plate.' Gemini Pro is forced to pick a single high-impact candidate. Spam kills adoption. One per hour is the product-level rate limit baked into the prompt, not a config knob.
Checks you haven't already suggested it
Line 31 of the prompt mandates a SELECT against the user's local observer_activity table (LIMIT 10 most recent). Line 34 tells the model: 'Err on the side of NO_TASK when in doubt about similarity.' Same app + same category of work = dropped.
Verdict is one of three words
VERDICT: NO_TASK, TASK_FOUND, or UNCLEAR. Only TASK_FOUND persists to the observer_activity SQLite table (AppDatabase.swift line 960, migration fazmV4). UNCLEAR is first-class. The prompt says 'A wrong suggestion is worse than no suggestion' on line 38.
The prompt that runs the discovery
This is the shape of the thing no competitor page has. Not a workflow, not a BPMN shape library. A prompt that watches you work and picks one thing to take off your plate.
Where a Fazm “process” lives on disk
A workflow in UiPath is a .xaml. A flow in Power Automate is JSON in a Dataverse tenant. A process in Fazm is a row in this SQLite table, created by the fazmV4 migration and owned by the user on their own Mac.
How a TASK_FOUND verdict becomes a visible suggestion
After Gemini returns TASK_FOUND, persistAndShowOverlay writes the row and paints the AnalysisOverlayWindow below the floating bar. Everything after “VERDICT: TASK_FOUND” is a single function.
What the run looks like from outside the app
Simulated trace of one full discovery cycle, observed from a terminal watching the Fazm log and the fazm.db SQLite file. This is what a BPA “workflow” looks like when the workflow file is a database row.
What runs when the user clicks Run
The task is a sentence. The agent picks which of five bundled MCP servers handles each step. Parallel steps are just sequential tool calls batched in one turn.
macos-use MCP server, bundled
acp-bridge/src/index.ts line 1057 spawns the mcp-server-macos-use binary from Contents/MacOS. It reads AXUIElement trees from the macOS Accessibility APIs, the same machinery VoiceOver uses. Finder, Mail, Notes, Figma, Xcode, Slack, Notion, any Mac app.
Playwright MCP attached to your real Chrome
Not a headless Chromium downloaded to a cache. The bridge attaches over CDP through the Chrome Web Store extension (ID mmlmfjhmonkocbjadbfplnigmagldckm). Your cookies, 2FA state, and SSO sessions are live on turn one, so SOC 2 flows through Okta do not re-auth.
Google Workspace MCP, Python bundled
Gmail, Calendar, Drive, Docs, Sheets. acp-bridge/src/index.ts line 1076 spawns the Python server with PYTHONHOME pointing at Fazm.app/Contents/Resources/google-workspace-mcp/.venv. No pip install. No user-facing Python at all.
WhatsApp, Telegram, custom user servers
The whatsapp MCP is bundled (acp-bridge line 1052). Telegram is a bundled skill. Custom MCP servers declared at ~/.fazm/mcp-servers.json are loaded alongside on startup. One sentence can chain a browser_click, a Python call, and an AXPress in the same turn.
Versus every other business process automation tool on the first SERP page
Workflow-designer BPA is mature and well-suited to large, regulated processes with a CoE and a platform budget. Observation-first BPA is a different shape, suited to one person on one Mac who wants the repetitive thing gone.
| Feature | Typical workflow-designer BPA (UiPath, Camunda, Appian, Power Platform, FlowForma, Nintex) | Fazm (observation-first, Mac-native) |
|---|---|---|
| How the user tells the tool what to automate | The user opens a BPMN-style designer, drags shapes, wires triggers to actions, maps fields, handles exception branches, and publishes a workflow. Weeks of discovery workshops before the first automated run in most UiPath / Appian / Camunda deployments. | They don't. Fazm's screen observer analyzes 60 minutes of active-window recordings with Gemini Pro and surfaces the task itself in the Discovered Tasks tab. The user clicks 'Run' on a suggestion, not 'Build a workflow'. Source: GeminiAnalysisService.swift lines 12 to 65, DiscoveredTasksSection.swift. |
| Shape of the thing you author | A persistent workflow artifact: a .bpmn file, a Power Automate flow, a UiPath .xaml, a Nintex form. It lives in a central repository, has a version, and requires an admin to edit. | A single-sentence task description in English. The agent decides at runtime which MCP server handles which step. No pre-drawn diagram. No condition nodes. The turn is the workflow. |
| What it reads to decide what to click | RPA platforms (UiPath, Automation Anywhere, Blue Prism) use a mix of OCR, image templates, and UI Automation selectors. Screenshot-based agents (Anthropic Computer Use, browser-use variants) ask a vision model to guess x,y pixels, which breaks on the first redesign. | AXUIElement accessibility trees via the bundled macos-use MCP server for native Mac apps, and browser_snapshot() labelled DOM nodes via Playwright MCP for web apps. The agent sends text refs (ref=eN), never pixel coordinates. |
| What apps it works with | SaaS-only for most BPA suites (Zapier, Make, n8n). Desktop RPA (UiPath, Power Automate Desktop) works on Windows with heavy IT setup. Mac-native desktop BPA is essentially Fazm alone. | Any app on your Mac. Browser and native. Finder, Mail, Notes, Xcode, Figma, Notion, Slack, WhatsApp, Linear, Stripe, Shopify, internal admin panels, anything the AXUIElement tree exposes. Non-Electron apps included. |
| What the 'process' artifact looks like on disk | A versioned workflow file checked into a platform repository. Every change is a deploy. Every bug is a support ticket to the workflow admin team. | A row in the local SQLite observer_activity table with type='gemini_analysis' and a JSON content blob of {task, description, document, chunks_analyzed, tokens}. Created by GeminiAnalysisService.swift line 1156. Status moves from pending to shown to acted to dismissed. |
| Who operates it | A CoE (Center of Excellence) pattern is table stakes. Business analysts gather requirements, process architects model the flow, developers implement, QA tests, ops deploys, support runs the RPA bots on shared infrastructure. | One person, the end user on their own Mac. There is no workflow admin role. The observer writes a suggestion, the user clicks run, the agent runs it. The whole loop happens without a second human. |
| Where the data lives | Typically a cloud tenant: UiPath Orchestrator, Microsoft Power Platform environment, Appian cloud. Every run produces process telemetry for the platform. Some regulated industries cannot accept this tradeoff. | Local. observer_activity, chat_messages, and ai_user_profiles live in ~/Library/Application Support/Fazm/fazm.db. The 60-minute screen recordings are cached locally and uploaded to Gemini's File API only at analysis time. Nothing writes to a shared workflow server. |
| What runs when the agent fires | A BPMN engine executes sequential activities, each mapped to a connector or bot. Parallel steps need gateway nodes. Failure handling needs compensating-transaction modeling. | Five MCP servers spawned at app launch: fazm_tools, playwright, macos-use, whatsapp, google-workspace (acp-bridge/src/index.ts line 2550). The agent picks which one handles which step. All speak stdio, nothing listens on a port. |
The actual things that make it work
Every chip is a real symbol, constant, table, or file in the Fazm source tree, not a marketing phrase.
Six steps, from a blank screen observer to a finished automation
Steps one through four happen without the user authoring anything. Step five is the moment the discovered task appears in the main window. Step six is the run.
The observer records 60 minutes of your active window
SessionRecordingManager.swift line 77 configures the observer recorder at 2 FPS. It captures only the focused window, encodes 60-second chunks, and writes them to ~/Library/Caches/observer-recordings. The recording is local-only until analysis fires.
GeminiAnalysisService buffers and triggers
GeminiAnalysisService.swift line 213 checks whether the total buffered duration has crossed targetDurationSeconds (3600 s). Once it does, and there is no existing analysis running and no active failure cooldown, triggerAnalysis() runs.
Gemini Pro checks your local SQLite first
The agentic loop forces the model to run a SELECT against observer_activity (the last 10 discovered tasks) and chat_messages (the last 10 messages) before deciding. query_database(sql) is mediated by the service so only read-only SELECTs are allowed (index.ts line 534).
One sentence emerges, or the model returns UNCLEAR
The verdict is one of NO_TASK, TASK_FOUND, or UNCLEAR. Only TASK_FOUND writes to observer_activity. The prompt on line 38 tells the model 'A wrong suggestion is worse than no suggestion.' Quality over volume is enforced by the prompt, not by UX.
Discovered Tasks tab shows the candidate
DiscoveredTasksSection.swift polls observer_activity every 10 seconds (refreshTimer line 11). A wand-and-stars row appears in the main window. The user clicks 'Run' and the exact task becomes an agent turn, routed across the five bundled MCP servers.
The agent runs it against any Mac app
macos-use MCP handles native apps (Finder, Mail, Xcode, Notion, Slack). Playwright MCP --extension drives your real Chrome with your cookies intact. Python servers handle Gmail, Calendar, Drive. One user turn routes through whichever combination the task needs.
The boundary, point by point
Watching a user's screen to surface automation candidates is a bigger trust surface than a workflow canvas. These are the specific design choices that keep the observer loop contained.
What the screen observer does and does not touch on your Mac
- Observer recordings live in ~/Library/Caches/observer-recordings, local-only until analysis fires (SessionRecordingManager.swift line 69).
- The screen observer can be disabled in Settings. Line 168 of GeminiAnalysisService.swift short-circuits handleChunk when shortcut_screenObserverEnabled is false.
- Gemini File API upload happens only at analysis time, and only for the buffered clips that crossed the 60-minute threshold. Chunks are deleted after successful analysis (line 282).
- observer_activity and chat_messages live in ~/Library/Application Support/Fazm/fazm.db, an on-device SQLite file, not a cloud workflow tenant.
- The agentic loop only allows read-only SELECT queries against the local DB. query_database's description on index.ts line 534 says 'Only SELECT queries are allowed.'
- A failed analysis keeps the buffer intact and sets a 5-minute cooldown (retryCooldown = 300) so a broken API key does not spam Gemini (lines 78, 286).
- The FoundationModel-style 'wrong suggestion is worse than no suggestion' rule is in the prompt itself (line 38), not a post-hoc filter.
Business process automation that picks the process for you.
Fazm is a Mac app that watches 60 minutes of your work at 2 FPS, uses Gemini Pro to surface the single most impactful repetitive task, writes it to a local SQLite row, and runs it through the macOS Accessibility APIs against any Mac app. No BPMN canvas. No CoE. Free to download.
Download Fazm →See observation-first BPA running on a real Mac
Book 20 minutes and we'll walk through the discovery prompt, the observer_activity table, and one automation you pick live on the call.
Book a call →Business process automation tools, answered against the Fazm source
How is this different from UiPath, Appian, Power Automate, Camunda, or any other top business process automation tool?
Those tools assume the user already knows which process to automate. The user (or a process architect) opens a workflow designer, drags BPMN nodes, wires actions, maps fields, publishes the workflow, then runs it on the platform. Fazm inverts the order. The screen observer records 60 minutes of active-window activity at 2 FPS (SessionRecordingManager.swift line 77), uploads those chunks to Gemini Pro at the one-hour mark (GeminiAnalysisService.swift line 213), and the prompt instructs the model to pick the single most impactful task that is currently eating the user's time (line 13 of the same file). The task is saved to the local observer_activity SQLite table (AppDatabase.swift line 960) and surfaced in the Discovered Tasks tab. The user clicks 'Run,' not 'Build.' The artifact is a sentence, not a .bpmn file.
Does Fazm require a workflow designer or BPMN canvas?
No. There is no canvas, no nodes to wire, no gateways to model. The 'workflow' is a single-sentence task that the agent decodes at runtime. The agent decides per step which of the five bundled MCP servers (fazm_tools, playwright, macos-use, whatsapp, google-workspace; spawned at acp-bridge/src/index.ts line 2550) should handle the step. Parallel steps are just sequential tool calls the agent batches. There is no persistent workflow file to version or deploy. If you need a versioned workflow, Fazm is the wrong tool. If you want the thing done, Fazm is the shape built around that.
What does 'works with any app on your Mac, not just the browser' actually mean?
It means the bundled macos-use MCP server (acp-bridge/src/index.ts line 1057 spawns mcp-server-macos-use from Contents/MacOS) reads AXUIElement trees from the macOS Accessibility APIs, the same APIs VoiceOver uses. That gives the agent programmatic access to every button, text field, and role in every native Mac app, including Finder, Mail, Notes, Xcode, Figma, Slack, Notion, Linear, and any Catalyst or SwiftUI or AppKit app. The browser is one MCP server among five, not the whole product. A BPA task that touches a Finder rename, then a Slack DM, then a Stripe invoice download, then a Google Sheet update lands as a single user turn chaining all four servers in order.
What is the accessibility API advantage over screenshot or OCR based RPA?
Three concrete differences. First, the agent reads structured data: role, label, value, children, with a stable ref token per element, so it never has to guess pixel coordinates. Second, it survives app redesigns that would break OCR templates and vision models that memorized pixel regions. Third, it is fast. An AXUIElement tree is microseconds to read; a screenshot pass through a vision model is hundreds of milliseconds per step. For a bookkeeping automation with 40 clicks, that difference compounds into minutes per run.
What is the anchor line in the prompt that controls which task gets suggested?
Line 13 of /Users/matthewdi/fazm/Desktop/Sources/GeminiAnalysisService.swift reads verbatim: 'You are watching ~60 minutes of a user's screen recording. Each video clip captures the active window of whatever app the user was using at that moment. Your job is to identify the ONE most impactful task an AI agent could take off their plate.' The word ONE is load-bearing. One task per analysis, not a dashboard of ten. The model is then forced at line 30 to run two mandatory SQL queries (observer_activity and chat_messages) before proposing anything, and the prompt at line 34 instructs the model to 'err on the side of NO_TASK when in doubt about similarity' to prior suggestions.
Is the observer always recording, and where do the recordings go?
The observer runs while the app is open and the screen observer setting is on. handleChunk (GeminiAnalysisService.swift line 166) short-circuits if shortcut_screenObserverEnabled is false (line 168). Chunks are 60 seconds of H.265 video of the currently focused window, written to ~/Library/Caches/observer-recordings, and copied into ~/Library/Application Support/Fazm/gemini-analysis/chunks on finalization. They stay on disk until the 60-minute buffer threshold triggers analysis (line 213), at which point they are uploaded to the Gemini File API, the model picks a task, and the successfully-analyzed chunks are deleted (line 282). A failed analysis keeps the buffer intact and imposes a 5-minute retryCooldown (line 78).
Why return UNCLEAR as a first-class verdict instead of always suggesting something?
Because suggestion spam kills adoption. The prompt at line 38 says literally 'A wrong suggestion is worse than no suggestion.' Line 42 adds: 'Only flag a task if ALL of these are true' followed by six conditions, including 'you can clearly see what the user is doing' and 'the task is NOT already being handled by Fazm's agent'. A model that returns UNCLEAR fifty percent of the time is more useful than one that always flags something, because the one it does flag is rarely garbage. This is a product decision encoded in the prompt, not in retrieval or post-filtering.
What does the output actually look like in the Discovered Tasks tab?
A row with a wand-and-stars icon, the one-sentence task title, a status badge (pending, shown, acted, dismissed), and an expand caret. Expanding reveals the description (3 to 5 sentences), the markdown document write-up with 'What Was Observed', 'The Task', 'Why AI Can Help', and 'Recommended Approach' sections (GeminiAnalysisService.swift line 61), and a 'Run' button. DiscoveredTasksSection.swift polls observer_activity every 10 seconds (line 11) so a newly-analyzed task appears without a manual refresh. Status transitions from pending to shown the moment the row is expanded (markAsRead, line 88).
How does this compare to process mining tools like Celonis that also 'discover' processes?
Process mining tools extract process event logs from ERP systems (SAP, Oracle, Salesforce) and reconstruct the canonical path through the process from timestamps and transaction IDs. Fazm does something different and smaller: it looks at your screen, not your database, and identifies one candidate task that the AI could take off your plate, not the whole topology of a process. Celonis tells an enterprise 'your procure-to-pay has 14 deviation paths, here are the bottlenecks.' Fazm tells one person 'you just spent 40 minutes copying invoice numbers from Stripe into a Google Sheet; want me to do that?' Different altitude, different customer, different price tag. Complementary, not competing.
Can I ignore the discovered tasks and just type my own automation commands?
Yes. The discovered-tasks pipeline is one entry point into the agent; the floating bar is the other. Typing 'open Stripe, pull the last three invoices, rename by date, drop in Dropbox' into the floating bar kicks off the same agent turn the observer would have, except the user authored the task instead of Gemini. The observer exists so that users who would not have thought to ask still get a suggestion. It is an onboarding slope, not a gate.
What happens to the business process automation artifact when I quit the app?
It persists. observer_activity rows live in ~/Library/Application Support/Fazm/fazm.db, which is not cleared on quit. The buffered chunks that have not yet been analyzed are also persisted, to the buffer-index.json file at ~/Library/Application Support/Fazm/gemini-analysis/buffer-index.json (GeminiAnalysisService.swift line 138). On relaunch, restored entries that no longer have a file on disk are pruned (line 146). Orphaned .mp4 files are cleaned up (line 156). Buffer index survives a crash or a reboot, so a 45-minute recording does not become a 45-minute loss.
How many MCP servers does the agent have access to and what does each one do?
Five, spawned at app launch by acp-bridge/src/index.ts and logged on line 2550: fazm_tools (the Fazm HTTP tools for file indexing, user profile, memory graph), playwright (the Playwright MCP with --extension, attached to your real Chrome via the Chrome Web Store extension), macos-use (AXUIElement-based native Mac app control), whatsapp (the bundled WhatsApp MCP binary), and google-workspace (the bundled Python MCP for Gmail, Calendar, Drive, Docs, Sheets). Any custom server declared in ~/.fazm/mcp-servers.json is also loaded on startup (line 1104). The agent picks which server per step, so one user turn can chain several.
Is this a consumer product or an enterprise product?
Consumer-shaped: signed, notarized Mac app downloaded from fazm.ai, one-click install, no admin role, no CoE. Enterprise pilots run through cal.com/matt364/fazm-enterprise-demo where we walk through the macos-use MCP, the observer's local-only posture, and what an enterprise deployment would look like for a team where each person runs Fazm on their own Mac but shares MCP server configs through ~/.fazm/mcp-servers.json. The architecture is per-user, so adding seats is copy-paste; the workflow-admin role common in UiPath / Appian deployments does not exist in Fazm.
More on Mac-native automation and how the discovery loop fits
Neighboring guides
Accessibility API vs screenshot agents
Why Fazm reads labelled DOM and AXUIElement trees instead of asking a vision model to guess pixel coordinates from a screenshot.
AI tools for business process automation guide
The wider landscape: vertical SaaS AI, workflow platforms, and desktop AI agents, and where each one actually fits.
AI automation for small business, getting started
What the observation-first path looks like for a non-developer owner who just wants the repetitive thing gone.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.