Field notes from a Mac-native agent
An agent that clicks the apps on your Mac, not a dashboard you have to learn
Most pages writing about AI agents for small business admin are listicles of SaaS web platforms. They cover Lindy, Zapier Agents, Athena, Smartsheet AI, Beam, Relevance, n8n. Every one of those is a browser dashboard fronted by connectors the vendor pre-built. None of them open QuickBooks Desktop, none of them touch a Numbers spreadsheet, none of them reply inside the real Mail app. Fazm is a consumer Mac app. It reads the live macOS accessibility tree of whatever window is in front of you and clicks the buttons a human would click, with seventeen task-shaped skills shipped inside the signed .app.
The split nobody else writes about
When a small business owner says they want an AI agent for admin, they are pointing at one of two very different things. The first is a cloud orchestration agent: it lives in a browser tab, it talks to Gmail and Stripe and HubSpot through OAuth connectors, and it shines when the work is moving JSON between SaaS systems. The second is a desktop agent: it lives on the Mac, it operates the apps already in the dock, and it shines when the work is the bookkeeping spreadsheet the accountant emailed, the QuickBooks desktop file, the Numbers sheet of unpaid invoices, the local PDF contract, the Mail inbox holding vendor receipts.
Almost every page already published on this topic conflates the two. The roundup-style guides list cloud orchestration tools and call the category done. The desktop half goes unmentioned, because most of the named tools cannot do it. Fazm is built for that half. The rest of this page is what that means in concrete terms: which files are bundled, which binary does the clicking, which apps the agent can operate out of the box.
“Forty to fifty turns into a session of QuickBooks Desktop reconciliation, the agent had clicked through three windows, edited eleven entries, and produced a clean PDF report. None of that uses a SaaS connector. None of it would be possible from a browser dashboard.”
Real workflow run by a one-person consulting practice on Fazm v2.4.1, April 23, 2026
The seventeen skills, marching past
Each one is a real .skill.md file at /Users/matthewdi/fazm/Desktop/Sources/BundledSkills/, baked into the app bundle at build time. The agent loads them by name on demand. None of them require a vendor account.
Want a new skill? The find-skills skill resolves missing capabilities at runtime by pulling from a registry. The existing seventeen are the ones that ship pre-loaded so the common cases work offline, on first launch, with no extra download.
One sentence, three Mac apps, no connector
What flows through Fazm when a small business owner asks for the kind of admin chain a SaaS dashboard cannot reach.
Owner: pull unpaid invoices and email reminders
None of those three Mac apps have a public API the agent is calling. The accessibility tree is what makes them operable. The skills are what make the operations domain-shaped.
Six concrete admin slots, mapped to skills
These are the requests Fazm hears most often from owners of ten-person and below operations. Each card names the underlying skill or binary so you can verify what runs on-device.
Receipts spreadsheet to categorized expense file
Owner drops a messy CSV from a card statement onto Fazm. The xlsx skill reads it, infers the columns, deduplicates, categorizes by vendor, totals by month, and writes a clean .xlsx your bookkeeper can ingest. The work happens against the file on disk, not against a SaaS dashboard, and the file does not leave the Mac unless the owner attaches it to an email.
Invoice draft in the local tax format
The docx skill produces a Word file with the layout your country requires. For a Chilean shop that is the SII digital stamp pattern; for a Spanish autónomo it is the IVA breakdown; the deep-research skill is what figures out the local rules on first use. The pdf skill flattens the result.
Vendor mail triage in the real Mail app
The macos-use binary reads the accessibility tree of Mail.app, picks the AXRow with the highest priority signal, and lets the language model draft a reply that the owner approves with one click. No browser, no IMAP relay, no copy-paste between Gmail web and a chat window.
Google Workspace setup that does not require an admin
The google-workspace-setup skill walks the owner through a one-time OAuth flow against their own Google Cloud project, saves credentials locally, and stops asking. The credentials never travel to a vendor server.
Weekly social posts from a one-line outline
The social-autoposter skill drafts the LinkedIn carousel, the Reddit thread reply, and the X post, then opens each platform inside Fazm's persistent browser profile to publish on the owner's signal. Each platform login lives on the Mac.
Ad hoc spreadsheet questions in plain English
Open Numbers or Excel, then ask Fazm to compute the unit margin by SKU or to flag the rows where invoiced amount is more than ten percent above quoted. The xlsx skill writes the formula, runs it, and labels the results so the owner can audit them.
Cloud orchestrator vs Mac-resident agent
Both approaches are useful. They are not substitutes; they cover different halves of small business admin. This is where Fazm fits and where it does not.
| Feature | Typical cloud agent platform | Fazm, Mac-resident agent |
|---|---|---|
| Where it runs | In a browser tab, on a SaaS host | On the Mac, as a signed .app |
| What it can operate | Only apps with a vendor-built connector | Any app open on the Mac, via accessibility tree |
| Files on disk | Uploaded to the vendor's cloud first | Read in place; do not leave the Mac unless asked |
| Setup | OAuth each tool, configure each integration | Drag .app, grant Accessibility, sign in once |
| Cost shape | Per-seat or per-run, usually with API markup | Consumer subscription |
| Works without internet | Whole platform is offline if the cloud is down | Skill code runs locally; only the model call needs network |
| Works with QuickBooks Desktop, Numbers, Notes | No, no connector exists | Yes, by accessibility tree |
| Adds new capabilities | Wait for the vendor to ship a new connector | find-skills skill pulls in extras at runtime |
The binary that does the clicking, by file and line
The reason any of this works on a Mac, with no SaaS connector, is one Swift binary that the agent calls as a local MCP server. Here is where it is wired in the codebase.
The binary is built as a universal arm64 plus x86_64 from github.com/mediar-ai/mcp-server-macos-use, signed with the same Developer ID that signs the .app, and dropped into Contents/MacOS at build time. Open any signed Fazm.app on a Mac, right-click, Show Package Contents, and the file is there.
Why this is the part competitors cannot replicate from a SaaS plane
A SaaS agent platform cannot ship a signed Mac binary that runs against your local accessibility tree. They would need a desktop app, code-signing infrastructure, and an architecture that pushes the work to the user's Mac instead of pulling data into a vendor cloud. The economic model behind every browser-resident agent platform is centralization. Fazm is the inverse: the expensive thing, the ad hoc click on a Mac app, runs where it costs the least to run, on the owner's hardware.
What a single chained admin run looks like
One sentence in the floating bar, three Mac apps touched, twenty unpaid invoices reconciled. Real timing measured on a one-person consulting practice on April 23, 2026 against Sonnet 4.6.
1. Owner opens Numbers and types a sentence into the floating bar
Hit Cmd+Shift+Space anywhere on the Mac. The bar floats over Numbers. The owner types: pull the unpaid invoices into a new sheet and email a reminder to each client.
2. Fazm reads the Numbers window via the accessibility tree
The macos-use binary calls AXUIElementCopyAttributeValue against the frontmost Numbers window. The result is roughly two kilobytes of structured text naming each row, column, and AXValue. The language model sees the data without a screenshot.
3. Skills get pulled in by name
The model picks the xlsx skill to write a filter formula, the docx skill to draft the reminder body, and the macos-use binary to open Mail and create a draft for each client. Each step is a tool call into a local skill file, not a network round trip to a SaaS connector.
4. Owner reviews, edits, sends
Fazm queues the drafts in Mail.app. The owner clicks Send on each one or asks Fazm to send all of them at once. The accessibility tree reads the send confirmations back to the model so the next turn can update the Numbers sheet with the sent timestamps.
5. The Numbers sheet updates and the owner moves on
The xlsx skill writes the timestamps back into the spreadsheet. Fazm logs the run for the owner's records. Total elapsed time on a stack of twenty unpaid invoices, measured against Sonnet 4.6 on April 22, 2026, is roughly two minutes.
The accessibility tree, raw
A redacted slice of what the language model sees when the macos-use binary returns the Numbers window holding the unpaid-invoice sheet. Roughly two kilobytes of structured text. Compare that to an eighty-kilobyte base64 PNG of the same window.
The model picks rows by stable AXRow identifier, not by pixel coordinate. If the owner reorders columns or resizes the window, the tree still names the same rows.
Ten admin tasks an owner can hand off on day one
Each line maps to a skill or two and the macos-use binary. Pick three to delegate first, see what it feels like, then push more onto the agent as trust accrues.
Day-one delegation list
- Clean a CSV of card-statement receipts into a categorized .xlsx
- Draft a Word invoice in the local tax format and export to PDF
- Open Mail.app, triage vendor messages, draft replies
- Pull the unpaid rows from a Numbers sheet and queue reminder emails
- Set up Google Workspace OAuth without an IT person
- Draft a slide deck from a Notes outline using the pptx skill
- Post the same announcement to LinkedIn, X, and Reddit
- Cut a thirty-second clip from a longer video using video-edit
- Run a quick competitor digest with deep-research
- Update a docx contract template with the latest pricing tier
The economics, on one screen
Numbers measured on a five-click Mail reply chain against Sonnet 4.6 on April 22, 2026. The accessibility-tree path is what makes a Mac agent fit a consumer subscription instead of a developer's API budget.
A heavy admin session of forty to fifty turns lands on roughly 0 input tokens per UI read instead of roughly 0 image-equivalent tokens per UI read. That difference is why a flat consumer price covers a small business workload at all.
Walk through your first three admin slots, on your own Mac
Bring an actual messy spreadsheet, an actual unpaid-invoice sheet, an actual stack of vendor emails. Fifteen minutes is enough to see whether the Mac-resident shape fits your admin work.
Frequently asked questions
What is actually different about Fazm compared to other AI agents pitched at small business admin?
Almost every option marketed as an AI agent for small business admin is a SaaS web platform. You log into a dashboard, you connect tools through OAuth, and the agent only operates the systems for which the vendor built a connector. That means it works for Gmail and HubSpot and Slack, and it doesn't work for the QuickBooks Mac client, the Excel file your bookkeeper sends weekly, the Numbers spreadsheet on your desktop, the Mail account you use for vendor receipts, or the local PDFs an accountant emailed you for review. Fazm runs on the Mac itself. It opens the apps already on your machine, reads what is on screen through the macOS accessibility API, and clicks the buttons a human would click. Seventeen specialized skills ship inside the signed .app, including ones for spreadsheets, Word documents, PDFs, slide decks, Google Workspace setup, and email. The list of files lives at /Users/matthewdi/fazm/Desktop/Sources/BundledSkills/ in the project repo and is bundled into the app at build time.
What does a small business owner actually use it for on day one?
The most common day one tasks reported by Fazm users from auto body shops, photography studios, ecom brands, and one-person consulting practices are: cleaning a messy spreadsheet of receipts into a categorized expense file (the bundled xlsx skill), drafting a Word invoice in the format the local tax authority requires (docx skill plus pdf skill), preparing a slide deck from a Notes outline (pptx skill), opening Mail and replying to a stack of vendor questions one by one (macos-use binary plus the natural-language tool calls), and setting up Google Workspace OAuth without copy-pasting credentials between three browser tabs (google-workspace-setup skill). All of those happen against apps already on the owner's Mac, not against a SaaS dashboard.
How does it operate the apps without taking screenshots all the time?
The binary that does the work is called mcp-server-macos-use. It is a Swift program built from github.com/mediar-ai/mcp-server-macos-use as a universal arm64 plus x86_64 binary, signed with the same Developer ID as the app, and dropped into Contents/MacOS of the .app bundle. When the language model needs to know what is on screen, the binary reads the macOS accessibility tree using the AXUIElementCopyAttributeValue C API. The result is structured text, roughly two kilobytes for a typical app window. The model picks an element by its stable accessibility identifier and the binary calls AXUIElementPerformAction to click it. There is no PNG in the chain unless the app is genuinely a canvas surface like Figma or a graphics-heavy web page, in which case Fazm falls back to a targeted screenshot through a separate Playwright server.
Does my small business need an IT person to set this up?
No. The setup is download a signed .dmg from fazm.ai, drag the app to Applications, sign in with Google or Apple, and grant Accessibility and Screen Recording permissions when prompted. The first launch checks the permission status with a real AXUIElementCopyAttributeValue call against the frontmost app, not a stale system flag, so the app can tell you whether the permission actually works rather than whether you clicked the right toggle in System Settings. The relevant code lives at /Users/matthewdi/fazm/Desktop/Sources/AppState.swift in the function testAccessibilityPermission at lines 433 to 463, with a Finder cross-check at 468 to 485 and an event-tap probe fallback at 490 to 504. The probe matters because macOS 26 Tahoe occasionally returns stale permission state from the standard API, and the event-tap probe checks the live TCC database instead.
What admin tasks are still a bad fit for an agent like this?
Three categories. First, anything that requires touching a payment rail. Fazm will draft an invoice, email it, file it, and reconcile it against a deposit notification, but it will not initiate a wire transfer, charge a card, or move funds between accounts. Second, anything where the underlying app has a generic accessibility tree, like a heavily customized Electron tool with everything labeled AXGenericElement. The fallback there is a targeted screenshot, which works but loses the speed and cost advantage. Third, anything that is genuinely a regulatory filing where the tax authority requires a human to attest. Fazm prepares the document, opens the portal, and walks you through the submission, but the click on the final attestation should be the owner's. Most other admin work, including spreadsheet cleanup, document drafting, calendar coordination, vendor email triage, and weekly reporting, is exactly what the bundled skills are shaped for.
How does the agent handle confidential data like client contracts and financials?
Two facts make this concrete. The skills run as local code on the Mac, not in a vendor cloud. When Fazm processes a spreadsheet, the file does not leave the machine; the model reads only the parts the skill chose to forward. When Fazm operates an app via the macos-use binary, the data crossing the wire to the language model is the structured accessibility tree of that app's window, which is text, never the file contents on disk unless the workflow specifically asks for them. The privacy language was tightened in version 2.3.2 on April 16, 2026, with a changelog entry that reads: tightened privacy language in onboarding and system prompts to accurately say local-first instead of nothing leaves your device. The honest framing is local-first: most of the work is on-device, and the parts that travel to the model are the parts the skill explicitly chose.
What are the seventeen bundled skills and where are they on disk?
The complete list at /Users/matthewdi/fazm/Desktop/Sources/BundledSkills/ as of v2.4.1: ai-browser-profile.skill.md, canvas-design.skill.md, deep-research.skill.md, doc-coauthoring.skill.md, docx.skill.md, find-skills.skill.md, frontend-design.skill.md, google-workspace-setup.skill.md, pdf.skill.md, pptx.skill.md, social-autoposter-setup.skill.md, social-autoposter.skill.md, telegram.skill.md, travel-planner.skill.md, video-edit.skill.md, web-scraping.skill.md, xlsx.skill.md. They get baked into the app bundle by the build pipeline and surface inside the agent as named tool capabilities. The names are not random; each one corresponds to a real category of admin work the average small business owner does in a given week.
How is this different from connecting Zapier or Make to a hosted AI agent?
Zapier and Make are integration buses. They wire up SaaS systems that already expose APIs. They are extraordinary for moving data between Gmail, Slack, Stripe, HubSpot, and the long tail of cloud apps. They cannot, by design, do anything to an app that runs only on the Mac. Fazm is the inverse: it is built for the long tail of apps that exist on a Mac and have no public API, including Numbers, Notes, Messages, the QuickBooks desktop client, ScreenFlow, Photos, the local Mail accounts that hold vendor receipts, and many more. The two coexist. Fazm users frequently chain a Zapier-style cloud workflow with a Fazm Mac workflow on the same task, because the cloud step covers the SaaS hop and the Fazm step covers the desktop hop.
What does the cost model feel like for a small business?
Fazm sells as a consumer subscription. The architectural choice that makes that price possible is the accessibility-tree path: a Mail reply walkthrough used to ship roughly two hundred forty kilobytes of base64 PNG to the language model across three turns, and on the tree path it is about six kilobytes of structured text for the same five clicks. That changed inside the codebase on March 27, 2026 with version 1.5.0, whose changelog includes the line: Screen capture and macOS automation now uses native accessibility APIs instead of browser screenshot. A small business owner running forty to fifty heavy turns in a session lands on a price that fits a consumer subscription rather than a developer's API budget.
Where do I verify the file paths and binary location?
Three places to check. The seventeen skill files live at /Users/matthewdi/fazm/Desktop/Sources/BundledSkills/ on the developer machine and inside the .app at runtime. The mcp-server-macos-use binary path is resolved at /Users/matthewdi/fazm/acp-bridge/src/index.ts line 63 with the expression join(contentsDir, MacOS, mcp-server-macos-use). It is registered as a built-in MCP server between lines 1056 and 1064 only if existsSync returns true. The five-name BUILTIN_MCP_NAMES set is at line 1266: fazm_tools, playwright, macos-use, whatsapp, google-workspace. The accessibility-permission probe lives in /Users/matthewdi/fazm/Desktop/Sources/AppState.swift between lines 431 and 504 and includes a Finder cross-check plus a CGEvent.tapCreate fallback for macOS 26 Tahoe.
Other Mac-native takes on the small business admin agent question
Related field notes
Automation for small business
What a small business owner can hand off to a Mac-native agent on day one, organized by the apps already on the dock.
Privacy-first small business automation
Why an agent that runs on the owner's Mac and reads only what Accessibility grants is a different privacy story than a SaaS dashboard.
Accessibility API AI agents vs screenshots
The byte-budget argument for reading the macOS accessibility tree instead of taking pixel screenshots, with measured numbers from the Fazm pipeline.