Notes from a shipping open source macOS agent

AI agent for desktop tasks, and the recurring-task primitive most guides skip

Most pages on this topic describe one-shot demos: ask the agent to do something, watch it do the thing, take a screenshot, close the window. The part that decides whether the agent ends up on your dock or in your trash a week later is the part almost none of those pages mention: the primitive for saying do this again, every weekday at 9, forever, without making the user open a settings panel. This is how that primitive is built in one shipping open source macOS agent, file names and line numbers included.

Matthew Diakonov, Written with AI

Published May 22, 202611 min read

Direct answer (verified 2026-05-22)

An AI agent for desktop tasks is a program that turns natural-language instructions into actions on your computer through a tool palette (browser, accessibility tree, local SQL, file scan, screenshot, native app drivers) plus an LLM that picks which tool to call next. The interesting variant is one that also lets the agent itself save a task as a routine the daemon re-fires later with the same tool palette and a resumed session. Fazm is one such open source macOS app; the routines primitive is defined in acp-bridge/src/fazm-tools-stdio.ts at lines 512 to 574, the schedule grammar in acp-bridge/src/schedule.mjs at lines 14 to 86.

Source: github.com/m13v/fazm. MIT-licensed.

The category is wider than the demos suggest

When most people say AI agent for desktop tasks they mean the one-shot demo on a tech announcement page: a model reads the screen, clicks something, fills a form, files an expense, and the recording ends. That is the first useful capability, and most products stop there. The reason the category looks crowded but boring is that ten different products are all in the same shape: a chat box that can drive your browser once.

The agents that actually live on a dock past week three add one quiet primitive: they remember the task. Not just the conversation history. The task itself, as a thing the agent can call on its own to ask the daemon to run it again later. That primitive turns the agent from a clipboard for one-off LLM calls into something closer to a colleague who remembers what you asked for last Tuesday.

The rest of this page walks through that primitive in one shipping codebase, because the file paths are the part most other writeups gloss. The tool surfaces around it (browser, accessibility, Google Apps, WhatsApp, custom MCP) are also concrete, and listed by name lower down.

What the agent actually has on its tool belt

Before the routines part is interesting, the underlying tool palette has to be real. The seven MCP servers below are bundled with the app (declared in acp-bridge/src/index.ts as BUILTIN_MCP_NAMES); user-added servers are read from ~/.fazm/mcp-servers.json with the same shape Claude Code uses for mcpServers. So the agent that executes a routine has the same tool surface as the agent in the chat window. There is no second-class runtime for scheduled work.

acp-bridge/src/index.ts, BUILTIN_MCP_NAMES

Built-in MCP servers bundled with the app

fazm_tools

Local SQL, screenshots, file scan, permissions, browser profile, routines.

playwright

Drives the user's real Chrome via an extension; keeps logins and Turnstile state.

macos-use

Native macOS accessibility automation. Click and read in any app, not just the browser.

google-workspace

Docs, Sheets, Calendar, Gmail. Bundled Python server under Contents/Resources.

Sends and reads in the WhatsApp Catalyst app via accessibility, not Web.

browser-harness

Alternative managed Chromium with a persistent profile at ~/.fazm/browser-harness.

assrt

Optional QA testing tools (assrt_test / assrt_plan / assrt_diagnose). Gated by env var.

The macos-use server is the part most other AI agent for desktop tasks products skip. It reaches the apps that do not have a public API and do not run inside a browser: Numbers, Keynote, Reminders, Notes, system settings, the file manager. Working from the accessibility tree (not pixel-matching on a screenshot) is the difference between an agent that does what you wanted and an agent that clicked the cell next to it.

The routine primitive, end to end

A routine is one row in the local cron_jobs SQLite table. It has a name, a prompt (phrased as if the user typed it), a schedule string, optional model id and workspace, and a session mode. The agent itself creates that row by calling the routines_create tool. The user never opens a settings panel. They say what they want, in English, in chat.

acp-bridge/src/fazm-tools-stdio.ts (excerpt)

On fire, launchd is the thing that polls the table and spawns the headless runner, not the macOS app. So the routine fires even if the GUI is not open and the user has not signed in to the foreground session. The runner is acp-bridge/src/cron-runner.mjs, and the responsibilities are spelled out in the top-of-file docstring at lines 1 to 25: read the cron_jobs row, insert a running cron_runs row and a user chat_messages row (so the prompt shows up in conversation history while the agent is still working), spawn the bridge subprocess, send warmup then query, capture the first result line, write the assistant message and the ok status, recompute next_run_at.

What the sequence looks like in one screenshot

One user turn that creates a routine, one launchd fire that runs it, one chat turn the user sees the next morning. The actors are the ones in the real source.

From English to a daily run

The thing worth lingering on is that the same actor labelled Agent shows up twice. Once during the user turn (deciding which tool to call to schedule the routine), and once during the daemon fire (executing the prompt with the full tool palette). It is the same ACP bridge subprocess pattern in both places, just spawned by different parents.

The schedule grammar is small enough to remember

Three flavors, all parsed in acp-bridge/src/schedule.mjs. The agent translates English into one of them at create time, but the grammar is small enough that you can type it directly in a conversation and skip the translation. Standard cron rule: when both day-of-month and day-of-week are restricted, either match qualifies (see lines 79 to 83 of schedule.mjs).

// acp-bridge/src/schedule.mjs

"cron:0 9 * * 1-5"            // 5-field cron, weekdays at 09:00 local
"cron:*/15 * * * *"           // every 15 minutes
"cron:30 8 1 * *"             // 08:30 on the 1st of every month
"every:1800"                  // every 1800 seconds (30 min)
"every:86400"                 // once a day
"at:2026-06-01T17:00:00Z"     // one-shot, ISO 8601 UTC

On validate, computeNextRun is the function that decides whether a schedule will ever fire. If it returns null, validateSchedule (line 43) translates that to a human error: the agent then sees 'absolute timestamp is in the past' or 'schedule string failed to parse' as the tool result, and corrects in the next turn.

Five tools to manage routines, by name

The routines_* tool surface in fazm-tools-stdio.ts

routines_create (line 512) — create one. The tool description holds the schedule grammar; the model reads it on every cold session and learns the format.
routines_list (line 532) — list everything. SELECT from cron_jobs ordered by enabled DESC, COALESCE(next_run_at, 9e15) ASC, name ASC.
routines_update (line 537) — patch one. Pass only the fields you want to change; enabled=false pauses without deleting; passing a new schedule recomputes next_run_at on the same UPDATE.
routines_remove (line 555) — delete it permanently, including the cron_runs history rows.
routines_runs (line 564) — query the execution log. Filter by job_id; each row has status, output text, cost, tokens, duration, and started/finished timestamps. The agent uses this to answer 'why did my morning email routine fail'.

The reason routines_runs deserves its own slot is that the agent itself queries its own execution history. A user does not have to dig through a logs folder; they ask the agent and the agent reads the table.

New vs resume, and why the choice matters

session_mode is a single string column on cron_jobs. Two legal values: 'new' or 'resume'. The default is 'new' (set in routinesCreate at line 630). The difference between them is one of the more consequential choices in the whole primitive.

'new' means each run gets a fresh ACP session. The model wakes up with no memory of the previous run, sees only the prompt. The right shape for any task that is genuinely stateless: a daily summary of an external system, a polling check, a metrics roll-up. The benefit is that the model is not biased by yesterday's context, and the token cost per run is bounded by the prompt size.

'resume' carries the ACP session id forward, so the model is the same conversation it was on the previous run. The right shape for longer-running iterative work: a writing project the agent is continuing across days, a multi-step deal the agent is moving through stages. The tradeoff is that the conversation grows, which eventually runs into the session-rollover problem covered in our field notes on persistent session state, linked below.

The result of a run lands in chat, not in a logs folder

When a routine fires, the runner writes two rows into chat_messages stamped with taskId="routine-<id>": one for the prompt (sender 'user'), one for the agent's reply (sender 'assistant'). So the run appears in the main chat history as a fresh turn, not in a separate panel the user has to remember to check. The user opens the app in the morning and the digest is right there, in the same place they ask the agent questions during the day.

The execution metadata (status, output text, cost, tokens, duration) also lands in cron_runs as a parallel row, queried by the routines_runs tool. The Routines tab in the Swift app, defined in Desktop/Sources/MainWindow/Pages/RoutinesSection.swift (header comment at lines 4 to 12), is a read view onto those two tables. It refreshes every 8 seconds, and the explicit comment in source notes that users create routines by talking to the agent, so the view is mostly read-only on purpose.

The full create-to-result path, in steps

From English prompt to tomorrow morning's digest

User asks in English

no settings panel

Agent picks schedule

cron / every / at

INSERT cron_jobs

validate first

launchd polls

while app is closed

cron-runner.mjs fires

spawns ACP bridge

Result written to chat

taskId=routine-<id>

The whole path is six concrete pieces, none of them magic. The implementation that maps to each box is in a file you can open. The part that makes it feel different from a wrapper around cron is the first two boxes: the user does not pick the schedule, the agent does, and the agent does it inside the same chat where the user asked. The user never thinks in cron syntax even though the row stored is a cron expression.

MIT

“Fazm wraps Claude Code (and Codex) via ACP in a native macOS UI. Persistent sessions survive a restart, one-click chat forking opens a new window with full prior context, no auto-compacting. Reaches beyond the terminal via accessibility APIs into the browser, native Mac apps, and Google Workspace. Free to start, fully open source, runs locally.”

github.com/m13v/fazm

Where this fits next to cron, Shortcuts, and Zapier

Cron and a shell script fix the action at write time. The shell script you wrote is the action that runs forever. If the underlying system changes (a column renamed, a button moved, an email layout tweaked), the script breaks. A routine fixes only the prompt and the schedule. The action is whatever the model decides to do at fire time, given the same tool palette it had during the original conversation. That flexibility costs you: you trust the model on every run, which is why routines write each result back into the visible chat history rather than executing silently.

Apple Shortcuts is a node graph you build at design time. Great for deterministic flows (plug in headphones, lower volume to 50%) and for things that need to be fast and offline. Bad for anything that requires reading natural-language content and deciding what matters inside it. The model is what makes the difference; Shortcuts does not have one.

Zapier is the same node-graph model at internet scale, with first-party integrations to a thousand SaaS apps. If the integration you need is in their catalog and the logic you need is a chain of fixed steps, Zapier wins on reliability and observability. Where a desktop agent routine wins is when the trigger or the result lives on your computer (not in a SaaS API), when the step you care about involves a model deciding what to do based on the content it sees, or when the app you want to drive does not have a public API at all (Numbers, Keynote, Notes, system settings, that 2009-era internal tool your finance team still uses).

Tasks that fit this shape

The pattern is the same in every case: the work is repetitive enough that you do not want to ask for it every day, the input changes enough that a static script would break, the output is short enough that you want it as a chat message.

· Every weekday at 9, look at my inbox, group the unread threads by counterparty, and write me a summary I can read in two minutes.
· Every hour, check whether the build went red on main, and if so, post the failing test and the last commit into chat.
· Every Monday morning, open the metrics dashboard in Chrome, screenshot the past week, and write a paragraph on what changed.
· Every 30 minutes during the workday, look at the Notes file my partner shares with me, and tell me if anything new was added.
· Every Friday at 5, walk my Google Calendar for next week, block focus time around the meetings that survived the cull, and tell me which ones I could probably skip.

The list is not exhaustive on purpose. The interesting thing about the routine primitive is that the user is the one inventing the tasks, in their own words, and the agent is the one figuring out which tools to call to do each one. A list this short under-sells the shape.

What it does not do

A routine is not a deterministic graph. The model can decide differently between runs. If the goal is exact-same-actions every time, write a shell script or a Shortcut. The model is the flexibility; remove it and you have removed the reason the primitive exists.

A routine is not a SaaS workflow tool with a multi-tenant audit log and approval gates. The audit trail is the chat history and the cron_runs table, both local to the user's machine. Good for individuals and small teams; not the right shape for finance ops that need a controller to sign off on every run.

A routine cannot run on a different machine than the one that created it. The whole point is that the agent has access to your desktop, your browser session, your local files, and your accessibility tree. Move the routine to a server and you have given up the things that made it useful.

Building an AI agent for desktop tasks and stuck on the recurring-run part?

Twenty minutes on the routine primitive, the schedule grammar, and the launchd setup that took longer than it should have.

Frequently asked questions

Short version: what is an AI agent for desktop tasks?

A program that turns natural-language instructions into actions on your computer through a tool palette (browser control, accessibility tree reads, SQL on a local database, file scans, screenshots, app drivers) plus an LLM that decides which tool to call next. The smallest interesting variant is a chat box that can take one shot at one task. The version that earns a place on your dock is one that also lets you say 'do this again every weekday at 9' and keeps the same tool palette and session memory across runs. The literal answer Fazm uses for that is its routines primitive, defined in acp-bridge/src/fazm-tools-stdio.ts at lines 512 to 574 (five routines_* tools) and acp-bridge/src/schedule.mjs at lines 14 to 86 (the schedule grammar).

What kinds of desktop tasks does it actually do on a Mac?

On Fazm specifically: read or click in any app via macOS accessibility APIs (the macos-use MCP server), drive the user's real Chrome with playwright via a browser extension (preserves logins and Cloudflare/Turnstile state), edit Google Docs / Sheets / Calendar / Gmail, send WhatsApp messages through the Catalyst app, run SELECT/INSERT/UPDATE/DELETE on the local fazm.db, capture a screenshot of the screen or active window, scan ~/Downloads ~/Documents ~/Desktop ~/Developer ~/Projects /Applications, and ask the user a question with quick-reply buttons. All of those are tool names, not marketing labels. They live in acp-bridge/src/fazm-tools-stdio.ts (ALL_TOOLS array) and the built-in MCP server list is in acp-bridge/src/index.ts (the BUILTIN_MCP_NAMES set).

What is the 'routine' primitive and why does it matter?

A routine is a row in the local cron_jobs SQLite table: a name, a prompt phrased as if the user typed it, a schedule string, optional model id and workspace, and a session_mode of 'new' or 'resume'. The agent itself creates that row by calling its routines_create tool (acp-bridge/src/fazm-tools-stdio.ts, line 512), which means the user never opens a settings panel. They say 'every weekday at 9 send me my open Stripe disputes', the agent picks the schedule string ('cron:0 9 * * 1-5'), validates it with computeNextRun (schedule.mjs:14), and INSERTs the row. From then on, launchd polls the table, and on fire it spawns cron-runner.mjs which spawns the same ACP bridge subprocess, sends the prompt as a fresh turn, and writes the result back into chat_messages so it shows up in the user's conversation history.

What does the schedule string actually look like?

Three flavors, all parsed in acp-bridge/src/schedule.mjs. 'cron:0 9 * * 1-5' is a standard 5-field cron expression (minute hour day-of-month month day-of-week) parsed by nextCronFire at line 55, with the standard rule that when both day-of-month and day-of-week are restricted, either match qualifies. 'every:1800' is an interval in seconds. 'at:2026-04-30T18:00:00Z' is a one-shot absolute ISO 8601 timestamp; if it is in the past, validateSchedule (line 43) returns 'absolute timestamp is in the past'. The agent does the translation from English to the schema, but the schema is small enough that you can type it directly.

What is the difference between session_mode 'new' and 'resume'?

'new' creates a fresh ACP session for each run. The model wakes up with no memory of the prior run, sees only the prompt. Use this when the task is genuinely stateless (a daily summary of an external system, a polling check). 'resume' carries the ACP session id forward so the model is the same conversation it was on the previous run; you can iterate on a longer-running piece of work by re-running it. The flag is the session_mode column on cron_jobs and defaults to 'new' in routinesCreate (acp-bridge/src/fazm-tools-stdio.ts, line 630).

Does the agent need permission to do this stuff?

Yes, and the permission gates are tools the agent calls explicitly, not magic. check_permission_status (fazm-tools-stdio.ts:325) returns JSON for the five macOS permissions: screen_recording, microphone, notifications, accessibility, automation. request_permission (line 334) triggers the actual macOS system dialog. The reason these are tools the agent has is that the agent runs them during onboarding (also as tools, with ask_followup buttons), so the user grants permissions inside the same chat where they later ask for the work. If a permission is missing when a routine fires, the corresponding tool returns an error string and the agent writes that into the next chat turn.

Where does the result of a recurring run actually go?

Two places. First, into the chat_messages table as a row stamped with taskId='routine-<id>', which means it shows up in the user's main chat history as a fresh turn, with the prompt as the 'user' message and the agent's reply as the 'assistant' message. Second, into cron_runs as one row per execution, with status, output text, cost, token counts, duration, and started/finished timestamps. The routines_runs tool (fazm-tools-stdio.ts:564) queries cron_runs, so the agent can answer 'why did my morning email routine fail' by reading its own execution history. The Routines tab in the app (Desktop/Sources/MainWindow/Pages/RoutinesSection.swift) is essentially a read view onto those two tables, polled every 8 seconds.

How is this different from cron + a shell script?

Cron + a shell script fixes the action at write time. You decide what command to run, you write the command, the command runs. A routine fixes only the prompt and the schedule. The action is whatever the model decides to do at fire time, given the same tool palette it had during the first conversation. If the structure of the underlying data changes (Stripe adds a new column, the agent has to click a slightly different button this week, the email layout changed), the routine still works because the model re-reads the world and re-picks the tool calls. Cron + script breaks; the agent adapts. The cost of that flexibility is that you have to trust the model on every run, which is the reason routines write each result back into the visible chat history rather than executing silently.

How is this different from Apple Shortcuts or Zapier?

Shortcuts and Zapier are graphs you build at design time. Each node is a fixed action with fixed inputs. A routine is one prompt and a schedule, and the agent walks the tool palette at run time. The graph that runs is implicit and can be different across runs. The pragmatic difference: Shortcuts is great for 'when I plug in headphones, lower volume to 50%'; Zapier is great for 'when a Typeform fires, write to a Google Sheet'; an agent routine is the right shape for 'every weekday at 9, look at my inbox, decide which threads are about open disputes, summarize them by counterparty, and send me the digest'. That last one is hard to express as a graph because the decision of what to summarize lives inside the email body, which only the model can read.

Does it support custom MCP servers, or am I locked to the built-ins?

Custom is supported. The built-in set, declared in acp-bridge/src/index.ts as BUILTIN_MCP_NAMES, is: fazm_tools, playwright, macos-use, whatsapp, google-workspace, browser-harness, assrt. Anything else is read from ~/.fazm/mcp-servers.json, which uses the same format as Claude Code's mcpServers (name, command, args, env, enabled). Desktop/Sources/MCPServerManager.swift owns the load/save path. So you can plug in a Stripe MCP, a Slack MCP, a Notion MCP, or your own internal-tools MCP, and routines created after that can call them.

Where in the source can I read the actual implementation?

All MIT-licensed at github.com/m13v/fazm. The five routines_* tool declarations are in acp-bridge/src/fazm-tools-stdio.ts at lines 512 (routines_create), 532 (routines_list), 537 (routines_update), 555 (routines_remove), and 564 (routines_runs). The schedule grammar (cron, every, at) and the parser are in acp-bridge/src/schedule.mjs at lines 14 to 86. The headless run-one-job loop is in acp-bridge/src/cron-runner.mjs, see the top-of-file docstring at lines 1 to 25 for the exact responsibilities. The Swift UI that reads cron_jobs and cron_runs is Desktop/Sources/MainWindow/Pages/RoutinesSection.swift, with the explanation of the data flow in the header comment at lines 4 to 12. The built-in MCP server list is in acp-bridge/src/index.ts where BUILTIN_MCP_NAMES is defined.

Is this only for developers?

The audience that gets the most out of it is technical, because the bridge is real and the tool palette is real and you can extend it. But the routine primitive is a phrase-and-a-schedule interface, so a non-developer can absolutely create one by talking. The agent translates 'every Monday morning' into 'cron:0 9 * * 1', validates it, and writes the row. The reason developers stick around is that they realize they can plug in their own MCP server and have the same agent drive it.

Adjacent reading

How it works

AI desktop agent: the self-observing loop

The Gemini-powered observer that watches a 60-minute rolling buffer of your active window and refuses to suggest work the agent is already doing.

Read

Field notes

Agent persistent session state, the rollover trap

What actually breaks when a rate limit fires on a live conversation, and the small append-only chain that fixes it for shipping computer-use agents.

Read

Edge cases

AI desktop agent: edge cases that matter

Permissions denial mid-task, OAuth rotation under a running routine, and other failure modes you only see in week three.

Read

The category is wider than the demos suggest

What the agent actually has on its tool belt

Built-in MCP servers bundled with the app

The routine primitive, end to end

What the sequence looks like in one screenshot

The schedule grammar is small enough to remember

Five tools to manage routines, by name

New vs resume, and why the choice matters

The result of a run lands in chat, not in a logs folder

The full create-to-result path, in steps

Where this fits next to cron, Shortcuts, and Zapier

Tasks that fit this shape

What it does not do

Building an AI agent for desktop tasks and stuck on the recurring-run part?

Frequently asked questions

Adjacent reading

AI desktop agent: the self-observing loop

Agent persistent session state, the rollover trap

AI desktop agent: edge cases that matter

Comments (••)

Comments ()