Proactive AI Agents on the Desktop: Observe, Suggest, Act
Most AI assistants are reactive. You ask, they answer. A proactive agent runs on its own heartbeat. It notices things happening, pokes at its memory, decides a few of them are worth surfacing, and shows up in your notifications without being asked. The idea is exciting. The execution is full of subtle choices that decide whether you end up with a thoughtful collaborator or a noisy interruption machine. This guide covers the observe-suggest-act gradient for proactive agents and the specific limits they hit once they try to do their work on a desktop.
“Fazm uses real accessibility APIs instead of screenshots, so it interacts with any app on your Mac reliably and fast. Free to start, fully open source.”
fazm.ai
1. Why Proactive Agents Are Different
A reactive agent has one loop: user prompt in, response out. Its success is measured by whether it answered the question well. A proactive agent has no incoming prompt to anchor it. It has to decide when to run, what to look at, whether anything worth doing is going on, and whether to interrupt the human. That last part is where most of the interesting design sits.
The cost structure is different too. A reactive agent that you only call ten times a day is cheap. A proactive agent running a heartbeat every five minutes is making 300 decisions a day about whether to do anything. If each decision hits a large model, you have a real bill. If it does not, you need a cheaper first-pass filter.
The failure modes are different. A reactive agent that gives a bad answer wastes your time for a few seconds. A proactive agent that fires off a bad notification at 11 PM trains you to ignore it, which destroys the whole value proposition. It only takes a handful of false positives for a proactive agent to become something the user silences.
And the value is different. A reactive agent that is 10 percent better than the alternative wins. A proactive agent that interrupts you appropriately 85 percent of the time and inappropriately 15 percent can still be a net negative, because the 15 percent erodes trust faster than the 85 percent builds it.
2. The Observe-Suggest-Act Gradient
There is a useful gradient for thinking about how much authority a proactive agent should have. It runs from pure observation at one end to autonomous action at the other. Most production systems pick a position on this gradient and stay there, because each level has a different cost-of-mistake profile.
Observe. The agent runs its heartbeat, pulls in signals (browser history, file system changes, calendar events, messages), maintains its own memory, and logs what it notices. It never surfaces anything to the user. The output is an internal journal. This sounds useless but is often the first stage of building anything more ambitious. You are validating that the signal extraction works before you risk interrupting anyone.
Suggest low. The agent writes its observations to a place the user can browse on their own schedule: a daily digest email, a panel in a sidebar, a Notion page. It does not push. The user opts in to reading it. This captures a lot of the value with almost none of the interruption cost.
Suggest high. The agent posts a notification or opens a dialog. This is the first level where a mistake has a real cost. The standard for precision has to go up. A notification every five minutes is rude. A notification once a day for something useful is a feature.
Act with approval. The agent proposes an action and waits for a click to confirm. "I have drafted a reply to this email, approve?" "I have prepared the invoice in QuickBooks, submit?" This is the sweet spot for many business workflows because the human stays in the loop for anything with a real consequence, but the boring parts happen on their own.
Act autonomously. The agent does the thing without asking. This should be reserved for narrow, well-tested workflows where the downside of a mistake is tolerable. Archiving clearly-spam email, for example. Not filing an invoice.
A real proactive agent does not sit at one level for everything. It mixes. The same agent might observe your git commits silently, write a daily digest of your slack mentions, suggest a reply to a high-importance email, and file receipts from a trusted vendor autonomously. The gradient is a tool for thinking about each capability separately.
A desktop agent that can actually take the next step
Fazm can move from suggestion to action on your Mac, using accessibility APIs to interact with real apps.
Try Fazm Free3. Where Desktop System Access Becomes the Limit
The moment a proactive agent tries to move from observe to act, it runs into the same wall every desktop tool hits: the operating system is deliberately skeptical of automation. Both macOS and Windows have layers of permissions that gate what a third-party tool can see and do.
On macOS the relevant permissions include Accessibility (required to read the UI tree and send synthetic events to other apps), Automation (required to send AppleEvents to other apps), Screen Recording (required to take screenshots or read pixel data from other apps), Full Disk Access (required to read files outside the user's sandbox), and various app-specific entitlements (Mail, Calendar, Contacts, Reminders, Photos).
None of these are granted by default. Each requires an explicit user action. Users get a dialog. They click Allow or Deny. If they click Deny, the agent has no recourse. This is by design. Apple built these prompts as a backstop against malware, and they treat any desktop agent the same way.
For a proactive agent this has real implications. You want the agent to observe a wide range of signals. Each signal often requires its own permission. Asking the user to click Allow five times in a row on first launch is a terrible onboarding experience, and modern macOS now groups some of these prompts to reduce the friction, but it is still a wall.
The act side of the gradient is even more constrained. To interact with other apps, an agent usually needs Accessibility plus Automation. To open a file in an app it does not already have permission to, it may need additional approvals. A proactive agent that wants to autonomously archive an email and then open a related document in Preview has to have been granted access to both Mail and Preview ahead of time.
The sober implication is that truly autonomous proactive agents on desktop are going to operate within a zone of previously-granted permissions. You cannot design an agent that spontaneously extends its authority on the fly. You design one that works within the permissions a thoughtful user has agreed to.
4. Designing the Heartbeat
The heartbeat is the clock that makes the agent proactive instead of reactive. Designing it well is mostly about being conservative. A five-minute heartbeat generates 288 decision points per day. Most of those should be cheap no-ops.
Use a tiered model stack. The heartbeat itself runs a small local model, or even a simple rule engine, to decide whether anything worth escalating has happened. Only if the first-pass filter says "something interesting" does the agent engage a larger model. This is what keeps the monthly bill from getting out of hand.
Include memory read and write on every heartbeat. Update what the agent knows about the current state of the user's environment: which apps are open, what was the last significant event, what did the agent already decide about similar events today. Without this the agent re-decides from scratch every time, which is both expensive and error-prone.
Rate-limit notifications separately from the heartbeat. Just because the agent noticed something every five minutes does not mean it should be allowed to notify you every five minutes. A hard rule like "no more than three notifications per hour" or "only during working hours" is worth its weight.
Log everything. Every heartbeat, every decision, every notification or suppressed notification, every action. You will want this log when you review why the agent did or did not do something last Tuesday.
5. Current Approaches Including Fazm
The space of proactive agent tools is small and still forming. A few recognizable approaches:
Mobile-first personal assistants (things like Rewind, Granola, some of the newer always-listening projects). These are strong at observe and suggest. They tend to punt on act because mobile platforms are even more locked down than desktops.
Background task agents (the "AI employee" category). Strong at act, weaker at observe. They mostly operate on scheduled triggers (every hour, every day, every Monday at 9) rather than a truly reactive heartbeat.
Desktop agents that can be wrapped in a heartbeat loop. This is where Fazm and similar accessibility-API tools fit. On their own they are reactive automation engines. Paired with a scheduler or a separate observer agent, they become the act layer for a proactive system. The division of labor is often: a cheap model or rule engine handles the heartbeat and decides when to act; a desktop agent executes the action through accessibility APIs; a larger model handles unusual cases or edge-case reasoning.
IDE-native agents (Cursor's ambient modes, the various Claude Code harnesses). Strong for coding workflows. Limited to the IDE and whatever it can see in the filesystem. Not the right fit for a general proactive assistant.
Research projects with actual full-stack proactive behavior are rare. Most of what you see in launch videos is reactive automation dressed up with a trigger. That is not a criticism; it is the honest state of the field.
6. Pitfalls Worth Planning For
A short list of things that go wrong when people ship proactive agents:
Notification fatigue. Users silence anything that cries wolf more than once. Ship conservatively. Let users tune up, not down. Start with "notify me once a day" and let them ask for more.
Runaway cost. The heartbeat is in charge of spending. A bug in the decide step can keep escalating every single beat to the large model and run up a bill quickly. Hard cost caps and alerting are a must.
Permission creep. The agent asks for one more permission each week to do one more thing. Users notice. Build the permission surface honestly from the start and resist the temptation to quietly expand it.
Memory drift. The agent builds up a picture of the user that is no longer accurate. Periodic memory review (automatic or surfaced to the user) keeps it honest.
Failure silence. The agent does something wrong and does not tell you. For anything in the act tier, a visible audit should be available to the user on demand. "What did my agent do today?" should have a clean answer.
Proactive agents are one of the more exciting frontiers of personal software. The ones that work will be the ones that are honest about the observe-suggest-act gradient, honest about the system-access ceiling they work under, and honest about their own memory and error modes. The novelty will wear off. What will remain is whether the agent is still useful next month.
The act layer for a proactive agent on Mac
Fazm handles the execution side of proactive automation: clicking, typing, and navigating real apps through accessibility APIs.
Try Fazm FreeOpen source. Runs locally. Pairs well with any observe-suggest layer you already have.
Comments
Public and anonymous. No signup.
Loading…