The AI desktop agent that watches itself so it never suggests work it is already doing
Every other page for this keyword describes reactive agents: you type a command, the agent clicks and types. Fazm ships something no SERP page describes. A second agent, a Gemini observer, watches a 60-minute rolling video buffer of your active window, runs an agentic loop with three introspection tools, and refuses to interrupt you with a suggestion that duplicates work the first agent is already performing.
The wire values, in one marquee
Every chip below is a constant, table, or verdict that appears verbatim in Desktop/Sources/GeminiAnalysisService.swift. Nothing invented. If a competitor's AI desktop agent does not have these values in its source, it is not running this loop.
Four numbers that bound the loop
1,181 is the total line count of the observer service. 120 is the chunk ceiling. 3600 is the trigger duration in seconds. 5 is the agentic turn cap. Four numbers are all you need to argue that this is a bounded, auditable loop, not a freewheeling agent.
The core idea
0 min of rolling video, 0 self-introspection tools
A reactive agent answers prompts. A proactive observer watches work that already happened and decides whether to offer to take any of it off the user's plate. The three introspection tools (query_database, read_dev_log, get_active_sessions) are what let it do that without duplicating work the reactive agent has already handled.
Reactive desktop AI agent vs. a desktop AI agent with a self-observing loop
The table below is a toggle between the two mental models. One is how every SERP result describes an AI desktop agent. The other is what Fazm's source actually ships, with the concrete file:line anchors in the right column.
Two definitions of 'AI desktop agent'
Waits for a prompt. Has no knowledge of prior suggestions. Has no knowledge of what the reactive agent is doing right now. Every task suggestion is a best guess from a single moment.
- Trigger: one user message
- No dedup against prior agent output
- No self-awareness of concurrent agent activity
- Same suggestion can surface twice in one hour
Anchor fact: six constants, twelve lines
The entire bound of the observer loop is a block of six constants at the top of GeminiAnalysisService.swift. The six constants at lines 67 through 78 plus maxTurns at line 377 (default arg, enforced in the agentic for-loop at line 760) define every boundary of the loop: the model, the buffer ceilings, the upload split, the cooldown, and the turn cap.
“private let targetDurationSeconds: TimeInterval = 3600”
Desktop/Sources/GeminiAnalysisService.swift:69
Six constants, one bento
Each card is a named constant in the public source. Together they define the observer as a bounded process. If any of them were missing, the loop would be unbounded, uncooperative, or unsafe to ship.
maxChunks = 120
Hard cap on buffered chunks. Oldest are dropped from disk first. Prevents unbounded growth if the 60-minute trigger can't fire.
targetDurationSeconds = 3600
Gate for triggerAnalysis(). bufferedDuration sums endTimestamp - startTimestamp per ChunkEntry.
inlineSizeLimit = 1_500_000
Boundary between inline base64 and Gemini File API upload. Above 1.5 MB, the service uploads and waits for processing via waitForProcessing.
retryCooldown = 300
Five minutes between failed analysis retries. Buffer is preserved on failure so no video is lost during cooldown.
maxTurns = 5
The agentic loop for the observer is bounded to five tool-call turns. Beyond that, the service logs exhausted turns and returns the last raw output.
model = gemini-pro-latest
The observer is not Claude, it is Gemini. The Fazm runtime separates the chat agent (your primary assistant) from the observer agent that watches for unmet work.
What happens to every chunk
A chunk is one short MP4 of whichever app window had focus when it was recorded. Six steps take it from SessionRecordingManager to the observer's final verdict. Every step lives in the public source.
SessionRecordingManager finalizes a chunk
A short video of the active app window lands on disk. The chunk ships with app name, window title and frame count in ActiveAppInfo, so the observer knows which app was focused during which seconds of the clip.
handleChunk copies it to Application Support/Fazm/gemini-analysis/chunks
The original temp file is deleted after the session recorder is done with it, so the service reads the bytes and writes them to a stable path named chunk_<index>_<epoch>.mp4 before the recorder cleans up.
Buffer index is persisted as JSON on every append
ChunkEntry[] is encoded to buffer-index.json via atomic write. This survives app restarts, crashes, and reboots, so 60 minutes of observation is not lost if the Mac goes to sleep mid-buffer.
Buffer is capped at 120 chunks, oldest dropped with its file
The maxChunks = 120 cap is a safety ceiling. If the buffer ever exceeds it (e.g., after a long offline run), the oldest ChunkEntry files are removed from disk and the array is trimmed from the front.
Once bufferedDuration >= 3600s, triggerAnalysis() fires
bufferedDuration is the sum of (endTimestamp - startTimestamp) across every entry. At or above one hour, analysis starts unless the last failure was less than retryCooldown = 300 seconds ago.
On failed analysis, the buffer is kept; on success, only analyzed chunks are cleared
Success removes the Set of analyzedURLs from the buffer and deletes those files from disk. New chunks that arrived during analysis are preserved. Failure sets lastFailedAnalysis = Date() and waits out the 5-minute cooldown before retry.
Anchor fact 2: the three self-introspection tools
Most desktop AI agents expose tools to call out to the world (browser, shell, file system). Fazm's observer has three tools pointed inward. They exist to give Gemini enough context to refuse to suggest work that is redundant with the first agent's recent activity.
“"name": "get_active_sessions" — Check if Fazm's AI agent is currently processing any tasks.”
Desktop/Sources/GeminiAnalysisService.swift:555-562
What each tool is for
Five roles split across three tools plus the mandatory pre-check SELECTs embedded in the prompt. Together they are what the observer uses to introspect before saying anything.
query_database(sql)
Gemini can run SELECT (or WITH) against fazm.db. The exposed tables are chat_messages, observer_activity, ai_user_profiles, and indexed_files. Nine SQL keywords are denied before the query reaches SQLite: DROP, ALTER, TRUNCATE, CREATE, ATTACH, DETACH, VACUUM, REINDEX, PRAGMA. Multi-statement is also blocked.
read_dev_log(lines)
Gemini can tail Fazm's dev log (max 200 lines, default 50) to see whether the agent is currently in the middle of an ACP session. This is how the observer distinguishes automated-looking activity from activity the agent is already performing.
get_active_sessions()
No args. Returns whether Fazm's ACP bridge has any live sessions and what tools it recently called. The observer uses this as a second-order signal before suggesting a task: if the agent is already running, most suggestions collapse to NO_TASK.
Mandatory pre-checks in the prompt template
Before Gemini makes a verdict, the prompt forces two SELECTs: observer_activity (last 10 gemini_analysis rows) and chat_messages (last 10 messages). If the proposed task is similar to anything in observer_activity, the prompt tells the model to return NO_TASK.
Verdict format is structured, not free-form
The prompt locks the response to three verdicts: NO_TASK, TASK_FOUND, UNCLEAR. Only TASK_FOUND triggers persistAndShowOverlay, which writes the row into observer_activity and displays the overlay above the floating bar.
Inputs, hub, outputs
Three signals go into the observer. One runtime decides the verdict. Three outputs leave. The diagram below is a one-glance view of how the observer interacts with the rest of Fazm.
Observer signal flow
Wire-level trace for one analysis turn
The sequence diagram below traces one 60-minute buffer going through the observer end to end. Three tool calls happen before the verdict, two of them are the mandatory dedup SELECTs.
One observer turn, end to end
Anchor fact 3: the mandatory dedup SELECTs
Dedup is not a post-processing filter. It is enforced in the prompt that the observer reads on every run. Before any verdict, the observer must run two SELECTs and prefer NO_TASK when it sees similar prior tasks. This is the exact text from the prompt template.
Anchor fact 4: the SQL guard
The observer's query_database tool does not trust the LLM. Nine SQL keywords are stripped before execution; the query must start with SELECT or WITH; multi-statement is rejected. The executor also auto-appends LIMIT if missing. This is the whole surface area Gemini gets against the user's local database.
The buffer lifecycle in code
The buffer is capped, persisted, and gate-kept by the six constants above. The 20 lines below are the heart of the trigger logic, lifted verbatim from lines 196 to 215. Everything else in the 1,181-line file is upload, API, parsing, and overlay plumbing.
Three greps anyone can run to verify the page
This guide is not marketing. Every claim is independently checkable. The terminal session below is the simplest way to verify the anchor facts against the public Fazm source tree.
Side by side
Nine rows, each one anchored to a specific constant, tool, or line range in the observer source. The left column is how every SERP page for this keyword defines an AI desktop agent. The right column is what the observer actually does.
| Feature | Reactive AI desktop agent | Fazm observer |
|---|---|---|
| Trigger | User types a command in a chat box | 60 minutes of buffered video duration (targetDurationSeconds = 3600) |
| Input signal | A single prompt string | Up to 120 video chunks of the active app window, each tagged with app name and window title |
| Self-awareness | None. The agent does not know it is already running | Observer calls read_dev_log + get_active_sessions to check if the agent is already doing the suggested work |
| Dedup | Usually none. Same suggestion can surface repeatedly | Mandatory SELECT from observer_activity LIMIT 10 before verdict. Similar tasks return NO_TASK |
| SQL surface | Full read/write access, or no database access at all | SELECT or WITH only, 9-keyword denylist, single-statement enforced at lines 607-617 |
| Failure handling | Retry immediately, often in a loop | 5-minute retryCooldown, buffer preserved on failure, cleaned only on success |
| Upload path | Often ships full screenshots to a server | Files above 1.5 MB use Gemini File API resumable upload; smaller ones go inline base64 |
| Output surface | Chat message in a window | One row in observer_activity + a Discovered Tasks overlay above the floating bar |
| Model | Whatever the chat provider uses | gemini-pro-latest, with agentic function calling, maxTurns = 5 |
Grep-verifiable anchor checklist
Twelve claims, each greppable in the public Fazm tree at the exact line ranges cited. If any item fails, this page is wrong and should be corrected. If all pass, the page is a code tour, not a product pitch.
Twelve grep-verifiable claims
- model = "gemini-pro-latest" is declared at GeminiAnalysisService.swift line 67
- maxChunks = 120 is declared at GeminiAnalysisService.swift line 68
- targetDurationSeconds: TimeInterval = 3600 is declared at GeminiAnalysisService.swift line 69
- inlineSizeLimit = 1_500_000 is declared at GeminiAnalysisService.swift line 71
- retryCooldown: TimeInterval = 300 is declared at GeminiAnalysisService.swift line 78
- maxTurns: Int = 5 is the default arg of callGenerateContentAgentic at line 750, enforced in the for-loop at line 760
- query_database, read_dev_log, get_active_sessions are declared in toolDeclarations at lines 533, 544, 555
- The blockedSQLKeywords array (9 keywords) is at lines 569-571
- The SELECT-or-WITH guard is at line 607
- The multi-statement guard splits on ';' and rejects >1 statement at lines 612-617
- TASK_FOUND verdicts are persisted to observer_activity and shown as an overlay at lines 273-275
- Uploaded Gemini File API files are deleted after each run at the end of runAnalysis
Want to watch the observer refuse a redundant suggestion live?
Book a walkthrough. We show the 60-minute buffer triggering, the three introspection tools firing against a real observer_activity table, and the NO_TASK verdict that follows when the agent is already doing the work.
Book a call →Frequently asked questions
What makes an AI desktop agent different from a chatbot with a Mac window?
A chatbot in a Mac window runs on a single trigger: the user types something. Fazm's desktop AI agent adds a second runtime that runs on a different trigger (60 minutes of buffered screen video, not a prompt), on a different model (gemini-pro-latest, not the chat model), with different tools (query_database, read_dev_log, get_active_sessions) that let it introspect the first agent before deciding whether to interrupt you. Verify at Desktop/Sources/GeminiAnalysisService.swift lines 67 through 78.
Why does the observer have read_dev_log and get_active_sessions as tools?
So the observer can refuse to suggest work the agent is already doing. The prompt at line 28 says: 'If you see terminal, IDE, or browser activity that looks automated (fast typing, command sequences, file edits happening rapidly), call read_dev_log or get_active_sessions FIRST to check whether Fazm's AI agent is already doing that work. Do NOT suggest automating something that is already being automated.' This is the most common false positive class, and the tools exist to eliminate it.
Why is the observer on Gemini and not the same model as the chat agent?
Gemini's multimodal File API can ingest up to 120 chunks of 1-minute MP4 video in one request. The alternative would be sending 120 frames as separate images, which is worse for temporal reasoning. The model constant at line 67 is gemini-pro-latest; the chat agent is a separate ACP-bridged model chosen by the user. Having a separate observer model is a deliberate split, not a convenience.
What does the observer actually query from the database?
Four tables, read-only, via SELECT or WITH only. Nine SQL keywords are denied (DROP, ALTER, TRUNCATE, CREATE, ATTACH, DETACH, VACUUM, REINDEX, PRAGMA) and multi-statement is blocked. The tables are chat_messages (recent conversation), observer_activity (past discovered tasks, for dedup), ai_user_profiles (the user's inferred profile), and indexed_files (a 500MB-filtered, depth-3 scan of Downloads, Documents, Desktop, Developer, Projects, Code, src, repos, Sites, and /Applications).
How does the agent avoid showing the same suggestion twice?
The prompt at lines 31-34 forces the observer to run SELECT content, status, createdAt FROM observer_activity WHERE type='gemini_analysis' ORDER BY createdAt DESC LIMIT 10 before it forms a verdict. If the proposed task is similar to anything in those 10 rows (same app + same category of work), the prompt requires NO_TASK. It is a dedup enforced at the LLM layer, not a post-hoc filter.
What happens when the Gemini call fails?
lastFailedAnalysis is set to Date(), the buffer is preserved intact, and no chunks are deleted from disk. On the next chunk arrival, the analyzer will see it is still inside the 300-second retryCooldown window and skip the retry. After cooldown, the next chunk that pushes bufferedDuration past 3600s will retry on the same buffer.
Why does Fazm split video chunks above 1.5 MB onto the Gemini File API instead of sending everything inline?
Gemini's generateContent endpoint has a hard inline size limit. The inlineSizeLimit = 1_500_000 constant at line 71 is the split point: smaller chunks are base64-encoded inline; larger ones go through the resumable upload flow and waitForProcessing polls until the file is ready. Uploaded files are deleted after each run so no residue remains on Google's servers.
Does the observer run in the cloud or on device?
The buffer, the chunk files, the buffer index JSON, and the SQL query execution all run on device. Only the MP4 chunks go to the Gemini API, and only for the turn the analysis covers. The uploaded files are deleted server-side after each run. The observer's verdict is written back to the local observer_activity table; the overlay UI reads it from the local SQLite file.
How do I turn the observer off?
handleChunk at line 148 checks UserDefaults.standard.object(forKey: "shortcut_screenObserverEnabled") and returns early if it is false. Toggling that setting disables chunk buffering immediately. Existing chunks are cleaned up by the orphan sweep on next init.
How many turns can the observer take in one analysis?
Five. The callGenerateContentAgentic function is declared with maxTurns: Int = 5 at line 750 and the for-loop iterates 1...maxTurns at line 760. Each turn can contain multiple tool calls in parallel. If the model does not emit a verdict within five turns, the service logs 'exhausted agentic turns' and returns null.
What is the overlap with 'Computer Use' style desktop agents?
Computer Use agents are reactive: a command comes in, they plan clicks and keystrokes, they execute. Fazm's desktop agent has a reactive half too (the chat agent), but it pairs it with a proactive observer whose job is to detect unsolicited work on the screen and surface it as a Discovered Task. Both halves share the same fazm.db SQLite file, which is how the observer can see what the reactive agent has been doing.
Is the observer's output private?
Only TASK_FOUND verdicts are persisted. The prompt enforces UNCLEAR as the return value when the video is ambiguous, and explicitly prefers UNCLEAR over hallucinated suggestions. PostHog tracks verdict, chunks_analyzed, tool_call_count, turns_used, and input/output tokens, but not the raw video. The raw chunk files never leave the local Application Support directory except during an active Gemini call, after which uploaded files are deleted.
Adjacent guides on the two halves of a Mac desktop AI agent: the reactive one and the observer.
Keep reading
Desktop AI agent, structural primitives
The two Mac primitives (borderless NSWindow at .floating + Carbon RegisterEventHotKey) that separate a real desktop agent from a menu bar app.
Accessibility API vs screenshots
Why the primary Fazm agent reads the AX tree instead of screenshots, and why the observer is the one exception that uses screen video.
macOS AI agent development guide
The AX permission flow, the screen recording permission flow, and how to test the observer from a terminal command.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.