What actually makes something a desktop AI agent on the Mac
Every SERP page describes a desktop AI agent by what it does: click buttons, read the screen, run workflows. None of them answer the structural question. Which two Mac primitives separate a desktop agent from a menu bar app or a browser extension? Fazm's answer is 22 lines of NSWindow init plus a 141 line Carbon hotkey manager, both in the public source.
The primitives, in one marquee
Every chip below is a specific Swift symbol that appears in the two files this page is anchored to. Nothing invented, nothing generic. If a competitor's desktop agent does not use the chips on this strip, it is probably a browser extension or a menu bar app in disguise.
The numbers that define the anchor
Four structural anchors, all of them greppable in the public Fazm tree. The 2,188 line number is the total size of FloatingControlBarWindow.swift. The 141 is the full GlobalShortcutManager.swift. The 0x46415A4D is the Carbon signature that spells FAZM in ASCII. The 99 is the line in FloatingControlBarWindow.swift where the two collectionBehavior flags are set on one line.
Two primitives, one desktop agent
0 lines of NSWindow init plus 0 lines of Carbon hotkey
That is the entire structural definition of a desktop AI agent on macOS. Everything after those 163 lines, the accessibility reads, the tool invocations, the streaming model output, is the agent runtime that runs inside the window. The window and the hotkey are what make it a desktop agent at all.
Desktop agent vs menu bar app vs browser extension
These three things get called desktop AI agents in marketing copy. Only one of them actually is. The toggle below contrasts what most consumer AI agents on the Mac ship (the thin version) with what the structural primitives let Fazm do (the real version).
Which one is actually a desktop agent
Locked to the status bar or the browser tab. Disappears the moment the user goes fullscreen. Does not follow across Desktop Spaces. Requires the host app or browser to have focus to receive keystrokes. Cannot draw on top of Keynote or Final Cut.
- No .fullScreenAuxiliary flag, invisible during fullscreen work
- No .canJoinAllSpaces flag, pinned to one Desktop Space
- No global hotkey, must be clicked to activate
- Scoped to a single window or a single browser tab
Anchor fact: the 22-line NSWindow init
This is the block that defines the Fazm floating bar as a desktop agent. It lives at FloatingControlBarWindow.swift lines 82 to 103, plus two overrides at lines 139 and 140. Every line is a specific decision that, if missing, degrades the product into something else.
“self.collectionBehavior = [.canJoinAllSpaces, .fullScreenAuxiliary]”
FloatingControlBarWindow.swift:99 (the single line that makes the bar a desktop agent)
Each NSWindow init line, decoded
Every non-obvious line in the init block is a deliberate choice that separates a desktop agent from every other kind of Mac UI.
.borderless styleMask
The default NSWindow style mask ships with a titlebar, traffic-light buttons, and an OS-drawn frame. A desktop agent needs none of that. The borderless style removes every chrome element so the bar renders as a free-floating pill that draws its own shape in SwiftUI.
isOpaque = false, backgroundColor = .clear, hasShadow = false
Opaque windows force the compositor to rasterize the whole rect. A transparent background lets the rounded-corner SwiftUI content draw without a rectangular bounding halo behind it. hasShadow = false kills the default AppKit drop shadow so the agent can draw its own.
level = .floating
NSWindow.Level.floating sits above .normal windows. A menu bar app uses .statusBar which hides when you hover the menu. A regular app uses .normal which disappears behind other windows. .floating is the level every desktop agent needs so it does not get buried when you switch apps.
collectionBehavior = [.canJoinAllSpaces, .fullScreenAuxiliary]
This is the line. .canJoinAllSpaces makes the window follow the user to every Desktop Space instead of staying pinned to one. .fullScreenAuxiliary makes it appear above fullscreen apps, which is the only way a desktop agent survives Keynote, Final Cut, or any borderless fullscreen game. Without these two flags the agent disappears the moment the user presses Cmd+Ctrl+F.
canBecomeKey / canBecomeMain override
Borderless windows are non-key by default. AppKit refuses to route keyboard input to them. Overriding canBecomeKey to true is how the floating bar accepts the user typing into the chat input without having to be the frontmost application. Without this override the user could see the bar but not type into it.
Anchor fact 2: the Carbon FAZM signature
GlobalShortcutManager.swift line 88 declares the hotkey signature as FourCharCode(0x46415A4D). Those four bytes spell FAZM in ASCII. Carbon uses the signature to namespace hotkey IDs per app. This is what lets three hotkeys with IDs 1, 2, and 3 coexist without colliding with other apps' hotkeys of the same ID. The whole manager is 141 lines, MIT buildable from the public source.
“EventHotKeyID(signature: FourCharCode(0x46415A4D), id: id.rawValue) // "FAZM"”
GlobalShortcutManager.swift:88
How the hotkey routes from key press to window focus
Every Cmd+\ press goes through five layers before the user sees the bar open. Each layer is a small amount of code in the public source, but together they are what makes a desktop agent reachable from anywhere on the Mac.
User presses Cmd+\\ anywhere on the Mac
The key event enters the kernel HID stack. If the user is in Safari, Xcode, Slack, or a fullscreen Keynote, it does not matter. Carbon Event Manager receives it before the foreground app.
Carbon matches the hotkey against signature 0x46415A4D (ASCII FAZM)
The FourCharCode is a 32-bit integer that Carbon uses to namespace hotkey IDs per app. 0x46415A4D is F, A, Z, M. Carbon finds the matching EventHotKeyRef in hotKeyRefs and dispatches to handleHotKeyEvent.
handleHotKeyEvent posts com.fazm.desktop.toggleFloatingBar
The Carbon callback translates the hotkey ID into a Cocoa NotificationCenter notification name. This is the bus between low-level Carbon and high-level AppKit. The bar subscribes to this notification and shows itself.
FloatingControlBarManager.shared.openAIInput() runs
The AppKit side wakes, the NSWindow becomes key, the SwiftUI input field gets focus, and the user can start typing. All of this happens without the agent process needing to be frontmost.
Once visible, Cmd+N is handled by the local monitor
For shortcuts that should work only when the bar is already focused, Fazm layers NSEvent.addLocalMonitorForEvents on top of Carbon. The local monitor fires before text fields consume the event, so Cmd+N starts a new chat even when the input field would otherwise capture the N keystroke.
Inputs, hub, outputs
Three entry paths into the agent, one NotificationCenter bus at the hub, three AppKit side effects. The Carbon hotkey fires regardless of frontmost app; the local NSEvent monitor fires only when the bar is key; DistributedNotificationCenter posts from a terminal command let any outside process trigger the same code paths, which is how Fazm's test hooks work.
Key press to floating bar
Wire-level trace for one Cmd+\ press
The sequence diagram below traces one global hotkey press from the moment the user hits the key in some other app, through Carbon, the notification bus, and AppKit, to the floating bar appearing with its input field focused. None of this requires the Fazm app to be frontmost when the key fires.
Cmd+\\ press to input field focus
The Cmd+N local monitor, and why it is needed
Carbon handles the cross-app hotkey. NSEvent.addLocalMonitorForEvents handles the in-app case. A text field inside the bar would normally consume an N keystroke before any shortcut could fire, so Fazm registers a local monitor that runs before the text field and returns nil to consume the event. This layered approach is why Cmd+N works reliably as a new chat shortcut even when the input field is focused.
The six structural flags, in one grid
Six flags define whether a Mac app is a desktop agent or something narrower. Each card below is a specific Swift symbol that appears in the 163 lines this page is anchored to.
.borderless
styleMask without titlebar or traffic lights so SwiftUI draws the pill shape.
.floating
Window level that sits above normal app windows so the agent does not get buried.
.canJoinAllSpaces
Follows the user to every Desktop Space instead of pinning to one.
.fullScreenAuxiliary
Renders above fullscreen apps. The only way a desktop agent survives Keynote.
Carbon 0x46415A4D
Four-char signature that namespaces FAZM hotkeys so they fire from any frontmost app.
canBecomeKey override
Borderless windows refuse keyboard input by default. Override makes typing work.
What a terminal-driven test hook looks like
Because the Carbon handler posts its result through NotificationCenter, a DistributedNotificationCenter post from a terminal hits the same subscribers. That is how Fazm's programmatic test hooks trigger the bar without a physical key press.
Grep-verifiable anchor checklist
Every item below is independently checkable in the public Fazm source. If any item fails, the guide is wrong and should be corrected. If all pass, the page is not marketing, it is a code tour.
Twelve grep-verifiable claims
- FloatingControlBarWindow.swift exists at Desktop/Sources/FloatingControlBar/ and is 2,188 lines long (wc -l)
- Line 90: styleMask: [.borderless] strips every OS chrome element
- Lines 95-97: isOpaque = false, backgroundColor = .clear, hasShadow = false
- Line 98: self.level = .floating keeps the bar above normal windows
- Line 99: collectionBehavior = [.canJoinAllSpaces, .fullScreenAuxiliary]
- Lines 139-140: canBecomeKey and canBecomeMain both overridden to true
- Lines 108-114: Cmd+N local monitor via NSEvent.addLocalMonitorForEvents
- GlobalShortcutManager.swift is 141 lines at Desktop/Sources/FloatingControlBar/
- Line 88: EventHotKeyID(signature: FourCharCode(0x46415A4D), id:) // ASCII FAZM
- Lines 10-12: three Notification.Name("com.fazm.desktop.*") declared
- Line 60: registerHotKey(keyCode: 42, modifiers: Int(cmdKey), id: .toggleBar)
- Line 90: RegisterEventHotKey is the Carbon API, not AppKit, so it fires system-wide
Side by side
Nine rows, each one anchored to a specific Swift symbol. The left column is the kind of Mac AI app that gets labeled as a "desktop AI agent" in marketing copy; the right column is what the structural primitives let Fazm do.
| Feature | Menu bar app or browser extension | Fazm |
|---|---|---|
| Works on top of fullscreen apps | No, menu bar and browser extensions are scoped to their host | Yes, via .fullScreenAuxiliary collectionBehavior |
| Follows the user across Desktop Spaces | No, a normal app window stays pinned to one Space | Yes, via .canJoinAllSpaces collectionBehavior |
| Invokable when the agent is not frontmost | No, AppKit local shortcuts require the app to be active | Yes, via Carbon RegisterEventHotKey (signature 0x46415A4D) |
| Renders outside the browser viewport | No, browser extensions are clipped to the tab | Yes, borderless NSWindow draws anywhere on screen |
| Keeps working inside fullscreen Keynote or Final Cut | No, a normal app is hidden when another app goes fullscreen | Yes, .fullScreenAuxiliary places it above other fullscreen apps |
| Accepts keyboard input without stealing app focus | Requires the host app to be frontmost to receive keystrokes | canBecomeKey override lets the floating bar receive keys directly |
| Drawn chrome, no OS titlebar or traffic lights | Menu bar apps are locked to the status bar shape | styleMask [.borderless] with SwiftUI drawing the pill |
| Transparent, rounded, drop-shadow-free | Standard AppKit windows are opaque and rectangular | isOpaque=false, backgroundColor=.clear, hasShadow=false |
| Clears up after itself on quit | Browser extensions linger in every tab until uninstalled | Single NSWindow, deallocated with the app process |
Want to see the floating bar run on your Mac workflow?
Book a live walkthrough. We show the 22 lines of NSWindow init and the Carbon hotkey in action on real apps, including fullscreen Keynote.
Book a call →Frequently asked questions
What is a desktop AI agent, structurally?
A desktop AI agent is an app that can receive input and render output outside of a single application viewport, on top of every Space and every fullscreen context on the Mac. That is different from a browser extension, which is clipped to a browser tab, and from a menu bar app, which is pinned to the status bar. The structural primitives that define it on macOS are a borderless NSWindow with NSWindow.Level.floating and a collectionBehavior that contains .canJoinAllSpaces and .fullScreenAuxiliary, plus a Carbon RegisterEventHotKey so the agent can be invoked when it is not the frontmost app. Fazm implements both primitives at FloatingControlBarWindow.swift lines 82-103 and GlobalShortcutManager.swift lines 86-100.
Why is .canJoinAllSpaces the most important collectionBehavior flag?
A normal AppKit NSWindow is created on one Desktop Space and stays pinned there. If the user switches Spaces (Ctrl+Right Arrow), the window disappears from view. For a desktop agent that claims to work on any Mac workflow, that is fatal. The agent has to be present on every Space the user switches to, including Spaces created seconds ago, without the developer writing per-Space tracking code. .canJoinAllSpaces tells AppKit to make the window a member of every existing and future Space. Fazm sets it at FloatingControlBarWindow.swift line 99.
Why does .fullScreenAuxiliary matter?
When any app enters macOS fullscreen, the system moves that app to its own dedicated Space and hides everything else. Keynote, Final Cut Pro, Zoom screen share, fullscreen games, and Safari PiP all use this mode. Without .fullScreenAuxiliary, a desktop agent is invisible during exactly the moments users care about most: during a presentation, during a recording, during screen share. .fullScreenAuxiliary tells AppKit to render this window on top of the fullscreen app's dedicated Space, not just the Space it was created on. Fazm sets it at the same line as .canJoinAllSpaces, FloatingControlBarWindow.swift line 99.
What is signature 0x46415A4D and why does it matter?
It is a FourCharCode, a 32-bit integer that Carbon Event Manager uses to namespace hotkey registrations per application. Each byte is an ASCII character: 0x46 is F, 0x41 is A, 0x5A is Z, 0x4D is M, so the signature spells FAZM. Carbon uses the signature to deduplicate hotkey IDs across applications. Without a unique signature, two apps registering a hotkey with the same ID would collide. The value is at GlobalShortcutManager.swift line 88. The reason it is a Carbon call rather than an AppKit local monitor is that Carbon hotkeys fire system-wide, even when Fazm is not the frontmost app. An AppKit local monitor only fires when the app has focus, which is the opposite of what a desktop agent needs.
Why use Carbon in 2026 instead of a modern AppKit API?
Because there is no modern AppKit API for system-wide hotkeys. NSEvent.addGlobalMonitorForEvents observes events but cannot consume them. NSEvent.addLocalMonitorForEvents only fires when the app is key. Carbon's RegisterEventHotKey is the only path on macOS that both observes and consumes a global hotkey, and Apple has preserved it precisely because every serious desktop utility (Alfred, Raycast, CleanShot, 1Password) depends on it. Fazm calls it at GlobalShortcutManager.swift line 90.
How does Cmd+N work differently from Cmd+\\ ?
Cmd+\\ is the global toggle. It has to work even when the user is in Xcode, Safari, or Slack with those apps frontmost. That is Carbon territory. Cmd+N only has to work when the floating bar is already visible and focused, so it uses NSEvent.addLocalMonitorForEvents, an AppKit API, at FloatingControlBarWindow.swift lines 108-114. The local monitor is layered above the text field so it fires before the field consumes the N keystroke. Fazm returns nil from the monitor closure to consume the event instead of letting it reach the text field.
Why override canBecomeKey and canBecomeMain on a borderless window?
AppKit treats borderless NSWindow instances as non-key by default because most borderless windows are inspector panels, HUDs, or notifications that should not steal keyboard input from the frontmost application. A desktop agent is the opposite: it needs to accept keystrokes directly so the user can type into the chat without having to click first. Overriding canBecomeKey and canBecomeMain to true (FloatingControlBarWindow.swift lines 139-140) tells AppKit to route keyboard events to this window when it is visible.
How is this different from Raycast, Alfred, or Spotlight?
The window primitives are the same. Launchers like Raycast and Alfred also use borderless NSWindows with .floating level and a Carbon hotkey. The difference is what the window does after it appears. A launcher looks up commands and apps; a desktop AI agent reads the accessibility tree of the frontmost app, invokes tools in a background agent loop, and renders streaming model responses with in-place follow-up prompts. Fazm uses the same window class as a launcher for the visible layer, then hooks it to an agent runtime via acp-bridge. The launcher stops at the palette; the agent keeps going.
Does a desktop agent have to be a floating window?
No, but if it is not a floating window it is not really a desktop agent. A menu bar app is pinned to the status bar and can only render a dropdown; a browser extension is clipped to the browser tab; a normal app window is hidden by Cmd+Tab. An agent that wants to overlay any app, follow the user across Spaces, and keep working during fullscreen needs all four of: .borderless, .floating, .canJoinAllSpaces, .fullScreenAuxiliary. You can skip any one of them, but the product that results is something else.
What does the NotificationCenter bus actually do?
It is the seam between Carbon and AppKit. Carbon hotkey callbacks run on the main thread but use a C-level API that does not know about any of Fazm's Swift classes. Rather than pass closures into Carbon, Fazm posts a NotificationCenter notification inside the hotkey handler and subscribes to it from AppKit classes like FloatingControlBarManager. The three names are com.fazm.desktop.toggleFloatingBar, com.fazm.desktop.askAI, and com.fazm.desktop.newPopOutChat (GlobalShortcutManager.swift lines 10-12). This makes the Carbon layer testable: DistributedNotificationCenter posts with the same names fire the same code paths, which is how Fazm's programmatic test hooks work.
Can I verify these claims without installing Fazm?
Yes. The relevant files are in the public Fazm tree at Desktop/Sources/FloatingControlBar/. Run wc -l FloatingControlBarWindow.swift GlobalShortcutManager.swift and you should see 2,188 and 141 lines respectively. Grep for 'canJoinAllSpaces' inside FloatingControlBarWindow.swift, you will find it at line 99. Grep for '0x46415A4D' inside GlobalShortcutManager.swift, you will find it at line 88. Grep for 'com.fazm.desktop' and you will find the three notification names at lines 10-12. Everything on this page is a direct line-number reference to that tree.
What comes after the window primitives?
Once the window and hotkey are in place, the agent runtime lives behind them. Fazm wires the floating bar to acp-bridge/src/index.ts, which runs the ACP (Agent Client Protocol) loop, mounts MCP servers for macos-use, whatsapp, google-workspace, playwright, and fazm_tools, and streams model responses back to the SwiftUI view inside the NSWindow. The window and the hotkey are the desktop primitives. The accessibility tree reads and MCP tool invocations are the agent primitives. This guide is about the first half, because without it there is no place for the second half to surface.
Adjacent guides on what makes a desktop AI agent work in production.
Keep reading
Desktop AI agent, beyond demos
The permission recovery code that separates a demo-grade desktop agent from one that survives a macOS update.
Accessibility API desktop automation vs screenshots
Why Fazm reads the AX tree instead of sending screenshots to a vision model, with file:line anchors.
macOS AI agent development guide
What the AX permission flow looks like for a shipping Mac agent, end to end.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.