Why Native Swift Menu Bar Apps Are the Right UI for AI Agents
Why Native Swift Menu Bar Apps Are the Right UI for AI Agents
Most AI tools make you switch to a separate window. Open ChatGPT. Open Claude. Open a dedicated app. Context switch, type your question, get your answer, switch back.
For a desktop agent that is supposed to save you time, this interaction model defeats the purpose. If you have to leave your current app to use the agent, you already lost the flow state you were trying to protect.
The Menu Bar Pattern
A floating bar that stays on top of your screen, activated by a keyboard shortcut (Option+Space), is the right pattern. It is:
- Always available. No app-switching. No window hunting. No Cmd+Tab.
- Never in the way. It appears when you need it and disappears when you do not.
- Context-aware. Because it is a native app, it can see what app you are using and what is on screen.
Push-to-Talk Over Chat
The second insight is that voice input beats typing for agent interaction. If you are going to use an AI agent to control your computer, typing instructions into a chat box means your hands are busy doing something the agent should be handling.
Push-to-talk solves this: hold the hotkey, say what you need, release. The agent transcribes your speech locally and executes the actions. Your hands are free the entire time.
This is the difference between a transcription tool (like Whisper Flow) and a desktop agent. Transcription tools convert speech to text. Desktop agents convert speech to actions - the agent actually controls your mouse, keyboard, and applications to complete the task.
Why Native Swift Matters
Building this as a native Swift/SwiftUI app instead of an Electron wrapper matters for three reasons:
- System integration. Native apps can use ScreenCaptureKit, accessibility APIs, and the Keychain directly. Electron apps need to go through Node.js bindings that are often incomplete or buggy.
- Performance. A menu bar app needs to be instant. Sub-100ms activation time. Electron cannot deliver this consistently.
- Resource usage. A floating overlay that runs all day cannot afford to consume 500MB of RAM. Native Swift apps typically use 30-50MB.
Fazm uses a native Swift menu bar interface with push-to-talk voice control. Open source on GitHub. Discussed in r/macapps.