Wearing a Mic So Your AI Agent Acts as Chief of Staff
Wearing a Mic So Your AI Agent Acts as Chief of Staff
A post about wearing a microphone throughout the day so an AI agent system could act as a "chief of staff" sparked a lot of discussion. The idea: speak naturally during your workday, and the agent picks up on actionable items - updating your CRM after a call, drafting follow-up emails, creating tasks, logging notes.
It sounds futuristic, but the pieces are all available today.
Voice as the Natural Input
Typing is great for precision work. But a huge amount of your workday involves things you could just say out loud. "Remind me to follow up with Sarah on Thursday." "Log that call with Acme Corp - they are interested in the enterprise plan." "Draft a reply to that email from marketing saying we will have the numbers by Friday."
With a local speech-to-text model like Whisper running on Apple Silicon, transcription happens on-device in real-time. The transcript feeds into a language model that extracts intents and entities, then triggers actions through your actual apps.
The Chief of Staff Pattern
A human chief of staff listens in meetings, takes notes, identifies action items, and makes sure they get done. An AI version of this pattern does the same thing: it listens to your ambient speech, filters for actionable content, and executes through whatever tools you have connected.
The key difference from a voice assistant like Siri is that a chief of staff agent has persistent context. It knows your ongoing projects, your contacts, your priorities. When you say "send the proposal to David," it knows which David, which proposal, and which channel David prefers.
Making It Practical
The practical version of this today is a macOS agent with always-on voice input. Whisper handles transcription locally. The agent uses accessibility APIs to interact with your apps - adding entries to your CRM, creating calendar events, drafting emails in your actual mail client.
Privacy is handled by keeping everything local. Your voice data is transcribed on-device, processed on-device, and the actions happen on your own machine through your own apps.
The chief of staff is not a single app. It is an agent layer that sits on top of all your existing tools and responds to your voice.
Fazm is an open source macOS AI agent. Open source on GitHub.