Self-Hosted iOS Voice Keyboard for AI Agent Workflows
Voice Input Is Underrated for AI Workflows
Most people interact with AI agents by typing. Open a terminal, write a prompt, wait for output. But voice input changes the equation in ways that aren't obvious until you try it.
The Typing Bottleneck
When you type a prompt, you're dedicating both hands and your full visual attention to the agent interaction. You can't be doing anything else. This turns AI assistance into a sequential task - you stop working, talk to the agent, then resume working.
Voice eliminates this. You speak a command while your hands stay on whatever you were doing. Review a PR while telling the agent to update the changelog. Browse documentation while directing a refactor. The interaction becomes parallel instead of sequential.
Why Self-Hosted Matters
Cloud-based speech-to-text adds latency and sends your audio to third-party servers. When you're dictating commands that include API keys, file paths, project names, and business logic, that's a privacy concern worth taking seriously.
Self-hosted voice recognition running on Apple Silicon handles this locally. Whisper.cpp on an M-series chip transcribes speech in near real-time without any network round trip. Your commands stay on your machine.
iOS as a Remote Control
An iOS voice keyboard connected to your Mac agent creates a powerful setup. Walk around your house, dictate high-level instructions from your phone, and the desktop agent executes them on your Mac. It's like having a remote control for your development environment.
The key is building the keyboard as a native iOS extension that talks directly to your macOS agent over your local network. No cloud relay, no subscription, no third-party dependency.
Making It Practical
The trick is handling the messiness of natural speech. "Fix that build error from earlier" needs context about recent failures. "Deploy the thing I was working on yesterday" needs memory. Voice input only works well when the agent underneath is smart enough to resolve vague references.
- Voice Control Unlocks Desktop Agents
- Native Mac Speech to Text - Local Processing
- Whisper CPP Apple Silicon Voice Recognition
Fazm is an open source macOS AI agent. Open source on GitHub.