Back to Blog

Building a Native Swift Voice App for macOS - Open Source Journey

Fazm Team··2 min read
swiftvoice-appmacosopen-sourcewhisperkit

Building a Native Swift Voice App for macOS - Open Source Journey

The idea is simple. You talk to your computer and it does things. Not "it shows you a chat response" - it actually opens apps, fills forms, sends messages, organizes files. A voice interface for your entire desktop.

Why Native Swift

Electron would have been faster to prototype. We went with native Swift anyway because desktop agents need low-latency access to system APIs that web wrappers cannot provide efficiently. Accessibility APIs for reading screen content. Core Audio for microphone access without browser permission dialogs. AppKit for window management. These are not nice-to-haves - they are the core functionality.

SwiftUI handles the interface - a minimal overlay that shows transcription status and agent actions. The UI stays out of the way because the whole point is that you should not need to look at a screen to control your computer.

WhisperKit for Local Transcription

Speech-to-text runs locally using WhisperKit, which is optimized for Apple Silicon. No audio leaves your machine. Transcription latency on an M1 is under 500ms for typical commands, which feels close to real-time.

Local transcription also means the app works offline and does not depend on any cloud service's uptime or pricing changes. The model runs on your hardware using your cores. That is it.

Accessibility APIs for Control

macOS accessibility APIs let the agent read what is on screen - button labels, text field contents, menu items - and interact with those elements programmatically. This is the same API that screen readers use, which means it works across virtually every macOS application without needing app-specific integrations.

The combination of voice input, local transcription, and accessibility-based control creates something that feels genuinely different from typing commands into a chat box. You say "send the weekly report to the team Slack channel" and watch it happen.

The entire project is open source because we believe desktop agents should be transparent about what they do on your machine.

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts