macOS Dictation With Your Own Model - Accessibility API for Text Insertion

Matthew Diakonov

Updated March 19, 2026

dictation byok accessibility-api macos speech-to-text local-models

macOS Dictation With Your Own Model - Accessibility API for Text Insertion

Apple's built-in dictation works fine for quick notes. But if you want better accuracy, custom vocabulary, or a model that runs entirely on your machine, you need to build your own pipeline. The missing piece is usually not the speech model - it is getting the transcribed text into the right place.

The Accessibility API Solution

macOS provides the Accessibility API (AXUIElement) that lets apps discover which text field is focused and insert text directly into it. This is the same mechanism screen readers use, and it is the most reliable way to inject transcribed text into any application without clipboard hacks or simulated keystrokes.

The flow looks like this:

Audio capture from the microphone
Local speech-to-text model (Whisper, or a fine-tuned variant) processes the audio
The Accessibility API identifies the currently focused text element
Text gets inserted at the cursor position via AXValue attribute setting

Why Local Models Matter

Cloud dictation sends every word you say to a remote server. For developers dictating code, lawyers reviewing case notes, or anyone working with sensitive information - that is a non-starter. Running Whisper locally on Apple Silicon gives you sub-second latency with zero data leaving your machine.

The BYOK Angle

Bring-your-own-key dictation apps let you swap models without changing the insertion layer. The Accessibility API integration stays the same whether you are using Whisper base, large-v3, or a custom fine-tuned model. This separation of concerns - audio processing vs. text insertion - is what makes the architecture flexible.

Getting Permissions Right

The hardest part is not the code. It is getting users through the macOS Accessibility permission prompt without confusion. Your app needs to be added to System Settings > Privacy & Security > Accessibility before it can insert text into other apps.

Once that is done, you have a dictation system that is faster, more private, and more accurate than the built-in option.

Fazm is an open source macOS AI agent. Open source on GitHub.

macOS Dictation With Your Own Model - Accessibility API for Text Insertion

macOS Dictation With Your Own Model - Accessibility API for Text Insertion

The Accessibility API Solution

Why Local Models Matter

The BYOK Angle

Getting Permissions Right

More on This Topic

Related Posts

Fazm: Open Source macOS AI Agent on GitHub

download-ggml-model.sh large-v3: How to Download the Full Whisper Large Model

ggml-large-v3.bin: Complete Guide to Whisper's Largest GGML Model