Voice layer notes

The 2026 model updates also broke our microphone. We fixed it with 19 words.

Every other piece of writing about 2026 LLM updates is a chronological list of who shipped what. This one is about a second-order consequence almost nobody else writes about, because almost nobody else ships a voice-first Mac agent that gets dictated at all day. When the model name is brand new, can your computer actually hear you say it. The answer for Fazm is a 19-string array compiled into the binary at Desktop/Sources/DeletedTypeStubs.swift line 631, biased into Deepgram Nova-3 on every voice session.

M
Matthew Diakonov
9 min read

The thing nobody writes about

You can ship a perfect model picker. You can wire up dynamic catalogs from the agent SDK so new Anthropic releases show up without a build. You can teach the floating bar to route Smart to Opus and Fast to Sonnet and Scary to Haiku. None of that helps if the user says “switch to Sonnet” into a microphone and the speech-to-text model returns “switch to sonnet” in lowercase, because the downstream parser was looking for the proper noun.

The 2026 LLM update cycle has been brisk enough that this is now a daily problem in production. Anthropic shipped Sonnet 4.6 as the everyday default. Opus 4.6 went GA, then Opus 4.7 followed weeks later. Haiku 4.5 is the Scary-button backing on most consumer Macs. Each of those names is, from the perspective of a 2024-era ASR model, a poem, a body part, a Japanese verse form, and a condiment. None of them is a model name yet. None of them is even a noun yet, on the priors that mattered when the speech model was trained.

The agent SDK fixed the model picker. Nothing fixes voice unless the agent fixes voice itself.

The fix is a 19-string array

At Desktop/Sources/DeletedTypeStubs.swift line 631 there is a Swift literal called systemVocabulary. It is a static let, so it is compiled into the binary at build time. There are 19 entries. Five of them are Anthropic-family LLM names. Two are protocols around those names. The rest are developer-adjacent product names that the team learned, by reading actual production voice queries, were getting heard as common English on the way in.

systemVocabulary, line 631

Every term shipped to Deepgram Nova-3 on every voice session

Fazm

Product name. Spoken in onboarding.

Claude

Anthropic family. Heard hundreds of times a day.

Sonnet

Mid-tier label. Sounds like the poem.

Opus

Top-tier label. Easy mishear as a name.

Haiku

Fastest label. Sounds like the poem.

Anthropic

Vendor name. Rarely in plain English text.

MCP

Model Context Protocol. Acronym from late 2024.

ACP

Agent Client Protocol. Acronym from 2025.

Supabase

Backend tool. Often in dev queries.

Firestore

Database product. Compound word.

PostHog

Analytics product. Compound word.

Sentry

Error tracking. Common English homonym.

Stripe

Payments. Common English homonym.

Vercel

Hosting product. Coined name.

Deepgram

The transcriber itself. Self-referential.

Whisper

OpenAI ASR. Common English homonym.

Xcode

Apple IDE. Compound word.

SwiftUI

Apple framework. Compound word.

Tauri

Rust framework. Coined name.

Five LLM names. Two protocols. Twelve developer-adjacent product names. The list is curated, not generated. Nova-3 will accept up to 500 keyterms per session, but the doc comment above the array, lines 629 and 630, is honest about the practical drop-off: effectiveness falls past about 30 terms. So the list stays at 19. The cost of adding a name is a build, a notarization, a Sparkle update. The benefit is that the next time someone says it, the microphone hears it correctly.

What goes on the wire

The plumbing is at Desktop/Sources/TranscriptionService.swift. Line 94 sets private let model = "nova-3". Lines 294 through 296 are a three-line for-loop that emits one URL query item per term:

// Add keyterm parameters for custom vocabulary (Nova-3 uses "keyterm" not "keywords")
for term in vocabulary {
  queryItems.append(URLQueryItem(name: "keyterm", value: term))
}

The comment is doing real work. The older Deepgram keywords parameter accepted boost weights and could hallucinate the term if you pushed too hard. Nova-3 keyterm is designed to be safe at high recall: the model treats the listed strings as in-vocabulary for the session and false-positive risk stays low because the strings are not common English. The 19 query items get appended to the WebSocket URL on every voice session, every push-to-talk press, every dictation toggle. There is no per-utterance prompt. The bias is at the session level.

Voice path on every floating-bar query

MicFazmDeepgram Nova-3Agent16 kHz audio chunkWS open + 19 keyterm paramsInterim text, biased decodeAppend to floating-bar input"Switch to Sonnet" -> model selectPicker re-renders, label = Fast

The five mishears the list is built to stop

The internal motivation for shipping this list, recorded in the 2026-04-28 changelog entry for version 2.6.2, is that “product names and tech terms (Claude, Sonnet, Opus, MCP, Tauri, Supabase, etc.)” were coming back wrong. Five of those failure modes are below. None of them is hypothetical. They were observed in actual voice queries during the run-up to the release.

Before the list

  • "Switch to Sonnet 4.6" gets transcribed as "Switch to sonnet 4.6" with the lowercase poem-form, then the model picker sees no proper noun to match and stays on whatever was selected last.
  • "Use Opus for this one" comes back as "use opus for this one" or worse, "use Oh-puss for this one" depending on accent, and the routing never reaches the heaviest model.
  • "Add an MCP server" gets normalized to "Add an em-cee-pee server" with hyphenation that the agent's command parser does not recognize as a known noun.
  • "Talk to Claude" turns into "Talk to Claud" or "Talk to clawed" on a fast utterance and the reference resolves to nothing.
  • "Connect to Anthropic" lands as a phonetic guess like "Anthrop-ic" with the wrong stress, and downstream search-and-replace rules do not catch it.

The fix is not magic. It is the same ASR model, the same audio, the same network round-trip. The only delta is 19 query items.

After the list

  • Claude, Sonnet, Opus, Haiku, Anthropic are sent to Nova-3 as five separate `keyterm` query items, biasing the decoder toward those tokens before the audio is processed.
  • MCP and ACP are sent as their bare letter sequences. Nova-3 keeps the casing and the agent's command parser sees the canonical acronym, not a phonetic spelling.
  • Deepgram and Whisper are biased so that troubleshooting questions about transcription itself stay readable in the chat log.
  • Vercel, Supabase, Firestore, PostHog, Sentry, Stripe, Xcode, SwiftUI, Tauri cover the rest of a developer-adjacent vocabulary the model otherwise hears as common English.
  • The full set is curated, not generative. Nova-3 caps total keyterms at 500 but accuracy drops past about 30 terms, so the list stays at 19 instead of dumping every product name in the world.

Two layers, two release cadences

Fazm absorbs the 2026 LLM update cycle on two surfaces, and they have different update rules. The picker is dynamic. The voice vocabulary is static. Both deserve to exist.

The picker reads from a @Published array fed at runtime by the Claude agent SDK. New Anthropic models show up without a Fazm build, get substring-matched against a four-row family map, and route to one of three labels. That is the dynamic layer. New names absorb the second they appear in the SDK.

The vocabulary is a Swift literal in source, compiled into the binary, shipped as part of a notarized release. New names absorb on the next point release. That is the static layer. The two layers do not contradict each other; they cover different costs. Adding a model to the picker is free for the Fazm side, and we want it to be free, because Anthropic ships fast and the user should not wait. Adding a name to the vocabulary is the cost of a release, and we accept that cost, because the wrong place to make a speech model decide what is a word is at request time, and the right place is at session-open time.

19

Improved voice transcription accuracy for product names and tech terms (Claude, Sonnet, Opus, MCP, Tauri, Supabase, etc.) by biasing Deepgram with a built-in domain vocabulary.

Fazm 2.6.2 release notes, 2026-04-28

The user can override

The Settings panel at SettingsPage.swift renders each built-in term with a small X button next to it. Click the X and the term is appended to a Set called disabledSystemVocabulary, which is persisted across launches. The effective list shown to Nova-3 is user-added terms first, then any active system terms not already present, deduplicated case-insensitively. A Restore built-in defaults button clears the disabled set in one click. The 2026-04-28 release notes for version 2.6.2 list this UI as a deliberate ship: “Added a Vocabulary section to Settings to view, disable, and restore the built-in transcription terms.”

A user can also add their own. The Dictionary section accepts a free-text term and appends to a published array called transcriptionVocabulary. That is the path for company-specific words: customer names, internal codenames, product lines that nobody outside the building has ever heard. The session-level keyterm cap is 500, so there is room.

Why this matters more in 2026 than it did in 2024

In 2024, the LLM you were saying out loud was almost always called GPT, with a number after it. There were maybe three product names that mattered for voice. The Whisper-class ASRs of the time hit them well enough on common pronunciations that the keyterm trick was a niche concern.

In 2026, the active vocabulary that a serious user touches in a normal week includes Claude, Sonnet, Opus, Haiku, GPT, the OpenAI numbered series, DeepSeek, Gemini, Llama, Mistral, plus MCP, ACP, and a half-dozen agent-protocol acronyms. Anthropic alone has shipped four named families with version numbers attached. The combinatorial space is larger, the confidence interval on any single ASR pass is lower, and the failure mode is no longer rare. It is the median voice query for an LLM-using developer.

The right time to teach a speech model about Sonnet 4.6 is the day Anthropic announces Sonnet 4.6. The wrong time is six months later, when the next version is already named Sonnet 4.7 and the original mistake has been quietly mistranscribed into a thousand chat logs.

Curious whether voice-first survives your real workflow?

Bring a recording or a workflow you keep dictating wrong, and we will run it through the same keyterm path live.

Common questions about voice transcription and 2026 LLM updates

Why does a list of 2026 model updates have anything to do with voice transcription?

Because the new model names are not English words. Speech-to-text systems are trained on text and audio that exists at training time. "Sonnet 4.6" did not exist at training time, and "Sonnet" by itself collides with the poetic form. The same pattern hits "Opus", "Haiku", "Anthropic", "MCP", and "ACP". A voice-first agent has to either accept that the spoken model name will be misheard about as often as it is heard, or it has to bias the transcriber against the news cycle. Fazm picked the second option, and the bias list lives in source as a 19-term Swift array.

Where exactly in the Fazm source is the vocabulary list, and what is in it?

Desktop/Sources/DeletedTypeStubs.swift line 631, in a static let called systemVocabulary. The 19 strings are: Fazm, Claude, Sonnet, Opus, Haiku, Anthropic, MCP, ACP, Supabase, Firestore, PostHog, Sentry, Stripe, Vercel, Deepgram, Whisper, Xcode, SwiftUI, Tauri. Five entries are Anthropic-family LLM names, two are protocols (MCP, ACP), and the rest cover developer-adjacent product names that show up in voice prompts. The list is curated by hand from production voice queries, not generated. Users can disable individual built-in terms in Settings; removals are tracked in a published Set called disabledSystemVocabulary and persisted across launches.

What does Fazm actually do with that list at runtime?

TranscriptionService.swift line 294 builds a Deepgram WebSocket URL and runs a for-loop over the vocabulary, appending one URLQueryItem(name: "keyterm", value: term) per term. The model is hard-coded to "nova-3" at line 94. Nova-3 uses the keyterm parameter, not the older keywords parameter, which is why the comment at line 294 calls that out. The 19 query items go on the wire on every voice session, every push-to-talk press, and every dictation toggle. There is no per-utterance prompt, the bias is at the session level.

Is this just keyword boosting, or is it doing something Nova-3 specific?

Nova-3 specific. The older Deepgram keywords parameter accepted simple boost weights but had to be balanced carefully or the model would hallucinate the term. Nova-3 keyterms are designed to be safe at high recall: the model treats them as in-vocabulary tokens for the duration of the session, and false-positive risk is low because the listed strings are not common English words. The doc comment in source notes the cap of 500 total keyterms and a practical effectiveness drop-off past about 30 terms, which is why the system list stays at 19 even though there is plenty of headroom for more product names.

Why are common-word homonyms like Sentry, Stripe, and Whisper on the list at all?

Because the model otherwise resolves them to the lowercased English meaning. "Did you check Sentry?" comes back as "did you check sentry" with no capital, and a downstream query parser that looks for the proper noun fails to find it. The keyterm bias keeps the casing and pushes the decoder toward the proper-noun spelling whenever the surrounding context is plausible. The tradeoff is that on a fully unrelated utterance about an actual castle sentry, the agent might still capitalize the word; in practice the surrounding context almost always disambiguates because Fazm voice queries are about computers, not medieval keeps.

How does this list get updated when a new model name lands?

By a release of the Mac app. The list is a Swift literal compiled into the binary, so a new entry means a build, a notarization, and a Sparkle update. The 2026-04-28 release notes for version 2.6.2 are explicit: "Improved voice transcription accuracy for product names and tech terms (Claude, Sonnet, Opus, MCP, Tauri, Supabase, etc.) by biasing Deepgram with a built-in domain vocabulary," plus "Added a Vocabulary section to Settings to view, disable, and restore the built-in transcription terms." That release shipped the list and the Settings UI together. Future model names are expected to land in subsequent point releases; the dynamic, no-build path is the model-picker substring table, not the keyterm list.

Can a user override the built-in vocabulary?

Yes, in two directions. Add: the Dictionary section in Settings (the SettingsPage.swift Dictionary case) accepts a free-text term and appends to a published array called transcriptionVocabulary, which Fazm merges with the system list before sending to Deepgram. Remove: the same panel renders each built-in term with an X button that calls disableSystemTerm on AssistantSettings, adding the lowercased term to disabledSystemVocabulary. The effective list shown to Nova-3 is user-added terms first, then any active system terms not already present, deduplicated case-insensitively. A Restore built-in defaults button clears the disabled set in one click.

Why not just use a generative LLM to clean up the transcript after the fact?

Because by the time the cleanup pass would run, the user has already seen the bad text in the floating bar input. The agent's command parser also runs against the live transcript, so a misheard "Sonnet" is a misrouted query before any post-process catches it. Biasing at the speech model is the only place to intervene early enough. There is a separate find-and-replace rule list at TranscriptionService.swift line 299 that cleans up English spoken-form patterns like "dot com" to ".com", but those rules only apply to English-mode and multi-language sessions, and they are explicitly not the place for product names.

Does this change for non-English voice sessions?

The keyterms still go on the wire. Deepgram Nova-3 supports a multi-language mode for Romance, Germanic, Hindi, Russian, Japanese sessions and a few others, listed in the multiLanguageSupported set in source. The system vocabulary entries (Claude, Opus, MCP, etc.) are proper nouns in any language, so the bias still helps on a Spanish or French dictation that namechecks an Anthropic model. The find-and-replace rules are gated behind an English check at TranscriptionService.swift line 301; the keyterm loop at line 294 is not, so the LLM-name vocabulary travels with every language.

Where does this fit in the bigger picture of how Fazm absorbs the 2026 LLM news cycle?

There are two layers. The dynamic layer is the model picker: the floating-bar dropdown reads from a published array filled at runtime by the Claude agent SDK, with a 4-row substring table that maps any incoming model ID to one of three labels (Scary, Fast, Smart). New Anthropic models become selectable without a Fazm build. The static layer is the keyterm vocabulary: when Anthropic ships a name that the user is going to say out loud, that name gets a Swift literal entry in the next point release. The two layers solve different parts of the same problem. The picker absorbs the model. The vocabulary teaches the microphone to hear the model's name.