Local voice desktop agent
What a local, voice-first desktop agent actually shipped this past week
Search results for tools "released this week" stop at the announcement: one launch, one tidy feature list. That tells you a thing exists. It does not tell you whether the team is fixing the unglamorous parts that break a desktop agent in real use. So here is the opposite: an actual seven-day release log from one shipping local agent, Fazm, and what each fix reveals about voice input, action capture, and desktop automation that runs on your own machine.
Direct answer - verified June 21, 2026
The local desktop AI agent with voice input that shipped this past week is Fazm: a fully local, open source macOS agent with push-to-talk voice control. It cut six tagged releases between June 15 and June 21, 2026 (v2.9.66 through v2.9.71), with v2.9.71 alone landing 11 fixes. One caveat that most roundups skip: it is a desktop app for macOS 14 and newer, not a phone app, so the "mobile" part of the search does not apply here. The voice and capture run on your Mac.
Source you can check yourself: github.com/m13v/fazm/releases.
"Released this week" should mean the release log, not the press release
When you search for an agent that shipped in the past week, the pages that come back describe a single moment: a public preview, a feature grid, a hero video. That is useful once. But a desktop agent lives or dies on the second week, when you hit the network it was not demoed on, the app it cannot quite drive, the crash on the brand-new OS. The signal you actually want is whether anyone is still shipping fixes for those. Here is the same product told both ways.
Two ways to read 'released this week'
A single launch post. 'Native voice-first local desktop agent. Drives your browser and Mac apps. Open source.' You learn the tool exists and what it claims. You learn nothing about whether it survives contact with your setup.
- One date, one feature list
- No visibility into reliability
- Demo-network, demo-apps, demo-OS
The actual seven-day log
Every entry below is a real tagged release from Fazm's changelog, in order. Read top to bottom and you can watch the cadence: same-day reships, paper-cut fixes, and then one large reliability release. This is what a local voice desktop agent looks like when it is being maintained, not just launched.
Fazm releases, June 15 to 21, 2026
v2.9.66 and v2.9.67 - June 15
Two releases in one day. v2.9.66 made the paywall header reflect whether a free trial is currently available. v2.9.67 restored the Opus option in the model picker so 'Smart' mode was selectable again. Small, but it is the cadence that is the point: same-day reships when something is off.
v2.9.68 - June 16
Two fixes that matter to power users. The conversation history list had been silently capped at 20 chats and was hiding older ones; that cap is gone. And the Custom API Endpoint setting now warns when your selected model is not a Claude model, since the endpoint only routes Claude traffic, with a one-tap switch back to a Claude model.
v2.9.69 - June 16
Local file paths in the agent's chat replies were not clickable and threw an application error when opened. Fixed. This is exactly the kind of paper cut that never appears in a launch post but bites every day if you ask the agent to touch files.
v2.9.70 - June 17
Removed the 'Chat with Our Founder' booking option from the subscription paywall. A product decision, shipped as its own release rather than bundled and forgotten.
v2.9.71 - June 21
The big one: eleven changes. Voice replies fall back to Deepgram when ElevenLabs TTS quota is exhausted. A push-to-talk crash that could freeze then quit the app on macOS 26 (Tahoe) is fixed. The agent now uses mouse and keyboard control for web pages inside a remote desktop or screen share. The agent stopped refusing to edit documents open in Word and now applies the edits. Plus OAuth on both IPv4 and IPv6 loopback, faster failure on an unreachable Custom API Endpoint, a graceful browser-extension handoff, and a code-signing fix that stopped an onboarding restart loop.
“Six tagged macOS releases in seven days, ending with an eleven-change reliability drop. Maintenance, not marketing, is the tell.”
Fazm changelog, June 15 to 21, 2026
What the week's fixes reveal about local voice automation
The fixes are not random. Each one points at a design choice that separates a local, action-capturing desktop agent from a chat box that can only talk. Read the four below as a cross-section of the architecture, exposed by the bugs it took to harden it.
Voice that degrades instead of dying
v2.9.71 added a Deepgram fallback for text-to-speech when the ElevenLabs quota runs out. A voice-first agent that goes silent the moment one vendor throttles you is a toy. This is the difference between a demo and something you talk to every day.
Real browser, graceful handoff
Because Fazm drives your actual Chrome through an extension (not a headless instance), the connection can drop. The fix this week makes the agent hand control back to you and stop guessing at screen state it has not verified, instead of thrashing.
Accessibility-driven edits
The agent had been refusing to edit documents open in apps like Word, claiming it could only generate copy-paste text. v2.9.71 makes it actually apply edits through the accessibility layer. Capture and act, not capture and describe.
Networks the demo never hits
Account connection was failing with ERR_CONNECTION_REFUSED on IPv6-preferring networks; the OAuth callback now listens on both IPv4 and IPv6 loopback. The boring half of shipping a local agent is the half that decides whether it works on your network, not the presenter's.
About the "mobile agent" part
If you arrived here wanting an agent that runs on your phone, the honest answer is that Fazm does not. It is macOS 14 and newer, full stop. The reason the searches overlap is that the things people want from a mobile assistant, talk to it and have it do the work, are exactly what voice-first desktop control delivers, just aimed at the Mac you already work on. You hold a hotkey, you talk, and the same agent loop drives your real browser, your native Mac apps, and your Google Workspace through the accessibility layer rather than waiting for you to type.
That scope is a feature, not an apology. The action capture is precise because it reads the macOS accessibility tree, the same interface VoiceOver uses, instead of screenshotting pixels and guessing. The automation is durable because it persists your sessions across a restart and never auto-compacts the context out from under a long task. And it runs locally: the agent, your files, and your screen reads stay on your machine, with the model pointed at your own Claude account or a custom endpoint. The trade is platform reach for depth on one platform, and this week's log is full of the depth.
Want a walkthrough of the voice and desktop-control parts?
Book 15 minutes and I will show you what the past week's releases actually changed, live on a Mac.
Questions people actually ask
Frequently asked questions
Which local desktop voice AI agent released this past week?
Fazm, a fully local open source macOS agent with push-to-talk voice input, shipped six tagged releases between June 15 and June 21, 2026: v2.9.66 and v2.9.67 on the 15th, v2.9.68 and v2.9.69 on the 16th, v2.9.70 on the 17th, and v2.9.71 on the 21st. It is a desktop app (macOS 14 and newer), not a phone app, so if you searched for a 'mobile agent' the honest answer is that the voice and action-capture parts run on your Mac, not on iOS or Android. The releases are public at github.com/m13v/fazm/releases.
Is Fazm a mobile agent?
No. Fazm runs on macOS 14 or newer and nowhere else right now. The 'voice input' and 'capture' parts that overlap with what people expect from a mobile assistant are there (hold a hotkey, talk, the agent acts), but they act on your desktop: your browser, your native Mac apps, your Google Workspace, your local files. If you need an agent literally on a phone, Fazm is not it. If you want voice-driven automation of the Mac you already work on, that is exactly what shipped this week.
What does 'fully local' mean here, and what still leaves the machine?
The agent loop, the accessibility reads of your screen, the file scans, and the local SQLite database all run on your Mac. Fazm wraps Claude Code and Codex over the Agent Client Protocol, so the model inference itself runs against whatever account or endpoint you point it at (your own Claude Pro or Max plan, or a custom Anthropic-compatible endpoint). Voice transcription and text-to-speech are the one place this week's changelog shows a cloud dependency: v2.9.71 added a Deepgram fallback for when the ElevenLabs TTS quota is exhausted. So 'local' describes where the agent and your data live and act; the model and the optional spoken-reply voice are services you choose.
Why does a release log matter more than a launch announcement?
A launch tells you a tool exists. A week of releases tells you whether the team is fixing the unglamorous things that break a desktop agent in real use. This week's Fazm log is almost entirely that: a push-to-talk crash on macOS 26 Tahoe, an OAuth callback that failed on IPv6-preferring networks, a browser-extension drop that used to make the agent thrash, and the agent refusing to edit a document open in Word. None of that shows up in a feature grid, and all of it is what determines whether you keep the app on your dock after the demo.
What was actually in the big v2.9.71 release?
Eleven changes on June 21, 2026. The ones that matter for voice and desktop control: spoken replies now fall back to Deepgram when ElevenLabs TTS quota runs out; a push-to-talk crash that could freeze then quit the app on macOS 26 was fixed; the agent now drives web pages shown inside a remote desktop or screen share with mouse and keyboard control instead of the Chrome tools; and the agent stopped refusing to edit documents open in apps like Word and now applies the edits. The rest were reliability fixes (Custom API Endpoint warmup failing fast, OAuth on IPv4 and IPv6 loopback, browser-extension handoff, and a code-signing seal corruption that caused an onboarding loop).
How does Fazm capture 'actions' on the desktop without screenshots?
It reads the macOS accessibility tree, the same API that VoiceOver uses, rather than screenshotting the screen and running vision over pixels. That means it gets real element roles, labels, and values for native Mac apps and can act on them directly. For the browser it drives your actual Chrome through an extension, which keeps you logged in and preserves Cloudflare and Turnstile state, instead of spinning up a headless instance. The week's log reflects this: the browser-extension handoff fix in v2.9.71 is about gracefully returning control to you when that real-browser connection drops mid-task.
Is this open source, and can I read the changelog myself?
Yes on both. Fazm is open source at github.com/m13v/fazm, and the tagged releases (each one a v2.9.x+buildnumber-macos tag) are at github.com/m13v/fazm/releases. The user-facing notes also ship inside the app and are written for humans, not commit hashes, so you can read what changed without parsing git history.
Keep reading
AI agent for desktop tasks, and the recurring-task primitive most guides skip
How the agent turns 'every weekday at 9 check my email' into a persisted routine, with file paths and line numbers.
Voice-controlled macOS agent
What hold-to-talk control of your Mac actually does, and where it reaches beyond the terminal.
AI desktop agent
The accessibility-tree approach to driving native Mac apps instead of screenshotting pixels.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.