The April 22 release where a consumer Mac AI agent stopped inventing workarounds for its own native features
Every other article about open source AI in this window covers OpenAI Codex Labs and the early-April foundation model wave. This one is about a single 10-line commit that landed at 01:55 PT on April 22, 2026, and shipped as the headline of Fazm v2.4.1 seventeen hours later. A session replay caught the agent telling a user to build a Telegram bot. The native feature had always been one click away.
What the coverage agrees on
If you read three different recaps of open source AI for April 21-22, 2026, you end up with the same short list. OpenAI launched Codex Labs and an enterprise GSI partner network. ChatGPT Images 2.0 was announced. The month's foundation-model drops (Llama 4 Scout on April 2, Qwen 3 72B on April 5, Codestral 2 on April 8) kept generating downstream stories. Hugging Face trending pages moved around a little. That is roughly the complete story as it is usually told.
What that story leaves out is what happened inside shipping consumer AI software. No new DMGs from the big names. No App Store push. But quite a bit of code landed in the smaller repos, and one commit in particular is a case study in the kind of bug you only find by watching sessions, not reading error logs.
The anchor fact
Fazm v2.4.1 shipped on April 22, 2026 at 18:56 PT. The single most interesting commit in the release landed at 01:55 PT the same day. Its message is eight words: Make AI aware of native Remote Control feature. Its diff is 10 lines added to one file.
The commit message does not say why. The commit body, which most public repo viewers hide, says everything.
“Session replay investigation (device aMo1gHjub3Q1RlzCDorRdHj3srB3) found the AI suggesting users build a custom Telegram bot for phone control, despite the native Remote Control feature (chat.fazm.ai + QR code) already existing. The main desktopChat system prompt had no mention of Fazm's own features, so the model invented workarounds.”
2c09d26a commit body, verbatim
The bug, before and after
This is what a user asking for phone control read from the agent before and after the commit.
The same user question, two versions of the prompt
You could build a small Telegram bot that talks to your Mac over SSH. Here's how: create a new bot with BotFather, install python-telegram-bot, write a handler that receives a message, runs a shell command on your Mac, and sends the output back. For always-on access, set up a reverse tunnel with ngrok or cloudflared. This takes about an afternoon once you have Python installed.
- Invented a new bot from scratch
- Asked the user to install Python and ngrok
- Never mentioned the native Remote Control feature
- The 'afternoon of work' estimate hid the real cost
The 10 lines, verbatim
This block was inserted in Desktop/Sources/Chat/ChatPrompts.swift between the <user_context> and <mentor_behavior> sections of the desktopChat prompt. It is the entire change.
Why this bug is easy to miss
A traditional bug throws a stack trace, or a wrong pixel, or a response that never arrives. This one did none of those things. The agent answered the user's question with a plan that was coherent, technically correct in isolation, and confidently delivered. The user, not knowing the Remote Control panel existed, had no reason to file it as a bug. The only way to find it was to watch the session.
The shape of the miss
The agent had no knowledge of Fazm's own features in its system prompt. When asked 'how do I control my Mac from my phone?', it fell back on general web knowledge (Telegram bots, SSH) because that was the best answer it had access to.
Why logs didn't catch it
The model did not crash. The tool calls it made (reading the knowledge graph, checking memory) all succeeded. The output was plausible. Every traditional signal was green.
Why users didn't report it
If you don't know Remote Control exists, a Telegram-bot suggestion reads as reasonable. Silent miss-recommendations don't show up in feedback queues.
Why session replay caught it
Device aMo1gHjub3Q1RlzCDorRdHj3srB3, real user, real conversation. Replay let the team see the gap between what the agent said and what the app could do. That gap is the bug.
Same-day timeline, end to end
From the session replay observation to the v2.4.1 release tag, every step happened on April 22, 2026, Pacific time. No overnight. No multi-day review.
April 22, 2026, all times Pacific
01:55:43 PT, commit 2c09d26a lands
+10 lines on Desktop/Sources/Chat/ChatPrompts.swift. The <fazm_features> block is introduced. Commit body records the session-replay origin and the Telegram-bot recommendation that triggered it.
12:32 PT, 24f609c7 ships a timer-refresh fix
Conversation history was churning from a timer refresh storm. Fix is unrelated to the prompt change but ends up in the same release.
16:19 PT, b5638f7c skips paywall during onboarding
ChatProvider.swift gains an isOnboarding guard so new users are not paywalled before they finish setup.
16:44-16:56 PT, Sonnet model label fix
Four commits refactor the model short-label lookup and change the fallback from Smart to Fast, so users on Sonnet stop seeing the wrong badge when Anthropic reports a partial model list.
16:47 PT, 21924aa7 handles revoked sign-ins
AuthService.swift gains 25 lines: on permanent refresh-token failure, stop the retry timer and sign the user out so they can sign back in.
18:11-18:55 PT, onboarding chat bubble fixes
Five commits close out the thread on tool-call-driven bubble splitting, so the onboarding chat reads as one message instead of three.
18:56:18 PT, commit 11fb560a tags Release 2.4.1
CHANGELOG.json closes out the v2.4.1 block with the four items above. The fazm_features commit is the headline story of the release, even though it is not listed as a separate line; it lives under the broader awareness of the AI's behavior.
19:57 PT, post-release Opus default shift
After the release tag, a767a55f flips the default Opus model ID back to the canonical one, unblocked by the ACP SDK 0.29+ fix that landed upstream. Ships in the v2.4.2 line.
The loop that caught it
The observability that led to this commit is a specific product pipeline, not a generic log query. It has a name (session replay), a sample identifier (the device ID in the commit body), and a destination (the system prompt file).
From a user session to a 10-line commit
Why this is not how most agents handle the same problem
The common patterns for "teach the agent what the product can do" fall into three buckets: a retrieval-augmented docs index, a tool call that returns feature descriptions at turn time, or nothing at all. Fazm's choice, inline in the system prompt, is the bluntest of the three and, for a product with a small feature surface, the most reliable.
| Feature | Retrieval or tool-call based approaches | Fazm <fazm_features> block |
|---|---|---|
| Where the feature list lives | Vector DB, docs site, or a tool that hits an endpoint at turn time | Inline in the desktopChat prompt constant, Swift source |
| How the model accesses it | Retrieval call or tool call per turn; sometimes skipped when confidence is high | Every single turn, zero tool calls, zero latency |
| Update path | Re-index docs or re-deploy tool endpoint; versioning can drift from the app | Edit ChatPrompts.swift, cut a release |
| Cost of being wrong | Stale index silently persists; no single file to grep | Stale prompt ships with the DMG and is obvious in git blame |
| Ceiling on feature count | Scales to hundreds of features; needed for platforms | Five features today; fits in one paragraph |
| Correct choice | Platforms, marketplaces, or products with rapid feature churn | Consumer apps with small, slow-moving feature sets |
Rows describe Fazm as of the v2.4.1 release line. A large-surface-area product would need a different approach; the point is to match the technique to the product.
What the rest of the v2.4.1 release looked like
For completeness, the four items shipped alongside the prompt change. These are bug fixes that restore intended behavior, not new capabilities.
v2.4.1, CHANGELOG.json entries
- Fixed paywall blocking users mid-onboarding (commit b5638f7c, ChatProvider.swift)
- Fixed revoked sign-in making the app retry in the background; user is now signed out so they can sign in again (commit 21924aa7, AuthService.swift +25 lines)
- Fixed model label reading 'Smart' for Sonnet users when Anthropic reports a partial model list (commits 6b52425b, b740aac9, 064d1a2c, db9d80d9)
- Fixed onboarding chat messages splitting into multiple bubbles when tool calls occur (commits cbed7124, dd8c32fe, f3800147, aeacf863, 39ad5154, 8b41248f)
- Plus the April 22 01:55 PT fazm_features addition (commit 2c09d26a), which is the headline change even though it is not listed separately in CHANGELOG.json
The small numbers that define the release
Trace the whole thing yourself
Every assertion on this page is reproducible from the public repository. If you want to audit, this is the shell sequence.
The rest of the April 21-22 landscape, in one strip
For context, this is what every other writeup of the same window is about. Fazm v2.4.1 is the outlier, which is exactly why it is worth writing down.
The takeaway, if you are building anything similar
Don't assume the model knows your product. A capable general-purpose model has the entire internet as a prior. Your specific app is a rounding error in that prior. Unless you put your app's features in the prompt, the model will recommend the generic solution it read about somewhere else. That is not a prompt engineering taste question; it is a straightforward observation about where the model's knowledge comes from.
Watch sessions, not just errors. The Telegram-bot miss never generated an error. It never even generated a support ticket. The only way to find it was to open real conversations and notice the gap between what the agent said and what the app could do. Dashboards cannot see that gap; only people watching conversations can.
Ship the fix the same day. There is no queue between the session replay observation at midnight and the release tag at dinnertime. The commit is 10 lines. The test is reading it back. The release is a DMG. For a consumer product where the cost of a stale prompt is user churn, fast is correct.
“The hardest class of AI product bugs is the ones where the agent gives a coherent wrong answer. The answer looks right. The user nods. The feature you shipped goes unused. You only find it by watching real sessions.”
One last thing
The <fazm_features> block closes with a sentence that reads a little differently from the rest:
If unsure whether a feature exists natively, say so and offer to check, don't assume you need to build a workaround.
That is the only escape hatch in the block. It covers the case where a future feature lands after the prompt was last edited. The agent is told: if you don't know, admit it, don't invent. In a release log full of revert-to-intent bug fixes, it is the one sentence that changes the agent's epistemics rather than its lookup table. That sentence is why I wrote this page.
Want to see the v2.4.1 behavior running on your Mac?
Book a 15-minute walkthrough. I'll demo the Remote Control feature, open the ChatPrompts.swift diff side-by-side, and answer any question about how the <fazm_features> block shapes the agent's recommendations.
Frequently asked questions
What were the big open source AI updates on April 21-22, 2026?
The stories every roundup for this window covers: OpenAI announcing Codex Labs for enterprise adoption with GSI partners (Accenture, Capgemini, Cognizant, Infosys, PwC, TCS), ChatGPT Images 2.0, and continued momentum on the early-April releases of Llama 4 Scout (Apr 2), Qwen 3 72B (Apr 5), and Codestral 2 (Apr 8). What no roundup covers is what shipped in the smaller corner of MIT-licensed consumer Mac AI agents. On April 22, 2026 at 18:56 PT, Fazm (github.com/mediar-ai/fazm) shipped v2.4.1. The headline change of that release traces back to a single commit landed 17 hours earlier at 01:55 PT on the same day: 2c09d26a, Make AI aware of native Remote Control feature.
Why does that single commit matter more than a patch-note line?
Because it is a concrete example of a failure mode every production AI agent hits: the model does not know what the product around it can already do, so when the user asks for a capability the app has shipped, the model invents a workaround. In this case, the session replay from device aMo1gHjub3Q1RlzCDorRdHj3srB3 showed Fazm telling a user to build a custom Telegram bot for phone control, despite Fazm already having a native Remote Control feature (Settings > Remote Control, scan the QR code, chat with Fazm from chat.fazm.ai on your phone). The commit response was 10 lines added to Desktop/Sources/Chat/ChatPrompts.swift, a new <fazm_features> XML block listed directly in the desktopChat system prompt.
How can I verify the commit, the date, and the exact text myself?
Clone github.com/mediar-ai/fazm and run: git show 2c09d26a to see the full diff. The author date is Wed Apr 22 01:55:43 2026 -0700. The diff is against Desktop/Sources/Chat/ChatPrompts.swift and adds a block that begins with <fazm_features> and ends with </fazm_features>. The single most specific line in the block is the phrase: NEVER suggest building a custom Telegram bot, Discord bot, or SSH setup for phone control, the native feature already exists. Every claim on this page falls out of the public git log; no interpretation required.
What else shipped in Fazm v2.4.1 on April 22, 2026?
Four line items in CHANGELOG.json under the 2.4.1 release entry. 1) Fixed paywall blocking users mid-onboarding (commit b5638f7c at 16:19 PT added an isOnboarding guard to ChatProvider.swift). 2) Fixed a revoked sign-in making the app retry in the background; the user now gets signed out so they can sign in again (commit 21924aa7 at 16:47 PT added 25 lines to AuthService.swift). 3) Fixed the model label reading Smart for Sonnet users when Anthropic reports a partial model list (commit 6b52425b at 16:44 PT, followed by label refactors at 16:49 PT). 4) Fix onboarding chat messages splitting into multiple bubbles when tool calls occur (a chain of commits from 16:25 PT through 18:55 PT). The release tag went out at 18:56:18 PT.
Why was the teach-the-agent-its-own-features fix the headline of the release?
Because the other three items are bug fixes that restore the prior intended behavior; this one actively changes what the agent says to users. A user who asked Fazm for phone control before April 22, 2026 would read a plausible-sounding plan involving a new Telegram bot, a new Discord channel, or an SSH tunnel. A user who asks on April 23 or later reads a single sentence: open Settings > Remote Control and scan the QR code. That is a product-quality change, not a bug fix. It is also the kind of change you only find by watching real sessions, because the AI's invented workaround was coherent enough that the user would never have reported it as a bug.
Is hardcoding product features into a system prompt the right pattern?
It is the pragmatic pattern when the feature set is small enough that the model can hold all of it in a single prompt section, and when the cost of a stale description is bounded. Fazm's <fazm_features> block lists five features (Remote Control, voice input via push-to-talk, personal Claude account, referral, memory) and nothing else. At that scale, a prompt block is cheaper and more reliable than a tool call that retrieves feature descriptions at turn time. The risk is that the block has to be updated whenever a feature ships or changes; the commit history of ChatPrompts.swift will be the source of truth.
What does this have to do with accessibility APIs and screen context?
Everything, indirectly. Fazm reads the user's focused window through the real macOS accessibility tree (AXUIElement, kAXFocusedWindowAttribute, and related APIs, used in AppState.swift around line 439), not screenshots. That read path is structured: the agent sees button labels, window titles, and element roles, not pixels. That lets the agent answer many questions (which Save button, which tab, which message) without asking at all. But accessibility context tells the agent what is in front of the user right now; it does not tell the agent what Fazm itself can do. The <fazm_features> block closes that second gap. Structured reads of the focused app + a prompt-level description of the containing app is the minimum pair to stop the model from inventing workarounds.
Does the Remote Control feature actually exist or was this a stub?
It exists in production before the April 22 commit. The web half of the feature lives at chat.fazm.ai; the desktop half is surfaced in Settings > Remote Control with a QR code. The session replay that triggered commit 2c09d26a showed the AI recommending a Telegram workaround in a session where the user had that Settings panel one click away. The problem was not a missing feature; it was a model that did not know the feature existed. The 10-line prompt addition is the fix.
Where exactly does the new prompt block sit in the file?
Desktop/Sources/Chat/ChatPrompts.swift, the constant desktopChat, between the <user_context> block and the <mentor_behavior> block, starting at roughly line 25 of the post-commit file. The opening tag is <fazm_features>. The block uses markdown bullets (- **Remote control from phone**, - **Voice input**, and so on) inside an XML tag, a format the rest of the prompt follows. The final sentence before the closing tag is: If unsure whether a feature exists natively, say so and offer to check, don't assume you need to build a workaround.
Did anything else change in the app after the v2.4.1 release tag on April 22?
Yes. One commit landed 61 minutes after the release tag: a767a55f at 19:57 PT, Update Opus model ID to default for ACP SDK v0.29+ compatibility. That commit edited ShortcutSettings.swift to remove the Opus-to-Sonnet migration that v2.4.0 introduced, because the ACP SDK upgrade fixed the Opus rate-limit bug that had forced the migration in the first place. It ships in the v2.4.2 line rather than v2.4.1; if you cloned the repo at main on April 23 you would see it, but users on v2.4.1 do not have it yet.
How does Fazm's release cadence compare to the other April 21-22 stories?
Fazm shipped two numbered releases eleven days apart in April (v2.4.0 on April 20, v2.4.1 on April 22). OpenAI's Codex Labs launch was a business announcement, not a software release. The big foundation model drops (Llama 4, Qwen 3, Codestral 2) all predated this window. For a reader looking at what changed in MIT-licensed consumer desktop AI in the April 21-22 slice specifically, v2.4.1 is the shipping event. For a reader looking at what is interesting about v2.4.1 specifically, commit 2c09d26a is the answer.
Is Fazm a framework or a consumer product?
Consumer product. Signed, notarized macOS DMG. The repo is MIT-licensed and the Swift + Rust + TypeScript source is public, but the delivery format is an app you drag into /Applications, not a library you import or a CLI you install globally. That distinction is why the April 22 fix was urgent: a framework user would file a feature-misrecommendation as a prompt bug and work around it; a consumer would nod, try the Telegram suggestion, and churn. Fixing the prompt in a release beats adding a developer doc every time, for that audience.