48-hour window: April 12-13, 2026

The Two Days When Llama 4, Qwen 3 Audio, and One Boring Bug Fix All Shipped

Every AI roundup for April 12-13, 2026 lists the same model releases, papers, and GitHub stars. This guide adds the part the roundups skip: the runtime layer that decides whether any of those models can actually do anything on your Mac, and the boring patch Fazm shipped to keep that layer working.

M
Matthew Diakonov
11 min read
4.9from 200+ users
Fazm v2.2.1 shipped April 12, 2026
Four-method accessibility probe in AppState.swift
Open source: github.com/mediar-ai/fazm

The 48-Hour Release Wall

Skim what shipped. Every chip below is a real release, paper, or commit window from April 12 or 13, 2026. Hover to pause.

llama.cpp Qwen3 audio (April 12)Codex CLI 0.121 alpha 4 (April 13)MemPalace 23k starsNVIDIA Ising open modelsLlama 4 Scout 10M contextLlama 4 Maverick 400B/17B MoEQwen 3 235B MoEGemma 3n 4B on-deviceOLMo 2 32B fully openDeepSeek V3 training paperGemma 3 license update (April 11)Fazm v2.2.1 (April 12)

What Each Release Actually Means

Not a leaderboard. A short read on what changed for someone who wants to use these models, not just write about them.

llama.cpp gets Qwen3 audio (April 12)

Qwen3-Omni and Qwen3-ASR weights now load through llama.cpp's standard pipeline. The practical effect on a Mac: speech in, structured tokens out, fully local, no Python runtime.

OpenAI Codex CLI 0.121 alpha 4 (April 13)

Realtime V2 background-agent streaming. The agent runs in the background and streams partial output while you keep typing. The pattern: decouple chat from tool execution.

MemPalace passes 23k stars

Cross-session persistent memory for LLMs. Launched April 6, still adding commits through April 13. The premise: your assistant should remember what you discussed yesterday.

NVIDIA Ising

First open AI models from NVIDIA aimed at accelerating useful quantum computing research. Niche today, but signals NVIDIA's open-model push extends past LLMs.

DeepSeek V3 training paper

Publishes training internals most labs keep proprietary. Useful even if you never train: the data-mixing and stability tricks travel to fine-tuning.

Gemma 3 license update (April 11)

Removes the previous user-count cap. If you build a consumer product on Gemma 3, you no longer need to negotiate above a usage threshold.

The Part No Roundup Covers

A model release is the start of a chain, not the end. After weights ship, someone has to wire that model into a runtime that can read your screen, click buttons in your apps, remember what happened yesterday, and not break when Apple silently rotates the TCC database. That runtime layer is invisible in benchmark charts.

On April 12, 2026, the AI world was looking at Llama 4, Qwen 3 audio, and the latest agent-orchestration framework. On the same day, Fazm shipped v2.2.1 with one line in the changelog:Fixed duplicate AI response appearing in pop-out and floating bar when sending follow-up messages.

That bug is not glamorous. It is also exactly the kind of friction that decides whether a real consumer product feels broken or works. The rest of this page is about the layer underneath that bug, which is where the same news week's models actually live.

From Model Drop to Useful on Your Mac

The path from a Hugging Face release to a consumer product interaction looks like this. The model is one node in a graph of permission probes, capture surfaces, and tool calls.

The Mac AI runtime

User question
Active window pixels
AX permission state
Tool-calling LLM
Click / type / scroll
Read app structure
Answer in chat

The Four-Method Accessibility Probe

In macOS Sequoia, Apple introduced a per-process TCC cache that backs AXIsProcessTrusted(). The cache can go stale after a system update or app re-sign and start returning false for an app that the user already authorized. There is no public API to flush it.

Fazm's workaround lives in /Desktop/Sources/AppState.swift. It stacks four probes and trusts whichever one disambiguates the failure mode.

The fallback chain (line numbers from the open source repo)

1

1. AXIsProcessTrusted() (line 311)

The standard call. Cheap and fast. Returns true on the happy path. Returns a stale value when the cache desyncs.

2

2. testAccessibilityPermission() (line 433)

Real AXUIElementCopyAttributeValue call against the frontmost app's focused window. Distinguishes apiDisabled (system-wide off) from cannotComplete (this specific app does not implement AX, e.g. Qt, OpenGL, Python apps like PyMOL).

3

3. confirmAccessibilityBrokenViaFinder() (line 468)

If the previous probe returned cannotComplete, retry against Finder. Finder is known to be AX-compliant. If Finder also fails, the permission is truly broken. If Finder works, the original failure was app-specific and the permission is fine.

4

4. probeAccessibilityViaEventTap() (line 490)

Final tie-breaker. CGEvent.tapCreate(.cgSessionEventTap, ..., .listenOnly). Bypasses the per-process cache entirely because event-tap creation hits the live TCC database directly.

The Code That Catches the Stale Cache

Verbatim from the Fazm repo. The comment on line 308 is the short version of the entire problem.

Desktop/Sources/AppState.swift

What This Looks Like in Production Logs

When the workaround triggers, the runtime emits a recognizable transition log. This is what you would actually see tailing/tmp/fazm-dev.log after a macOS point update breaks accessibility.

fazm-dev.log

The third line is the one nobody enjoys writing. It says the standard API lied and the only reason the app keeps working is a four-step fallback that exists because we got tired of shipping support tickets.

April 12-13 in Numbers

0Permission probes in the AX fallback chain
0k+MemPalace GitHub stars by April 13
0MLlama 4 Scout context window (tokens)
0pxFazm screenshot max dimension before send
4 probes

AXIsProcessTrusted() can return stale data after macOS updates or app re-signs, so we also do a functional AX test to detect the broken state.

Fazm AppState.swift, line 308 (open source)

The Three Layers a Mac AI Product Has to Get Right

April 12-13 was loud at the model layer. The runtime and product layers are where the work actually compounds.

Layer 1

The model

Llama 4. Qwen 3. Gemma 3n. DeepSeek V3. The thing the news cycle covers. Important. Replaceable.

Layer 2

The runtime

Permission probes. Window capture. Tool execution loops. Background agents. The boring code that decides if a model release matters in a real product.

Layer 3

The product

Pop-out windows. Streaming responses. Onboarding. The bug fixes you ship on the same day Llama 4 makes the front page.

What to Actually Try This Week

The release wall is wide. Here is what is worth a few hours of your time depending on what you want to do.

A short, opinionated punch list

  • Update llama.cpp and load a Qwen3-ASR weight to see real-time local speech transcription on Apple Silicon.
  • Pull the Codex CLI 0.121 alpha 4 build and try Realtime V2 background streaming on a long codemod task.
  • Star MemPalace and watch how cross-session memory matures past the first 30 days.
  • Skim the DeepSeek V3 training paper for data-mixing ideas applicable to your fine-tunes.
  • Install Fazm and ask it to summarize the document you currently have open. The runtime in this guide is what answers.

Run this stack against your own Mac

Fazm is the consumer-friendly side of the runtime described above. Install on macOS 13+, grant accessibility once, ask any question about the app you have open.

Download Fazm for macOS

Frequently Asked

The questions that came up most when researching the April 12-13 window.

Frequently asked questions

What AI models and open source releases shipped on April 12-13, 2026?

The 48-hour window included llama.cpp gaining Qwen3 audio support for omni and ASR models, OpenAI Codex CLI shipping the 0.121 alpha 4 build with Realtime V2 background-agent streaming, MemPalace continuing rapid development past 23,000 GitHub stars on cross-session LLM memory, NVIDIA announcing Ising (open AI models for quantum computing), and rolling activity around the Llama 4 Scout and Maverick releases, Qwen 3 family, Gemma 3n on-device variants, OLMo 2 32B from Allen AI, and the DeepSeek V3 paper detailing training internals most labs keep proprietary.

What did Fazm itself ship on April 12, 2026?

Fazm v2.2.1 shipped on April 12, 2026 with one line in the changelog: 'Fixed duplicate AI response appearing in pop-out and floating bar when sending follow-up messages.' This is the boring side of an AI release week. While the news cycle covered new foundation models, indie macOS developers shipped fixes that make existing models actually behave correctly inside a real product.

Why does the macOS accessibility permission keep breaking?

Starting in macOS Sequoia, the per-process TCC cache that backs AXIsProcessTrusted() can return stale data after a system update or app re-sign. The function returns false even though the user granted the permission, because the cache was never refreshed. Fazm works around this in AppState.swift line 308 with a comment that reads literally: 'AXIsProcessTrusted() can return stale data after macOS updates or app re-signs, so we also do a functional AX test to detect the broken state.' The workaround stack is four probes deep: AXIsProcessTrusted, then AXUIElementCopyAttributeValue against the focused window, then a Finder fallback to disambiguate, then a CGEvent.tapCreate listenOnly probe as final tie-breaker.

Can I run April 2026's open source models locally on my Mac?

Yes for inference. Llama.cpp and Ollama added day-of support for Qwen 3 audio, Llama 4 Scout (17B active parameters of a 109B MoE), and Gemma 3n's 4B-effective on-device variant. M-series Macs with 16GB RAM can run the smaller variants at usable speeds via Metal acceleration. The harder problem is wiring those local models into a tool-calling loop that actually controls macOS apps, which is what desktop assistants like Fazm exist to handle.

What is Realtime V2 background-agent streaming in OpenAI's Codex CLI 0.121 alpha 4?

Background-agent streaming lets the CLI hand off long-running tasks to an agent that streams partial output back to the terminal while continuing to run, so you keep typing instead of waiting for the agent to finish. The 0.121 alpha 4 build that shipped on April 13 wires this through the new Realtime V2 streaming protocol, which replaces the previous polling-based status checks. For desktop AI products like Fazm, the equivalent pattern is decoupling the chat loop from the tool-call execution loop so the user can keep talking while the agent finishes a click sequence.

How does Fazm get screen context if it does not use ScreenCaptureKit?

Fazm captures the active window via CGWindowListCopyWindowInfo and CGWindowListCreateImage, both of which are deprecated in macOS 14+. The choice is intentional. ScreenCaptureManager.swift line 28 carries the comment: 'CGWindowListCreateImage is deprecated in macOS 14+ but ScreenCaptureKit requires async setup and user prompts. This synchronous API still works and is intentional.' Captures are downscaled to a 1568px max dimension and JPEG-compressed under 3.5MB before being passed to the model. The active app PID is tracked via NSWorkspace notifications so the screenshot always lands on the focused window, not on Fazm itself.

Which models does Fazm actually ship with as of April 2026?

Three for chat: claude-haiku-4-5-20251001, claude-sonnet-4-6 as the default, and claude-opus-4-6. One for passive video session analysis: gemini-pro-latest, tracked in usage metrics as gemini-2.5-pro. The chat side is Anthropic because tool-calling reliability for accessibility-tree automation has not reached parity in open weights. The session-analysis side is Gemini because its multimodal video understanding lets the agent read 60+ minute screen recordings in one pass instead of frame-by-frame.

Where can I read the source for the four-method accessibility probe?

Fazm is open source. The probe stack lives in /Desktop/Sources/AppState.swift around lines 308 to 503 of the repo at github.com/mediar-ai/fazm. The relevant functions are checkAccessibilityPermission (line 310), testAccessibilityPermission (line 433), confirmAccessibilityBrokenViaFinder (line 468), and probeAccessibilityViaEventTap (line 490). Each one exists because the previous one fails on a specific real-world condition Apple introduced or never documented.

The 48-Hour Takeaway

April 12 and 13, 2026 will show up in benchmark charts as the window when Qwen 3 audio entered llama.cpp, Codex CLI got background-agent streaming, MemPalace crossed 23,000 stars, NVIDIA opened up Ising, and the DeepSeek V3 paper changed how people think about training-data mixing.

For people building consumer AI products on macOS, those same 48 hours included shipping a one-line bug fix at v2.2.1 and quietly relying on a four-method accessibility probe that exists because AXIsProcessTrusted() cannot be trusted on macOS Sequoia.

Both stories are real. The first one fills the news cycle. The second one is what makes the first one usable.

fazm.AI Computer Agent for macOS
© 2026 fazm. All rights reserved.

How did this page land for you?

Comments

Public and anonymous. No signup.

Loading…