The April 13-14, 2026 announcement no roundup covered, because it was merged, not posted.
Every SERP page for this query treats announcement as synonymous with blog post: a lab publishes, a summary cites the lab. Useful, but it undercounts the week by a lot. One open source AI agent quietly shipped 126 commits across those two days, no blog, under MIT license, at github.com/mediar-ai/fazm. And the architectural choice in those commits is exactly the thing loud announcements that week kept skipping: scope capture to the focused app window via NSWorkspace + AXUIElement, instead of OCR-ing the whole screen.
THE SERP, HONESTLY
What the top pages for this query ship
Run the query. Read the top ten. The shape repeats: a numbered list of lab names with the date of their April 13-14 post, a one-line quote from the post, a link out. All ten share the same editorial premise: announcements happen on stage. A project that merged 126 commits and never published a post on those dates is invisible to every page on the first SERP. That is the gap this guide fills, with a worked example.
| Feature | Typical announcement roundup | This page (announcement-as-code) |
|---|---|---|
| Definition of 'announcement' | Post, thread, release note | Merged code, tagged or not |
| Projects covered | 5 to 10 well-known labs | One deep, file and line cited |
| Architecture detail | None; just the headline claim | Capture chain, flags, sizes |
| Reproducibility | Trust the editor | One git log rebuilds the list |
| Coverage of the long tail | No | Yes, one sample at a time |
| OCR vs. AX distinction | Not mentioned | Central to the story |
THE ANCHOR FACT
App-scoped capture, not full-screen OCR
Most announcement-driven April 13-14 agents sit on top of some version of the computer-use pattern: the model sees a screenshot of a whole display (or a headless browser viewport), OCRs it, and drives. Fazm's April 13-14 commits commit to a different plumbing. Focus detection happens first, through the operating system; capture is narrowed to that focused window; the image is sent downsized enough that the upload never races the model's size limit. Every step is a real call in a real file.
THE CHAIN (file and line)
- FloatingControlBarWindow.swift:1030
NSWorkspace.didActivateApplicationNotificationobserver stashes the frontmost app's PID aslastActiveAppPID. - AppState.swift:439-441
AXUIElementCopyAttributeValue(appElement, kAXFocusedWindowAttribute, ...)probes the AX API, confirming the app is reachable and the permission is live. - ScreenCaptureManager.swift:14-44
CGWindowListCreateImage(.null, .optionIncludingWindow, windowID, ...)captures just that one window, not the display, with.boundsIgnoreFramingand.bestResolution. - ScreenCaptureManager.swift:106-178
downscaleto max 1568px longest edge, then JPEG quality stride0.7 → 0.3until the file is under3_500_000bytes. Headroom for the multi-image 2000px API cap.
THE PICTURE
Two system APIs, one merged capture
Accessibility tells you which app and which window. Core Graphics tells you the pixels for that specific window. Neither one on its own is enough. Pure-AX breaks on Qt, OpenGL, Electron apps that under-implement the tree. Pure-pixel gets every pixel on the display, including the wrong ones. The April 13-14 push hardens the merge.
Focus via AX, pixels via CG
OCR vs. AX
The default pattern vs. the April 13-14 pattern
This is what most reference agents do on a desktop, side by side with what Fazm's FloatingControlBarWindow actually runs. The difference is one flag on CGWindowListCreateImage, and a short detour through NSWorkspace.
// full display -> OCR -> pray
let image = CGDisplayCreateImage(
CGMainDisplayID()
)
// send the whole screen to the model:
// menu bar, dock, other apps, notifications,
// a second monitor. model OCRs and guesses.
sendToModel(image)VERIFY IT
The commit log, reconstructed from one command
The announcement-as-code premise of this page only works if the code is there. Here is the exact shell invocation that rebuilds the April 13-14 commit list against the live MIT repo. The output is what the rest of this page cites. Nothing on this page is editorial paraphrase.
THE TWO DAYS
What the commits actually did
The 126 commits cluster around two themes, both of which extend the app-scoped capture idea into new surfaces. April 13 makes the capture chain reentrant across concurrent chat sessions. April 14 adds attachments and history so a popped-out conversation keeps running against the app under your cursor, not Fazm's own window.
April 13-14, 2026
April 13, AM: per-session concurrency lands
Commit f25272f1 threads sessionKey through ChatProvider, ACPBridge, and the acp-bridge TypeScript daemon. The old single isSending flag becomes sendingSessionKeys: Set<String> (ChatProvider.swift, around line 354). Without this, two chat windows running at once would race on the same flag and one would overwrite the other's response stream.
April 13, PM: per-window lastActiveAppPID
FloatingControlBarWindow gains its own workspace observer (lines 1030-1046) so each detached window tracks the last app that was frontmost when that window was active, not a global value. This is what lets the app-scoped capture stay correct when multiple chats are open.
April 14, AM: drop and paste attachments
Commit fdf93314 wires drag-and-drop on the floating control bar. 66634116 prevents binary files from being slurped into memory; 21ce1fb3 adds size limits and a path fallback for large blobs. 181db6c7 and f3a59ea8 fix the dropped-item parser to handle both URL and Data payloads from NSItemProvider.
April 14, PM: history + session resume
788dfe96 writes the ACP session ID into the conversation history row; 08096a0a does the same for detached chat. ba58c841 adds loadSessionId(). Clicking an old conversation in the settings sidebar now resumes the ACP session rather than starting a fresh agent. This is the first time Fazm's history is a first-class resume surface, not a log.
“66 commits on 2026-04-13. 60 commits on 2026-04-14. 126 total, under MIT, with zero accompanying blog posts. The announcement was the diff.”
git log on github.com/mediar-ai/fazm
THE PROBE
60 lines that most agents skip
The AX permission on macOS 26 (Tahoe) is unreliable in a specific way: the per-process cache behind AXIsProcessTrusted() can return the wrong answer after a TCC change until the next launch. Fazm's AppState.swift does not just call AXIsProcessTrusted. It runs three probes and crosses them: a real AX call, a fallback AX call against Finder, and a CGEvent tap as a tie-breaker that reads the live TCC database. That is what lets the app-scoped capture path trust its AX check.
THE NUMBERS
The two-day window, quantified
commits on 2026-04-13, spine commit f25272f1
commits on 2026-04-14, attachments and history resume
approximate lines of AX-permission probe logic no roundup mentions
WHAT OTHER ANNOUNCEMENTS MISS
Five architectural details the SERP doesn't surface
App selection is done by the OS, not the model
NSWorkspace.frontmostApplication gives a deterministic PID. The model never has to guess which app the user meant. That removes a whole class of agent error before the first token is generated.
The AX call is a liveness probe
testAccessibilityPermission() does a real AXUIElementCopyAttributeValue, not just AXIsProcessTrusted. It catches stale TCC state on macOS 26.
cannotComplete is treated as ambiguous
Because Qt/Electron under-implement AX, a single failure is not proof the permission is broken. Finder retry and CGEvent tap disambiguate.
Capture flags are not the defaults
.optionIncludingWindow + a specific windowID is the difference between 'this one app' and 'everything on screen.' .boundsIgnoreFraming includes the window shadow so the model gets spatial context, not a rectangular crop.
Downscale is adaptive, not fixed
1568px longest edge is only the first cut. JPEG quality strides from 0.7 down to 0.3 until the file is under 3.5 MB. That means a text-heavy window ships near lossless, and a busy photo window still clears the 5 MB base64 ceiling on the upload API.
Each detached window has its own PID watcher
Per-window lastActiveAppPID is why two concurrent chats don't race. April 13's push.
WHO IS FAZM FOR
Consumer app, not developer framework
One more thing the "announcement roundups" mostly miss: most of what they cite is a framework, SDK, or hosted model drop. Useful for builders, irrelevant for the Mac user who just wants an AI that sees what they're looking at. Fazm ships as a signed DMG. Download. Drag. Grant two permissions. The AX probe and the app-scoped capture described above are what actually runs the moment you hit the floating bar.
Want to see the app-scoped capture path live?
Book 15 minutes. I'll share a screen, open Fazm, and walk the NSWorkspace + AXUIElement chain against whatever app you pick. No slides, just the capture in your own session.
Book a call →Questions about the April 13-14 window
What counts as an 'open source AI announcement' on April 13-14, 2026?
In the SERP sense, the word is used loosely. The roundups pile three distinct things under it: product launch posts (blogs, X threads), changelogs (tagged releases, CHANGELOG entries), and model drops (weights on Hugging Face, new cards). All three are announcements-as-copy. What they miss is announcements-as-code: the much larger population of projects that shipped meaningful work those two days via plain commits, with no accompanying post. Fazm (github.com/mediar-ai/fazm) is one of those projects. It pushed 126 commits across April 13 and April 14, 2026. No blog post went out. The architectural statement is the diff itself.
What is the specific architectural statement Fazm shipped on April 13-14?
That an AI agent on a Mac does not need to OCR the whole screen to know what the user is looking at. The focused app already exposes its structure through the operating system, and capture can be scoped to that window. Concretely: NSWorkspace tells you which app is frontmost, AXUIElement confirms the focused window is reachable, and CGWindowListCreateImage captures only that window (not the whole display). The April 14 push extended this to detached chat windows and file attachments so the same app-scoped context flows through popouts. This is the opposite of the default computer-use agent assumption, which is that the display is a black box and the only API is pixels.
Where exactly does this live in the code?
Three files. First, Sources/FloatingControlBar/FloatingControlBarWindow.swift lines 1030-1073: a workspace observer on NSWorkspace.didActivateApplicationNotification stashes lastActiveAppPID, and captureScreenshotEarly() dispatches to ScreenCaptureManager.captureAppWindow(pid:) before Fazm's own floating bar can cover the app underneath. Second, Sources/FloatingControlBar/ScreenCaptureManager.swift lines 14-44: CGWindowListCreateImage is called with .optionIncludingWindow and a specific windowID, not .optionOnScreenOnly with kCGNullWindowID. That flag is the difference between 'one app' and 'everything'. Third, Sources/AppState.swift lines 433-485: testAccessibilityPermission() calls AXUIElementCopyAttributeValue with kAXFocusedWindowAttribute as a live probe of the AX API, and confirmAccessibilityBrokenViaFinder() disambiguates cannotComplete errors by retrying against Finder.
How is that different from 'just take a screenshot'?
A screenshot of the whole display gives the model a pile of pixels with no structure: a menu bar, a dock, the wrong app's window because the user happened to be hovering it, background apps, notifications, a second monitor. The model then OCRs, guesses, and hallucinates. Fazm's chain trims the problem before the image is even taken. NSWorkspace.frontmostApplication gives a PID. AXUIElementCreateApplication(pid) proves the app is AX-reachable. CGWindowListCopyWindowInfo with a PID filter picks the main window for that PID (largest area, skipping toolbars and status items). CGWindowListCreateImage captures just that window with .boundsIgnoreFraming so the shadow comes along for spatial context. The model sees one focused thing.
So it still uses pixels. Is it really 'accessibility API based'?
It uses both, and each for its strength. The Accessibility API selects and probes. CGWindowList captures and downscales. In the capture chain, AX is the router; the pixels are the payload. The product claim is that this is better than OCR-first, not that it never uses pixels. Pure-AX agents exist; they are brittle against apps that under-implement the AX tree (Qt, OpenGL, Electron). Pure-pixel agents exist; they drown in screen chrome. The April 13-14 commits double down on the hybrid: app selection via AX, window capture via CG, then a tight downscale-and-compress path (max 1568px longest edge, JPEG quality stride 0.7 down to 0.3) to keep the uploaded image under 3.5 MB so the downstream model never rejects it for size.
What happens when the Accessibility API cannot give a clean answer?
The code treats 'cannotComplete' as ambiguous and runs a secondary probe. testAccessibilityPermission() (AppState.swift line 433) hits AXUIElementCopyAttributeValue on the frontmost app; if the result is .apiDisabled, the AX permission is stuck and the user is routed to the onboarding path. If the result is .cannotComplete, confirmAccessibilityBrokenViaFinder() (line 468) re-runs the same call against Finder; Finder is AX-reliable, so a second failure means the permission truly is broken, while a Finder success means the original app was AX-hostile and the permission is fine. If Finder is not running, probeAccessibilityViaEventTap() (line 490) falls back to creating a CGEvent tap, which checks the live TCC database on macOS 26 (Tahoe) where AXIsProcessTrusted() can go stale. This is 60 lines of careful disambiguation that most agents skip.
Why was this specific work clustered on April 13-14?
Because April 13 was the 'per-session concurrency' push (commit f25272f1 threads a sessionKey through ChatProvider and the ACP bridge), and the moment detached chat windows can run concurrently, app-scoped capture has to follow each window's own last-active-app instead of a single global. April 14 then extends the same chain: the FloatingControlBar gains drag-and-drop attachments (commit fdf93314), detached windows propagate attachments (2048aa36), and the pasted or dropped file travels alongside the app-scoped window capture as part of the same context payload. It is a two-day arc from single-session to multi-session, capture plus attachments, all using the same app-scoped idea.
Can I verify the commit count and dates myself?
Yes. git clone https://github.com/mediar-ai/fazm.git, then git log --pretty=format:"%h %ad %s" --date=short --after="2026-04-13 00:00:00" --before="2026-04-14 23:59:59" | wc -l returns 126. Split by day: 66 commits on 2026-04-13 and 60 on 2026-04-14. Every hash referenced on this page (f25272f1, fdf93314, 2048aa36, 788dfe96, 0845f95f, 66634116) shows up in that list, and every file:line citation (FloatingControlBarWindow.swift:1030-1073, ScreenCaptureManager.swift:14-44, AppState.swift:433-485) resolves in a checkout of the same repo.
Was anything formally announced or tagged on April 13-14?
No. There was no blog post, no X thread, no Hugging Face drop, no GitHub release tag for those dates. The CHANGELOG updates are the code itself. That is deliberate: for a tool this small, a post-per-push would be noise. The argument this page makes is that 'announcement' in the keyword should not be read as 'press release.' Code merging to main under MIT is already an announcement, it just has a narrower audience. Treating merged code as a first-class announcement is how roundups miss most of the actual motion in an ecosystem.
How do I try the app and see the architecture for myself?
Download the signed DMG from fazm.ai. Drag to Applications. Open. Grant Accessibility permission when prompted (that is what enables the AX probe path described above). Then grant Screen Recording (that is what lets CGWindowListCreateImage actually return pixels rather than .permissionDenied). Open any app, click Fazm's floating bar, and ask a question about what's in that window. The capture you are sending is scoped to that app's window only. Pop the chat out; the context still follows the app under your cursor, not Fazm's own window. Drop a PDF onto the floating bar; the attachment rides along with the capture. That whole flow is the April 13-14 shape.