An open source AI agent framework that is already a product
Search results for this keyword are mostly libraries: LangGraph, CrewAI, AutoGen, Mastra, VoltAgent, Microsoft Agent Framework. You install them, you write agent code, you host the runtime. Fazm sits in a different slot. It is MIT-licensed, open at github.com/mediar-ai/fazm, and it also happens to be a signed macOS app that reads your screen through the accessibility API and drives five MCP servers out of the box.
The gap on this SERP
Every result on page one of this search is a developer library. You read their docs, you pip install, you write an agent class, you plug in an LLM key, you set up logging, you build a UI on top, you deploy. Reading the code and building a product from it are two different projects.
Fazm inverts the order. The GUI and the framework were written in the same repo, so the framework decisions (which protocol the agent speaks, which tools it gets, how prompts are structured, how permission flows look) were made once and are visible end to end. If you fork, you get the library and the working app. If you only want the app, you never have to look at the framework. Both modes are first class.
How the framework is wired at runtime
The anchor fact: it reads your screen through accessibility, not screenshots
This is the piece of the framework that nothing else on this SERP talks about, because most of them do not ship a client at all. When Fazm needs to know what is on your screen, it does not take a PNG and send it to a vision model. It calls into the macOS accessibility APIs directly. Here is the exact shape of the call, lifted from Desktop/Sources/AppState.swift around line 439:
Note the handling of cannotComplete. That is not cosmetic. Some macOS apps (PyMOL, certain Qt windows, OpenGL canvases) do not implement AX at all. A naive agent framework would conclude its permissions are broken and start a support loop with the user. Fazm falls back to probing Finder, and if that also fails, probes a CGEvent tap to check the live TCC database. Those three tiers of retry are what lets the AX-first strategy actually work against real-world apps.
The five MCP servers, exactly as wired
Every agent framework has a tool layer. In most of them it is whatever you wire up yourself. In Fazm the tool layer is frozen in source: a single line in the ACP bridge decides which MCPs are considered built-in, and the app bundle ships the binaries next to the Swift app.
fazm_tools
Internal tools the agent uses for framework-level actions: state getters, workspace switching, follow-up routing. Implemented in fazm-tools-http.ts and fazm-tools-stdio.ts inside acp-bridge.
playwright
The official Playwright MCP package, but pinned to your real Chrome window through a Chrome Web Store extension (Playwright MCP Bridge) so logged-in sessions, cookies, and saved passwords come along for the ride.
macos-use
Native Mach-O binary mcp-server-macos-use. Speaks MCP over stdio and exposes the accessibility tree of whatever app is frontmost. This is the AX reader the hero section shows.
Controls the WhatsApp macOS Catalyst app via accessibility, not the web interface. Sends and reads messages without leaving the local machine.
google-workspace
Python MCP server for Gmail, Drive, Docs, Sheets, and Calendar. Shipped inside the bundle so you never run pip install to get it working.
Release 2.4.0 on 2026-04-20 added custom MCP servers through ~/.fazm/mcp-servers.json, so this list is the floor, not the ceiling. Anything you can reach with MCP you can hand to the same agent.
What the .app actually looks like on disk
One of the clearest ways to see that the framework is real is to unzip the signed build and look at what sits next to the Swift binary. The paths below are verifiable with a local ls against a mounted Fazm.app.
Numbers read directly from acp-bridge/src/index.ts:1266, the Desktop/Sources/BundledSkills/ directory, and the repo README.
How a message flows through the framework
If you read the repo top-down, every message goes through the same seven stages. The boundaries between stages are real files, not conceptual diagrams, so each one is a place you can fork, patch, or replace.
User input
Voice is captured in FloatingControlBar/PushToTalkManager.swift or typed into AskAIInputView.swift. Voice is transcribed locally via TranscriptionService.swift.
Context gather
AppState.swift computes the frontmost process, its focused window, and optional file context. Screen state is pulled via AXUIElement calls, not pixels.
Prompt assembly
Chat/ChatPrompts.swift injects user name, timezone, bundled python path, goals, tasks, and AI profile into the desktopChat template.
ACP session
Chat/ACPBridge.swift spawns the acp-bridge Node process per session and proxies JSON-RPC messages to the Claude Code agent (protocol v0.29.2 as of 2.4.0).
Tool dispatch
The bridge decides which of the five built-in MCP servers to call; any extra servers from ~/.fazm/mcp-servers.json are also available. ChatToolExecutor.swift renders progress.
Native action
macos-use moves a window, playwright clicks a button in your real Chrome, google-workspace sends a doc edit. All actions originate on your machine.
Streamed reply
FloatingControlBar/AIResponseView.swift renders the streamed answer; optional voice output via the TTS toggle. SessionRecordingManager.swift can log the run for debugging.
Dev framework vs. ready-to-run framework
Everything on this SERP sits somewhere on this axis. Most rows are not about quality; they are about shape.
| Feature | Dev-first frameworks (LangGraph, CrewAI, AutoGen, Mastra, VoltAgent, MS Agent Framework) | Fazm |
|---|---|---|
| License | MIT or Apache 2.0, all dev libs | MIT, single repo, no carve-outs |
| What you install | pip / npm / dotnet add package | Signed, notarized macOS .app (DMG) |
| Who builds the UI | You do. The framework ships no client. | SwiftUI floating bar + chat, already in the repo |
| Agent protocol | Usually custom per framework | ACP (Agent Client Protocol v0.29.2) |
| Tool layer | BYO. Wrap your own APIs. | 5 MCP servers bundled in the .app, custom MCPs via ~/.fazm/mcp-servers.json |
| Reading the user's screen | Not in scope; framework doesn't see a screen | AXUIElement calls (structured accessibility tree), screenshot fallback only when needed |
| Runs natively on | Any host you deploy to | macOS 14+ Apple Silicon / Intel |
| Forkability | High (it is a library) | High. Swift + Rust + TypeScript all in the same MIT repo |
| Can the end user run it without writing code | No | Yes, double-click the .app |
What you get if you clone it
This is the part the keyword never spells out. “Open source” should mean you can run the thing. Here is what actually lands on disk when you git clone github.com/mediar-ai/fazm.
In the repo today
- Desktop/ - full SwiftUI macOS app: floating control bar, chat UI, onboarding, paywall, settings, shortcuts, push-to-talk, session recording.
- Backend/ - Rust service (Cargo.toml, Dockerfile, run.sh) that handles server-side tasks.
- acp-bridge/ - TypeScript ACP bridge: fazm-tools-http.ts, fazm-tools-stdio.ts, protocol.ts, ws-relay.ts, oauth-flow.ts, patched-acp-entry.mjs.
- Desktop/Sources/BundledSkills/ - seventeen .skill.md files loaded by the agent at runtime; add your own by dropping more markdown here.
- Desktop/Sources/Chat/ChatPrompts.swift - the actual system prompts, including desktopChat and the onboarding chat templates.
- run.sh / reset-and-run.sh - build the Swift app and launch locally, optionally with onboarding + permissions + UserDefaults reset.
- CHANGELOG.json - machine-readable release notes, useful if you want to track framework-level changes across versions.
Each of these has a direct code path in the repo, not a 'coming soon' note.
Apps the bundled agent can drive today
Chrome
Playwright MCP Bridge extension, real session
Finder
AXUIElement navigation, confirmed in tests
Native Catalyst app via whatsapp-mcp
Gmail
google-workspace MCP
Docs
google-workspace MCP
Sheets
google-workspace MCP
Calendar
google-workspace MCP
Any AX app
macos-use reads any accessible macOS app
“Control browser, write code, handle documents, operate Google Apps, and learn your workflow - all from your voice. Free to start. Fully open source. Fully local.”
Fazm README.md, root of the repo
Where this framework will disappoint you
If you want a pure library to embed into a backend service, Fazm is the wrong shape. It is built around a native macOS runtime, the code path assumes there is a user at a keyboard with AX permission granted, and the packaging story is a signed .app, not a pip wheel. For a pure library use case, LangGraph, CrewAI, Mastra, VoltAgent, or Microsoft Agent Framework are better fits.
Fazm shines when the agent needs to reach into software the user is already logged into (their Chrome session, their Gmail, their WhatsApp, their local files) and does not want to duplicate a second identity stack for the agent. That is the slot on the SERP that no one else is filling right now.
If you are shopping for an open source agent framework
Pick the shape first, then the name. If you need a library to drop into a backend, you want LangGraph-class projects. If you need a 000% MIT-licensed agent that ships as a Mac app and already knows how to read native UI through accessibility, Fazm is the one result on this SERP that matches the description.
Either way, the test is the same: does the thing you install actually do what the keyword promised? Clone the repo, run ./run.sh, grant accessibility permission once, and the rest is already on disk.
Want to see the framework driving your real Mac?
Fifteen minutes, shared screen. We point it at your Chrome, your Gmail, and a macOS app of your choice, and you watch the AX tree do the work.
Book a call →Frequently asked questions
Is Fazm actually open source, and under what license?
Yes. The full source tree is at github.com/mediar-ai/fazm under the MIT license. The README at the root of the repo ends with the line 'MIT'. That covers the Swift desktop app under Desktop/, the Rust backend under Backend/, the TypeScript ACP bridge under acp-bridge/, the bundled skills under Desktop/Sources/BundledSkills/, and the installer assets. There is no dual-license carve-out and no server-side license variant.
What makes this different from LangGraph, CrewAI, AutoGen, or Microsoft Agent Framework?
Those are developer frameworks: you pip install or npm install them, you wire up your own LLM calls, you write the prompts, you host the runtime, you build the tool layer, and you ship a product on top. Fazm is an agent framework that is already a product. The same repo contains the prompts (Desktop/Sources/Chat/ChatPrompts.swift), the ACP bridge that speaks to Claude (acp-bridge/), the MCP servers for macOS automation (mcp-server-macos-use), browser control (Playwright MCP), Google Workspace, and WhatsApp, plus a signed and notarized macOS .app you can download and run today. You can either use it out of the box, or fork it and rewire any layer.
Why does Fazm read the screen through macOS accessibility APIs instead of taking screenshots?
Because the accessibility tree is structured data and a screenshot is a bag of pixels. AXUIElement, kAXRoleAttribute, kAXFocusedWindowAttribute, and friends expose the exact text, button labels, window hierarchy, and element bounds that the app reports to VoiceOver and other assistive tech. That is cheaper to pass to an LLM, more accurate, survives dark mode and high-DPI scaling, and avoids the fragility of OCR. Fazm still captures screenshots when a user asks for visual reasoning about something pixel-only (a PDF with images, a design canvas), but the primary read path is AX, and the bundled mcp-server-macos-use binary at Contents/MacOS/mcp-server-macos-use is what exposes that read path to the agent.
Which MCP servers are shipped inside the app bundle?
Five, hardcoded in acp-bridge/src/index.ts around line 1266 in the constant BUILTIN_MCP_NAMES. They are fazm_tools (the framework's own internal tools), playwright (browser control via the Playwright MCP Bridge extension), macos-use (the native accessibility-based automation binary), whatsapp (native WhatsApp Catalyst app control), and google-workspace (a bundled Python MCP server). You can add your own on top of these through ~/.fazm/mcp-servers.json, which was added in release 2.4.0 on 2026-04-20.
How does Fazm talk to Claude? Is there a custom protocol?
No. Fazm uses ACP, the Agent Client Protocol, which is the same protocol Zed's agent panel speaks. The repo's acp-bridge/ folder contains a TypeScript process that launches per chat session, proxies JSON-RPC messages between the Swift app and the Claude Code agent, and exposes the bundled MCP servers through it. ChatToolExecutor.swift in the Desktop source dispatches tool calls. Release 2.4.0 notes list 'Upgraded Claude agent protocol to v0.29.2', so ACP is the seam.
Can I run Fazm with a different model than Claude?
The shipping build uses Claude because ACP + Claude Code are what the bridge is built around, and release 2.4.0 added dynamic model discovery so new Claude model versions appear without an app update. The 'built-in' mode (managed agent) is Claude only. Because the source is MIT and the ACP bridge is a separate TypeScript process, swapping in a different ACP-speaking agent is a fork-level change, not a setting.
Is this Mac only? What about Linux and Windows?
Mac only today. The desktop app is SwiftUI, the accessibility reader is built on AXUIElement (Core Foundation), and the bundled macOS-use MCP binary is a native Mach-O executable. A Linux or Windows port would need a new accessibility reader (AT-SPI on Linux, UI Automation on Windows) and a different packaging path. The framework shape, ACP bridge plus MCP servers plus system prompts, would carry over.
How many bundled skills ship inside the app?
Seventeen Claude Code skills, stored as .skill.md files under Desktop/Sources/BundledSkills/. They cover ai-browser-profile, canvas-design, deep-research, doc-coauthoring, docx, find-skills, frontend-design, google-workspace-setup, pdf, pptx, social-autoposter, social-autoposter-setup, telegram, travel-planner, video-edit, web-scraping, and xlsx. These are plain markdown prompts the agent loads on demand, so a fork can add or strip skills by editing that directory.