Best open source AI agents, grouped by what they actually touch
Every page-one result for this query ranks agents by stars or downloads and treats "open source AI agent" as a synonym for "Python framework you wrap an LLM with." That skips the only decision a reader is trying to make: which agent can do the thing I need on my screen. This guide groups the best open source AI agents in 2026 into five surfaces, fills in the category every listicle misses (native desktop apps via accessibility APIs), and points at the exact lines of code where the category is wired up.
The gap on this SERP
I read the top ten results for "best open source AI agents" in April 2026. Nine of them are framework listicles: LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Google ADK, Dify, Mastra, sometimes Aider and Cline, sometimes AutoGPT for nostalgia. One is a GitHub awesome-list. They rank by stars, downloads, or enterprise adoption, and they answer the question "which Python library has the biggest community."
The question almost nobody in those lists answers is "which open source agent can click a button inside Calendar, or send a WhatsApp message, or rename a Finder file, without me writing Python." That category exists. It is dominated by MCP servers that wrap the macOS accessibility APIs (AXUIElement, kAXFocusedWindowAttribute, kAXRoleAttribute). Those servers live on GitHub, they are open source, and they just do not show up when a listicle filters by "agent framework."
The rest of this page is that list, grouped by surface, with the missing category put back in its place.
Five surfaces, five categories of open source AI agent
Category 1: text surface
The agent reads prompts and returns text. No file writes, no UI clicks. You build the app around it.
This is the largest category and the one every listicle leads with. If your task is research, summarization, planning, or any form of "give me a better answer than a single LLM call," you want one of these.
LangGraph
Stateful graph-based agent framework inside the LangChain ecosystem. Strongest in enterprise deployments, 34.5M monthly downloads as of early 2026. Pick when you care about controllability, observability, and long-running multi-step workflows.
OpenAI Agents SDK
OpenAI's successor to the Assistants API, open sourced in 2025. Handoffs, guardrails, tracing. Pick when you are locked into the OpenAI stack and want first-party tool-calling semantics.
CrewAI
Role-based multi-agent orchestration. Each 'agent' has a backstory, tools, and a goal, and a 'crew' runs them together. Pick for content pipelines where you want the roles named explicitly.
AutoGen
Microsoft Research's conversational multi-agent framework. Pairs well with AgentChat on top. Pick when you want two or more agents to talk to each other and you want the dialog visible.
Google ADK
Google Agent Development Kit, released in 2025. Python and Java bindings, native Vertex AI integration. Pick if you are on GCP and want to run agents next to Gemini without bespoke glue.
Mastra
TypeScript-first agent framework with native workflows, RAG, and evals. Pick when your stack is Node, your team writes TS, and you do not want to context-switch to Python.
Category 2: code surface
The agent edits source files, runs a terminal, reads a repo. It lives inside a code editor or a shell.
This is the second-most-covered category on the SERP, and the one where "open source" actually matters on a day-to-day basis. You want the diffs visible, the command approvals visible, and the model swap trivial.
Aider
Terminal-first coding agent with deep git integration. Multi-file edits, repo-map, any LLM. Pick when you want to drive an LLM from your shell and land commits, not chat windows.
Cline
VS Code extension. Plans multi-step tasks, edits files, runs terminal commands, confirms each step. The strongest choice for users who want the agent inside their editor, not outside it.
Continue
Open source autopilot for JetBrains and VS Code. Custom commands, context providers, local-model support. Pick when you already use Continue as autocomplete and want the agent in the same extension.
OpenHands
Full-repo coding agent with a sandboxed browser and editor. More heavyweight than Aider; pick for benchmark-shaped tasks (SWE-bench, HumanEval) where you want a boxed-in environment.
Kilo Code
Fork of Roo Code with multi-model routing. Pick for teams that want a Cline-like experience but with model A/B routing baked into the extension.
Claude Code (via ACP)
Anthropic's coding agent, open protocol, runs locally, speaks ACP. Not a listicle favorite because it is Claude-only, but the protocol itself is open, which is what Fazm, Zed, and a growing number of clients embed.
Category 3: browser surface
The agent opens a browser, navigates a DOM, clicks, fills forms, extracts text. Playwright or Chromium under the hood.
Every serious open source agent eventually picks up a browser. What varies is whether the browser is a fresh Chromium in a sandbox, your real Chrome with your real cookies, or a CDP connection to whatever already runs.
browser-use
Python library that wraps Playwright in an agent loop: the agent sees a DOM-derived structured snapshot, picks an element, clicks. Strongest open source browser agent by raw pull: clean abstractions, no vendor lock-in.
Playwright MCP
Microsoft's MCP server for Playwright. Exposes a browser surface to any MCP-speaking client. This is what Fazm ships at runtime in the 'playwright' entry of BUILTIN_MCP_NAMES.
Stagehand
Higher-level TypeScript library from Browserbase. Mixes natural-language instructions with deterministic Playwright calls. Pick when you want the LLM to handle fuzzy steps and the code to handle asserts.
Skyvern
Self-hosted workflow runner built on Playwright plus vision. Good for repeatable automations like invoice extraction; less useful for open-ended chat.
LaVague
Open source 'large action model' framework focused on browser tasks. Fits the same slot as browser-use but prefers a retrieval-augmented action pipeline.
playwright-extension (Chrome)
The pattern, not a project: MCP bridge that attaches to the user's real running Chrome so saved logins and cookies are already present. Fazm uses this pattern so Cloudflare Turnstile and OAuth consent flows pass through cleanly.
Category 4: native desktop via accessibility APIs
The agent opens a real app (Calendar, Messages, Finder, Notes, Figma, anything) and clicks or types inside it through the macOS accessibility tree.
This is the category every listicle skips. The projects below all exist, are on GitHub, are open source, and none of them are on a single "best open source AI agents" listicle I could find on page one of Google in April 2026. They do not fit the "Python framework" shape, so they get filtered out.
Anchor fact
Fazm hardcodes five bundled MCP servers in acp-bridge/src/index.ts line 1266, and ships mcp-server-macos-use as a native Mach-O binary at Contents/MacOS/mcp-server-macos-use inside the signed .app. The read path that binary uses, AXUIElementCreateApplication + kAXFocusedWindowAttribute, is visible in Desktop/Sources/AppState.swift line 441.
macos-use (mediar-ai)
MCP server in Swift that drives any macOS app through AXUIElement. Open source at github.com/mediar-ai/mcp-server-macos-use. This is the binary Fazm ships bundled; you can also wire it into Claude Desktop, Cline, or Zed directly.
MacOS-MCP (CursorTouch)
Lightweight MCP server for computer use on macOS. Covers file navigation, app control, UI interaction, and browser automation. Strong alternative when you want a minimal footprint.
macos-automator-mcp
steipete's server that runs AppleScript and JXA from an MCP client. Different slot: when you want to drive scripted Apple APIs (Mail rules, Music, Shortcuts) rather than hit buttons through AX.
mcp-remote-macos-use
Full remote Mac control via MCP, no extra API key. Useful when you want one agent to drive a Mac that is not the one you are currently on.
iMCP
mattt's macOS app that exposes Messages, Contacts, Reminders, and other first-party data as an MCP server. Narrower than macos-use; pick when you want Apple-ecosystem data rather than click-level control.
AutoMac MCP
Full-stack UI automation MCP server with mouse, keyboard, screen, and window control. Closer to a macro engine; good for deterministic scripts driven by an LLM.
Why this category needs the accessibility tree, not screenshots
A screenshot pipeline looks elegant in a demo. A vision model reads pixels, emits coordinates, the agent clicks. The problem arrives on the second run, when the user is in dark mode, or resized the window, or has a different system font, or moved a toolbar. Pixel coordinates are brittle.
The accessibility tree is not. It is the same hierarchical data structure VoiceOver reads to announce UI aloud: every button has a role, a title, a frame, a parent, a focused state. Agents that read this tree ask for an element by role and title, not by pixel offset, so they keep working when the UI moves.
Fazm probes the tree on the frontmost application to verify accessibility permission is not stale. The same AXUIElement + attribute lookup is what macos-use uses on every tool call.
Category 5: general runtimes
The agent is a self-contained runtime: it picks goals, executes shell commands, writes files, iterates. Surface is "whatever the runtime decides to touch."
This is the oldest open source agent category (AutoGPT kicked it off in 2023). It has matured: Open Interpreter runs code locally, Dify ships a full low-code platform, and AutoGPT itself is still maintained. Use this category when the task does not fit any single surface above.
Open Interpreter
Executes Python, bash, and browser commands directly on your machine. Local-first, conversational, still the cleanest general runtime for power users.
AutoGPT
The original 'agent picks its own subtasks' runtime. Kept maintained into 2026. Better as a reference implementation than as a daily driver.
Dify
Low-code platform for building LLM apps and agents. 129.8k GitHub stars as of early 2026; pick when you want a UI builder and a backend, not a library.
BabyAGI
Task-list-driven agent. Simple enough to read in a single sitting; useful as a starting point for a fork.
Flowise
Drag-and-drop LLM and agent builder. No-code. Pick when you want to hand a non-engineer a way to prototype agents.
AGiXT
Self-hosted agent platform with plugins and memory. Fits between Open Interpreter and Dify in weight.
The desktop category, running live under Fazm
Picking the right one: a decision tree
Do you want to write code?
If yes and you want the agent in your terminal, pick Aider. If yes and you want it in VS Code, pick Cline or Continue. If no, skip this branch entirely.
Do you want text answers, not actions?
Pick a text-surface framework (LangGraph if stateful, CrewAI if multi-role, AutoGen if conversational, Mastra if you write TypeScript). Use it to build your own app.
Do you want to automate a website?
Pick browser-use for Python, Stagehand for TypeScript, or Playwright MCP if you want it wired into an MCP-speaking client (Claude Desktop, Cline, Zed, Fazm).
Do you want to automate a native Mac app?
This is the category every listicle skips. Pick macos-use as the MCP server. Run it standalone against Claude Desktop, or run Fazm to get it bundled with ACP, Claude, accessibility permission plumbing, and 17 preloaded skills.
Do you want the agent to pick its own goals?
Open Interpreter for local-first power use, Dify for a UI-driven platform, AutoGPT if you want to fork the reference implementation.
Desktop category vs framework category
Same keyword, different category, wildly different user experience.
| Feature | LangGraph / CrewAI / AutoGen | macos-use + Fazm |
|---|---|---|
| Install step | pip install, set API key, write agent code | Download signed .app, grant accessibility, done |
| Target surface | Whatever you build the tool loop for | Any macOS app that exposes AX (effectively all of them) |
| Click inside Calendar or Messages | No (you ship the integration yourself) | Yes (works out of the box via the accessibility tree) |
| Ship to non-technical user | Requires you to build the UI and runtime | Already a signed consumer app |
| License | Apache 2, MIT, varies | MIT (github.com/mediar-ai/fazm, github.com/mediar-ai/mcp-server-macos-use) |
| Swap in another LLM | Yes, provider-agnostic | Claude by default (ACP seam); fork to change |
| Typical user | Engineer building a product | End user who wants their Mac automated |
Projects mentioned in this guide
What to check before picking an open source agent
- Which surface does it touch (text, code, browser, desktop, general)
- Does it have a permission prompt you can audit
- Is the protocol it speaks open (MCP, ACP, bespoke)
- Can you swap the underlying model without a fork
- Is the read path structured (AX tree, DOM) or pixel-based
- Does it ship as a signed binary, or do you build it
- Where does user data go (local disk, hosted API, both)
“MCP servers bundled inside the signed Fazm .app before the user adds any of their own.”
acp-bridge/src/index.ts, BUILTIN_MCP_NAMES (line 1266)
Every one of these is also usable standalone with any MCP-speaking client.
MCP servers shipped bundled in Fazm
fazm_tools
Internal tool layer dispatched by ChatToolExecutor.swift (execute_sql, capture_screenshot, ask_followup, skills).
playwright
Browser control via Playwright MCP, reused via the real-Chrome extension pattern.
macos-use
Native accessibility-tree automation for any Mac app. Open at github.com/mediar-ai/mcp-server-macos-use.
Controls the WhatsApp Catalyst app through macOS accessibility APIs (search, open chat, send message).
google-workspace
Gmail, Calendar, Drive, Docs through a bundled Python MCP server with its own credentials dir.
~/.fazm/mcp-servers.json
Any additional MCP servers you register, shipped as of release 2.4.0 on 2026-04-20.
So which is the best open source AI agent
There is no single answer and every listicle that pretends otherwise is optimizing for a ranking rather than a reader. The best open source AI agent is whichever one already speaks the surface your task lives on. LangGraph for text-shaped work. Aider or Cline for code. browser-use for the web. macos-use for any Mac app. Open Interpreter when the task crosses all four.
If you want one signed consumer app that speaks the last four surfaces together, Fazm is the packaging of that idea. If you want the open source pieces without the packaging, every component in this guide is on GitHub today.
The one thing to walk away with: read the 0 line of acp-bridge/src/index.ts in any agent you evaluate. If there is no equivalent, no hardcoded list of tools the agent can actually touch, you are looking at a library, not an agent.
Talk through which surface your agent needs
Fifteen minutes with the team that ships macos-use and Fazm. We will point at the right open source starting point for your use case, whether you are writing your own framework or forking ours.
Book a call →Frequently asked questions
Why group open source AI agents by surface instead of by GitHub stars?
Because stars measure attention, not capability. A 100k-star framework like LangChain does not click buttons in Calendar on its own, and a 2k-star project like macos-use can drive any Mac app through the accessibility tree. Those two things solve completely different problems. Ranking them on the same list, sorted by stars, buries the decision a reader is actually trying to make: what do I want this agent to touch. Group first by surface (text, code, browser, native desktop, general runtime), then pick the strongest projects inside each surface.
What is the 'native desktop via accessibility APIs' category, and which open source projects live in it?
An agent in this category opens a real app on your Mac (Finder, Calendar, Messages, Notes, Xcode, Figma, Notion, anything), reads its accessibility tree with AXUIElement calls, and clicks, types, or scrolls elements the same way VoiceOver would announce them. The open source projects that do this in 2026 are macos-use (github.com/mediar-ai/mcp-server-macos-use), MacOS-MCP by CursorTouch, macos-automator-mcp by steipete, mcp-remote-macos-use by baryhuang, iMCP by mattt, and AutoMac MCP. None of them appear on top 'best open source AI agents' listicles because those listicles index on Python frameworks, not MCP servers that touch native apps.
Where does Fazm fit in this list?
Fazm is not a Python framework, so it does not show up on the usual listicles either. It is an MIT-licensed consumer macOS app (github.com/mediar-ai/fazm) that bundles five MCP servers hardcoded in acp-bridge/src/index.ts line 1266: fazm_tools, playwright, macos-use, whatsapp, and google-workspace. The macos-use binary is shipped as a native Mach-O at Contents/MacOS/mcp-server-macos-use inside the signed .app, so the 'native desktop via accessibility' surface works on first launch without the user installing a Python agent, writing a tool loop, or configuring a client. It is the clearest packaging of the underrepresented category in this guide.
Is macos-use open source, and can I use it without Fazm?
Yes. macos-use is open source at github.com/mediar-ai/mcp-server-macos-use under the same org that publishes Fazm. It is a standalone MCP server. You can wire it into Claude Desktop, Cline, Zed, or anything that speaks MCP; Fazm just happens to be the client that bundles it end to end. Using it outside Fazm means you handle accessibility permissions yourself and you host your own agent loop; using it inside Fazm means the accessibility permission flow, AX retry logic, ACP bridge, and Claude agent are already wired.
What is the difference between reading the accessibility tree and taking a screenshot?
The accessibility tree is structured data: element role, title, value, focused state, frame, children. A screenshot is pixels. Passing the tree to an LLM costs a few hundred tokens per window and survives dark mode, high-DPI, and theme changes, because the app reports its own semantics. Passing a screenshot costs an image budget (and for Claude the image is downscaled to 1568 pixels on the longest edge before upload; see ScreenCaptureManager.swift line 153 in Fazm) and then relies on vision OCR. For most desktop automation, the tree wins on latency, cost, and reliability. Fazm still captures screenshots when visual reasoning is required (a PDF figure, a design canvas), but the primary read path is AX.
What does 'ACP' mean in Fazm's architecture, and why does it matter for this list?
ACP is the Agent Client Protocol, the same JSON-RPC protocol Zed's agent panel speaks. Fazm's acp-bridge/ folder contains a TypeScript process that launches per chat session, proxies messages between the Swift app and the Claude Code agent, and exposes MCP servers to it. The 2.4.0 release notes list 'Upgraded Claude agent protocol to v0.29.2' on 2026-04-20. It matters for this guide because any open source ACP-speaking agent (Claude Code, and by extension future ACP implementations) can reuse the same MCP server layer. Picking an open source agent is less about which framework and more about which protocol seam you are willing to commit to.
Which agents are safe to run with broad filesystem and app control?
In practice, only the ones that confirm destructive actions with you before running them. Cline gates terminal commands on user permission per step. Fazm gates macos-use tool calls through the standard macOS accessibility and screen recording prompts, and routes write-intent SQL through ChatToolExecutor.swift where the prompt explicitly requires confirmation-style phrasing. Open source agents that silently fan out to every tool they can discover are fine for read-heavy research but risky for anything that types into a user document. The AppState.swift code that detects stuck accessibility permission (lines 431 to 504 of Desktop/Sources/AppState.swift) is an example of the kind of defensive plumbing a consumer agent needs that most library-level frameworks leave to you.
I want to build my own. Where do I fork from?
Start from the surface you care about. For text-only agents (research, writing, summarization) fork LangGraph or OpenAI Agents SDK and skip the UI work. For code, fork Aider or Cline. For browser, fork browser-use. For native desktop on Mac, fork macos-use as the MCP server and Fazm as the client shell: you get the ACP bridge, the accessibility permission loop, the five bundled MCP servers, the seventeen bundled skills under Desktop/Sources/BundledSkills/, and a signed build pipeline via Codemagic. The first three categories expect you to build the app on top; the fourth category is the only one where the open source repo already is the app.