The 14-tool catalog of a shipped open source AI agent
Search "open source ai agent tools" and you get a wall of framework roundups. LangGraph. CrewAI. AutoGen. OpenAI Agents SDK. Every one of them is scaffolding for the agent you are supposed to build yourself. This guide is different. It walks the actual tool catalog that Fazm (a MIT-licensed consumer Mac agent) ships with: the ALL_TOOLS array in acp-bridge/src/fazm-tools-stdio.ts, lines 279 to 503, with every tool named, described, and cross-linked to its Swift executor.
“The entire tool surface of a shipped, consumer-facing open source AI agent fits in one TypeScript array, ALL_TOOLS in acp-bridge/src/fazm-tools-stdio.ts (lines 279-503), with 14 entries: execute_sql, capture_screenshot, check_permission_status, request_permission, extract_browser_profile, edit_browser_profile, query_browser_profile, scan_files, set_user_preferences, ask_followup, complete_onboarding, save_knowledge_graph, save_observer_card, speak_response. Plus one bundled MCP binary at /Contents/MacOS/mcp-server-macos-use for Accessibility-tree click and type on any other Mac app.”
github.com/mediar-ai/fazm, MIT license, verified 2026-04-19
What the top SERP actually ships
Every top-10 result for this keyword is either a framework or a framework roundup. The category is missing the thing an end user can download and use.
The entire tool catalog, in one array
This is the declaration block every LLM call lands against. Each entry is name, description, inputSchema. No decorators, no metaclass, no graph. When the acp-bridge starts, it filters this array by session type (onboarding, chat observer, regular) and returns what remains to Claude.
The 14 tools, one card each
Every card below is one entry in ALL_TOOLS. Read them like a feature matrix; this is the full surface area a Fazm agent can touch on your Mac.
execute_sqlRuns one SELECT, INSERT, UPDATE, or DELETE against the local fazm.db SQLite file. DROP, ALTER, CREATE, PRAGMA, ATTACH, DETACH, VACUUM are blocked. UPDATE and DELETE without WHERE are rejected. Multi-statement queries are rejected.
ChatToolExecutor.swift lines 152-260
capture_screenshotGrabs a JPEG of the whole display or just the frontmost window. Called only when the Accessibility tree cannot answer the agent's question. Explicitly documented as 'the ONLY way to see the user's desktop', not a replacement for widget reads.
fazm-tools-stdio.ts lines 295-314
check_permission_statusReturns JSON for the five macOS permissions the agent cares about: screen_recording, microphone, notifications, accessibility, automation. An LLM can branch on this before attempting a gated call.
fazm-tools-stdio.ts lines 316-324
request_permissionTriggers the real macOS permission dialog for one of the five permissions. Returns granted, pending, or denied. One permission per call.
fazm-tools-stdio.ts lines 325-339
extract_browser_profileWalks browser autofill, logins, history, bookmarks. Writes a structured identity profile (name, emails, phones, addresses, payment cards last-4, accounts, top tools, contacts) into the local database. 100% on-device, nothing leaves the machine.
fazm-tools-stdio.ts lines 340-348
edit_browser_profileDelete or update a single entry in the browser profile by fuzzy query. Useful when the user asks 'forget that phone number' or 'change the company name'.
fazm-tools-stdio.ts lines 349-361
query_browser_profileNatural-language search over the extracted profile with optional tag filters (identity, contact_info, account, tool, address, payment, contact, work, knowledge). Auto-triggers extract_browser_profile if the cache is stale.
fazm-tools-stdio.ts lines 362-373
scan_filesBlocking scan of ~/Downloads, ~/Documents, ~/Desktop, ~/Developer, ~/Projects, /Applications. Returns file-type breakdown, project indicators, recent files, installed apps, plus any folders macOS denied access to.
fazm-tools-stdio.ts lines 374-382
set_user_preferencesSingle tool for language, name, and TTS voice toggle. Language codes include en, es, ja, ko, ru, zh, fr, de, it, pt, ar, hi, pl, nl.
fazm-tools-stdio.ts lines 383-404
ask_followupRenders two to four clickable quick-reply buttons in the chat. MUST be the last tool call of a turn; the LLM cannot call another tool or emit text after it. This is a hard rule, not a suggestion.
fazm-tools-stdio.ts lines 405-425
complete_onboardingEnds the onboarding chat, logs analytics, enables launch-at-login, starts the background services. Called once, at the end of first run.
fazm-tools-stdio.ts lines 427-434
save_knowledge_graphPersists 15 to 40 nodes and their edges to a local knowledge graph (local_kg_nodes, local_kg_edges). Node types are person, organization, place, thing, concept. Edges are free-form labels like works_on, uses, built_with.
fazm-tools-stdio.ts lines 436-475
save_observer_cardWrites a single 'I noticed something' card to the observer feed (types: insight, pattern, skill_created, kg_update). Auto-accepted, user can deny to undo. This is the agent's memory UI, not a raw INSERT.
fazm-tools-stdio.ts lines 477-487
speak_response1 to 3 sentence TTS summary via Apple's AVSpeechSynthesizer. Gated behind the voice-response toggle. Only called when the conversation language is in the supported list (English, Spanish, French, German, Italian, Dutch, Japanese).
fazm-tools-stdio.ts lines 489-501
The dispatcher is a 14-case switch
On the Swift side, every tool resolves through a single function. No agent graph, no orchestrator. Whatever name comes in, one case handles it. That is the whole interface.
How a user sentence becomes a real action
A Claude agent inside acp-bridge picks one of the 14 tool names, emits tools/call, and Fazm routes that call through the dispatcher and out to macOS.
user intent -> tool dispatcher -> macOS
The actuator: Accessibility API, not screenshots
The biggest structural choice in Fazm's tool catalog is not in the tool list at all. It is what happens when the LLM says "click Send". Most open-source agents in the SERP answer with a screenshot and a vision model. Fazm answers with a tree of typed widgets.
How the agent clicks a button in another app
A screenshot of the whole display is taken every turn. A vision model returns (x, y) for the Send button. The agent issues a click at (x, y). If the window moved, scaling changed, or the button moved 6 pixels, the click misses. Retry, rescreenshot, rerun the model.
- 70-150 KB base64 per turn
- Breaks under occlusion and scaling
- Mostly browser-only
- Every turn eats a vision call
AX tree vs. screenshot, line by line
Specific dimensions on which the two actuator strategies diverge. Both are real techniques in open source today. They are not interchangeable.
| Feature | Screenshot-based agents | Fazm (AX-first) |
|---|---|---|
| Default actuator | Pixel coordinate derived from a vision model on a screenshot | AX tree via mcp-server-macos-use (click by role, write by field) |
| Works when the window is behind another | No, the target has to be visible to be captured | Yes, AXUIElement references are window-agnostic |
| Works at 4K, 5K, and weird scaling | Often misses by 5-30 pixels; retries eat tokens | Yes, coordinates do not matter |
| Works for disabled or hidden buttons | Model has to infer it from pixel color | AXEnabled exposes the state directly |
| Cost per action | A full base64 screenshot, often 70-150 KB | A few hundred tokens of AX tree text |
| Works outside the browser | Usually browser only; some frameworks tack on OS drivers | Any running Mac app: Mail, Reminders, Xcode, Figma, Terminal |
| When screenshots still fire | Every turn, because that is the agent | Only when the AX tree does not answer the question |
The bundled MCP binary that drives other apps
On disk, at /Applications/Fazm.app/Contents/MacOS/mcp-server-macos-use, Fazm ships a standalone MCP server that speaks the Accessibility API. acp-bridge spawns it as a child process and the Claude agent calls it through the mcp__macos-use__* tool namespace.
What the bundled MCP server can do
- Enumerate a running app's widget tree via AXUIElementCreateApplication
- Find widgets by AXRole, AXTitle, AXDescription, AXIdentifier
- AXPress an AXButton / AXCheckBox / AXMenuItem
- Read AXValue, AXSelectedText, AXRole, AXChildren
- Write AXValue into an AXTextField or AXTextArea
- Focus a window, raise it, resize it, move it
- Walk the tree into sub-sheets and sheets-on-sheets
- Return a compact tree snapshot as a .txt file
What one tool call looks like in the log
Tail /tmp/fazm.log while you interact with the agent. You will see the tool name, the argument payload, and the string result come back, exactly as the LLM sees them.
A tool call's lifecycle, six steps
This is the entire pipeline from the user's words to an observable effect on macOS. Every hop is in the public repo.
You hold Option (or type in the floating bar)
PTT flips idle -> listening. For text, the chat input sends directly to ChatProvider. Either way, a user utterance becomes a string.
The string enters acp-bridge
ACPBridge.swift launches Node + acp-bridge (agent-client-protocol) with a Claude credential. The LLM sees the tool list returned by TOOLS in fazm-tools-stdio.ts.
The LLM calls a tool, by name
JSON-RPC tools/call with one of the 14 names. fazm-tools-stdio is the MCP server; it forwards through a bridge pipe named FAZM_BRIDGE_PIPE.
Swift executes it
ChatToolExecutor.execute() dispatches by tool name. SQL runs against fazm.db. capture_screenshot calls SCStreamConfiguration. request_permission hits the macOS TCC API.
macos-use MCP handles widget clicks
For click/type on other apps, the Claude agent calls mcp__macos-use__* tools that the bundled binary /Contents/MacOS/mcp-server-macos-use handles by walking the AX tree. No screenshot in this path.
Results come back as strings
Every tool returns a plain string the LLM reads. Errors, row counts, permission state, file lists. The agent either answers the user or issues another tool call.
Six choices the catalog forces that a framework never would
A shipped tool catalog is opinionated in ways an SDK cannot afford to be. These are decisions a user-facing agent has to make to be useful out of the box.
14 tools, 1 array, 1 file
Every tool the LLM can call is declared in ALL_TOOLS inside acp-bridge/src/fazm-tools-stdio.ts. If you want to know what a Fazm agent can do, open one file. No scattered decorators, no agent-per-tool orchestration.
A Swift switch, not a framework
The Swift side dispatches in ChatToolExecutor.swift with a 14-case switch at lines 60-134. Readable, greppable, testable. No dependency injection container, no agent registry.
SQL is a first-class tool
execute_sql gives the agent read/write access to the local fazm.db SQLite database. Observer activity, chat history, knowledge graph, user settings, file index. The LLM writes the query; the executor runs it; bad keywords are blocked.
Permissions are tools too
check_permission_status and request_permission turn the macOS permission model into two callable tools. The agent can decide when to ask, in-context, instead of hitting a hardcoded modal on startup.
Quick replies are a hard last step
ask_followup is enforced as the last tool call of a turn. No other tool can run after it. This stops the classic 'agent writes a wall of text, then writes more text' failure mode frameworks leave open.
Voice response is opt-in
speak_response is filtered out of the tool list when voice is disabled. The LLM never sees a tool it cannot call. This is gated at the tool-listing layer, not at the handler.
Framework vs. shipped app, the category split
Framework wins if you are shipping your own agent. Fazm wins if you want to use one today. Both are open source. Neither is a better version of the other.
| Feature | LangGraph / CrewAI / AutoGen / OpenAI Agents SDK | Fazm (consumer app) |
|---|---|---|
| Shape of the deliverable | pip install, npm install, docker compose up, README | Signed .dmg you download and launch |
| Where the tool catalog lives | Whatever you write in your graph/crew/agent config | ALL_TOOLS array, lines 279-503 of one .ts file |
| How a tool is exposed to the LLM | Framework-specific decorator, subclass, or config block | MCP server over JSON-RPC stdio, plus a Swift executor |
| Audience | Developers building their own product | End users. Download, use, done |
| What a new tool costs you | Add an orchestration node, wire it into the flow, ship | Add a block to ALL_TOOLS + a case in ChatToolExecutor |
| License | Usually Apache 2 or MIT, same spirit, different artifact | MIT at github.com/mediar-ai/fazm |
The tool catalog in numbers
Tools in ALL_TOOLS
Lines for the array (279 to 503)
Switch statement that dispatches them
Frameworks required to read it
Try the shipped agent, not another framework
Fazm is the only open source AI agent on this SERP you download as a signed .dmg. The 14-tool catalog described on this page is already wired in. Same MIT license as the frameworks, different artifact entirely.
Download Fazm for Mac →Open source AI agent tools, answered against the source
What does 'open source ai agent tools' usually mean in the top SERP results, and how is this page different?
The top results for this keyword are framework roundups: LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Google ADK, Mastra, Dify, plus coding agents like OpenHands, Aider, and Cline. All of them are Python or TypeScript SDKs that a developer assembles into a product. None of them let a non-engineer install a working agent with a tool surface already wired in. Fazm is the other shape: a signed Mac app with 14 tools already declared in one file (acp-bridge/src/fazm-tools-stdio.ts, ALL_TOOLS at lines 279-503) and a Swift dispatcher that executes them (ChatToolExecutor.swift, lines 60-134). This page walks that catalog tool by tool so you can see what a shipped consumer agent's tool surface actually looks like, not another framework tutorial.
What are the exact 14 tools the Fazm LLM can call?
In source order: execute_sql, capture_screenshot, check_permission_status, request_permission, extract_browser_profile, edit_browser_profile, query_browser_profile, scan_files, set_user_preferences, ask_followup, complete_onboarding, save_knowledge_graph, save_observer_card, speak_response. They are declared in acp-bridge/src/fazm-tools-stdio.ts in the ALL_TOOLS array (lines 279-503) and dispatched in Desktop/Sources/Providers/ChatToolExecutor.swift (switch at lines 60-134). On top of that, the bundled MCP binary at /Contents/MacOS/mcp-server-macos-use exposes Accessibility-tree click and type tools (mcp__macos-use__*) that the agent can call to drive any other Mac app.
Why is Accessibility API actuation better than screenshot-based actuation for an AI agent?
Screenshots force the LLM to solve a vision problem every turn (find the pixel, click the pixel) and they break when the target is occluded, offscreen, or rendered at a scaling factor the model was not trained on. Accessibility (AX) actuation treats the app as a tree of typed widgets and asks, 'find the AXButton named Send'. The reference returned is opaque and stable. You do not need coordinates. You do not need the window to be frontmost. You get AXEnabled, AXValue, AXSelectedText for free. Fazm's default path through the macos-use MCP is AX-first; capture_screenshot is explicitly documented as a fallback, not the default pipeline. That is why voice commands like 'mark the first reminder as done' land on the actual AXCheckBox, not a pixel guess.
How does the tool catalog change per session? I heard onboarding has different tools.
Yes. fazm-tools-stdio.ts builds the TOOLS constant by filtering ALL_TOOLS (line 513-521). Chat observer sessions get only execute_sql, capture_screenshot, query_browser_profile, edit_browser_profile, and save_observer_card (CHAT_OBSERVER_TOOL_NAMES, line 271-277). Regular floating-bar sessions get the full set minus onboarding-only tools. Onboarding sessions get the full set. speak_response is filtered out entirely unless the voice toggle is on (VOICE_RESPONSE_TOOL_NAMES, line 506). So the tool list the LLM sees is always the intersection of 'what this session needs' and 'what the user enabled'.
How does a tool call actually travel from the LLM to macOS?
The Claude agent speaks ACP (agent client protocol) to acp-bridge. acp-bridge spawns fazm-tools-stdio.ts as a subprocess (MCP server over stdio). When Claude emits tools/call, the stdio MCP forwards the call through a named pipe (FAZM_BRIDGE_PIPE env var) back to the parent acp-bridge process, which hands it to the Swift side of the app through WebRelay / ACPBridge. Swift dispatches in ChatToolExecutor.execute() and the return value walks back up the same pipe. For macos-use tools, acp-bridge spawns a separate MCP server, mcp-server-macos-use, that speaks directly to the Accessibility API. None of this is a cloud round trip; only the LLM call itself leaves the machine.
Is execute_sql really safe? Letting an LLM write SQL sounds like a risk surface.
The guard is at two layers. First, ChatToolExecutor.swift blocks a keyword list of DROP, ALTER, CREATE, PRAGMA, ATTACH, DETACH, VACUUM outright (lines 152-154). Second, UPDATE and DELETE without a WHERE clause are rejected (lines 241-243). Multi-statement queries (semicolon-joined) are rejected (lines 186-191). The agent cannot issue a DROP TABLE even with prompt injection. SELECT is auto-limited to 200 rows. It is local-only, too; the database is fazm.db in Application Support, never synced to a server.
Can I add my own tool to this catalog?
Yes, in two places. Add a block to ALL_TOOLS in acp-bridge/src/fazm-tools-stdio.ts with your tool's name, description, and JSON Schema. Then add a case in the switch in ChatToolExecutor.swift that implements the handler and returns a String. Rebuild and the LLM gets it on the next session. Because ALL_TOOLS is regular TypeScript, not a framework DSL, you can also add input validation, logging, or analytics wrappers in the dispatcher without fighting an orchestration layer.
How does this compare to a framework like LangGraph or CrewAI?
Frameworks are libraries you use to build an agent; Fazm is the agent. LangGraph gives you stateful graph orchestration. CrewAI gives you multi-role agents with shared goals. AutoGen gives you a conversation-based agent loop. All three are correct choices if you are shipping your own voice assistant, your own coding agent, or your own research tool. Fazm is open-source source code for a finished consumer product. You can fork it, swap execute_sql for your own SQL engine, or replace capture_screenshot with whisper + diffusion. You cannot fork LangGraph and hand the result to your mom. Both shapes are valid. They are not the same artifact.
Where is the tool catalog in the MIT-licensed source, exactly?
github.com/mediar-ai/fazm. The TypeScript declaration lives in acp-bridge/src/fazm-tools-stdio.ts; ALL_TOOLS starts at line 279 and ends at line 503. The Swift executor lives in Desktop/Sources/Providers/ChatToolExecutor.swift; execute() starts at line 57 and the switch runs lines 60-134. Every file path and line number on this page points to the public repo as of 2026-04-19.
Does the agent ever leave the machine?
Only for two things: the Claude LLM call itself, which is a commercial API round-trip, and Deepgram for speech-to-text when you use voice. All 14 tools run on-device. Browser profile extraction reads local browser SQLite files. execute_sql queries a local fazm.db. capture_screenshot takes a local screen capture. The macos-use MCP is a local binary. Fork the repo, swap the Claude call for Ollama or llama.cpp, and the whole stack is offline.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.