Open source AI agent tools14-tool catalog, file + line numbersSource-verified, April 19 2026

The 14-tool catalog of a shipped open source AI agent

Search "open source ai agent tools" and you get a wall of framework roundups. LangGraph. CrewAI. AutoGen. OpenAI Agents SDK. Every one of them is scaffolding for the agent you are supposed to build yourself. This guide is different. It walks the actual tool catalog that Fazm (a MIT-licensed consumer Mac agent) ships with: the ALL_TOOLS array in acp-bridge/src/fazm-tools-stdio.ts, lines 279 to 503, with every tool named, described, and cross-linked to its Swift executor.

Fazm

Published April 19, 202614 min read

Try the shipped agent on Mac

4.9from 200+

Every tool name, file path, and line number on this page points to the public MIT-licensed Fazm repo

ALL_TOOLS array verified at acp-bridge/src/fazm-tools-stdio.ts lines 279-503

Dispatcher switch verified at Desktop/Sources/Providers/ChatToolExecutor.swift lines 60-134

Frameworks vs. a finished tool catalog

The SERP is 10 frameworks. Here is the shape of the other thing.

LangGraph, CrewAI, AutoGen: frameworks for building

Fazm: 14 named tools already wired in

One file, lines 279-503, MIT license

Actuator is the macOS Accessibility tree

0:00 / 0:05

14 tools

“The entire tool surface of a shipped, consumer-facing open source AI agent fits in one TypeScript array, ALL_TOOLS in acp-bridge/src/fazm-tools-stdio.ts (lines 279-503), with 14 entries: execute_sql, capture_screenshot, check_permission_status, request_permission, extract_browser_profile, edit_browser_profile, query_browser_profile, scan_files, set_user_preferences, ask_followup, complete_onboarding, save_knowledge_graph, save_observer_card, speak_response. Plus one bundled MCP binary at /Contents/MacOS/mcp-server-macos-use for Accessibility-tree click and type on any other Mac app.”

github.com/mediar-ai/fazm, MIT license, verified 2026-04-19

What the top SERP actually ships

Every top-10 result for this keyword is either a framework or a framework roundup. The category is missing the thing an end user can download and use.

LangGraph (framework)CrewAI (framework)AutoGen (framework)OpenAI Agents SDK (SDK)Google ADK (framework)Mastra (framework)Dify (platform)OpenHands (coding agent)Aider (coding agent)Cline (coding agent)Fazm (a signed Mac app with 14 real tools)

0Tools in ALL_TOOLS

0Blocked SQL keywords

0Row cap on SELECT

0macOS permissions gated by 2 tools

The entire tool catalog, in one array

This is the declaration block every LLM call lands against. Each entry is name, description, inputSchema. No decorators, no metaclass, no graph. When the acp-bridge starts, it filters this array by session type (onboarding, chat observer, regular) and returns what remains to Claude.

acp-bridge/src/fazm-tools-stdio.ts, lines 279-294 (first tool shown, 13 more follow)

The 14 tools, one card each

Every card below is one entry in ALL_TOOLS. Read them like a feature matrix; this is the full surface area a Fazm agent can touch on your Mac.

execute_sql

Runs one SELECT, INSERT, UPDATE, or DELETE against the local fazm.db SQLite file. DROP, ALTER, CREATE, PRAGMA, ATTACH, DETACH, VACUUM are blocked. UPDATE and DELETE without WHERE are rejected. Multi-statement queries are rejected.

ChatToolExecutor.swift lines 152-260

capture_screenshot

Grabs a JPEG of the whole display or just the frontmost window. Called only when the Accessibility tree cannot answer the agent's question. Explicitly documented as 'the ONLY way to see the user's desktop', not a replacement for widget reads.

fazm-tools-stdio.ts lines 295-314

check_permission_status

Returns JSON for the five macOS permissions the agent cares about: screen_recording, microphone, notifications, accessibility, automation. An LLM can branch on this before attempting a gated call.

fazm-tools-stdio.ts lines 316-324

request_permission

Triggers the real macOS permission dialog for one of the five permissions. Returns granted, pending, or denied. One permission per call.

fazm-tools-stdio.ts lines 325-339

extract_browser_profile

Walks browser autofill, logins, history, bookmarks. Writes a structured identity profile (name, emails, phones, addresses, payment cards last-4, accounts, top tools, contacts) into the local database. 100% on-device, nothing leaves the machine.

fazm-tools-stdio.ts lines 340-348

edit_browser_profile

Delete or update a single entry in the browser profile by fuzzy query. Useful when the user asks 'forget that phone number' or 'change the company name'.

fazm-tools-stdio.ts lines 349-361

query_browser_profile

Natural-language search over the extracted profile with optional tag filters (identity, contact_info, account, tool, address, payment, contact, work, knowledge). Auto-triggers extract_browser_profile if the cache is stale.

fazm-tools-stdio.ts lines 362-373

scan_files

Blocking scan of ~/Downloads, ~/Documents, ~/Desktop, ~/Developer, ~/Projects, /Applications. Returns file-type breakdown, project indicators, recent files, installed apps, plus any folders macOS denied access to.

fazm-tools-stdio.ts lines 374-382

set_user_preferences

Single tool for language, name, and TTS voice toggle. Language codes include en, es, ja, ko, ru, zh, fr, de, it, pt, ar, hi, pl, nl.

fazm-tools-stdio.ts lines 383-404

ask_followup

Renders two to four clickable quick-reply buttons in the chat. MUST be the last tool call of a turn; the LLM cannot call another tool or emit text after it. This is a hard rule, not a suggestion.

fazm-tools-stdio.ts lines 405-425

complete_onboarding

Ends the onboarding chat, logs analytics, enables launch-at-login, starts the background services. Called once, at the end of first run.

fazm-tools-stdio.ts lines 427-434

save_knowledge_graph

Persists 15 to 40 nodes and their edges to a local knowledge graph (local_kg_nodes, local_kg_edges). Node types are person, organization, place, thing, concept. Edges are free-form labels like works_on, uses, built_with.

fazm-tools-stdio.ts lines 436-475

save_observer_card

Writes a single 'I noticed something' card to the observer feed (types: insight, pattern, skill_created, kg_update). Auto-accepted, user can deny to undo. This is the agent's memory UI, not a raw INSERT.

fazm-tools-stdio.ts lines 477-487

speak_response

1 to 3 sentence TTS summary via Apple's AVSpeechSynthesizer. Gated behind the voice-response toggle. Only called when the conversation language is in the supported list (English, Spanish, French, German, Italian, Dutch, Japanese).

fazm-tools-stdio.ts lines 489-501

The dispatcher is a 14-case switch

On the Swift side, every tool resolves through a single function. No agent graph, no orchestrator. Whatever name comes in, one case handles it. That is the whole interface.

Desktop/Sources/Providers/ChatToolExecutor.swift, lines 57-75 (truncated)

How a user sentence becomes a real action

A Claude agent inside acp-bridge picks one of the 14 tool names, emits tools/call, and Fazm routes that call through the dispatcher and out to macOS.

user intent -> tool dispatcher -> macOS

The actuator: Accessibility API, not screenshots

The biggest structural choice in Fazm's tool catalog is not in the tool list at all. It is what happens when the LLM says "click Send". Most open-source agents in the SERP answer with a screenshot and a vision model. Fazm answers with a tree of typed widgets.

How the agent clicks a button in another app

A screenshot of the whole display is taken every turn. A vision model returns (x, y) for the Send button. The agent issues a click at (x, y). If the window moved, scaling changed, or the button moved 6 pixels, the click misses. Retry, rescreenshot, rerun the model.

70-150 KB base64 per turn
Breaks under occlusion and scaling
Mostly browser-only
Every turn eats a vision call

AX tree vs. screenshot, line by line

Specific dimensions on which the two actuator strategies diverge. Both are real techniques in open source today. They are not interchangeable.

Feature	Screenshot-based agents	Fazm (AX-first)
Default actuator	Pixel coordinate derived from a vision model on a screenshot	AX tree via mcp-server-macos-use (click by role, write by field)
Works when the window is behind another	No, the target has to be visible to be captured	Yes, AXUIElement references are window-agnostic
Works at 4K, 5K, and weird scaling	Often misses by 5-30 pixels; retries eat tokens	Yes, coordinates do not matter
Works for disabled or hidden buttons	Model has to infer it from pixel color	AXEnabled exposes the state directly
Cost per action	A full base64 screenshot, often 70-150 KB	A few hundred tokens of AX tree text
Works outside the browser	Usually browser only; some frameworks tack on OS drivers	Any running Mac app: Mail, Reminders, Xcode, Figma, Terminal
When screenshots still fire	Every turn, because that is the agent	Only when the AX tree does not answer the question

The bundled MCP binary that drives other apps

On disk, at /Applications/Fazm.app/Contents/MacOS/mcp-server-macos-use, Fazm ships a standalone MCP server that speaks the Accessibility API. acp-bridge spawns it as a child process and the Claude agent calls it through the mcp__macos-use__* tool namespace.

What the bundled MCP server can do

Enumerate a running app's widget tree via AXUIElementCreateApplication
Find widgets by AXRole, AXTitle, AXDescription, AXIdentifier
AXPress an AXButton / AXCheckBox / AXMenuItem
Read AXValue, AXSelectedText, AXRole, AXChildren
Write AXValue into an AXTextField or AXTextArea
Focus a window, raise it, resize it, move it
Walk the tree into sub-sheets and sheets-on-sheets
Return a compact tree snapshot as a .txt file

What one tool call looks like in the log

Tail /tmp/fazm.log while you interact with the agent. You will see the tool name, the argument payload, and the string result come back, exactly as the LLM sees them.

/tmp/fazm.log

A tool call's lifecycle, six steps

This is the entire pipeline from the user's words to an observable effect on macOS. Every hop is in the public repo.

You hold Option (or type in the floating bar)

PTT flips idle -> listening. For text, the chat input sends directly to ChatProvider. Either way, a user utterance becomes a string.

The string enters acp-bridge

ACPBridge.swift launches Node + acp-bridge (agent-client-protocol) with a Claude credential. The LLM sees the tool list returned by TOOLS in fazm-tools-stdio.ts.

The LLM calls a tool, by name

JSON-RPC tools/call with one of the 14 names. fazm-tools-stdio is the MCP server; it forwards through a bridge pipe named FAZM_BRIDGE_PIPE.

Swift executes it

ChatToolExecutor.execute() dispatches by tool name. SQL runs against fazm.db. capture_screenshot calls SCStreamConfiguration. request_permission hits the macOS TCC API.

macos-use MCP handles widget clicks

For click/type on other apps, the Claude agent calls mcp__macos-use__* tools that the bundled binary /Contents/MacOS/mcp-server-macos-use handles by walking the AX tree. No screenshot in this path.

Results come back as strings

Every tool returns a plain string the LLM reads. Errors, row counts, permission state, file lists. The agent either answers the user or issues another tool call.

Six choices the catalog forces that a framework never would

A shipped tool catalog is opinionated in ways an SDK cannot afford to be. These are decisions a user-facing agent has to make to be useful out of the box.

14 tools, 1 array, 1 file

Every tool the LLM can call is declared in ALL_TOOLS inside acp-bridge/src/fazm-tools-stdio.ts. If you want to know what a Fazm agent can do, open one file. No scattered decorators, no agent-per-tool orchestration.

A Swift switch, not a framework

The Swift side dispatches in ChatToolExecutor.swift with a 14-case switch at lines 60-134. Readable, greppable, testable. No dependency injection container, no agent registry.

SQL is a first-class tool

execute_sql gives the agent read/write access to the local fazm.db SQLite database. Observer activity, chat history, knowledge graph, user settings, file index. The LLM writes the query; the executor runs it; bad keywords are blocked.

Permissions are tools too

check_permission_status and request_permission turn the macOS permission model into two callable tools. The agent can decide when to ask, in-context, instead of hitting a hardcoded modal on startup.

Quick replies are a hard last step

ask_followup is enforced as the last tool call of a turn. No other tool can run after it. This stops the classic 'agent writes a wall of text, then writes more text' failure mode frameworks leave open.

Voice response is opt-in

speak_response is filtered out of the tool list when voice is disabled. The LLM never sees a tool it cannot call. This is gated at the tool-listing layer, not at the handler.

Framework vs. shipped app, the category split

Framework wins if you are shipping your own agent. Fazm wins if you want to use one today. Both are open source. Neither is a better version of the other.

Feature	LangGraph / CrewAI / AutoGen / OpenAI Agents SDK	Fazm (consumer app)
Shape of the deliverable	pip install, npm install, docker compose up, README	Signed .dmg you download and launch
Where the tool catalog lives	Whatever you write in your graph/crew/agent config	ALL_TOOLS array, lines 279-503 of one .ts file
How a tool is exposed to the LLM	Framework-specific decorator, subclass, or config block	MCP server over JSON-RPC stdio, plus a Swift executor
Audience	Developers building their own product	End users. Download, use, done
What a new tool costs you	Add an orchestration node, wire it into the flow, ship	Add a block to ALL_TOOLS + a case in ChatToolExecutor
License	Usually Apache 2 or MIT, same spirit, different artifact	MIT at github.com/mediar-ai/fazm

The tool catalog in numbers

Tools in ALL_TOOLS

Lines for the array (279 to 503)

Switch statement that dispatches them

Frameworks required to read it

Try the shipped agent, not another framework

Fazm is the only open source AI agent on this SERP you download as a signed .dmg. The 14-tool catalog described on this page is already wired in. Same MIT license as the frameworks, different artifact entirely.

Download Fazm for Mac →

Open source AI agent tools, answered against the source

What does 'open source ai agent tools' usually mean in the top SERP results, and how is this page different?

The top results for this keyword are framework roundups: LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Google ADK, Mastra, Dify, plus coding agents like OpenHands, Aider, and Cline. All of them are Python or TypeScript SDKs that a developer assembles into a product. None of them let a non-engineer install a working agent with a tool surface already wired in. Fazm is the other shape: a signed Mac app with 14 tools already declared in one file (acp-bridge/src/fazm-tools-stdio.ts, ALL_TOOLS at lines 279-503) and a Swift dispatcher that executes them (ChatToolExecutor.swift, lines 60-134). This page walks that catalog tool by tool so you can see what a shipped consumer agent's tool surface actually looks like, not another framework tutorial.

What are the exact 14 tools the Fazm LLM can call?

In source order: execute_sql, capture_screenshot, check_permission_status, request_permission, extract_browser_profile, edit_browser_profile, query_browser_profile, scan_files, set_user_preferences, ask_followup, complete_onboarding, save_knowledge_graph, save_observer_card, speak_response. They are declared in acp-bridge/src/fazm-tools-stdio.ts in the ALL_TOOLS array (lines 279-503) and dispatched in Desktop/Sources/Providers/ChatToolExecutor.swift (switch at lines 60-134). On top of that, the bundled MCP binary at /Contents/MacOS/mcp-server-macos-use exposes Accessibility-tree click and type tools (mcp__macos-use__*) that the agent can call to drive any other Mac app.

Why is Accessibility API actuation better than screenshot-based actuation for an AI agent?

Screenshots force the LLM to solve a vision problem every turn (find the pixel, click the pixel) and they break when the target is occluded, offscreen, or rendered at a scaling factor the model was not trained on. Accessibility (AX) actuation treats the app as a tree of typed widgets and asks, 'find the AXButton named Send'. The reference returned is opaque and stable. You do not need coordinates. You do not need the window to be frontmost. You get AXEnabled, AXValue, AXSelectedText for free. Fazm's default path through the macos-use MCP is AX-first; capture_screenshot is explicitly documented as a fallback, not the default pipeline. That is why voice commands like 'mark the first reminder as done' land on the actual AXCheckBox, not a pixel guess.

How does the tool catalog change per session? I heard onboarding has different tools.

Yes. fazm-tools-stdio.ts builds the TOOLS constant by filtering ALL_TOOLS (line 513-521). Chat observer sessions get only execute_sql, capture_screenshot, query_browser_profile, edit_browser_profile, and save_observer_card (CHAT_OBSERVER_TOOL_NAMES, line 271-277). Regular floating-bar sessions get the full set minus onboarding-only tools. Onboarding sessions get the full set. speak_response is filtered out entirely unless the voice toggle is on (VOICE_RESPONSE_TOOL_NAMES, line 506). So the tool list the LLM sees is always the intersection of 'what this session needs' and 'what the user enabled'.

How does a tool call actually travel from the LLM to macOS?

The Claude agent speaks ACP (agent client protocol) to acp-bridge. acp-bridge spawns fazm-tools-stdio.ts as a subprocess (MCP server over stdio). When Claude emits tools/call, the stdio MCP forwards the call through a named pipe (FAZM_BRIDGE_PIPE env var) back to the parent acp-bridge process, which hands it to the Swift side of the app through WebRelay / ACPBridge. Swift dispatches in ChatToolExecutor.execute() and the return value walks back up the same pipe. For macos-use tools, acp-bridge spawns a separate MCP server, mcp-server-macos-use, that speaks directly to the Accessibility API. None of this is a cloud round trip; only the LLM call itself leaves the machine.

Is execute_sql really safe? Letting an LLM write SQL sounds like a risk surface.

The guard is at two layers. First, ChatToolExecutor.swift blocks a keyword list of DROP, ALTER, CREATE, PRAGMA, ATTACH, DETACH, VACUUM outright (lines 152-154). Second, UPDATE and DELETE without a WHERE clause are rejected (lines 241-243). Multi-statement queries (semicolon-joined) are rejected (lines 186-191). The agent cannot issue a DROP TABLE even with prompt injection. SELECT is auto-limited to 200 rows. It is local-only, too; the database is fazm.db in Application Support, never synced to a server.

Can I add my own tool to this catalog?

Yes, in two places. Add a block to ALL_TOOLS in acp-bridge/src/fazm-tools-stdio.ts with your tool's name, description, and JSON Schema. Then add a case in the switch in ChatToolExecutor.swift that implements the handler and returns a String. Rebuild and the LLM gets it on the next session. Because ALL_TOOLS is regular TypeScript, not a framework DSL, you can also add input validation, logging, or analytics wrappers in the dispatcher without fighting an orchestration layer.

How does this compare to a framework like LangGraph or CrewAI?

Frameworks are libraries you use to build an agent; Fazm is the agent. LangGraph gives you stateful graph orchestration. CrewAI gives you multi-role agents with shared goals. AutoGen gives you a conversation-based agent loop. All three are correct choices if you are shipping your own voice assistant, your own coding agent, or your own research tool. Fazm is open-source source code for a finished consumer product. You can fork it, swap execute_sql for your own SQL engine, or replace capture_screenshot with whisper + diffusion. You cannot fork LangGraph and hand the result to your mom. Both shapes are valid. They are not the same artifact.

Where is the tool catalog in the MIT-licensed source, exactly?

github.com/mediar-ai/fazm. The TypeScript declaration lives in acp-bridge/src/fazm-tools-stdio.ts; ALL_TOOLS starts at line 279 and ends at line 503. The Swift executor lives in Desktop/Sources/Providers/ChatToolExecutor.swift; execute() starts at line 57 and the switch runs lines 60-134. Every file path and line number on this page points to the public repo as of 2026-04-19.

Does the agent ever leave the machine?

Only for two things: the Claude LLM call itself, which is a commercial API round-trip, and Deepgram for speech-to-text when you use voice. All 14 tools run on-device. Browser profile extraction reads local browser SQLite files. execute_sql queries a local fazm.db. capture_screenshot takes a local screen capture. The macos-use MCP is a local binary. Fork the repo, swap the Claude call for Ollama or llama.cpp, and the whole stack is offline.