Back to Blog

Using MCP to Let AI Agents Control macOS via Accessibility APIs

Fazm Team··2 min read
mcpmacosaccessibilityghost-osautomation

The Model Context Protocol gives AI agents a standardized way to use tools. On macOS, the most powerful tool available is the accessibility API - the same system that powers VoiceOver and other assistive technologies. Wrap that API in an MCP server and any LLM-based agent can control your entire desktop.

How It Works

An MCP server exposes macOS accessibility functions as callable tools. The agent can query the UI tree of any running application, read element properties like labels and roles, click buttons, type into fields, select menu items, and navigate between windows. All through structured API calls, not pixel-level interactions.

The key advantage is that accessibility APIs provide semantic information. The agent doesn't just see a rectangle at coordinates (340, 220) - it sees "Send button, enabled, in Mail compose window." This makes tool selection reliable because the agent understands what elements actually are, not just where they appear on screen.

Voice as the Input Layer

Once the agent can control macOS through MCP, voice becomes the natural input. You speak a command, the agent maps it to a sequence of MCP tool calls, and the tools execute against the accessibility API. The entire chain runs locally - speech-to-text on Apple Silicon, LLM inference for planning, and MCP execution for actions.

This creates a hands-free automation layer for your entire desktop. "Move yesterday's downloads into the project folder" becomes a voice command that triggers Finder operations through accessibility APIs. "Reply to Sarah's last email with the updated timeline" chains Mail operations together.

Why MCP Matters Here

Without MCP, every agent builds its own tool interface. With MCP, the accessibility layer is a shared resource that any compatible agent can use. Build the server once, and it works with Claude, local models, or whatever comes next. The protocol handles the plumbing so the agent can focus on understanding what you want.

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts