Building an MCP Server for macOS Accessibility API Control - Release Notes and Lessons

Fazm Team··2 min read

Building an MCP Server for macOS Accessibility API

We've been building an MCP server that lets AI agents control macOS apps through the accessibility API. Not screenshots. Not pixel matching. Direct interaction with the actual UI elements - buttons, text fields, menus, windows - the same way screen readers work.

Why Accessibility API Over Screenshots

Screenshot-based agents take a picture of your screen, send it to a vision model, and get back coordinates to click. This works but it's slow and fragile. A button that moves 10 pixels breaks the automation. A dark mode switch confuses the model. Resolution changes cause misclicks.

The accessibility API gives you structured data. You get the actual button label, its state (enabled, disabled, checked), its position, and its role in the UI hierarchy. No vision model needed for navigation. No guessing about what's on screen.

What We Learned Across Releases

v0.1.0 - v0.1.5: Getting basics working. Initial click, type, and read operations. The biggest challenge was handling apps that don't properly expose their accessibility tree. Some Electron apps are particularly bad at this.

v0.1.6 - v0.1.10: Performance matters. Traversing the full accessibility tree is expensive. We learned to scope queries - ask for elements in a specific window or region instead of dumping the entire tree. Context window overflow was a real problem.

v0.1.11 - v0.1.14: Reliability. Edge cases dominate. Dropdown menus that disappear before you can read them. Modal dialogs that block the main window. Apps that change their accessibility labels between versions. Each release fixed a new class of failures.

The Annoying Parts

Apple's accessibility API documentation is sparse. Some behaviors are undocumented. Different app frameworks - SwiftUI, AppKit, Electron, Qt - expose accessibility data differently. Testing requires actually running the target apps, which makes CI complicated.

Why It's Worth It

Despite the pain, accessibility-based control is fundamentally more reliable than screenshot-based approaches. Once you handle an app's quirks, the automation stays stable across visual changes, theme switches, and minor updates. That stability is what separates a demo from a daily-use tool.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts