What v0.1.14 Taught Us About macOS Accessibility API Automation
What v0.1.14 Taught Us About macOS Accessibility Automation
Fourteen releases into building an MCP server for macOS accessibility control, the lessons are clear. The hard parts aren't where you'd expect them.
The API Is Deceptively Simple
Apple's accessibility API surface is small. You can query elements, read attributes, perform actions. The basics work in an afternoon. But production reliability takes months. Every app implements accessibility differently. SwiftUI apps expose a clean tree. AppKit apps vary wildly. Electron apps are a grab bag.
What Broke Between Versions
v0.1.8 broke Finder navigation. Apple changed how Finder exposes sidebar items in a minor macOS update. No documentation. No changelog mention. We found out from user bug reports.
v0.1.11 introduced a memory leak. Retaining accessibility element references without releasing them. The server would grow to 2GB over a few hours of heavy use. Fixed by implementing proper reference counting in the element cache.
v0.1.13 fixed the menu bar race condition. Clicking a menu item sometimes failed because the menu would dismiss before the click registered. The fix was adding a small delay after menu expansion - ugly but necessary.
Lessons for MCP Server Builders
Test with real apps, not synthetic UIs. Your test harness might have perfect accessibility support. Safari, Slack, and VS Code each have their own quirks that only show up in production.
Cache aggressively, invalidate carefully. The accessibility tree changes constantly. Caching element references speeds up traversal but stale references cause crashes. We settled on a 500ms cache TTL as a reasonable default.
Error messages matter. When an element can't be found, tell the agent why. "Button not found" is useless. "Button 'Submit' not found in window 'Settings' - window contains: Cancel, Apply, Reset" gives the agent enough context to recover.
Ship often. Each release fixes edge cases you didn't know existed. Users are running your server against apps you've never tested. Frequent releases build trust and surface issues faster.
The State of Desktop Automation
We're still early. The accessibility API is powerful but undersupported by app developers. As more AI agents rely on it, there's pressure for apps to improve their accessibility support - which benefits everyone, not just AI tools.
Fazm is an open source macOS AI agent. Open source on GitHub.