Building a Universal macOS Automation API
Building a Universal macOS Automation API
macOS has at least three distinct automation layers, and none of them covers everything:
- AppleScript/JXA - great for app-specific commands but limited to apps that implement scripting support
- Accessibility API - can interact with any UI element but requires understanding the view hierarchy
- Shell commands - powerful for system operations but cannot click buttons or read screen content
Building a useful desktop agent means unifying all three into a single interface that picks the right approach automatically.
Why No Single Layer Works
AppleScript can tell Safari to open a URL, but it cannot interact with web page content. The accessibility API can click any button on screen, but it cannot tell Finder to move a file efficiently. Shell commands can manipulate files and processes, but they are blind to what is on screen.
Each layer has blind spots that the others cover. A real automation system needs to combine them:
- Use AppleScript when an app has good scripting support (Finder, Mail, Safari, Calendar)
- Fall back to accessibility API for apps without scripting dictionaries
- Use shell commands for file operations, process management, and system configuration
- Combine layers when a single task spans multiple approaches
The Unified Interface
The API abstracts away which layer handles each request. When you say "move all PDFs from Downloads to the Documents folder," the system:
- Uses shell commands to find and move the files (fastest, most reliable)
- Uses accessibility API to refresh the Finder window if one is open
- Uses AppleScript to show a notification when done
The caller does not need to know which layer executed each step. They describe the intent, and the API routes to the right mechanism.
Making It Work for AI Agents
AI agents benefit enormously from this abstraction. Instead of teaching the model three different automation syntaxes, you give it one consistent tool interface. The agent says what it wants to accomplish, and the unified API figures out how.
This is the core architecture behind Fazm's macOS integration. The agent interacts with one API, and the routing layer handles the complexity of choosing between AppleScript, accessibility, and shell operations.
Fazm is an open source macOS AI agent. Open source on GitHub.