Building an AI Personal Assistant That Controls Your Phone and Mac Through Accessibility APIs
Building an AI Personal Assistant That Controls Your Phone and Mac Through Accessibility APIs
The dream of a personal AI assistant is not a chatbot you type to. It is an agent that controls your devices the way you do - opening apps, tapping buttons, filling out forms, and moving between your phone and laptop seamlessly.
On macOS, the accessibility API (AXUIElement) gives you programmatic control over every app's UI elements. You can read button labels, click elements, type into text fields, and navigate menus. This is the same API that screen readers use, which means it works with virtually every application.
The macOS Side
We built our agent on top of the macOS accessibility framework. The agent reads the accessibility tree to understand what is on screen, identifies interactive elements, and executes actions through the same API. No screenshots needed for most tasks - just structured data about every button, field, and label.
This approach is fast. Reading the accessibility tree takes milliseconds. Clicking a button is instantaneous. Compare that to the screenshot approach where every action requires capturing an image, sending it to a vision model, waiting for analysis, and then executing. The accessibility API path is 10x faster minimum.
The iPhone Challenge
iOS does not expose accessibility APIs to third-party apps the same way macOS does. The workaround involves a combination of approaches - Shortcuts automation, local network communication between your Mac and phone, and in some cases using your Mac as a relay to control the phone through Xcode's instruments.
The iPhone access part is key because so much of daily life happens on mobile. Calendar invites, text messages, ride-hailing, food delivery - these are phone-first tasks that a desktop-only agent cannot reach.
The Cross-Device Vision
The real power comes from an agent that sees both your Mac and your iPhone as one unified workspace. "Book me a ride to the airport" should work regardless of which device has the Uber app open. The agent figures out which device to use and executes accordingly.
We are not fully there yet, but the foundation is accessibility APIs on both platforms, bridged by a coordinator that knows which device can handle each task.
Fazm is an open source macOS AI agent. Open source on GitHub.