Testing AI Agents with Accessibility APIs Instead of Screenshots
Screenshot-Based Testing Is Fragile
The most common approach to testing AI agents involves taking screenshots and comparing pixels. Did the agent click the right button? Take a screenshot and check. Did it fill in the correct field? Screenshot again.
This breaks constantly. A font change, a color update, a slightly different window size - any of these can cause pixel-based comparisons to fail. You end up spending more time maintaining test fixtures than actually testing agent behavior.
Accessibility APIs Give You Structure
Every macOS application exposes an accessibility tree - a structured hierarchy of UI elements with their roles, labels, values, and states. A button isn't a cluster of pixels. It's an element with role "button", label "Submit", and state "enabled."
When you test against the accessibility tree, you're testing against the semantic structure of the UI, not its visual appearance. Your test says "find the button labeled Submit and verify it was clicked" rather than "compare this region of pixels to a reference image."
Tests That Survive Redesigns
This is the real advantage. When the design team updates the app's color scheme, every screenshot-based test breaks. When they move a button from the left sidebar to a top toolbar, every coordinate-based test breaks. But if the button still has the same accessibility label, your accessibility-based tests pass without changes.
The accessibility tree also gives you information screenshots can't - like whether a checkbox is checked, whether a text field is focused, or whether a menu item is disabled. These states are invisible in a static screenshot but explicit in the accessibility data.
Practical Implementation
On macOS, you query the accessibility tree through the AXUIElement API. You can enumerate all elements in a window, filter by role and label, and verify states programmatically. It's more setup than taking a screenshot, but each test you write is dramatically more durable.
For AI agent testing specifically, this means you can verify complex multi-step workflows without maintaining a library of reference screenshots that rot every time the UI changes.
Fazm is an open source macOS AI agent. Open source on GitHub.