App Testing

Testing iOS and Desktop Apps with Accessibility APIs vs Screenshots

A developer recently shared that they let an AI agent autonomously test their iOS app using the accessibility tree approach, and it found real bugs in under 8 minutes. The post sparked a broader conversation about how automated testing should actually perceive and interact with app interfaces. Should you use screenshots and vision models, or should you read the accessibility tree directly? The answer depends on what you are testing, how your components are built, and what trade-offs you are willing to accept. This guide walks through both approaches, covers the gaps in each, and explains how to handle the hardest cases like custom components with no accessibility labels.

~50ms per interaction

“Fazm reads the accessibility tree to interact with apps reliably and fast, instead of relying on fragile screenshot matching. This means consistent results across UI changes and themes.”

fazm.ai

1. Two Paradigms for App Testing

Automated app testing has traditionally relied on two different ways of perceiving the interface under test. The first is visual: take a screenshot, compare it to a reference image, or feed it to a vision model that interprets what is on screen. The second is structural: read the UI hierarchy (the accessibility tree on iOS, macOS, or Windows) and interact with named elements programmatically.

Both approaches have been around for years. XCTest on iOS uses the accessibility hierarchy. Appium exposes it across platforms. Tools like Percy and Chromatic use screenshot comparison. What has changed recently is that AI models can now process either signal intelligently: a vision model can look at a screenshot and decide what to tap, while a language model can read a serialized accessibility tree and decide which element to activate.

This convergence is what makes the choice between the two approaches more nuanced than it used to be. It is no longer just "pixel diffing vs. XPath selectors." It is about which perception layer gives an AI agent the best information to test your app thoroughly, quickly, and without false positives.

2. Accessibility Tree Testing: How It Works

On iOS, every UIKit and SwiftUI view can expose accessibility properties: a label (what the element is called), a trait (button, header, image, etc.), a value (its current state), and a hint (what happens when you interact with it). These properties form a tree that mirrors the visual hierarchy but in a structured, machine-readable format. On macOS, the same concept exists through the Accessibility API, which exposes roles, subroles, and attributes for every element in every application.

When an AI agent tests an app using the accessibility tree, it serializes this tree into text. The serialized output might look something like: "Button: Submit Order, enabled; TextField: Email Address, value: empty; StaticText: Your cart has 3 items." The agent reads this text, understands the current state of the interface, and decides what to do next. If it wants to tap the Submit Order button, it references the element by its accessibility identifier or label, not by screen coordinates.

The advantages are significant. Tests are resolution-independent, meaning they work regardless of screen size, orientation, or device model. They are theme-independent, so switching from light mode to dark mode does not break anything. They are fast, because reading the accessibility tree takes milliseconds compared to seconds for screenshot capture and vision model inference. And they are semantically precise: you know an element is a button, not just something that looks like a button.

The developer who found bugs in 8 minutes was leveraging exactly this. The accessibility tree gave the AI agent a complete, structured map of every interactive element in the app. The agent could systematically explore every button, every input field, and every navigation path without ever needing to "look" at the screen. It found edge cases like buttons that did not respond correctly, fields that accepted invalid input, and navigation flows that left the user stuck.

Test and automate apps through accessibility APIs

Fazm uses native accessibility APIs on macOS for fast, reliable app interaction. No screenshot fragility, no coordinate guessing. Open source and free to start.

Try Fazm Free

3. Screenshot Testing: Strengths and Weaknesses

Screenshot testing is not without merit. There are genuine cases where visual verification catches problems that the accessibility tree misses entirely. Layout regressions are the clearest example: if a button overlaps another button, the accessibility tree shows two separate, functional buttons. Only a visual check reveals the overlap. Similarly, color contrast issues, truncated text, misaligned elements, and rendering artifacts are invisible in the accessibility tree.

Modern vision models have made screenshot testing more powerful than traditional pixel-diff approaches. Instead of comparing screenshots pixel by pixel (which breaks on any minor rendering change), a vision model can understand the intent of the interface. It can tell you whether a form looks correct, whether a chart displays the right trend, or whether an error message is actually visible to the user.

The weaknesses become apparent in CI/CD environments. Screenshot tests are slow: capturing, encoding, and processing each screenshot adds seconds per step. They are expensive: vision model API calls cost significantly more than text processing. They are flaky: font rendering differences across OS versions, subtle anti-aliasing changes, and animation timing can cause false failures. And they struggle with dynamic content: a timestamp, a user avatar, or a live data feed makes every screenshot slightly different.

For AI-driven exploratory testing (where an agent autonomously navigates an app looking for bugs), the latency of screenshot testing is particularly painful. An agent that needs 3 seconds per action to process a screenshot will take 5 minutes to explore 100 actions. The same agent using the accessibility tree can explore those 100 actions in under 30 seconds, giving it far more coverage in the same time budget.

4. The Custom Component Problem

The biggest weakness of accessibility tree testing is that it only works well when components actually expose accessibility information. Standard UIKit and SwiftUI components do this automatically: a Button is labeled, a TextField has a placeholder, a NavigationLink has a destination. But custom components are a different story.

Consider a custom-drawn chart component that renders data visualization using Core Graphics or Metal. To the accessibility tree, this might appear as a single opaque view with no children, no label, and no meaningful role. An AI agent using the accessibility tree would not even know the chart exists, let alone be able to test interactions like tapping a data point or pinch-zooming.

Game-like interfaces, custom gesture handlers, canvas-based drawing tools, and video players with custom controls all share this problem. They bypass the standard component hierarchy and render directly to the screen, leaving the accessibility tree incomplete. In practice, this means a significant portion of complex apps have accessibility gaps that make pure tree-based testing insufficient.

The solution is twofold. First, add accessibility labels and traits to custom components. On iOS, this means setting isAccessibilityElement = true, providing an accessibilityLabel, and defining appropriate accessibilityTraits. In SwiftUI, use the .accessibilityLabel() and .accessibilityAddTraits() modifiers. This is work that benefits both disabled users and AI testing agents.

Second, use a hybrid approach. For components with good accessibility support (which should be most of your app), use tree-based testing for speed and reliability. For components with limited accessibility information, fall back to screenshot-based verification. Several tools support this hybrid model, including Fazm for macOS desktop testing, Appium with image comparison plugins, and custom XCTest setups that combine accessibility queries with snapshot assertions.

5. Building a Practical Testing Workflow

Based on how teams are successfully using AI-driven app testing today, a practical workflow looks like this. Start with an accessibility audit of your app. Run Xcode's Accessibility Inspector or the macOS Accessibility Inspector and identify components that are missing labels or have incorrect traits. Fix the most critical gaps first, focusing on primary navigation paths and key user flows.

Next, set up tree-based automated testing for your core user journeys. Tools like XCTest with accessibility identifiers, Appium, Detox for React Native, or AI-powered tools that read the accessibility tree can all drive these tests. The goal is to cover the happy paths (sign up, log in, complete a purchase, send a message) with fast, reliable, non-flaky tests.

Add screenshot testing specifically for visual regressions. This does not need to cover every screen. Focus on screens with complex layouts, custom-rendered content, or visual elements that the accessibility tree cannot capture. Use snapshot testing libraries that support threshold-based comparison (allowing minor rendering differences) to reduce false positives.

For exploratory testing, let an AI agent loose on your app using the accessibility tree. Give it a goal ("find anything that seems broken") and let it systematically explore. The speed advantage of tree-based perception means the agent can cover far more of your app in a given time window. When it finds something suspicious, capture a screenshot for the bug report.

Finally, make accessibility quality a continuous concern, not a one-time audit. Add accessibility assertions to your CI pipeline. Flag new views that are missing labels. Treat accessibility completeness as a code quality metric alongside test coverage. The investment pays dividends three ways: compliance with accessibility standards, better AI testability, and eventually better compatibility with AI agents that your users might employ.

The 8-minute bug discovery that started this conversation was not magic. It was the result of an app with good accessibility implementation meeting an AI agent with the right perception layer. That combination is reproducible: improve your app's accessibility, choose tools that read the accessibility tree, and you will find bugs faster too.

Automate app testing with accessibility APIs

Fazm is a free, open-source AI agent for macOS that uses native accessibility APIs for fast, reliable app interaction. No screenshots, no flaky coordinate matching.

Try Fazm Free

Free to start. Fully open source. Runs locally on your Mac.