Building UI/UX Testing Skills for Claude Code with Screenshots and Accessibility Trees

Fazm Team··3 min read

Building UI/UX Testing Skills for Claude Code with Screenshots and Accessibility Trees

Most AI coding agents can write UI code but cannot verify what it looks like. They generate SwiftUI views, React components, or CSS layouts and then hope for the best. The feedback loop is broken because the agent never sees the result.

The fix is surprisingly straightforward - give the agent two things: a screenshot of the rendered UI, and the accessibility tree of the same screen.

Why Both Signals Matter

Screenshots alone are not enough. An LLM looking at a screenshot can tell you "there is a button in the top right" but it cannot reliably distinguish between a button that is 44 pixels wide and one that is 40 pixels wide. It cannot read small text consistently. It hallucinates element positions.

The accessibility tree fills the gap. It provides structured, machine-readable data about every element on screen - its role, label, position, size, and state. When Claude Code reads the accessibility tree, it knows exactly which elements exist, what they are called, and where they are located.

Combine the two and you get a verification system that actually works. The screenshot provides visual context - colors, layout, spacing. The accessibility tree provides precise element data - "this button has label 'Submit' at coordinates (200, 400) with size 120x44."

Building the Skill

A Claude Code skill for UI testing follows this pattern: build the app, launch it, navigate to the target screen using accessibility API actions, capture both a screenshot and the accessibility tree, then pass both to the LLM for analysis.

The key insight is that accessibility APIs are not just for testing accessibility compliance. They are the best available interface for an AI agent to understand and interact with native application UIs. They were designed to describe UI elements to non-visual consumers, which is exactly what an LLM needs.

Practical Results

With this approach, Claude Code can verify that a UI change looks correct, catch layout regressions, and confirm that interactive elements are properly wired up. It turns "I wrote the code, you go check it" into "I wrote the code and confirmed it renders correctly."

The dual-signal approach - visual plus structured - is more reliable than either signal alone. It is the same pattern that makes screen readers effective, applied to a different kind of non-visual consumer.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts