Automating Hundreds of Screenshots with Desktop Accessibility APIs

Fazm Team··2 min read

Automating Hundreds of Screenshots with Desktop Accessibility APIs

When you need to create hundreds of specific screenshots - different states, different screens, different configurations - doing it manually is not just tedious. It is error-prone. You miss states. You forget to resize windows. You accidentally capture your notification bar.

Why Pixel-Based Approaches Fail at Scale

The naive approach is to script mouse movements and clicks, then capture screenshots at fixed intervals. This breaks constantly. A dialog box appears one pixel to the left. The app takes 200ms longer to load. A system notification pops up mid-capture.

Screenshot automation needs to be state-aware, not position-aware.

Accessibility APIs Make It Reliable

On macOS, the accessibility API (AXUIElement) gives you the actual UI tree. You can query for specific elements by role, title, or identifier. Instead of "click at position 340, 220" you say "click the button labeled Submit." Instead of "wait 2 seconds" you say "wait until the loading spinner disappears."

This means your screenshot automation adapts to layout changes. If the button moves, the script still works. If the app takes longer to load, it waits for the right state before capturing.

The Workflow

A practical screenshot automation pipeline looks like this:

  1. Launch the target app and wait for it to reach a known state
  2. Navigate to each screen using accessibility API element references
  3. Set up the specific state you need (toggle settings, enter data)
  4. Verify the expected elements are visible before capturing
  5. Capture with ScreenCaptureKit for pixel-perfect results
  6. Move to the next configuration

The key insight is verification before capture. Do not take the screenshot until you have confirmed the UI is in exactly the right state.

Scaling to Hundreds

Once you have the pattern, scaling is straightforward. Define your screenshot manifest as a list of states and screens. The automation walks through each one, verifying and capturing. If something fails, it logs the failure and moves on instead of corrupting the rest of the batch.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts