Back to Blog

385ms Tool Selection Running Fully Local - No Pixel Parsing Needed

Fazm Team··2 min read
speedlocal-aiaccessibility-apiapple-siliconperformance

Most desktop AI agents follow the same cycle: take a screenshot, send it to a vision model, parse the response, figure out where to click, and then click. Each step adds latency. A single action can take two to five seconds, and multi-step workflows compound that delay into minutes of waiting.

There's a faster path. macOS accessibility APIs expose the UI structure of every application as structured data. Buttons have labels. Text fields have values. Menus have items. The agent doesn't need to look at pixels and guess - it reads the actual UI tree directly.

Why Structured Data Is Faster

Screenshot-based agents do three expensive things: capture a frame buffer, run vision inference to identify elements, and map pixel coordinates back to clickable targets. Each step involves significant computation and potential errors.

Accessibility API agents skip all three. They query the app's UI hierarchy, get back a structured tree of elements with their roles, labels, positions, and states, and select the right one by matching against the task. No image processing, no coordinate mapping, no ambiguity about what a button says.

The result is tool selection in 385ms on Apple Silicon, running fully local. That's from hearing the command to knowing which element to interact with. The actual interaction - clicking, typing, selecting - is near-instant after that.

What This Means in Practice

Sub-second responses change how you interact with the agent. Instead of saying a command and waiting, you speak and the action happens while you're still thinking about the next thing. Multi-step workflows that took 30 seconds with screenshot parsing finish in under five.

Speed is not just about performance benchmarks. It's about whether the agent feels like an extension of your thinking or an interruption to it. At 385ms, it feels like the former.

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts