Back to Blog

Accessibility Tree Dumps Overflow LLM Context Windows - How to Fix It

Fazm Team··3 min read
accessibility-treecontext-windowllmmacosoptimizationdesktop-agent

Accessibility Tree Dumps Overflow LLM Context Windows

A single accessibility tree dump from a macOS application can easily reach 24KB of raw data. Feed that into an LLM context window a few times during a task, and you have consumed most of your available tokens on structural data that the model barely needs.

This is one of the most common performance killers for desktop AI agents.

The Problem

When an AI agent inspects a macOS application through the Accessibility API, it gets back a hierarchical tree of every UI element - buttons, text fields, labels, scroll views, table cells, and all their attributes. A moderately complex app like Slack or Chrome can produce thousands of nodes.

Most of those nodes are irrelevant to the task at hand. But by default, agents dump the entire tree into their context to "understand" what is on screen. Three or four of these dumps and you have burned through 100KB of context on raw element descriptions.

The Fix

The solution is straightforward: do not put the raw tree in the context window.

  1. Write to /tmp files - dump the full accessibility tree to a temporary file on disk for reference
  2. Return summaries - send back a 5-10 line summary of what is on screen: the key interactive elements, their states, and their positions
  3. Query on demand - when the agent needs details about a specific element, read just that subtree from the file

This approach reduced context consumption from 24KB per inspection to under 500 bytes in practice.

Why This Matters for Agent Performance

Context window space is the scarcest resource for any LLM-powered agent. Every byte of irrelevant data:

  • Pushes out conversation history and task context
  • Increases inference cost on token-priced APIs
  • Reduces the model's ability to reason about the actual task
  • Slows down response generation

Treating the accessibility tree as a database to query rather than a document to read is the key insight. The agent does not need to see every button on screen - it needs to know which buttons are relevant.

Practical Implementation

For macOS agents using the Accessibility API, write a thin wrapper that captures the full tree, saves it to /tmp, and returns a structured summary. Include element references so the agent can request details about specific nodes when needed.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts