Context EngineeringClaude Code + Mac

Claude Code context management is about the medium, not the command.

Every top guide covers the same five levers. CLAUDE.md, slash /compact, slash /clear, subagents, plan mode. They all work. They also all assume text input. The moment Claude Code drives a graphical app, the variable that actually decides your token budget is whether you feed the screen as pixels or as an accessibility tree. This page walks through the protocol-level details, the file paths, and the compact_boundary events you can watch fire live.

Matthew Diakonov, Written with AI

Published April 18, 202611 min read

Try Fazm free

4.9from 200+

compact_boundary events surfaced in real time

MAX_IMAGE_TURNS = 20 per session (hard cap)

Accessibility tree beats screenshots 10x on tokens

The Sixth Lever

Claude Code context management when the task is not code

CLAUDE.md, /compact, /clear, subagents, plan mode. Five classic levers.

All five assume the input is text you type.

When the input is a GUI, the medium decides your budget.

Screenshots: ~2,000 tokens per turn. Accessibility tree: ~200.

Fazm caps images at 20 and streams compact_boundary events live.

0:00 / 0:05

What every top guide covers. And what they all miss.

Search "claude code context management" and you will find the same five techniques in different orders. Put persistent rules in CLAUDE.md. Run /compact when the window approaches capacity. Run /clear when a failed approach pollutes the session. Fan work out to subagents so the main thread sees only final answers. Stay in plan mode for exploration, flip to auto-accept for execution.

These are all correct. They also all describe a world where Claude Code talks to files and a shell. The advice does not cover the other thing people are increasingly doing with Claude Code: pointing it at a real graphical desktop app, asking it to act on behalf of a human who is not a developer, and expecting the session to last long enough to actually finish a task.

In that world, there is a sixth lever. It is not documented on code.claude.com. It dominates the other five combined. It is the choice of how you represent the screen.

How the screen becomes tokens

Every turn, the same physical screen becomes either two thousand tokens of images or two hundred tokens of text. The encoder is the lever.

The anchor: a single constant in acp-bridge/src/index.ts:793

Fazm ships an Agent Client Protocol bridge that sits between the desktop app and the Claude Code SDK. Inside that bridge there is exactly one constant whose name tells you the entire philosophy: MAX_IMAGE_TURNS. It caps how many turns per session may contain a screenshot at twenty. After the twentieth, the bridge stops attaching images even if the agent asks for one. The accessibility tree text keeps flowing, so the agent keeps working, but the context window stops growing at the rate images would push it.

acp-bridge/src/index.ts

Twenty is not an arbitrary number. It is the point where Claude's stricter per-image size limit kicks in after a session has accumulated many images, and it is also roughly where a screenshot-only agent would trigger its first compact_boundary. Cap there, and the session stays stable for the rest of the task.

One hundred turns, two futures

The same Mac task expressed two ways. On the left, the pixel path most computer-use agents take by default. On the right, the accessibility tree path Fazm uses unless the UI is genuinely visual. The code is boring. The math is not.

Screenshot vs accessibility tree for a single turn

// Screenshot path: every turn sends 1,700 to 3,000 image tokens

const screenshot = await captureScreen();       // Core Graphics
const base64     = screenshot.jpeg(0.7);         // 1568px max edge
await claude.message({
  content: [
    { type: "image",  source: { data: base64 } },  // ~2,000 tokens
    { type: "text",   text: "click the sidebar" }, //    ~10 tokens
  ],
});
// Per turn:  ~2,010 tokens
// 100 turns: ~200,000 tokens -> compact_boundary fires ~4x

-17% fewer tokens per turn

The numbers that decide the bill

Every one of these is a concrete constant either in Fazm's source, Claude's API documentation, or observable via the compact_boundary event itself.

0Claude context window (tokens)

0MAX_IMAGE_TURNS per session

0Image tokens for a 1440x900 screenshot (avg)

0Median tokens for an AX-tree turn

Conservative cost delta, 100-turn GUI task

Typical compact_boundary events saved per hour

CLAUDE.md tweaks needed to get it

compact_boundary is the event you should actually watch

When the Claude Code SDK decides the context window is about to overflow, it automatically summarizes older turns and emits a system message with subtype compact_boundary. The payload has a trigger (auto or manual) and a preTokens count. The stock ACP agent.js drops that message. The patched entry point in Fazm intercepts it and forwards it to the downstream protocol so the UI can actually show it.

acp-bridge/src/patched-acp-entry.mjs

The downstream handler then forwards it again to the Swift side so a user-visible indicator can fire:

acp-bridge/src/index.ts

What a compaction actually looks like in flight

This is the full trip of a compact_boundary event from Claude back to your Mac screen. Every hop logs, so you can verify it yourself by tailing /tmp/fazm-dev.log.

compact_boundary event path, Anthropic to UI

What you actually see in the log

Run a long workflow, then tail the dev log. Every compaction shows with a trigger and a preTokens count. The first fire tells you when your chosen input medium hit the wall. A well-configured accessibility-tree session usually never sees this line.

Watching compact_boundary events live

The six levers of Claude Code context management

Five are in every top guide. The sixth is only visible once Claude Code leaves the terminal. On any GUI task, it dominates the other five combined.

Context
budget

CLAUDE
.md

/compact

/clear

Sub
agents

Plan
mode

Input
medium

Where your context actually goes on a Mac automation

Four pollutants eat context faster than anything else. Three of them have classic fixes. The fourth is the one every guide skips.

Screenshot on every turn

1,700 to 3,000 image tokens per frame. Fifteen turns into a task, you have spent 30,000 tokens on visual signal alone, before any real reasoning tokens. This is the dominant pollutant on GUI work. Fix: accessibility tree first, screenshot only on genuinely visual UIs, cap with MAX_IMAGE_TURNS.

Unbounded session history

Every new turn replays the full conversation. Cache reads cover most of the cost, but the count still grows. Fix: let compact_boundary fire, or start a fresh session.

Large file reads

A 10,000 line source read is 60,000 to 80,000 tokens and it never comes back out. Fix: read narrow sections, farm deep reads to subagents that only return a summary.

Verbose tool output

Dumping full shell output, long JSON, or CSV into context is the same problem as a big file read. Fix: pipe through a parser, emit only the shape or totals the agent needs next.

Static system prompt growth

Rules accumulate in CLAUDE.md. Each new rule lives in every turn. Fix: move narrow rules to subagent-scoped files and let the main CLAUDE.md keep only project-wide invariants.

Screenshot-first vs accessibility-first, 100 turns in

The same Mac task, a multi-app workflow (Xcode, Finder, Slack, Notion), run by two identical Claude Code agents that differ only in how they represent the screen.

Input medium, end to end

A full image of the current display arrives on every turn. The model reads pixels, infers role and position, pays image-token cost on each frame. By turn 90 the window has compacted three to four times, each compaction losing some fidelity.

~200,000 image tokens over 100 turns
3 to 4 compact_boundary events per run
Pixel hallucinations trigger retries
Cost delta: 10x to 30x on long workflows

How Fazm actually manages context on a 100-step workflow

You never call /compact. You never call /clear. The medium carries the load, and the bridge enforces two or three safety caps so a long session finishes cleanly.

Accessibility tree is the default input

The bundled macos-use MCP server reads the focused window via the macOS AX API and returns a tree of role + label + coordinates. Median turn is under 200 tokens. No screenshots unless the agent asks.

Screenshot is a deliberate opt-in, not a default

When the agent decides the UI cannot be described in text (a canvas, a PDF viewer, an image), it calls the capture tool explicitly. ScreenCaptureManager.swift downscales to 1,568px longest edge and JPEGs to under 3.5 MB so a single image never exceeds Anthropic's per-image limit.

MAX_IMAGE_TURNS = 20 caps the pixel budget

A counter per session, reset on session delete. After twenty image-bearing turns, subsequent turns drop the image attachment even if the agent asked. Accessibility-tree text keeps flowing. The agent does not stall; the context window does.

compact_boundary is surfaced to the UI, not swallowed

The Claude Code SDK ships a stock agent entry point that drops system messages. Fazm's patched-acp-entry.mjs intercepts them and forwards compact_boundary and status_change so you see exactly when a compaction happened and how many tokens were in flight at the boundary.

Per-turn cost delta is visible in real time

total_cost_usd comes back on every result. Fazm subtracts the previous session cost to show the delta for just this turn. If an image-heavy turn costs five cents more than the previous AX-tree turn, you see it immediately and can adjust.

The short version, for a Reddit thread.

Classic Claude Code context hygiene (CLAUDE.md, /compact, /clear, subagents, plan mode) is still right. It just stops being the bottleneck the moment you ask Claude Code to act on a graphical app. At that point the bottleneck becomes whether each turn is paying two thousand tokens of pixels or two hundred tokens of structured text. Cap the pixel turns, read the accessibility tree by default, and watch compact_boundary events to verify your choice is holding.

MAX_IMAGE_TURNS = 20 at acp-bridge/src/index.ts:793 is the one line in the Fazm source that captures the entire strategy.

Questions that actually come up

Frequently asked questions

What is the one Claude Code context management lever that every guide misses?

The input medium. All five widely cited levers (CLAUDE.md for persistent instructions, /compact to summarize history, /clear to reset, subagents for context isolation, plan mode for intent-only turns) assume text input. The moment Claude Code drives a GUI, the dominant factor becomes how you represent the screen. A single 1440x900 screenshot runs 1,700 to 3,000 image tokens per turn. The same screen rendered as an accessibility tree is under 200 tokens of text. Over 100 turns that difference is the full 200K context window.

What is compact_boundary and why does it matter for context management?

compact_boundary is a session update event the Claude Code SDK emits when it automatically compacts conversation history because the window is about to overflow. The payload carries a trigger field (auto, manual) and a preTokens count so the client knows how much context was consumed before compaction. Most tools silently drop this event. Fazm captures it at acp-bridge/src/index.ts:2367-2373 and forwards it to the UI so you see, in real time, when a summary replaced your history and how many tokens were in flight at the boundary.

What is MAX_IMAGE_TURNS = 20 in Fazm and why does it exist?

MAX_IMAGE_TURNS = 20 is a session-scoped cap defined at acp-bridge/src/index.ts:793. It counts how many turns in a given session include image input (screenshots). After 20 such turns the bridge stops sending images in subsequent turns, even if the agent asks for one. Accessibility-tree text still flows. The cap exists because Claude's API enforces a stricter per-image size limit once a session has accumulated many images, and because each additional screenshot pushes the context window closer to the compact_boundary trigger. Twenty is the point where cost and stability peak for a typical Mac automation workload.

Does Fazm use screenshots at all, or only the accessibility tree?

It uses both, but asymmetrically. The accessibility tree (via the bundled macos-use MCP server) is the default medium for every turn. Screenshots are only requested when the agent explicitly decides the UI cannot be described in text (a canvas, a PDF viewer, an image). Even then, the screenshot goes through ScreenCaptureManager.swift, which downscales to a 1,568px longest edge and JPEG-compresses to stay under Claude's 3.5 MB per-image limit. Screenshot turns still count against MAX_IMAGE_TURNS.

How does Fazm track per-turn token cost to manage context?

The patched ACP entry point at acp-bridge/src/patched-acp-entry.mjs:45-49 captures total_cost_usd and usage on every result message. It computes _lastCostUsd = item.value.total_cost_usd - prevSessionCost so every UI turn has a dollar delta attached. The same path captures inputTokens, outputTokens, cache_read_input_tokens, and cache_creation_input_tokens so you can see when cache hits are saving you and when cache misses are inflating the bill. This is how context-management decisions become visible.

Can I still use /compact and /clear with Fazm like I do in the terminal?

The underlying Claude Code SDK exposes the same compaction mechanics, and compact_boundary events fire automatically when the window fills. /clear-equivalent behavior happens by starting a fresh session (the Mac app recreates the session rather than accumulating history). What changes is that you do not have to invoke either manually to keep a long Mac workflow alive, because MAX_IMAGE_TURNS prevents the pixel budget from dominating before compaction would normally trigger.

Where in the source code can I verify these claims?

The repo is github.com/mediar-ai/fazm. Four files matter for context management. acp-bridge/src/index.ts:793 defines MAX_IMAGE_TURNS. acp-bridge/src/index.ts:2367-2373 forwards compact_boundary events. acp-bridge/src/patched-acp-entry.mjs:65-72 patches the Claude Code SDK iterator to surface compaction at all (the stock SDK drops system messages). Desktop/Sources/FloatingControlBar/ScreenCaptureManager.swift handles the 1,568px downscale and 3.5 MB JPEG cap that keeps any screenshot turn inside Anthropic's per-image limit.

Does the accessibility tree approach actually survive apps with big complex UIs?

Yes, because the tree is hierarchical and the agent reads the subtree it needs. A full Xcode window might have 3,000 elements, but the tree walker descends only into the focused window and active panes. Median element text is eight to twelve tokens. Even a dense productivity app rarely exceeds 500 tokens for a full tree read, and narrow reads (one pane, one dialog) are closer to 50 tokens. This is why a 100-turn workflow on a Mac can finish inside a single context window when a screenshot-only agent would have compacted three times.

How is this different from Claude Computer Use or OpenAI Operator?

Both of those products are vision-first: the canonical input on every turn is a pixel screenshot. Anthropic sized Computer Use around that fact and publishes image-token pricing accordingly. Fazm inverts the default: structured text is the canonical input, and vision is a fallback. The consequence is 10x to 30x lower per-task context consumption, faster turns (less to compress), and fewer hallucinated clicks because the model is reading role and coordinate fields instead of inferring them from pixels.

Drive your Mac with Claude Code, without blowing the context window

Fazm runs Claude Code against real macOS accessibility APIs instead of screenshots. compact_boundary stays quiet. total_cost_usd stays low. Free to try, open source.

Try Fazm free →

Context management beyond the terminal.

Five classic levers plus one that nobody writes about. Fazm implements all six. Read the source at github.com/mediar-ai/fazm or install the app and watch it work.