Claude Code MCPACP bridgeContext-window hygiene

Claude Code MCP tool results arrive in two wrapper shapes, and one of them is quietly eating your context window

If you embed Claude Code as a subprocess and run MCP servers like Playwright or macos-use behind it, tool results come back in two different JSON wrappers. Image items can carry 500 KB of base64 each. The standard --image-responses omit flag does not always strip them. Here is the extractor Fazm runs in its ACP bridge, what it saves, and why the fix has to live at the bridge rather than the server.

Fazm

Published April 18, 202611 min read

Install Fazm free

4.9from Free Mac app, public acp-bridge source

Dual-format extractor at acp-bridge/src/index.ts:2262 handles direct MCP and ACP-wrapped results

Playwright screenshot tool result drops from ~500 KB to ~691 chars once images are stripped

Works for five built-in MCP servers: fazm_tools, playwright, macos-use, whatsapp, google-workspace

MCP tool results, adult-supervised

Strip images at the extractor, not at the server

Claude Code + MCP returns content arrays of text and image items

ACP hosts re-wrap them as {type:'content', content:{...}}

Base64 screenshots can be 500 KB of useless conversation fuel

Fazm's bridge walks both shapes and drops everything that is not text

Screenshot result: 500,000 -> 691 chars. Same call.

0:00 / 0:05

What the extractor actually saves

These numbers are what the bridge logs when running the same Playwright and macos-use calls with and without the image-stripping extractor on the path. They are not benchmarks; they are the size of a single tool result as it flows back into the Claude Code conversation.

0KBPlaywright screenshot before the fix

0Chars in the same screenshot result after

0KBPlaywright navigate before

0KBSame navigate after (upper bound)

Source: the bridge's per-tool log line at acp-bridge/src/index.ts:2342, which writes Tool completed: <name> output=<N> chars for every completed MCP call.

The whole problem in one line

0KB per screenshot -> 0 chars

When an MCP server returns an image content item inline, the base64 payload shows up directly in the tool result. Do not filter it and every screenshot costs you half a megabyte of conversation. Filter it and the model sees the screenshot path and metadata, which is what it actually wanted.

The two wrapper shapes Claude Code hands you

An MCP server always returns a content array. What gets handed to your relay depends on whether the caller is Claude Code directly or an ACP host embedding Claude Code as a subprocess. If you only handle one of the two shapes, half of your tool results arrive empty.

Direct MCP

[
  { "type": "text", "text": "snapshot saved" },
  { "type": "image",
    "data": "<500KB base64>",
    "mimeType": "image/png" }
]

What a bare MCP server emits. If you parse only this shape you will see empty results under an ACP host.

ACP-wrapped

[
  { "type": "content",
    "content": { "type": "text",
                 "text": "snapshot saved" } },
  { "type": "content",
    "content": { "type": "image",
                 "data": "<500KB base64>" } }
]

What the ACP host hands you when Claude Code is embedded as a subprocess. The actual MCP payload is one level deeper.

Anchor fact: the real extraction loop

This is the code that lives at acp-bridge/src/index.ts around line 2262. It treats both wrapper shapes as first-class in the same loop. Image items and unknown content types are silently dropped. The joined text is what gets forwarded back into the Claude Code conversation.

line 2262

“ACP wraps MCP content items as {type:"content", content:{type:"text"|"image", ...}}. We extract only text items and skip images to keep context small.”

acp-bridge/src/index.ts:2263 (inline comment)

acp-bridge/src/index.ts (around line 2262)

Two passes. First over update.content, then over update.rawOutput if the first pass yielded nothing. In both passes, anything whose type is not text is dropped without a warning. This is the difference between a 500 KB screenshot result and a 691 char screenshot result.

Why the Playwright flag alone is not enough

Playwright MCP ships a CLI flag, --image-responses omit, that is supposed to suppress inline image results. Fazm passes it. With @playwright/mcp@0.0.68 in extension mode (attached to a real Chrome so cookies and logins survive), the flag does not reliably block base64 payloads from reaching tool results. The bridge-side extractor is the actual backstop.

acp-bridge/src/index.ts:1033

The full round trip from tool call to clean context

Five stops between the model asking for a screenshot and the model seeing a summary small enough to think with. No step looks at pixels, no step rehydrates the image, no step leaves base64 in the conversation.

Claude Code calls an MCP tool

The subprocess asks to run a tool like mcp__playwright__browser_take_screenshot or mcp__macos-use__macos-use_refresh_traversal. The call goes out over ACP to the bridge, which forwards it to the registered MCP server.

The MCP server returns a content array

Playwright returns something like [{type:"image", data:"<500KB base64>"}] or [{type:"text", text:"snapshot saved to /tmp/playwright-mcp/x.yml"}] depending on flags and mode. macos-use returns file paths plus a visible_elements sample. Either way, the result is a JSON-RPC response the bridge has to forward back to Claude Code.

The bridge wraps or unwraps the content

ACP hosts sometimes re-wrap MCP content items as {type:"content", content:{type:"text"|"image", ...}}. Direct MCP responses skip that step. The extractor has to treat both as first-class so no server has to change its emit format.

The extractor walks the array and drops images

For each item it pushes text onto a buffer when it matches either {type:"text"} or {content:{type:"text"}}. Image items and unrecognized shapes are silently dropped. The joined text is what flows back into the Claude Code conversation.

The bridge logs size and forwards a truncated preview

The bridge writes "Tool completed: <name> output=<N> chars" so you can watch for leaks. The client UI gets the first 2000 characters for display; the model gets the full extracted text. No base64 touches the conversation state.

End-to-end dataflow through the bridge

Three inputs, one extractor, three outputs. The extractor is the only place that decides what the Claude Code subprocess gets to read.

MCP tool result pipeline

How this compares to the usual approach

A lot of Claude Code integrations assume one MCP wrapper shape and trust the server's image-suppression flag. Both assumptions break in extension mode or under an ACP host. The table below is what Fazm's bridge does differently, feature by feature.

Feature	Typical Claude Code + MCP relay	Fazm ACP bridge
Assumes a single MCP wrapper format	Yes, typically parses {type:"text"} directly	No, handles both direct MCP and ACP-wrapped {type:"content", content:{...}}
Handles image content items	Often forwards them unchanged, filling context with base64	Drops images at extraction; only text items survive
Relies on --image-responses omit	Assumes the flag works and stops there	Passes the flag AND strips at the bridge, because the flag is unreliable in extension mode
Screenshot tool result size	Roughly 500 KB of base64 per screenshot	Roughly 691 characters (path + metadata)
Navigate tool result size	Roughly 60 KB per page load	531 to 3.8 KB depending on the page
Error visibility	Silent; isError often swallowed by the relay	Logged as Tool ERROR or Tool soft-error with first 500 chars
Where the primary input comes from	Screenshots fed into a vision model	Accessibility tree (structured role+label text) from macos-use
Who has to change their server	Each MCP server must emit the expected shape	No MCP server has to change; the bridge adapts to both shapes

Six things the extractor actually does

All of these are small. None of them are clever. They are the rules the bridge follows so that tool results never turn into context-window bombs.

Two wrapper shapes, one loop

The extractor at acp-bridge/src/index.ts:2262 accepts {type:"text"} and {type:"content", content:{type:"text"}} in the same pass, so no MCP server has to care which host is on the other side.

Image items are dropped, not forwarded

Any content item whose type is not text is silently skipped. A single Playwright screenshot that would have been 500 KB of base64 becomes zero bytes of LLM context.

The flag AND the bridge

Playwright MCP is launched with --image-responses omit, but the extractor is what actually keeps the context clean when the flag misbehaves in extension mode.

Observable output sizes

Every tool completion logs output=<N> chars. If you start leaking images you see it in one grep.

Soft errors from Playwright and macos-use are surfaced

Tool text containing error, failed, connection closed, or timeout is logged as Tool soft-error so that silent failures do not disappear into the conversation.

Five built-in MCP servers

fazm_tools, playwright, macos-use, whatsapp, google-workspace. All of them flow through the same extraction pipeline.

What it looks like when it works

Tail the dev log and watch a handful of tool completions go by. Each line ends with output=<N> chars. If you are leaking images, N is in the 100,000 to 700,000 range. If the extractor is doing its job, N is two to four digits, as below.

/tmp/fazm-dev.log

Those four tool calls together are under 5 KB of context. The unfiltered version of the same four calls would be several hundred kilobytes, almost all of it base64 the model will never re-render.

See it yourself on your own machine

Install Fazm, open the floating bar with Cmd + backslash, and ask it to browse somewhere or click a button in any Mac app. Watch the tool completion lines in /tmp/fazm-dev.log. Every result stays in the bytes range, not the kilobytes range.

Download Fazm →

A note on why accessibility APIs make this easier

Most of the pressure on MCP tool result hygiene comes from screenshot-based agents. Every perceive-act cycle pushes another PNG into the conversation. Fazm's primary desktop automation MCP, macos-use, talks to the macOS accessibility tree through AXUIElement APIs. That means the model's primary input is a structured role+label tree written to a .txt file, not a pixel buffer. A full refresh_traversal result for a typical window is a few thousand characters. A screenshot of the same window is hundreds of kilobytes.

The extractor still exists, because screenshots still get taken occasionally (for visual verification), and because third-party MCP servers sometimes emit image items. But the base load on the context window is low by construction. Accessibility-first, screenshot-on-demand, is the cheap way out.

Frequently asked questions

What do Claude Code MCP tool results actually look like on the wire?

Two shapes. Direct MCP spec: a content array of items like {type: "text", text: "..."} or {type: "image", data: "<base64>", mimeType: "image/png"}. When Claude Code is wrapped by an ACP (Agent Client Protocol) host, those items are wrapped again as {type: "content", content: {type: "text" | "image", ...}}. If you parse only the direct shape you will see empty tool results; if you parse only the ACP shape you will miss servers that speak native MCP. Fazm's bridge handles both formats in the same loop.

Why do MCP tool results blow up the context window?

Because image content items carry raw base64 PNG/JPG data inline. A single Playwright screenshot returned by the browser MCP can be 400 to 700 KB of base64 text in one tool result. If you do not strip it before it flows into the LLM conversation, a two-minute browsing session can fill the entire context window with pixel bytes the model cannot usefully read. The same is true for macos-use accessibility traversal screenshots. The fix is to drop image items during extraction, not after.

Does --image-responses omit solve the problem on its own?

No, not reliably. Playwright MCP ships a CLI flag --image-responses omit that is supposed to suppress inline image responses. Fazm passes it at acp-bridge/src/index.ts:1033 together with --output-mode file and --output-dir /tmp/playwright-mcp. In practice, with @playwright/mcp@0.0.68 running in extension mode (attached to a real Chrome), the flag does not actually prevent base64 payloads from appearing in tool results. The bridge-side extraction fix is what actually keeps the context clean. If you rely only on the flag, you will still see 500 KB screenshots show up under some configurations.

Where is the exact file that fixes this?

acp-bridge/src/index.ts, starting at line 2262 with the comment "ACP wraps MCP content items as {type:\"content\", content:{type:\"text\"|\"image\", ...}}". The loop walks the content array, pushes text for direct-MCP {type: "text"} items, pushes text for ACP-wrapped {type: "content", content: {type: "text"}} items, and skips everything else. Line 2285 does a second pass on rawOutput with the same rule. That double-pass is what makes the bridge resilient to whichever wrapper a given MCP server happens to emit.

What sizes does this actually save?

Playwright screenshot tool result: roughly 500 KB of base64 before the fix, around 691 characters after. Playwright navigate tool result: roughly 60 KB before, 531 to 3.8 KB after depending on the page. Those numbers are observed from the bridge's tool result logs when running the same action with and without the extraction patch. They also match what you see in the bridge log message "output=<N> chars" at acp-bridge/src/index.ts:2344.

Do I need to do this if I am just using Claude Code in the terminal?

Usually not. The Claude Code CLI applies its own filtering to tool results and truncates obvious image payloads before they hit the context. The problem appears when you embed Claude Code as a subprocess, like Fazm does, and relay tool results through your own transport. ACP (Agent Client Protocol) changes the wrapper shape, and any MCP server that returns image content can leak through if your relay does not filter by content item type.

What MCP servers ship inside Fazm?

Five, hardcoded in BUILTIN_MCP_NAMES at acp-bridge/src/index.ts:1266. fazm_tools (in-process), playwright (browser automation), macos-use (a native Swift binary that talks to the macOS accessibility tree), whatsapp, and google-workspace. Any additional MCP servers the user installs are treated as user-space and also flow through the same result-extraction pipeline.

Why use accessibility APIs instead of screenshots at all?

Screenshots are large, noisy, and force the model to re-interpret pixels into UI semantics it could have gotten for free. The macOS accessibility tree is structured text: role, label, position, size, visible flag, children. One refresh_traversal call returns a .txt file the model can grep. Fazm's macos-use MCP reads AX directly, so the primary tool result is structured text, not a screenshot. That is the reason a single macos-use click round trip can be a few hundred characters where a screenshot-first agent would emit several hundred kilobytes.

How do I know my MCP tool results are leaking images into context?

Look at the bridge log line right after any tool completes. Fazm writes "Tool completed: <name> (id=<id>) status=completed output=<N> chars" at acp-bridge/src/index.ts:2342-2346. If a screenshot or navigate tool result shows output well above 50 KB you are leaking base64. If it shows a few hundred to a few thousand characters, the filter is working. You can also grep your transport logs for /^data:image\/png;base64,/ to see if raw payloads are making it through the relay.

What happens when an MCP tool actually returns an error?

The bridge checks update.isError from the MCP protocol and, separately, looks for error-ish words in the extracted text output. If either trips, the result is logged as "Tool ERROR: <name> error=<first 500 chars>" so it lands in Sentry breadcrumbs. Soft errors (no isError flag but the text contains error, failed, connection closed, or timeout) are logged as "Tool soft-error" specifically for mcp__playwright and mcp__macos-use, because those are the two MCP families that most often fail quietly and still return a 200. That logic lives around acp-bridge/src/index.ts:2300-2313.

Is Fazm a developer framework or a consumer app?

A consumer app. It is a signed, notarized Mac app you download and install. The Claude Code subprocess and the MCP bridge run locally. You do not wire up an SDK, you do not paste API keys at install time, and you do not deploy anything. The same plumbing described on this page, ACP wrapper handling, image stripping, dual-format extraction, runs for every user the same way.

Can I verify the numbers and code locations myself?

Yes. The acp-bridge source tree is public. Relevant lines: 1033 (Playwright MCP flags), 1266 (BUILTIN_MCP_NAMES), 2262 to 2298 (dual-format text extraction), 2342 to 2346 (output-size logging), 2542 (startup config log line). Open acp-bridge/src/index.ts and search for the inline comment that says ACP wraps MCP content items to land right on the fix.

Try the app that lives behind this page

Fazm is a consumer Mac app that runs Claude Code locally with five built-in MCP servers. The ACP bridge described here is what every tool result flows through. Free to install, no API keys at setup, every number on this page is pulled from the public source.

Install Fazm free →