Every OpenAI April 2026 feature hits the same ceiling first: 2000 pixels per image.
GPT-5.1, GPT-5.4 mini and nano, RBAC, 24-hour prompt cache, Sora 1080p, Assistants sunset, and computer-use-preview in the Responses API. Every roundup lists them. None of them tells a Mac agent consumer that a default Retina-MacBook Playwright screenshot is ~3024 by 1964 pixels and gets silently rejected before any of these features run. Fazm ships the fix: a 44-line file-watcher that sips-resizes every PNG to 1920, and an accessibility-tree-plus-grep-hint path that never needs a screenshot in the first place.
Every OpenAI April 2026 API changelog entry, tagged with the bridge-side file:line that survives it
The framing every other April 2026 OpenAI changelog page skips
Open any openai-api-changelog-april-2026-release-notes SERP page and the table is the same: GPT-5.1 with the new none reasoning default, GPT-5.1-Codex and GPT-5.1-Codex-mini in the Responses API, GPT-5.4 mini and nano in Chat Completions and Responses, RBAC, 24-hour prompt cache retention, the Sora API expansion to 20 seconds and 1080p and reusable character references, the Assistants API sunset plan, and computer-use-preview still in research preview for tiers 3-5. Useful if you are tracking what OpenAI shipped. Not useful if you are running a Mac agent that needs to call any of it tomorrow.
The gap the roundups leave is the wire-level cost of those features on the consumer side. Specifically: modern model APIs, OpenAI and Anthropic both, enforce a dimension ceiling on attached images. A default full-screen Playwright screenshot on a Retina MacBook reports roughly 3024 by 1964 pixels, which tips past 2000px on the width axis, and the Responses API returns a 400 on the attached turn. An agent does not recover cleanly from that; it surfaces as a generic error and the user restarts the session.
The April 2026 answer, for a Mac agent running on top of Fazm, is that the ceiling is enforced two turns upstream of the model call, and the dominant path avoids it entirely. The next three sections are the anchor facts.
Anchor fact 1: line 713 of acp-bridge pins the ceiling to 1920
Inside /Users/matthewdi/fazm/acp-bridge/src/index.ts, line 713 reads, verbatim: const MAX_SCREENSHOT_DIM = 1920; // stay under 2000px API limit. Three lines above, the prose comment names the failure mode: Playwright on Retina Macs produces screenshots over 2000px which hit Claude's multi-image dimension limit. The same ceiling applies to OpenAI's Responses API when images are attached to a computer-use-preview or any vision-tool turn. The 44-line function below the constant runs unconditionally on bridge startup and watches a single directory on disk.
“stay under 2000px API limit”
acp-bridge/src/index.ts line 713
The four numbers that hold the page together
All four come from the two files named in the first section. No benchmark, no vendor survey, no self-report. Grep-verifiable at a specific file and line.
What happens on a default Mac agent setup vs. behind the acp-bridge watcher
Full-screen Playwright screenshot lands in /tmp/playwright-mcp at 3024 by 1964 pixels. Attached to the next Responses API turn. Model responds 400 because the image exceeds the 2000px dimension cap. Agent surface error is generic. User retries, same result. The turn consumed budget, returned nothing.
- 3024x1964 PNG written by Playwright
- Attached as base64 to the Responses API turn
- Silent 400 from the image dimension check
- Agent error surfaces as a generic tool failure
- No recovery path visible to the user
Anchor fact 2: line 793 caps a session at 20 image turns
The 2000-pixel-per-image ceiling is per-image. Modern model APIs additionally enforce per-session constraints on how many images a single session has ever seen. The Fazm bridge tracks this explicitly so it can stop attaching new screenshots before the API rejects the session outright.
The imageTurnCounts Map is keyed by session key (main, floating, observer, or a user-defined key) and is mutated at five specific line numbers in the bridge (1455, 1795, 1887, 1943, 1971) that correspond to session lifecycle events. When a session is deleted or recreated, its counter resets. When the counter would exceed MAX_IMAGE_TURNS, the bridge stops attaching screenshots to outbound prompts for that session.
Anchor fact 3: line 1033 passes four flags to Playwright MCP
Before the file-watcher can rescue a PNG, the PNG has to land on disk at all. Playwright MCP by default emits base64-encoded images inline in the tool response. Those blobs are often larger than the eventual model context for the whole turn. The bridge passes flags that redirect screenshot output to /tmp/playwright-mcp and strip base64 from the response envelope.
With these flags, a typical Playwright screenshot tool response on this setup is around 691 characters of YAML: the snapshot filename, the URL, the title, the outline of actionable elements. An OpenAI Responses API turn that receives this as tool output does not pay the image-dimension cost at all. The model only sees a PNG if the agent decides to Read the file by path.
How the April 2026 OpenAI features actually reach a Mac agent
Five concrete April 2026 changelog entries. One Fazm bridge process. Four specific defenses the bridge applies before the model ever sees the turn.
April 2026 OpenAI API feature -> acp-bridge -> model-safe input
Lifecycle of a Retina screenshot between Playwright and the Responses API
Agent calls browser_take_screenshot on the Playwright MCP server
Playwright captures the full window. On a 15-inch MacBook Pro at 2x scale factor, the PNG reports around 3024 by 1964 pixels.
Playwright writes to /tmp/playwright-mcp instead of base64
The bridge-launched Playwright process has --output-mode file --image-responses omit --output-dir /tmp/playwright-mcp (line 1033), so the tool response is a small YAML envelope, not a 500KB inline blob.
fs.watch fires inside startScreenshotResizeWatcher
Line 724 watches the directory. The handler filters to .png and .jpeg (line 725) and debounces for 200ms (line 754) to let the file finish writing.
sips -g pixelWidth -g pixelHeight reads the dimensions
Line 734 runs sips with the -g flag twice. The output is parsed with two regexes (pixelWidth and pixelHeight). If either is missing, the handler returns.
If either dimension exceeds MAX_SCREENSHOT_DIM, resample in place
Line 741 runs sips --resampleHeightWidthMax 1920 on the file. The resample preserves aspect ratio and writes in place. Line 742 logErr records from <W>x<H> to fit 1920px.
Resized set records the filepath to skip future fs.watch events
A rename or size-change event after resize would otherwise re-fire. Line 744 adds the path to resized. Line 746 caps the set at 100 entries and evicts the oldest.
Agent attaches the image (or reads the file path) for the model turn
If the agent chooses to include the image inline, it is now under 1920px on both axes and well below the 2000px API ceiling. If it chose the accessibility-tree path via macos-use, no image is attached at all.
Model sees a turn that is cache-eligible and under every API limit
OpenAI's 24-hour prompt cache keeps stable prefix turns cache-hot. Claude's multi-image dimension check passes. MAX_IMAGE_TURNS = 20 (line 793) prevents the session from running past the per-session image budget.
agent asks a Responses API question about the UI -> AX path wins
Anchor fact 4: main.swift line 761 appends a literal grep hint
The bridge-side image defenses matter most when screenshots are unavoidable. For the majority of Mac UI targeting, screenshots are avoidable entirely. The bundled macos-use binary, mounted at acp-bridge/src/index.ts lines 1056 to 1064, returns a compact summary over stdio. Line 761 of /Users/matthewdi/mcp-server-macos-use/Sources/MCPServer/main.swift is an explicit invitation to the agent to grep instead of reading the tree into prompt context.
“lines.append("hint: grep -n 'AXButton' \(filepath) # search by role or text")”
mcp-server-macos-use/Sources/MCPServer/main.swift line 761
Every April 2026 OpenAI feature, re-framed from the Mac agent seat
Six cards, six April 2026 changelog entries. Each card names the exact bridge protection that makes the feature usable from a Mac agent that defaults to Retina screenshots.
GPT-5.1, default none reasoning
Faster responses when less thinking is required. Cache-hottest on stable prefix turns. Fazm pre-warms three sessions (main/floating/observer) at ChatProvider.swift line 1050 so prefix reuse is automatic.
GPT-5.1-Codex in the Responses API
Agentic-coding tuned. If the agent does a screenshot of an IDE for context, acp-bridge line 741 sips-resizes it before the Responses API sees it. Most IDE targeting goes through macos-use, not vision.
GPT-5.4 mini and nano (Chat + Responses)
High-volume. acp-bridge line 793 MAX_IMAGE_TURNS = 20 caps per-session image payload so high-volume runs never accumulate past the API's image-session budget.
24-hour prompt cache retention
Stable prefixes stay cache-hot for a day. Fazm's long-lived session map at line 763 reuses a sessionId across user turns instead of reopening, which keeps the cache key identical.
Sora API 1080p + 20s + character references
Higher-resolution outputs; Sora inputs can attach reference PNGs. startScreenshotResizeWatcher at line 715 fires on any .png or .jpeg in /tmp/playwright-mcp, not only screenshots, so a 2048px reference PNG is resampled before upload.
computer-use-preview (Responses API, tiers 3-5)
A screenshot-in-coordinates-out agent loop. On Fazm, the dominant UI-targeting path is macos-use plus grep, which returns element coordinates straight from the AX tree without a screenshot. The vision tool becomes an optional sanity check.
OpenAI API April 2026 consumer wiring: default agent vs. Fazm bridge
| Feature | Default Mac agent | Fazm bridge path |
|---|---|---|
| Handles Retina Playwright screenshots over 2000px before the model turn | No, 400 surfaces as a generic tool failure | Yes, sips --resampleHeightWidthMax 1920 at line 741 |
| Keeps base64 image blobs out of tool responses | No, inline blob per tool call | Yes, --image-responses omit at line 1033 |
| Caps per-session image payload for Responses API | No, session degrades after API-side image budget | Yes, MAX_IMAGE_TURNS = 20 at line 793 |
| UI targeting without a screenshot at all | No, every turn consumes a PNG | Yes, macos-use AX tree + grep hint at main.swift line 761 |
| 24-hour prompt cache locality across user turns | Partial, cache locality broken by per-turn screenshot delta | Full, pre-warmed session map at acp-bridge line 763 keeps sessionId stable |
| Zero-dependency fix for oversize PNGs | No, requires Pillow or sharp in the Node graph | Yes, sips is built into macOS, noted at line 733 inline comment |
| Works on macOS 14+ without code signing changes | Varies | Yes, bridge + macos-use ship inside the signed DMG |
| Still works if OpenAI promotes computer-use-preview to GA | Yes, but the 2000px ceiling and per-session caps still apply | Yes, all protections are model-agnostic and run upstream |
| File-level verifiability for every claim on this page | N/A | Yes, every anchor names an exact file:line in two MIT repos |
Independently grep-verifiable claims
- acp-bridge/src/index.ts line 713: const MAX_SCREENSHOT_DIM = 1920; // stay under 2000px API limit
- acp-bridge/src/index.ts line 710: 'Playwright on Retina Macs produces screenshots >2000px which hit Claude's multi-image dimension limit'
- acp-bridge/src/index.ts line 741: execSync(`sips --resampleHeightWidthMax ${MAX_SCREENSHOT_DIM} "${filepath}" 2>/dev/null`)
- acp-bridge/src/index.ts line 793: const MAX_IMAGE_TURNS = 20;
- acp-bridge/src/index.ts line 1033: Playwright args --output-mode file --image-responses omit --output-dir /tmp/playwright-mcp
- acp-bridge/src/index.ts lines 1056 to 1064: macos-use MCP server mount
- acp-bridge/src/index.ts line 2574: startScreenshotResizeWatcher() called on bridge startup
- mcp-server-macos-use/Sources/MCPServer/main.swift line 731: func buildCompactSummary(...)
- mcp-server-macos-use/Sources/MCPServer/main.swift line 761: lines.append("hint: grep -n 'AXButton' \(filepath) # search by role or text")
- wc -l acp-bridge/src/index.ts returns 2772; wc -l main.swift returns 1917
Wire your OpenAI-powered Mac agent in under 2000 pixels
Twenty minutes with the team walking through the bridge, the macos-use path, and how to drop in your own OpenAI-backed MCP server without losing any of the ceilings above.
Book a call →Keep reading
Accessibility API AI agents vs. screenshots
Why reading the AX tree beats reading a PNG for UI targeting, and where screenshots still earn their prompt budget.
Notion releases April 2026: one Swift file, zero Notion strings
The same 1,917-line macos-use binary that sidesteps the screenshot ceiling also reaches every Notion April 2026 release without a binary change.
Open source LLM release April 2026 news
The bridge process that handles OpenAI's 2000px image ceiling also handles Llama 4, Qwen 3, and Gemma 4 through the patched ACP entry point.
Frequently asked questions
What did OpenAI ship in the April 2026 API changelog?
The April 2026 OpenAI API changelog lists: GPT-5.1 as the new flagship with a none reasoning setting by default for faster responses; GPT-5.1-Codex and GPT-5.1-Codex-mini in the Responses API, tuned for agentic coding; GPT-5.4 mini and GPT-5.4 nano in Chat Completions and Responses for high-volume workloads; organization-level RBAC across API and Dashboard; extended prompt cache retention up to 24 hours; Sora API expansion with reusable character references, up to 20 second generations, 1080p on sora-2-pro, video extensions, and Batch API support; the Assistants API sunset path, with all its features migrating to Responses before retirement; and computer-use-preview, a specialized model for the computer use tool, available as a research preview in the Responses API for developers on usage tiers 3 to 5.
What is the anchor fact in the Fazm codebase for the OpenAI April 2026 changelog?
Line 713 of /Users/matthewdi/fazm/acp-bridge/src/index.ts reads const MAX_SCREENSHOT_DIM = 1920; // stay under 2000px API limit, the only constant in the bridge file whose trailing comment pins it to a model-API dimension ceiling. Three lines above, at line 710, the prose comment reads verbatim Playwright on Retina Macs produces screenshots >2000px which hit Claude's multi-image dimension limit. The same 2000px ceiling applies to OpenAI's Responses API computer-use-preview tool when screenshots are attached. The fallback is a 44-line function startScreenshotResizeWatcher at lines 715 to 758 that fs.watches /tmp/playwright-mcp/, runs sips -g pixelWidth -g pixelHeight to check any new PNG or JPEG, and shell-execs sips --resampleHeightWidthMax 1920 on files that exceed 1920px in either dimension. The bridge binary runs this watcher unconditionally on startup at line 2574.
Why does the OpenAI Responses API care about a 2000 pixel dimension ceiling on images?
OpenAI's vision-capable models have platform-wide limits on the maximum dimension of uploaded images. Retina MacBooks render at 2x scale factor, so a full-screen Playwright screenshot on a 15-inch MacBook Pro reports pixelWidth around 3024 and pixelHeight around 1964, both over the 2000px ceiling. The model API responds with a 400 Bad Request that surfaces as a generic agent error, not a clean image-too-large signal. Anthropic's Claude API enforces the same 2000px dimension limit on multi-image sessions, which is why Fazm's inline comment at acp-bridge/src/index.ts line 710 names Claude specifically; the file-watch runs for any downstream model the agent talks to. MAX_SCREENSHOT_DIM is set to 1920, a 4 percent margin of safety under 2000 that avoids rounding-error rejections.
How does Fazm avoid sending screenshots to OpenAI or Claude at all for UI targeting?
It bundles a second MCP server named macos-use, wired into the agent at acp-bridge/src/index.ts lines 1056 to 1064, whose source lives at /Users/matthewdi/mcp-server-macos-use/Sources/MCPServer/main.swift. The 1,917-line Swift binary reads the macOS accessibility tree (AXUIElement) of the target app and returns a compact summary over stdio. The function buildCompactSummary, declared at main.swift line 731, writes the full enriched tree to /tmp/macos-use/<timestamp>_<tool>.txt and returns only status, pid, app, filepath, file size plus element count, and on line 761 a verbatim literal line lines.append("hint: grep -n 'AXButton' \(filepath) # search by role or text"). The agent then calls a generic grep tool on the file to find the element it needs, by role or label. No screenshot is required for targeting. The screenshot only exists as a sanity check after the click.
What is the 20-image-per-session cap referenced on the OpenAI changelog page?
acp-bridge/src/index.ts line 793 declares const MAX_IMAGE_TURNS = 20;. The preceding doc block at lines 786 to 790 explains the constant: Tracks how many image-bearing turns each session key has had. Claude's API enforces a stricter 2000px/image limit once a session has many images. Resetting this counter on session delete ensures a fresh session starts clean. OpenAI's Responses API similarly imposes usage-based per-request and per-session image payload constraints. The bridge keeps a Map<string, number> called imageTurnCounts keyed by session key and deletes the entry on session lifecycle events (seen at lines 1455, 1795, 1887, 1943, 1971). Once the counter crosses MAX_IMAGE_TURNS, the bridge stops including screenshots in the prompt for that session to prevent API-side rejections.
Which Playwright MCP launch flags keep screenshot blobs out of the model context?
acp-bridge/src/index.ts line 1033 passes four flags verbatim to the Playwright MCP server at launch: --output-mode file, --image-responses omit, --output-dir /tmp/playwright-mcp. output-mode=file writes Playwright snapshots and screenshots to disk rather than embedding them in the MCP response. image-responses=omit drops any base64 image payload from the tool response envelope. output-dir pins the on-disk location so the screenshot-resize watcher has one directory to watch. The inline comment at line 1032 reads Save snapshots to files and strip inline base64 screenshots to reduce context size. These flags mean a Playwright screenshot does not reach the OpenAI or Claude model prompt unless the agent explicitly chooses to Read the PNG file. A typical screenshot tool result on this setup is 691 characters of YAML, not a 500KB base64 blob.
Does Fazm use OpenAI computer-use-preview today?
Not by default. The bundled main and floating and observer sessions are all warmed up with Claude Sonnet 4.6 via ACP, seen at Desktop/Sources/Providers/ChatProvider.swift lines 1048 to 1050. The architecture does not require it: the accessibility tree plus grep hint plus the six macos-use tools (open, click, type, press, scroll, refresh) covers what computer-use-preview covers with screenshots. If a user wires an OpenAI-backed agent into their own MCP server via ~/.fazm/mcp-servers.json (merged in at acp-bridge/src/index.ts lines 1102 to 1137), every protection described above still applies because the 2000px ceiling and the 20-image cap are enforced on the bridge side, upstream of whichever model the agent talks to.
What exact shell command does Fazm run to fix an oversize Retina screenshot in place?
acp-bridge/src/index.ts line 741 calls execSync(`sips --resampleHeightWidthMax ${MAX_SCREENSHOT_DIM} "${filepath}" 2>/dev/null`). sips is Apple's built-in Scriptable Image Processing System, shipped with every macOS since 10.3, which is why the comment at line 733 reads sips is built into macOS — no dependencies needed. The --resampleHeightWidthMax flag resamples the image so the larger of its width and height becomes 1920 while preserving aspect ratio. It writes in place, no output path argument. The logErr call at line 742 records Screenshot resized: <filename> from <W>x<H> to fit 1920px. After this runs, the PNG or JPEG under /tmp/playwright-mcp is safe to attach to any model API request.
How does OpenAI's Responses API 24-hour prompt cache interact with Fazm's session model?
OpenAI's extended prompt cache retention keeps cached prefixes active for up to 24 hours, so long-lived agent sessions reuse a stable prefix instead of re-billing it. Fazm's bridge registers three pre-warmed sessions on startup (main, floating, observer) and keeps them alive across user prompts rather than opening a new session per turn, visible at ChatProvider.swift line 1050 and at acp-bridge/src/index.ts line 763 where const sessions = new Map<string, { sessionId; cwd; model }>(). When screenshots are not attached (because macos-use handled the UI target via AX tree), the prompt prefix stays stable across turns and is cache-eligible for the full 24 hours. An agent that does a screenshot per turn invalidates cache locality more often; Fazm's default path preserves it.
Why does the macos-use response always append a grep hint instead of returning the full tree?
A full AX tree of a dense macOS app (Notion 3.4 Part 2, Slack, Figma) is tens of kilobytes. Attaching it to a Responses API prompt burns that much context every turn and accumulates across a session. main.swift line 731 defines buildCompactSummary, which returns a small text envelope (status, pid, app, filepath, file size plus element count, the grep hint, and optional screenshot path). The grep hint at line 761 is a literal invitation to the agent: hint: grep -n 'AXButton' /tmp/macos-use/<timestamp>_<tool>.txt # search by role or text. The agent uses a separate grep tool on the on-disk file to find the one element it needs, by role or label, instead of reading the tree. The on-the-wire response is typically under 500 bytes regardless of how deep the target app's UI is.
Where do I verify every number on this page?
Every anchor is in a public or user-local file with a specific line number. acp-bridge/src/index.ts line 713 (MAX_SCREENSHOT_DIM = 1920), line 710 (Playwright Retina comment), lines 715 to 758 (startScreenshotResizeWatcher), line 741 (sips --resampleHeightWidthMax), line 793 (MAX_IMAGE_TURNS = 20), line 1033 (Playwright MCP flags), lines 1056 to 1064 (macos-use mount), line 2574 (watcher start). mcp-server-macos-use/Sources/MCPServer/main.swift line 731 (buildCompactSummary), line 761 (grep hint append), line 1412 (Server name SwiftMacOSServerDirect). wc -l on the bridge returns 2772 and on the macos-use main.swift returns 1917. The macos-use repo is MIT-licensed at github.com/mediar-ai/mcp-server-macos-use; the Fazm repo is MIT-licensed at github.com/mediar-ai/fazm.
Does Fazm need any changes when OpenAI lifts computer-use-preview out of research preview?
No. The bridge does not depend on computer-use-preview being available. Its two protections against the 2000px image ceiling and the 20-image-per-session cap run unconditionally (startScreenshotResizeWatcher at line 2574, imageTurnCounts at line 791). Its alternative path, macos-use over stdio, is present whether or not a user wires an OpenAI-backed MCP server. When OpenAI promotes computer-use-preview to general availability, the bridge keeps stripping base64 images (line 1033), keeps resizing oversize PNGs (line 741), keeps capping image turns (line 793), and keeps offering the accessibility-tree-plus-grep-hint path as a screenshot-free alternative. No version bump on the bridge or the macos-use binary is required. The version string at main.swift line 1412 is SwiftMacOSServerDirect 1.6.0 and is expected to stay through the computer-use-preview promotion.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.