FIELD NOTES / MULTI-AGENT AX / 2026-05-13

The macOS accessibility API is single-tenant, and your second agent is the one who finds out

Run two AI agents on one Mac. Both call AXUIElementCopyAttributeValue against kAXFocusedWindowAttribute. Both post CGEvent mouse clicks. Within minutes the foreground is rotating between their two targets, the user's editor has lost focus four times, and one agent's click is landing in the other agent's window. The accessibility API is single-tenant per session. The fix is not a smarter agent. The fix is 30 lines of bash and a save-frontmost-before / restore-frontmost-after pair, wired as hooks. Notes from the desktop AI agent that lived through this.

Matthew Diakonov, Written with AI

Published May 13, 20269 min read

Direct answer (verified 2026-05-13)

macOS exposes one focused window and one frontmost app per session, so two agents driving the AX API at the same time will steal focus from each other and from you. There is no per-target accessibility scope as of macOS 26 (Tahoe), and async actor isolation in Swift only fixes the in-process race. The working cross-process fix is a per-tool serial mutex (mkdir-based atomic lock plus a JSON file holding session_id and timestamp, 30s TTL, 2s poll, 120s max wait) wrapped in PreToolUse and PostToolUse hooks, plus a focus-save-before / focus-restore-after pair. Apple's AX reference is axuielement_h.

Why the AX surface is single-tenant in the first place

The accessibility tree itself is a perfectly fine multi-reader surface. AXUIElementCreateApplication followed by a BFS walk over kAXChildrenAttribute is read-only and process-safe. Two agents can walk two trees at the same time and neither blocks the other. Tree reads are not where contention lives.

Contention lives in the act of acting. The set of resources that AX automation actually drives is global by OS design:

Global resources every AX agent assumes it owns

NSWorkspace.frontmostApplication: returns one app, not a list. Whoever calls activate last wins.
kAXFocusedWindowAttribute: per-app, but the focused app rotates the moment any agent calls activate, so reads come back stale.
The mouse cursor: one cursor, one position, one click target.
The CGEvent post tail: synthesized events go into a single shared queue and dispatch into whatever window is frontmost at the moment of post, not the moment of capture.
Modal sheets and accessibility prompts: the OS routes a single keystroke to whichever sheet has focus; cross-process keypresses dismiss the wrong sheet.

None of these resources have a notion of multi-tenancy. macOS does not say "this CGEvent click belongs to agent A and that one to agent B." It says: here is the cursor, here is the foreground, here is the event tail, good luck.

What actually breaks when two agents share one Mac

The failure modes are not theoretical. They are what happened the first time we tried to run a Reddit agent and a Twitter agent concurrently from the same Claude Code workspace, both driving real Chrome via the same MCP layer.

One Mac, two agents, before and after coordination

Agent A activates Reddit, starts walking the AX tree. 50ms later agent B activates Twitter. Agent A's first click was queued for the Reddit comment field; the OS posts it into Twitter's compose box because Twitter is now frontmost. The user, who was typing into VS Code, finds their cursor in Twitter as well. Both agents log success because the click 'happened.' Nothing logs the fact that the click landed in the wrong window. Three hours later the user notices the day's commit has half a tweet pasted into the middle of a function.

Foreground rotates mid-action, agents step on each other
User loses focus from their working app silently
Agents log success, the bug never surfaces
The wrong app receives keystrokes

The naive setup, and the 30-line mutex that fixes it

Most people who hit this for the first time try to fix it in agent code: they add an internal queue, they add a Swift actor, they add retry logic. None of that helps when the second offender is a completely different process. The mutex has to live OUTSIDE the agents, in a place every agent has to pass through to act. Hooks are that place.

before: each agent independently activates an app

#!/bin/bash
# What "running multiple agents" looks like before any contention thinking.
# Each agent independently decides it owns the foreground.

# Agent A
osascript -e 'tell application "Google Chrome" to activate'
# 50ms later, agent B fires its own activate against Slack.
osascript -e 'tell application "Slack" to activate'
# Agent A's CGEventCreateMouseEvent click now lands in Slack.
# Whatever the user was doing also got shoved into the background.
# Nobody logs anything. Nobody notices for 4 hours.

after: PreToolUse hook acquires a serial lock per tool family

#!/bin/bash
# ~/.claude/hooks/playwright-lock.sh (excerpt)
# PreToolUse hook. Acquired before any mcp__playwright.* tool call.

LOCK_FILE="$HOME/.claude/playwright-lock.json"
MUTEX_DIR="$HOME/.claude/playwright-mutex.d"
LOCK_EXPIRY=30      # seconds before a held lock is stale
MUTEX_EXPIRY=5      # seconds before the fs mutex is stale
MAX_WAIT=120        # max seconds to wait for the lock
POLL_INTERVAL=2     # seconds between retries

# Save the user's frontmost app BEFORE we steal focus.
source "$HOME/.claude/hooks/focus-save.sh"

# Atomic acquire via mkdir (only one caller wins per directory).
acquire_mutex() {
  if mkdir "$MUTEX_DIR" 2>/dev/null; then return 0; fi
  # Stale mutex? Check directory mtime against MUTEX_EXPIRY.
  local age=$(( $(date +%s) - $(stat -f %m "$MUTEX_DIR") ))
  if [ "$age" -gt "$MUTEX_EXPIRY" ]; then
    rm -rf "$MUTEX_DIR" && mkdir "$MUTEX_DIR" && return 0
  fi
  return 1
}

# Lock holder is identified by SESSION_ID.
# Same session: refresh. Different session: poll.
# Held by a session that died without releasing: TTL expires after LOCK_EXPIRY.

# PostToolUse hook (playwright-unlock.sh) refreshes the timestamp,
# session-end hook releases the lock.

Three details matter and each one came from getting it wrong first. The atomic acquisition has to be mkdir, not a lockfile-then-write pair, because mkdir is the only POSIX-portable atomic create primitive. The lock JSON has to carry a timestamp so a crashed holder eventually expires (otherwise one OOM-killed agent deadlocks every other agent on the machine forever). And the mutex itself needs its own short TTL (5 seconds in our case) so a process crashed between mkdir and JSON write does not orphan the directory.

4 locks

“The lock is independent per agent. Reddit and Twitter can be used simultaneously by different sessions, but two sessions cannot use the same agent concurrently.”

~/.claude/skills/browser-lock/SKILL.md

The four hook positions, in order

Once you accept that this lives in hooks, the shape of the solution is just four positions in the agent lifecycle. Each one does one thing.

Hook positions for safe multi-agent AX

1
PreToolUse: save focus
Capture the user's current frontmost app (osascript via System Events) before the agent steals it. Skip if the frontmost is already the agent's browser.
2
PreToolUse: acquire lock
mkdir mutex, then read JSON lock file. Same session: refresh. Other session alive: poll every 2s up to 120s. Other session expired (>30s): take over.
3
PostToolUse: refresh lock
Bump the JSON timestamp so a multi-call chain (click, wait, screenshot) holds the lock without re-acquiring between calls.
4
Session-end: release + restore
Delete the lock JSON. Read the saved-frontmost tombstone and re-activate the user's original app on a 200ms delay so the agent's last action settles.

The focus-save / focus-restore pair, in 20 lines

Even with the mutex in place, a single-agent run still leaves the user's frontmost app behind. The user clicked into VS Code, said "run that thing" out loud, the agent grabbed Chrome, ran its chain, and then left the user staring at Chrome instead of their editor. The fix is not in the agent; it is in two short hooks that run before and after every tool call.

~/.claude/hooks/focus-save.sh (PreToolUse)

#!/bin/bash
# ~/.claude/hooks/focus-save.sh
# Save the user's frontmost app so we can restore it after the
# agent's browser steals focus.

FOCUS_FILE="$HOME/.claude/saved-focus-app.txt"
FRONT_APP=$(osascript -e \
  'tell application "System Events" to get name of first process whose frontmost is true' \
  2>/dev/null)

# Important: do NOT save the agent's own browser as "the user's app"
# or we will restore the agent on top of itself.
if [ -n "$FRONT_APP" ] && [ "$FRONT_APP" != "Google Chrome" ] && [ "$FRONT_APP" != "Chromium" ]; then
  echo "$FRONT_APP" > "$FOCUS_FILE"
fi

~/.claude/hooks/focus-restore.sh (PostToolUse)

#!/bin/bash
# ~/.claude/hooks/focus-restore.sh
# Restore frontmost to whatever the user had before the tool call.

FOCUS_FILE="$HOME/.claude/saved-focus-app.txt"
if [ -f "$FOCUS_FILE" ]; then
  FRONT_APP=$(cat "$FOCUS_FILE")
  if [ -n "$FRONT_APP" ]; then
    # 200ms delay lets the agent's last action settle before we
    # rotate the foreground out from under it.
    (sleep 0.2; osascript -e "tell application \"$FRONT_APP\" to activate" 2>/dev/null) &
  fi
fi

Two non-obvious things. The save script refuses to record the agent's own browser as "the user's app," because otherwise a chain of browser actions would each re-save Chrome and the restore at the end would just leave Chrome up. The restore uses a 200ms delay because the agent's last AX call may not have fully resolved yet; activating a different app inside that window can race the action and lose state.

The case the mutex cannot fix: cron and launchd agents

A serial mutex is the right answer when both agents are users of the same machine and a queue is acceptable. It is the wrong answer when one of the agents is a cron job firing while the user is on a video call. The mutex would let the cron agent acquire, post a click into the user's call, then politely release. Correct behaviour at the AX layer; catastrophic behaviour at the human layer.

The honest fix for this case is a different hook that refuses the tool family entirely when the session has no interactive markers.

~/.claude/hooks/block-macos-use.sh (PreToolUse)

#!/bin/bash
# ~/.claude/hooks/block-macos-use.sh
# PreToolUse hook. Blocks mcp__macos-use__* tools in non-interactive
# (cron / launchd) sessions only. Interactive sessions still work.

cat > /dev/null  # consume hook stdin

# Interactive markers: any of these means a user is at the keyboard.
if [ -n "${ITERM_SESSION_ID:-}" ] || [ -n "${TERM_SESSION_ID:-}" ] \
   || [ -n "${TMUX:-}" ] || [ -n "${CLAUDE_CODE_ENTRYPOINT:-}" ]; then
  exit 0
fi

# No interactive markers. Refuse the tool call entirely so a cron
# agent does not steal focus from whoever is using the Mac right now.
echo '{"decision":"block","reason":"macos-use is not allowed in automated pipelines. The agent loop should switch to a sandboxed surface or wait for an interactive session."}'
exit 0

The decision is binary by environment. Interactive terminals (iTerm, Apple Terminal, tmux, the Claude Code CLI itself) all set one of ITERM_SESSION_ID, TERM_SESSION_ID, TMUX, or CLAUDE_CODE_ENTRYPOINT. If none are present, the agent is running in a context with no human at the keyboard, and the only safe answer for an AX-driving tool is to refuse and surface the refusal as a structured {decision: block} to the agent so it can route around to a sandboxed surface or wait.

The same lesson, one layer up: single-instance app enforcement

The hook-level mutex coordinates between agents that already accept they have to share a machine. The same pattern shows up one layer up, inside a desktop AI agent process itself: macOS will happily let a user double-click two copies of the same .app bundle (the one in /Applications and a copy in ~/Downloads), and LSMultipleInstancesProhibited in Info.plist is unreliable across paths. Two prod copies of the same agent then both try to drive the AX API. Inside Fazm, the fix is the same shape, written in Swift instead of bash, and run as the first thing the app does.

InstanceLock.swift, called from applicationWillFinishLaunching

// /Users/matthewdi/fazm/Desktop/Sources/InstanceLock.swift
// Called from applicationWillFinishLaunching, before ANY heavy init.
//
// Why: two prod Fazm instances would stomp on the same SQLite db,
// the same ACP bridge, the same Stripe device id, the same global
// hotkey, AND they would both try to drive the AX API.

@discardableResult
static func acquireOrHandoff() -> Bool {
    guard Bundle.main.bundleIdentifier == "com.fazm.app" else {
        return true   // dev build, multiple instances are fine
    }

    if let data = try? Data(contentsOf: lockFileURL!),
       let raw  = String(data: data, encoding: .utf8),
       let peerPid = Int32(raw.trimmingCharacters(in: .whitespacesAndNewlines)),
       peerPid != getpid(),
       kill(peerPid, 0) == 0 || errno == EPERM {

        // Confirm the peer is actually Fazm (not a recycled PID).
        if let peer = NSRunningApplication(processIdentifier: peerPid),
           peer.bundleIdentifier == "com.fazm.app" {
            peer.activate(options: [.activateAllWindows])
            Thread.sleep(forTimeInterval: 0.2)  // let activation deliver
            exit(0)                              // never returns
        }
    }

    // No live peer. Write our PID; SIGTERM/SIGINT/SIGHUP unlink it.
    try? "\(getpid())\n".data(using: .utf8)?.write(to: lockFileURL!, options: .atomic)
    installSignalCleanup()
    return true
}

Three details that took two debugging passes to get right. We use kill(peerPid, 0) for liveness, not flock or any unlink-on-exit hook, because flock does not survive SIGKILL and an unlink hook does not run after a kernel panic; the next-launch staleness check is the only path that works for hard kills. We confirm the peer is actually Fazm via NSRunningApplication.bundleIdentifier before activating, because PIDs get recycled and a recycled PID could belong to anything. And the 200ms Thread.sleep before exit gives AppKit a tick to actually deliver the activation before we tear ourselves down; without it, the user double-clicks the icon, sees the second copy launch, sees nothing happen, and double-clicks again.

Why Swift actors do not save you here

A common reflex on hearing "two agents racing on AX" is to reach for a serial Swift actor or a single GCD queue. It works inside one process. It does nothing across processes. The OS-level focus state is shared by every process on the machine; an actor guarantees serialised access to the AX call inside agent A's codebase, and agent B (a completely different binary, possibly a completely different language runtime) walks in and posts a CGEvent against the same shared event tail without asking.

The right mental model: in-process serialisation handles the case where one agent might race itself (concurrent tool calls inside one session). Cross-process serialisation handles the case where two agents race each other. They are independent problems. You need both. Inside Fazm we serialise per process at the ACPBridge layer. For multi-agent setups outside the app we wire the hook-level mutex described above.

Running more than one agent on your Mac and watching them step on each other?

Happy to compare notes on what we shipped to make this go away. 20 minutes, no pitch deck.

Frequently asked questions

Why does the macOS accessibility API behave like a single-tenant resource?

Because the things AX automation actually drives are global. NSWorkspace.frontmostApplication returns one app, not a list. kAXFocusedWindowAttribute returns one window per app. The mouse cursor is one cursor. Synthesized CGEvent posts go into one shared event tail. The accessibility tree itself is read-only and process-safe, so two agents can WALK trees in parallel just fine. The contention shows up the moment either agent decides to act: posting a CGEventCreateMouseEvent click, calling AXUIElementPerformAction(kAXPressAction), bringing a window forward via NSRunningApplication.activate. Each of those quietly assumes the agent owns the foreground.

What actually breaks when two agents drive AX at the same time?

Five things, in roughly this order. (1) The frontmost app flips between the two targets mid-action, so each agent's CGEvent click lands in the wrong window. (2) AXUIElementCopyAttributeValue against kAXFocusedWindowAttribute returns the OTHER agent's just-activated window instead of the one the caller expected. (3) The user's own work app (their editor, their video call) loses focus every few seconds and the user types into the wrong place. (4) Modal sheets and accessibility prompts triggered by one agent get dismissed by the other agent's keypress. (5) Process-level kAXFocusedUIElement is a per-app attribute and rotates without warning when an agent calls activate, so cursor position and selected-text reads return stale data.

Doesn't async actor isolation in Swift solve this?

It solves the in-process race, which is the easy half. A serial Swift actor or a single GCD queue serialises AX calls inside ONE agent process. It does nothing about the cross-process problem, where agent process A and agent process B (or two MCP servers, or an MCP server and the user's Cmd-Tab) all call into the same OS-level focus and event queue. The OS does not know about your actors. Cross-process contention has to be solved with a cross-process primitive: a file lock, a mach port, or a unix socket coordinator. Inside Fazm we serialise per process; for multi-agent setups outside the app we recommend a hook-level mutex.

How small can the cross-process mutex actually be?

30 lines of bash. The pattern that works for us in production is mkdir-based atomic acquisition (mkdir succeeds for exactly one caller per directory), with a stale-mutex check via the directory's mtime, plus a JSON file storing session_id and timestamp. Lock TTL of 30 seconds (so a crashed agent doesn't deadlock the others), poll interval of 2 seconds, max wait of 120 seconds before giving up and surfacing an error. PreToolUse hook acquires; PostToolUse hook refreshes; session-end hook releases. The whole shape lives in ~/.claude/hooks/playwright-lock.sh.

Why does Fazm refuse to run multiple production instances on one Mac?

Because the lessons above generalise to the app process itself. Two prod copies of Fazm would stomp on the same SQLite db, the same ACP bridge port, the same Stripe device id, and the same global hotkey, AND they would each try to drive AX. The fix is InstanceLock.swift, called from applicationWillFinishLaunching: PID file at ~/Library/Application Support/com.fazm.app/.instance.pid, kill -0 for liveness, NSRunningApplication.activate(options: [.activateAllWindows]) to hand off, then exit(0). Single-instance is enforced before any AX call ever fires.

What is the difference between blocking macos-use in cron vs letting it run?

An interactive shell session is the user opting in: they are at the keyboard, they will see the cursor move, they can take focus back if anything goes wrong. A cron or launchd-spawned agent has none of that. If a cron agent posts a CGEvent click while the user is on a video call, the click lands inside the call. The honest fix is to refuse the action class entirely in non-interactive sessions. block-macos-use.sh checks for ITERM_SESSION_ID, TERM_SESSION_ID, TMUX, and CLAUDE_CODE_ENTRYPOINT; if none are set, the hook returns a {decision: block} JSON to the agent and the tool call never runs.

Is screen-recording-based automation immune to this problem?

No, and arguably worse. A screenshot-based agent still has to actually click somewhere; the click goes through the same CGEvent pipeline and hits whatever window is frontmost at the moment of the post, not the moment of the screenshot. If two screenshot agents run on one Mac, the second agent's click lands in the first agent's window. Screenshot agents also pay a 200 to 400 ms screen-capture window where the foreground can change between capture and click; AX-based agents get the AX state and the click in the same blocking call. The contention class is the same. The window of vulnerability is wider.

Can the OS itself give us a per-target accessibility scope?

Not as of macOS 26 (Tahoe). The Accessibility TCC pane is still binary per app: an app either has accessibility, in which case it can read every UI element and post events to every other app, or it does not. There is no per-target scoping. Apple Events have per-target scoping (the OS prompts the first time an app sends events to Mail, Finder, etc.), but Apple Events are a different API and most computer-use agents do not use them. The only mitigation available to you today is to use a tool you can audit, plus run one agent at a time on the surface that drives AX.

How does Fazm handle this internally when running concurrent Claude Code sessions?

Fazm runs N Claude Code agent sessions inside one ACPBridge process. Each session has its own per-session interrupt flag, its own pending-message queue, and its own message-generation counter, all keyed by sessionKey (see Multi-agent Claude Code orchestration tradeoffs). For AX specifically, we serialise tool execution at the bridge layer: only one session can be actively executing a Playwright or macos-use tool call at a time, and the others queue. The user-visible effect is that two agents can THINK in parallel but only one can ACT in parallel. Latency goes up; correctness stays intact.

What goes in the focus-save / focus-restore pair, exactly?

Two short bash scripts. focus-save.sh runs `osascript -e 'tell application "System Events" to get name of first process whose frontmost is true'`, writes the name to a tombstone file, and skips writing if the frontmost is already the agent's browser (so we do not overwrite the genuine target with our own window). focus-restore.sh reads the tombstone, schedules a `tell application <name> to activate` via osascript on a 200ms delay so the agent's last action settles first, and logs the actual frontmost app afterwards for debugging. Both are 10 lines. The discipline is wiring them as PreToolUse and PostToolUse hooks for every tool that touches the foreground.

Other things we wrote about running real desktop AI agents