macos-use: The MCP Server That Locks Your Keyboard While the AI Drives
Every other writeup of macos-use covers the same ground: accessibility tree vs screenshots, six tools, install via npm. The part nobody mentions is what happens to your hands the moment a tool fires. A CGEventTap intercepts every keystroke and click. A fullscreen overlay puts a pulsing orange dot on screen. A 30-second watchdog guarantees you cannot get locked out. Pressing Esc throws InputGuardCancelled between every individual sub-action. This is a source-level walkthrough of that machinery, file paths and line numbers included.
What Other macos-use Writeups Cover, and What They Skip
Search for "macos use" and you will find the README, the mcp.so listing, two competing MCP servers (mcp-remote-macos-use, CursorTouch/MacOS-MCP), and a couple of third-party blog posts. They all describe the same three things: macos-use uses accessibility APIs instead of screenshots, it exposes six MCP tools, and you install it with npm.
None of them describe what your machine actually does to you while the agent is mid-action. That is the question that matters the second time you let an AI drive a Mac. What happens if the model picks the wrong button? What happens if it gets stuck in a loop? What happens to my pointer? Can I just hit Esc?
Every one of those answers lives in two files in this repo: Sources/MCPServer/InputGuard.swift (355 lines) and the engage/disengage call sites around Sources/MCPServer/main.swift:1667-1862. This guide reads them both with you.
The Source-State ID Trick That Makes the Lockout Possible
Blocking the user's input is easy. CGEvent.tapCreate with a suppressing return value will do it. The hard part is that the same server is also posting synthetic clicks and keystrokes to drive whatever app the AI is targeting. If the tap suppresses those too, the agent does nothing.
macos-use distinguishes its own input from yours by reading CGEventField.eventSourceStateID. Hardware HID events come in with state ID zero. Anything posted with CGEvent.post(tap: .cghidEventTap) from inside the server carries a non-zero state ID, because it was created on the .hidSystemState source. The callback short-circuits for non-zero IDs with a passthrough Unmanaged, and only checks for the Esc keycode against zero-state events. That is the entire mechanism in five lines.
Verbatim from the file:
The mask the tap is built with covers eleven event types: keyDown, keyUp, both mouse buttons up and down on left and right, mouseMoved, both mouse-dragged variants, scrollWheel, and flagsChanged. The mask is built incrementally because Swift's type-checker times out on the all-in-one expression, which the comment in the file calls out.
Why the Tap Goes on .cghidEventTap and Not the Session Tap
macos-use installs the tap on .cghidEventTap, which is the lowest level the public CGEventTap API exposes. Events arrive there before any app's window server filtering. Returning nil at that level is a hard suppression, not an "ask the next tap nicely" suggestion. It also means the tap callback runs on the main run loop, which is why the file specifically routes engage() through DispatchQueue.main.sync when called from a background thread.
Two Streams of Input, One Tap, One Decision
The diagram below is what the inputGuardCallback function does on every event while the guard is engaged. Hardware events are blocked (or, if the keystroke is plain Esc, used to cancel and then blocked). SDK events go straight through.
inputGuardCallback dispatch
The Lifecycle of a Single Tool Call
When the AI calls macos-use_click_and_traverse, the server runs through the sequence below. Every step is in main.swift between lines 1667 and 1862. The cursor and the frontmost app you were using are restored at the end, even on cancellation.
isDisruptive check (main.swift:1667)
Every tool except macos-use_refresh_traversal is treated as disruptive. The disruptive branch is the one that engages the input guard. The refresh tool just reads the AX tree, so it does not need the lockout.
Save cursor + frontmost app (main.swift:1671-1676)
NSEvent.mouseLocation is captured in screen-y-flipped coordinates. NSWorkspace.shared.frontmostApplication is captured by reference. Both will be restored after the tool returns or after Esc cancellation.
engage(message:) (main.swift:1696)
InputGuard creates the event tap, adds the run-loop source on the main loop, builds the overlay NSWindow at .screenSaver level with the pulsing orange dot, and starts the 30s DispatchSource watchdog. The whole engage path is synchronous on the main thread so the tap is active before the next line runs.
Action runs, throwIfCancelled() between sub-steps
main.swift:1708, 1721, 1728, 1734 all call InputGuard.shared.throwIfCancelled() between the primary action and any composed type / press follow-ups. If you tapped Esc, the next throwIfCancelled raises InputGuardCancelled and the tool stops before its next CGEvent.post().
Disengage with 200ms grace (main.swift:1754-1762)
After the action returns, the server sleeps 200ms so a late Esc tap can land. Then it reads InputGuard.shared.wasCancelled, calls disengage(), and either continues or throws InputGuardCancelled at the catch site below.
Restore cursor + frontmost app (main.swift:1767-1779)
A synthetic .mouseMoved event posts your saved CGPoint back. NSRunningApplication.activate(options:) re-fronts the app you were using. Cancellation has its own copy of this same restore at main.swift:1852-1860, so the experience is identical whether the tool finished or you killed it.
The Engage Site, Verbatim
This is the block in main.swift that fires before every disruptive tool. The overlay text gets a tool-specific description (the app you are opening, the key you are pressing) so the pulsing pill on screen tells you exactly what is about to happen. If the tool runs for less than half a second the pill barely flashes; on a multi-step composed click the pill stays up the whole time.
The Numbers in InputGuard.swift
All seven numbers are pulled from the source. The watchdog is configurable per-instance via InputGuard.shared.watchdogTimeout. The 200ms grace is at main.swift:1757. The dot animation is in the buildAndShowOverlay function.
Verify Every Number in This Page
Three commands. The first clones the server. The second prints the entire InputGuard.swift file (355 lines). The third grep finds the engage / disengage / throwIfCancelled call sites in main.swift so you can confirm the ordering described above.
What It Feels Like With and Without InputGuard
Plenty of MCP servers can fire CGEvents at macOS apps. The difference is what your machine does to you while they do it.
Driving macOS through MCP, with and without an input guard
The agent fires a click. Your mouse hand moves at the same time. Two CGEvents land on the same coordinate, in the wrong order. Your typing is interleaved with the agent's typing in whatever field happens to be focused. There is no overlay telling you the AI is mid-action. Esc does nothing special. The cursor ends up wherever the last click landed.
- No hardware-event suppression
- No on-screen indicator while a tool runs
- Esc does not cancel mid-step
- Cursor and focus are not restored
macos-use vs Other macOS MCP Servers
The other widely-listed macOS MCP servers all do GUI control of one kind or another. None of them ship a documented input lockout with Esc-cancellation between sub-actions.
| Feature | Other macOS MCP servers | macos-use |
|---|---|---|
| Drives apps via accessibility APIs (not screenshots) | Mixed (some screenshot, some AppleScript) | Yes (AXUIElement + CGEvent) |
| Blocks hardware input while a tool runs | No | Yes (CGEventTap on .cghidEventTap) |
| On-screen overlay indicating AI action | No | Yes (centered pill + pulsing orange dot) |
| Esc cancels mid-composed-action | No | Yes (throwIfCancelled() between every sub-step) |
| Watchdog auto-release on stuck server | No | Yes (30s DispatchSource timer) |
| Cursor restored after action | No | Yes (saved on engage, posted back on disengage) |
| Frontmost app restored after action | No | Yes (NSRunningApplication.activate) |
| Native Swift binary (no Node, no Docker) | Mixed (Node, Python wrappers) | Yes |
The Six Tools macos-use Exposes Over MCP
Every one of these except the last engages the input guard before firing. The refresh tool just reads the AX tree, so it does not touch CGEvent and does not need the overlay.
“The thing that makes macos-use safe to leave running is not the accessibility-API design. It is that the moment a tool fires, your keyboard stops working and the only key the OS still accepts is the one that cancels the agent.”
InputGuard.swift:248-265
Use macos-use through a desktop AI that already trusts the lockout
Fazm runs accessibility-first agents on macOS with the same safety stance: visible action banner, Esc cancels, cursor restored. Open source, free to start.
Try Fazm free →Frequently asked questions
What is macos-use?
macos-use (mcp-server-macos-use) is a Model Context Protocol server written in Swift that gives MCP-compatible AI clients (Claude Code, Claude Desktop, Cursor, VS Code) the ability to control any macOS application. It exposes six tools (open, click, type, press key, scroll, refresh traversal) and drives the OS through Apple's native AXUIElement and CGEvent APIs rather than screenshots. Source: github.com/mediar-ai/mcp-server-macos-use.
What happens to my keyboard and mouse when the AI is running a tool?
They stop working. The moment a disruptive tool fires (open, click, type, press key, scroll), main.swift:1696 calls InputGuard.shared.engage(), which creates a CGEventTap on .cghidEventTap and sets an event mask that includes keyDown, keyUp, leftMouseDown, leftMouseUp, mouseMoved, scrollWheel and flagsChanged. The tap returns nil for every hardware event, which suppresses it. The server's own CGEvent.post() calls go through because they carry a non-zero sourceStateID. The lockout lasts only as long as the action takes, plus a 200ms grace period afterward.
How do I cancel a macos-use tool that is running?
Press Esc with no modifiers held. The CGEventTap callback in InputGuard.swift watches for keyCode 53 with an empty modifier-mask intersection. When it sees that, it writes /tmp/macos-use/esc_pressed.txt, sets the internal _cancelled flag, suppresses the Esc event itself, and disengages the tap. main.swift:1708, 1721, 1728, and 1734 all call InputGuard.shared.throwIfCancelled() between sub-steps, so cancellation lands before the next click or keystroke goes out. The catch site at main.swift:1847 disengages cleanly and restores both your cursor position and your previously-frontmost app.
What if the server crashes or hangs while my input is blocked?
InputGuard ships with a DispatchSource watchdog timer set to 30 seconds (InputGuard.swift, watchdogTimeout). When it fires, the timer handler calls disengage() unconditionally, which removes the run-loop source, invalidates the CFMachPort, and orderOut()s the overlay window. There is no path where the lockout outlives the watchdog. The number is configurable on the InputGuard.shared instance if you fork the server.
What does the on-screen overlay look like?
A fullscreen transparent NSWindow at .screenSaver level with a 15% black tint. Centered on the main screen is a pill that is 720pt wide (capped at 50% of screen width), 80pt tall, with a corner radius of pillHeight/2 and a near-black NSColor(white: 0.08, alpha: 0.92) background. Inside the pill: a 16pt orange dot (NSColor.systemOrange) running an opacity CABasicAnimation between 1.0 and 0.3 over 0.8s, autoreversed and repeating forever. Next to the dot, white 20pt semibold system font on a single truncating line that says "AI: Clicking in app… — press Esc to cancel". The window has ignoresMouseEvents set to true, so it does not steal interaction from anything underneath.
Why does the cursor not jump around when macos-use clicks for me?
main.swift:1672-1676 saves the current NSEvent.mouseLocation (in screen-y-flipped coordinates) before the action runs. main.swift:1767-1771 posts a synthetic .mouseMoved CGEvent back to that exact CGPoint after the action completes, on .cghidEventTap. The cancellation path at 1852 does the same thing on Esc, so even when you abort mid-action your pointer ends up where you left it instead of wherever the agent's last click landed. main.swift:1671 also captures NSWorkspace.shared.frontmostApplication and reactivates it at 1779, so focus returns to the app you were using.
How does macos-use know its own clicks from my clicks?
Source-state ID. Hardware HID events carry sourceStateID == 0. CGEvent.post() calls from the SDK go out on the .hidSystemState source, which produces a non-zero state ID. The free-function callback at the bottom of InputGuard.swift reads CGEventField.eventSourceStateID on every event and short-circuits with Unmanaged.passUnretained(event) if it is non-zero. That is the entire trick. It is also what makes the Esc-detection unambiguous: the keyDown for an SDK-injected key has a non-zero state ID and is allowed through without being checked for keyCode 53.
Does macos-use require Accessibility permission?
Yes, the host application that spawns the server (Claude Desktop, Cursor, your terminal, VS Code) needs Accessibility in System Settings > Privacy & Security > Accessibility. Without it, both AXUIElement traversal and CGEventTap creation fail. InputGuard.swift logs an explicit "check Accessibility permissions" hint to stderr when CGEvent.tapCreate returns nil, and the engage() call falls through harmlessly so the agent can still report the failure instead of silently locking your input.
How is this different from baryhuang/mcp-remote-macos-use or CursorTouch/MacOS-MCP?
The other two MCP servers also drive macOS, but neither blocks your input or shows an overlay while a tool is running. mcp-remote-macos-use targets remote control over screen-sharing and uses a credentials-driven flow. MacOS-MCP exposes screenshot-style tools. macos-use is the only one that ships an InputGuard, a watchdog timer, and an Esc-throws-mid-step contract, which matters when you give an AI write-access to GUI apps and want a hard stop. Source-level proof lives at Sources/MCPServer/InputGuard.swift in this repo.
What MCP tools does macos-use expose?
Six. macos-use_open_application_and_traverse (opens an app by name, bundle id, or path), macos-use_click_and_traverse (click + optional type + optional key, supports element-by-text search), macos-use_type_and_traverse (type into focused field), macos-use_press_key_and_traverse (key + modifiers), macos-use_scroll_and_traverse (deltaX, deltaY in lines), and macos-use_refresh_traversal (read the AX tree without acting). Every tool except refresh is treated as disruptive in main.swift:1667 and goes through the input-guard engage/disengage cycle. Every tool returns a compact summary plus a path to a flat-text accessibility tree dump in /tmp/macos-use/.
Read the file, not the README.
Every fact on this page comes from Sources/MCPServer/InputGuard.swift and the engage / cancel sites in main.swift between lines 1667 and 1862. Open the repo and grep for InputGuard. The design is plain in the code.
Open the repo on GitHub
Comments
Public and anonymous. No signup.
Loading…