Personal AI agent: on-device vs VPS, the honest tradeoff
They look like the same product on a pricing page. They are not. A VPS-hosted personal agent is a headless task runner that calls APIs. An on-device personal agent is a process that operates the apps on your computer. The line between them is one OS surface that does not exist on Linux: macOS accessibility.
Direct answer (verified 2026-05-01)
On-device if your agent has to operate apps on your computer (clicking, typing, reading the AX tree, touching local files, using the microphone). VPS only for headless task runners that poll cloud APIs and run scheduled jobs. Same name, different products. Verified against Apple's AXUIElement reference: every function takes a local PID or a CFType from the same process tree, no remote variant exists.
The kernel-bound API that decides this
Open Desktop/Sources/AppState.swift in the Fazm repo and scroll to line 480. The function testAccessibilityPermission() is the live probe that runs every five seconds while the app is open. It is also the cleanest illustration of why on-device and VPS are not interchangeable for this kind of agent.
The function does three things. It pulls the frontmost app from NSWorkspace.shared.frontmostApplication, which only exists on a process running on the same macOS instance as the windowserver. It calls AXUIElementCreateApplication on that app's PID. It calls AXUIElementCopyAttributeValue to read the focused window. None of those calls take a hostname or a remote identifier. They take a local PID and a CFString. The only way to call them is from inside macOS user space, with TCC having granted Accessibility to the calling process.
A VPS process cannot do any of this. The OS is Linux. There is no NSWorkspace. There is no AXUIElement framework. There is no TCC database. There is no Mach port to your laptop's windowserver. You can run a Linux Swift toolchain on the VPS, but Foundation on Linux ships without AppKit, and the AX framework does not exist there. The terminal output below is what actually happens.
What each side actually does
Same query, two completely different products. Read across, not down.
| Feature | VPS personal agent | Fazm (on-device) |
|---|---|---|
| Reads your apps | No. The VPS is in a datacenter; it has no PID space for your Safari, Mail, Notes, Slack, Figma, etc. | Yes. AppState.swift line 488 calls AXUIElementCreateApplication on the frontmost app's pid; AppState.swift line 490 reads its window via AXUIElementCopyAttributeValue. |
| Clicks into your apps | No. macOS does not expose a remote AX endpoint. A VPS would have to drive a screen-share back, which means a local helper has to exist anyway. | Yes. The macos-use binary at acp-bridge/src/index.ts line 1326 ships as a local subprocess and calls AX directly. |
| Permission model | Whatever your VPS provider gives you. Root on the box, but no Apple TCC, no Accessibility, no Screen Recording, no Microphone permissions. | Apple TCC. AXIsProcessTrusted() and AXIsProcessTrustedWithOptions() are how the OS gates AX, and they only run on macOS userspace. |
| Always-on availability | Yes. That is the one thing a VPS does well. Your laptop can be closed; the cron-style task runner keeps polling APIs. | Only when your Mac is awake. If the agent has to operate apps that exist on your machine, that constraint is structural, not a bug. |
| Cost | Cheap. $5 to $10 per month covers a low-end Linux VPS plus whatever you spend on cloud LLM tokens. | The Mac you already own. Inference can be on-device or routed to any Anthropic-compatible endpoint, including a hosted provider, a corporate proxy, or a local model. |
| Voice input | You build it. The VPS does not have your microphone. | WhisperKit transcribes on-device. The audio never leaves the Mac. |
| Sees your local files | No. Your laptop and the VPS are different filesystems. Anything the agent uses has to be uploaded first. | Yes. The onboarding indexer writes a row per file into the indexed_files SQLite table; the agent queries it directly. |
| Honest summary of what it is | A headless task runner with persistent state. Useful for scraping, monitoring, scheduled API calls, automated outreach. | A computer-use agent. Useful for clicking around your real apps, filling out forms, answering email, moving data between programs you have open. |
The flow, on each side
On-device: agent reaches the AX tree directly
Fazm runs as a TCC-trusted Swift app. AppState.swift line 354 checks AXIsProcessTrusted on every poll; line 482 functionally tests the AX API against your frontmost app's PID. macos-use, spawned at acp-bridge/src/index.ts line 1326, makes the same calls for any app the agent decides to operate.
On-device: tools spawn as local subprocesses
The bridge at acp-bridge/src/index.ts line 1316 starts the playwright MCP. Line 1326 starts macos-use. Line 1334 starts the WhatsApp MCP. Every tool call routes to a process on your Mac that the agent can synchronously block on. No network in the hot path.
VPS: agent sees only what crosses the wire
The VPS process can call cloud APIs, scrape websites with a headless browser running on the VPS itself, run scheduled jobs, hold conversation state. None of that involves your Mac. If you want it to operate your Mac, you reverse-engineer the missing half: a local agent on the Mac plus a control protocol back to the VPS.
VPS: the missing half is the actual agent
Once you accept that the local-side process has to exist, the VPS becomes a queue plus persistent state. That is a useful thing, but it is not what people picture when they read 'personal AI agent on a VPS.' They picture something that operates their computer. That part lives on the computer.
“Number of AXUIElement functions in the public Apple SDK that accept a remote hostname or a foreign-process identifier. Verified by reading the AXUIElement.h header in the macOS 14 SDK.”
developer.apple.com/documentation/applicationservices/axuielement_h
The token math people skip
When a VPS-hosted agent has to see your screen, the only path is screenshot streaming. A 1920x1200 PNG is roughly 0 image tokens for the modern Anthropic tokenizer. A macOS accessibility tree for one Mail window is roughly 0 text tokens. Ten turns of screenshots costs about 3.5 million tokens of pixels before the model thinks. Ten turns of AX trees costs about fifteen thousand. The accessibility surface only exists if the agent runs on the OS that owns the windowserver. That is the on-device side. There is no equivalent on a VPS, even at unlimited bandwidth.
When a VPS is the right answer
To be fair to the other side: a VPS is correct for a real shape of agent. If the agent's job is to monitor a feed, poll a webhook, run a nightly scrape, send scheduled emails, orchestrate other agents over HTTP, or hold long-running conversational state across devices, none of that needs your Mac. A $6 VPS plus an LLM API key plus a small Python harness will do it cleanly, and your laptop can sleep. Hostinger, Virtua, and QuantVPS rank for this query because they sell that shape of hosting. They are answering a different question than this page. If your agent never has to look at the apps on your computer, take their advice.
But notice what falls out of the "personal" in "personal AI agent" once the box moves off your machine. Your installed apps. Your file system. Your inbox UI. Your active CRM session. Your calendar that already has the right account selected. A server-side agent has to be re-given all of that through APIs and uploaded artifacts, and most of the apps you actually use do not have a clean API surface. That is why the on-device shape exists.
What Fazm actually ships, and why it has to be on the Mac
Fazm is a Swift macOS app plus a Node subprocess (acp-bridge) that hosts the agent loop. The bridge spawns five MCP servers as local subprocesses: fazm_tools (line 1245), playwright (line 1316), macos-use (line 1326), whatsapp (line 1334), and google-workspace. Every tool call routes to one of those local processes. The model endpoint is pluggable (Anthropic by default, swappable to any Anthropic-compatible gateway). The harness is local because it has to be: the AX calls live in macos-use, the screen-recording entitlement lives in the Swift app, the WhisperKit voice transcription runs on the Apple Neural Engine. None of those have a remote equivalent.
You can route inference wherever you want. The agent itself stays on your Mac.
Want to see the AX path running on your apps?
Twenty minutes, screen share, your real workflows. We will walk through what an on-device agent can reach that a VPS one cannot.
Frequently asked questions
What is the actual technical reason a personal AI agent on a VPS cannot operate my Mac?
macOS accessibility (AX) APIs resolve PIDs from NSWorkspace on the same kernel. The call AXUIElementCreateApplication takes a process identifier; that identifier only exists for processes running on the local OS. There is no remote variant of this API and Apple has never shipped one. A VPS process is on a different machine, in a different OS family (Linux), with no concept of NSWorkspace, TCC trust, Mach ports, or AXIsProcessTrusted. You can verify this by reading AppState.swift in the Fazm repo at line 482 (the testAccessibilityPermission function) and tracing what would happen if you tried to run those exact calls from anywhere except a macOS user-space process.
Can a VPS still drive my Mac if I open a tunnel back?
Yes, but the agent is no longer on the VPS. If you reverse-tunnel a screen-share or accessibility bridge, you are running a local agent process on the Mac that takes instructions from the VPS. The thing that clicks lives on your Mac. The thing on the VPS is a controller. People sell that arrangement as a VPS-hosted agent, which is fine until you ask which side is doing the work that depends on AX, files, the screen, and the microphone. Always the local side.
Is a personal AI agent on a VPS just dishonest framing then?
Not at all. It is honest for one specific shape of agent: a headless task runner that polls APIs, scrapes web pages with a server-side browser, sends emails, runs scheduled jobs, and persists conversation state. AutoGPT-style loops, CrewAI, agentic scrapers, automated outreach, all run fine on a $6 VPS because none of them need the OS surface area of your laptop. The dishonest framing is only when these get marketed as 'personal AI agents' that replace something like Fazm or a Mac-side computer-use agent. They are different products with the same name.
If on-device is the right shape, why do most articles say to put it on a VPS?
Because the AI agent stack that exploded on Twitter in 2024-2025 was Python, an LLM API key, a few tool-calling libraries, and a server. That stack is much cheaper to ship as a VPS template than as a native Mac app. Hostinger, Virtua, QuantVPS and most hosting comparisons rank for this query because they sell the hosting; they are not wrong, they are answering a different question. None of them are about agents that operate your real apps.
Does Fazm need 24/7 availability the way a VPS gives?
Usually not. The work a Mac-side agent is good at, replying to email in your tone, filling forms, moving data between apps, answering chat in a CRM, only matters when you are at your Mac. You either trust the agent enough to leave the laptop awake, or you batch the work to when you are present. A VPS-style 24/7 task runner is a different product. If you need both, run them as two separate things.
Where is the inference, on-device or cloud?
Pluggable. Fazm uses Anthropic Claude by default via the Claude Agent SDK as a local subprocess; the model endpoint is a setting and can be pointed at any Anthropic-compatible gateway. That includes a corporate proxy, GitHub Copilot, a hosted provider, or a local model behind a bridge. The on-device decision is about where the agent runs and what it can touch (your apps, your files, your screen), not about where the tokens are computed.
What about Apple Silicon making local inference viable, does that change the calculus?
It changes one thing: you can keep tokens local too. M-series Macs run useful 4-8B parameter models at usable speed. That is great for short tasks, autocomplete, low-latency tool calls, and cases where you do not want a cloud provider in the loop. It does not change the agent boundary, the agent has to be on the Mac to operate the Mac whether the model is local or remote.
How can I verify the AX claim myself?
Three steps. Open AppState.swift at line 480 in github.com/m13v/fazm. Read testAccessibilityPermission(); follow the calls it makes (AXUIElementCreateApplication, AXUIElementCopyAttributeValue, kAXFocusedWindowAttribute). Then read Apple's developer documentation at developer.apple.com/documentation/applicationservices/axuielement_h. Note that none of the function signatures take a hostname or remote identifier, only local PIDs and CFType objects from the same process tree. There is no remote AX surface to call from a VPS.
Is Fazm open source so I can audit any of this?
Yes, MIT licensed, source at github.com/m13v/fazm. The two files that matter most for this comparison are Desktop/Sources/AppState.swift (the live AX permission probe and the function that actually calls AX against your frontmost app) and acp-bridge/src/index.ts (where the playwright and macos-use MCP servers get spawned as local subprocesses on your Mac, not on a VPS).
On-device computer-use agents
Keep reading
Local LLM vs local AI agent on macOS
A local LLM gives you tokens. A local AI agent gives you actions. The harness is the product, not the model.
Personal AI agent on device, in Fazm
The four-table SQLite schema and the one line in ChatProvider.swift that wraps every chat turn with your local profile.
Accessibility tree vs screenshots for AI agents
Why named AX fields beat pixel coordinates for any agent that has to click around real apps.