Argument

Agentic labor compression on the desktop is bounded by reach

Compression ratios get quoted as if they were a property of the model. They are not. The fraction of a team’s desktop headcount one agent can absorb equals the fraction of that team’s actual apps the agent can drive. Most agents reach less surface area than the slide deck implies, and that is where the math falls apart.

Matthew Diakonov, Written with AI

Published May 8, 20268 min read

Direct answer · Verified 2026-05-08

How much desktop headcount can an agentic worker compress? The ratio you actually get is bounded by the share of your team’s apps the agent can reach. A browser-only agent caps you at the slice of work that lives in a browser tab, commonly 30 to 50 percent of small-ops desktop work. An agent that drives every Mac app via the accessibility tree (Fazm) raises that ceiling to the share of tasks that are repeatable and well-defined, then expect a 30 to 40 percent haircut for supervision and edge cases. The model is not the bottleneck. Reach is.

Source for the surface-area numbers below: fazm-tools-stdio.ts and index.ts in the open Fazm bridge.

0First-party tools

0MCP servers wired

0%Browser-only ceiling (typical)

0%Realistic supervision haircut

The thesis

There is a category of business writing on AI agents that goes: pick a knowledge job, list the tasks, multiply hours by people, declare that one agent replaces 0.7 of a person, ship the slide. The number is satisfying because it is concrete. It is also wrong in a specific way: it assumes the agent can reach all of the work. In practice, the fraction of headcount you can compress equals the fraction of your team’s real apps the agent can drive, minus the fraction of the work that is judgment-heavy enough that no model is going to pick it up this year.

That changes how you size the impact. You stop asking “how good is the model” (which is the same answer for everyone, because the leading models are within a tight band) and start asking “how many of my team’s apps can the agent actually click around in”. The answer to that question is what determines the compression ratio. The answer is also where the products are most different.

Where the work actually lives

A small business or ops team on macOS spends most of a workweek in roughly the same set of surfaces: a browser (gmail, calendar, the CRM, billing portal, bank), a few native apps (Slack desktop, the IDE if there is one, Numbers or Excel, Preview for PDFs, Notes), an Electron grab-bag (Notion, Linear, Figma, design tooling), and a constant stream of small popovers and OS chrome (file pickers, share sheets, signing prompts, keychain dialogs). The browser is one bucket. Everything else is at least four other buckets.

Pull a representative week and you see the same shape: a single repetitive task usually crosses two or three of these buckets. “Onboard a new customer” means a CRM update, a calendar invite, a contract sent through a signing app, a Slack ping to the channel, and a row in a spreadsheet. “Reconcile yesterday’s sales” means the bank web app, the spreadsheet, an export from the POS desktop client, and a pasted summary into Slack. None of that lives in a single tab.

The agent has to reach all of these to compress the task end to end

The compression formula, written down

Here is the calculation that survives a finance team:

compressible_hours_per_week =
  Σ (task_minutes × tasks_per_week) / 60
  for each recurring task that is
    (a) repeatable, and
    (b) well-defined, and
    (c) lives in an app the agent can drive.

agent_recovered_hours =
  compressible_hours_per_week
  × reach_factor          # share of those apps the agent reaches
  × (1 − supervision_haircut)

headcount_compressed = agent_recovered_hours / 40

Two terms in that formula are usually skipped on slides. reach_factor is what the agent can actually open and click. supervision_haircut is what the human still has to look at. Skipping them is how you get the “one agent equals four FTE” quote that does not survive a quarter of real usage.

Why surface area is a product decision, not a model decision

Three real product choices set reach_factor, and they are all things the user buys, not the model produces.

One: where the agent looks. A pixel-based agent reads screenshots and infers what is clickable from images. That works on demos and breaks on the long tail: dynamic ARIA-only controls, native menus that pop up below a screenshot, popovers that close before the next frame, dark-mode skins it has not seen. An accessibility-tree agent walks the same data structure VoiceOver uses, gets stable role and label and frame for every element, and can act with a single AX query at sub-50ms latency instead of a 200ms-to-1s screenshot-and-recognize cycle.

Two: which surfaces are wired. Even an AX-aware agent only reaches what the product wired up. Fazm’s bridge gives the model nineteen first-party tools (defined in acp-bridge/src/fazm-tools-stdio.ts), plus four MCP server integrations wired in acp-bridge/src/index.ts: playwright (browser), macos-use (native AX-tree control of any Mac app), whatsapp, and google-workspace. macos-use is the multiplier; it turns “every Mac app that exposes AX” into one tool slot.

Three: who holds the session. If the agent runs in someone else’s cloud, your reach is bounded by what your IT team is willing to OAuth into a third party. If the agent runs locally inside your existing macOS session, your reach is bounded by what you are already logged into on your laptop, which is, by definition, every app you actually use. The two ceilings are very different in practice.

Compression ceiling by agent shape

Feature	Browser-only / pixel agent	AX-tree native agent
Apps the agent can drive	Browser tabs only, or whatever has stable enough pixels to recognize	Every Mac app exposing AX (essentially every Cocoa, AppKit, SwiftUI, Catalyst, modern Electron app) plus the browser
Per-click latency	Screenshot capture + vision pass + click, 200ms to 1s per step	Single AX query, <50ms typical
Where it breaks	Dynamic UI, popovers, dark mode skins, native menus, anything off-screen	Apps with no AX coverage (rare on Mac), judgment-heavy steps
Session ownership	Often a hosted cloud session, every integration needs explicit OAuth	Runs inside your logged-in macOS session, no third-party OAuth
Reach factor (rough)	Caps at the share of work that lives in a browser tab	Caps at the share of work that is repeatable and well-defined
Cost per compressed hour	Per-task SaaS pricing scales linearly with throughput	Inference only, scales with usage but not seat count

A worked example

One person on a five-person ops team spends roughly the following on recurring desktop work in a normal week:

CRM updates after calls: 3.5 hours, lives in browser CRM and the calendar app.
Invoicing and billing reconciliation: 2.5 hours, browser bank + native spreadsheet + native PDF preview.
Calendar reshuffles for client meetings: 1.5 hours, calendar app + email + Slack desktop.
Inbound triage and follow-up: 4 hours, gmail web + Notion or Linear + the CRM.
Document signing chase-ups: 1 hour, docusign web + native PDF + Slack.

Total: 12.5 hours of compressible recurring work per person per week. About 6 of those 12.5 hours live entirely in a browser tab. The other 6.5 cross into native or Electron apps that an AX-tree agent reaches and a browser-only agent does not.

Apply the formula. With a browser-only agent at a 70 percent supervision factor, you compress 6.0 × 0.70 = 4.2 hours per person per week, or about 21 hours across the team, half a person. With an AX-tree agent at the same supervision factor, you compress 12.5 × 0.70 = 8.75 hours per person per week, or about 43.75 hours across the team, just over one full person. Same model, same prompts, same five people. The delta is reach.

Same task, two agent shapes

Reads gmail, opens the CRM tab, types the update, drafts the follow-up. Stops at the calendar invite (in the native macOS Calendar app), stops at the signed PDF (in Preview), stops at the Slack ping (Slack desktop is not a tab).

Compressed: ~half the recurring work
Hands off the rest to the human
Person still owns context-switching tax
Reach factor is the cap, not the model

Where the framing fails

Compression is not the right lens for everything. Three places it overpromises:

Judgment work. Picking which deal to pursue, which clause to push back on, which candidate to advance. No reach factor saves you here, because the bottleneck is the decision, not the clicks around the decision. The honest framing: the agent compresses the surrounding work so the human spends more of their time on the decision and less on staging it.

One-off and creative work. Compression assumes recurrence. If you write a custom proposal once a quarter, an agent saves you maybe 20 percent on the boilerplate. The headcount math does not move.

Work that depends on tribal knowledge. If the rule for which discount to apply lives only in someone’s head, the agent cannot compress it without a knowledge step first. That is solvable (Fazm has a save_knowledge_graph tool for exactly this), but it is its own project, not a free lunch.

What to do with this

If you are sizing an agent project for a real team, do the boring version of the analysis: write the fifteen recurring tasks, mark which ones cross out of a browser tab, sum the hours, and apply a 30 to 40 percent supervision haircut. If your team’s tasks live mostly inside a browser, a browser-only agent is fine. If they cross into native apps (almost every small ops team), you need an agent that walks the accessibility tree, not pixels, and runs locally enough to be inside your existing logins.

That is the version of the math we built Fazm against. The tools list is in the open repo so you can audit the surface area before you trust the ratio.

Sizing this for your team

Bring your top fifteen recurring tasks. We will walk through the reach factor and the haircut on the call and tell you what an honest first-year compression number looks like.

FAQ

Frequently asked questions

What does agentic labor compression mean in plain language?

It is the ratio of human work-hours one agent can absorb in a fixed amount of wall-clock time. If a salesperson spends 12 hours a week on CRM updates, calendar reshuffles, and follow-up emails, and an agent takes those 12 hours down to 90 minutes of supervision, the agent compressed about 10.5 hours of labor per week. Compression is not the same as automation. Automation removes the work; compression collapses how much human time the work needs. Most desktop work is compressible, not removable, because someone still has to look at the result.

Why does the keyword pair compression with headcount?

Headcount is how the leverage shows up on a P&L. If you compress 12 hours a week of repetitive desktop work for ten people, that is 120 hours a week, which is roughly three full-time employees. The math is real, but it only holds if the agent can actually reach the apps where those 120 hours live. Most articles quote the ratio without quoting the reach. That is where the gap between slide-deck math and operating reality opens up.

What is the surface-area constraint?

An agent can only compress work in apps it can drive. A browser-only agent can drive web apps in a browser tab. A screenshot-based agent can drive whatever it can recognize in pixels, which works for stable layouts and breaks for anything dynamic, popovers, or accessibility-only interactions. A native macOS agent that walks the accessibility tree can drive every app that exposes AX, which on Mac is essentially every Cocoa, AppKit, SwiftUI, Catalyst, and modern Electron app. The compression ratio you can achieve is bounded above by the surface area the agent reaches, not by the model behind it.

How many tools does Fazm actually expose?

Nineteen first-party tools defined in acp-bridge/src/fazm-tools-stdio.ts (execute_sql, capture_screenshot, check_permission_status, request_permission, extract_browser_profile, edit_browser_profile, query_browser_profile, scan_files, set_user_preferences, ask_followup, complete_onboarding, save_knowledge_graph, save_observer_card, speak_response, routines_create, routines_list, routines_update, routines_remove, routines_runs), plus four MCP server integrations wired in acp-bridge/src/index.ts: playwright (browser), macos-use (native macOS app control via accessibility tree), whatsapp, and google-workspace. The model picks among them per turn. The reach matters because tool selection is the upper bound on what compresses.

How do I estimate compression for my team without overclaiming?

Three honest steps. First, write down the top fifteen recurring desktop tasks each role does in a week and how long each takes (calendar invites, CRM updates, invoice creation, follow-up emails, document signing, data reconciliation). Second, mark each task with the apps it touches. Third, mark each task as either 'lives entirely in a browser tab' or 'crosses native apps'. The browser-only fraction is the upper bound for a browser-only agent. The full set is the upper bound for an accessibility-tree agent. Then haircut by 30 to 40 percent for supervision, edge cases, and mistakes. That is your honest first-year compression estimate.

Where does compression actually break down?

Three places. One, judgment-heavy tasks (deciding which lead to call, which contract clause to push back on, which candidate to hire). Two, tasks with bespoke per-customer policy that lives only in a human's head. Three, tasks where the underlying app changes its UI faster than the agent's grounding can keep up. The first two are not surface-area problems and no agent solves them. The third is a surface-area problem in disguise: it is why pixel-based grounding is a worse bet than the accessibility tree for anything you want to run unattended.

Does running locally change the compression math?

Yes, in two specific ways. First, marginal cost per task on a local agent is roughly the price of inference (and Fazm runs against your own API key or against local models), so the per-hour-compressed cost stops scaling with usage the way a per-task SaaS would. Second, the agent can see the screen and the apps you are already logged into without any data leaving your machine, which removes the SOC2 review that often blocks compression projects from starting at all. Local does not raise the ceiling on what is automatable; it lowers the floor on what is allowed.

Is this just RPA with a chat interface?

No. RPA records a fixed sequence of clicks against fixed selectors and breaks the moment a UI moves. An agent backed by a model plus the accessibility tree picks the action per state, replans on failure, and can recover when an OK button moves three pixels. The model is what makes this not RPA. The accessibility tree is what makes the model's actions actually land. RPA tools have neither side of that loop and have a much lower compression ceiling on real-world apps.

Guide