AI browser automation for social posting

The honest answer is: stop using a headless browser, and stop trying to fake being human.

Every popular tutorial on AI-driven social posting tells you to launch headless Chrome, wrap it in a stealth plugin, add mouse jitter, rotate residential proxies, and write a script that pretends to be a person. That stack is the exact fingerprint Reddit and X have been flagging for two years. The architecture that actually works is the one that nobody bothers to write up: drive your real Chrome from your own Mac, perceive the page through the accessibility tree, and stop trying to hide that automation is happening.

Matthew Diakonov, Written with AI

Published May 13, 20269 min read

Direct answer (verified 2026-05-13)

Use a tool that drives your real, already-logged-in Chrome over the Chrome DevTools Protocol (via Playwright MCP's --extension flag, or the equivalent), reads pages through the accessibility tree instead of screenshots, and runs on the same machine your account already uses. Anti-bot detectors score the session by cookies, IP, TLS fingerprint, and behavior; a real local Chrome matches on all four. Fazm wires this for macOS in the open at github.com/mediar-ai/fazm; the relevant code lives in acp-bridge/src/index.ts around line 1735.

The fingerprint problem the common advice keeps dodging

Run any search for AI browser automation for social posting and you land on a small set of patterns. Some pages are SaaS schedulers like Hootsuite and SocialBee posting through official platform APIs, which works fine but is not browser automation at all. The rest are tutorials wrapping a headless Chromium build in a stealth plugin, adding a residential proxy pool, and selling the same advice: act more human. Slow the typing down. Move the mouse on a Bezier curve. Randomize the inter-action delay between 800 and 2200 ms. Use an aged account.

The reason that advice keeps appearing is that it used to work in 2020. It does not work in 2026 because the platforms now score the session, not the script. A Reddit anti-abuse system looks at the login cookie age, the IP reputation, the TLS handshake, the typing rhythm distribution, the resolution and timezone mismatch, the tab-focus history, and whether the keystroke events came from a synthetic source. Mouse jitter libraries all sample the same Bezier distribution; you can build a classifier on that alone. Stealth plugins patch a fixed set of navigator properties; the platforms run the same patches in their detectors. Residential proxy pools are mostly recycled; a fresh login from a pool IP that posted ten comments yesterday is the canonical bad signal.

The cheaper architecture is to remove those signals entirely instead of trying to spoof them. If you drive the user's actual Chrome with the user's actual cookies on the user's actual machine, there is nothing to spoof. The session has been alive for months. The IP has a history. The fingerprint is Chrome 132 stable on a real Mac because it is Chrome 132 stable on a real Mac. The DOM events come from the browser extension's privileged context, which fires real input events. The classifier has nothing to bite on.

691

“Characters in one Playwright accessibility-tree snapshot after the bridge strips images. A Retina screenshot of the same page is ~500 KB of base64.”

acp-bridge/src/index.ts, --image-responses omit configuration

Why the accessibility tree beats screenshots for social platforms

Almost every AI-agent demo for browser automation pipes a screenshot of the viewport into a vision model every turn. That works for a one-off lookup, but it is the wrong abstraction for social posting for two reasons that compound.

The first is cost. A 1440-by-900 Retina screenshot of Reddit's reply composer is roughly 500 KB of base64 once you wrap it in an image content block for the model. If the agent takes three turns to draft and submit a reply, that is 1.5 MB of context spent on pixels. On a cloud model you pay for it in tokens. On a local model running in a 32K context window, three turns of screenshots leaves no room for the actual reasoning.

The second is stability. Reddit and X iterate the post composer constantly. Pixel positions drift. Buttons get repainted. A vision model finds the new button most of the time but not always, and the edge cases are silent: the agent clicks the wrong control, posts partial content, deletes a draft, or worst, posts to the wrong subreddit. The accessibility tree does not change at that rate because screen readers depend on it. The "Comment" submit button has had the same accessibility name and role for years. Selecting by ref=e7 off a Playwright snapshot survives every visual refactor the platform ships.

The trade-off that nobody admits: accessibility-tree perception loses against screenshots when the target is a canvas-based or visual-only UI (some image editors, some games, some highly custom dashboards). Social platforms are the opposite end of the spectrum. They are designed to be screen-reader accessible because the platforms are legally required to be. The accessibility tree is a first-class DOM, not a fallback.

Headless plus stealth vs. real Chrome plus accessibility tree

Six dimensions where the two architectures diverge. Same task in both rows.

Feature	Headless stealth stack	Real-Chrome stack
Session and cookies	Empty profile dir. Headless Chrome boots with no logins, you script logins, the platform flags the new-device session immediately.	Your real Chrome profile. Already logged in to Reddit, X, GitHub. The agent acts inside the same session you'd use by hand.
Outbound IP	Datacenter IP if you run on a VPS, or your home IP plus residential proxy rotation. Both have known bad reputations on Reddit's risk model.	Whatever IP your Chrome was already on. If it was fine for your manual posting, it stays fine for the agent.
TLS and JA3 fingerprint	Stealth plugins patch navigator.webdriver, override permissions, fake plugins, randomize canvas. Patches drift, platforms run the same patches in their detectors.	Real Chrome stable build, real fingerprint. Nothing to patch because nothing is pretending to be Chrome, it is Chrome.
Perception channel	Screenshot of viewport sent to a vision model. Slow (about 500 KB per turn on a Retina Mac) and tied to pixel-exact layout, which platforms A/B test constantly.	Accessibility-tree text snapshot from Playwright MCP. About 691 chars per turn, stable against visual refactors, fast enough that you can run a local model.
Mouse and timing model	Bezier curve mouse jitter, randomized inter-key delay. Every off-the-shelf jitter library produces the same distribution, which the platform's classifier has seen a million times.	No simulated mouse. Tools call browser_click on an accessibility ref, the browser fires a real event from the extension's privileged context. Timing comes from how fast the agent reasons.
Visibility to the user	Headless. The user has no idea what the agent is doing. Bugs land in the platform's log before the user's.	Visible overlay on every page. The status pill reads "Browser controlled by Fazm. Feel free to switch tabs or use other apps." You see the same browser the agent sees.

No automation strategy makes a banned account un-banned. This compares the architectures, not the policy compliance.

The actual code path, with line numbers

The plumbing for this on Fazm is small enough to fit on one screen. The bridge between the agent runtime and Playwright lives at acp-bridge/src/index.ts. When a chat session opens, the bridge builds an argv array for Playwright MCP and forks the process. The relevant block is around line 1735.

// /Users/matthewdi/fazm/acp-bridge/src/index.ts, around line 1735
// Run on every chat session that needs the browser.

const playwrightArgs = [playwrightCli];

if (process.env.PLAYWRIGHT_USE_EXTENSION === "true") {
  playwrightArgs.push("--extension");           // attach to user's Chrome over CDP
}

playwrightArgs.push(
  "--output-mode",      "file",                  // snapshots land on disk, not inline
  "--image-responses",  "omit",                  // strip base64 PNG from tool results
  "--output-dir",       "/tmp/playwright-mcp",
);

// Inject the visible overlay so every page the agent touches gets the
// "Browser controlled by Fazm" status pill.
const overlayInitPage = join(__dirname, "..", "browser-overlay-init-page.cjs");
if (existsSync(overlayInitPage)) {
  playwrightArgs.push("--init-page", overlayInitPage);
}

servers.push({ name: "playwright", command: process.execPath, args: playwrightArgs, env: playwrightEnv });

Three things matter in that block. The --extension flag tells Playwright MCP to attach to a running Chrome over the DevTools Protocol via the browser extension instead of launching a new chromium binary. The pair --output-mode file and --image-responses omit push snapshots to disk and strip inline base64 PNGs from tool results so the model context does not bloat. The --init-page path injects a script on every page that paints the visible overlay.

The overlay itself is the part that surprises people. It is not marketing dressing. It is a real DOM element with the highest available z-index, pointer-events disabled, painted on top of every tab the agent navigates to.

// /Users/matthewdi/fazm/acp-bridge/browser-overlay-init.js
// Runs as an init script on every page Playwright navigates to.

(function injectFazmOverlay() {
  if (document.getElementById('fazm-overlay')) return;
  var overlay = document.createElement('div');
  overlay.id = 'fazm-overlay';
  overlay.innerHTML = ''
    + '<div id="fazm-canvas2">/* glowing wings, blobs, pointer-events:none */</div>'
    + '<div id="fazm-pill3">'
    +   '<div class="fazm-sp3"></div>'
    +   '<span>Browser controlled by Fazm \u00b7 Feel free to switch tabs or use other apps</span>'
    + '</div>';
  document.documentElement.appendChild(overlay);
})();

The status pill reads "Browser controlled by Fazm. Feel free to switch tabs or use other apps." The point is the opposite of stealth. The user can see at a glance that the agent has the wheel, can interrupt by clicking anywhere or pressing escape, and can switch to another tab without breaking the run because the agent has its own. The platform sees nothing the user does not see. There is no claim that this is a human, because there is no point in lying.

Two architectures, same task

A long-running script on a VPS launches headless Chromium, loads a stealth plugin, replays login cookies into an empty profile, rotates through residential proxies, simulates Bezier mouse movement, and types into the post field with randomized delays.

Empty profile means every session looks new
Stealth patches are public, detectors run the same patches
Mouse jitter libraries have a tell because they all sample the same Bezier
Proxy IPs are scored, residential pools are mostly recycled
User cannot see what the agent is doing

The counterargument: when headless still makes sense

Honest framing requires admitting where the headless stack is the right choice. Three cases.

Scale across thousands of accounts. If the job is to operate a warehouse of separate identities (which is a different conversation on whether that should exist at all), tying every identity to one user's real local Chrome does not work. Each identity needs its own isolated profile, its own IP. At that point you are running a fleet, and a real-Chrome-on-Mac architecture is the wrong shape regardless of detection trade-offs.

CI and scheduled scrapes with no human present. A cron job that checks whether a public profile shipped a new post needs to run unattended. There is no real Chrome session to attach to at 3 AM, and no human to confirm a CAPTCHA if one fires. Headless plus an API-first scheduler beats a real-Chrome agent there.

Targets without accessibility trees. Some highly custom dashboards (interactive 3D viewers, canvas-based editors, games) do not expose a useful accessibility tree. A vision model with screenshots is the only abstraction that maps onto them. Social platforms are not in this group, but plenty of other automation targets are.

Everything else, especially the bulk of what users actually want ("reply to this thread for me", "post this draft to my X account", "comment on the three best Reddit posts in this subreddit this week"), runs better in the real-Chrome architecture. The detection headache evaporates because there is nothing to detect.

“The pill that says 'Browser controlled by Fazm' was the deciding factor for me. I have run enough scraping scripts in tmux to know that 'silent' is the same as 'broken in three weeks'. I want to see what the agent is doing.”

Anonymous fazm user

from internal user feedback

What this looks like in practice for a Reddit reply

Concrete example, the kind of thing a small business owner actually does. You have a Reddit thread you want to reply to. The thread is in a subreddit you participate in regularly from your real account and your real Mac.

You open Fazm. You say: "Read this Reddit thread and draft a reply that does X." The agent calls browser_navigate with the URL. Chrome, the same Chrome you have used all morning, loads the thread in a new tab. The overlay paints on top. The agent calls browser_snapshot; Playwright writes a YAML accessibility-tree snapshot to /tmp/playwright-mcp/ and returns a small text result with ref ids. The agent reads the thread off the snapshot text, drafts a reply, shows it to you, and waits.

You read the draft. You edit two words. You say "post it." The agent calls browser_click on the reply-composer ref, then browser_type with the final draft, then browser_click on the submit-comment ref. The events fire from Chrome's extension context, which is the same source as a user typing. Reddit sees your real session, your real cookie age, your real IP, your real fingerprint, your real account history, and a comment posted at human speed because the bottleneck was you reading and approving the draft.

Nothing about that flow tries to look like a human. It is a human, one who happens to delegate the typing to an agent. The platform policy on whether AI-generated content is allowed is a separate question; the question this page answers is whether the delivery mechanism leaves a fingerprint, and the answer is no.

So what is the actual recommendation?

If your goal is one human's social-posting workflow on macOS, run the agent on the same Mac, attach it to your real Chrome over the DevTools Protocol, use accessibility-tree perception, and keep a visible overlay so you always know who has the wheel. You can do this with Fazm out of the box (the relevant flags are in acp-bridge/src/index.ts) or wire it yourself with the upstream Playwright MCP package using its --extension flag.

If your goal is to operate at fleet scale across many identities, this architecture is wrong for you and so is most of the advice in this category. Build for the fleet, accept the detection arms-race, and budget for accounts to die.

The honest middle ground is small and unglamorous: most people who want AI browser automation for social posting are one person replying to one inbox of threads from their one account. The plumbing for that case has been sitting in the Playwright MCP repo for over a year. Nobody writes about it because the spicy stack (headless plus stealth plus proxies) gets more affiliate clicks. That is the gap this page is here to fill.

Want to see this run on your own Reddit or X session?

15 minutes, your Mac, your real Chrome. I'll show you the --extension setup and we'll walk through a real reply end to end.

Frequently asked questions

Will I get banned posting to Reddit or X with this approach?

There is no automation strategy with a zero-ban guarantee. The honest framing: Reddit and X classify accounts by behavior, not by 'is this scripted'. A real account that posts at human cadence from its real browser and real IP looks like a real account. A new account that posts twenty replies in five minutes from a residential proxy with stealth Chrome patches gets flagged whether or not a human is at the keyboard. The setup described here removes the easy-to-classify signals; it does not remove the harder signal of behavior. Keep volume sane.

What does '--extension' actually do?

It tells Playwright MCP to attach to a running Chrome via the Chrome DevTools Protocol through a browser extension instead of launching a fresh chromium binary. The agent gets a real tab inside your real browser. You see the page move. You see the form fill. You can grab the keyboard at any time. The agent uses real DOM events, not synthesized mouse movements, because they come from the extension's privileged context.

Why is the accessibility tree better than screenshots for this?

Two reasons. First, perception cost. A screenshot of a Retina viewport is about 500 KB of base64 going into the model's context window every turn. An accessibility-tree snapshot from Playwright MCP is around 691 chars for the same page. You can run a local model on the second number; you cannot on the first. Second, stability. Reddit and X A/B test pixel layouts weekly. They do not refactor the accessibility tree weekly, because screen readers depend on it. Code that selects by accessibility ref keeps working after a visual redesign that breaks every coordinate-based agent.

What stops the agent from posting something I did not authorize?

Three things on the Fazm side. First, every action is a discrete tool call you can read in the chat log. Second, the visible overlay shows you the browser is being driven; you can pull keyboard focus and stop it mid-step. Third, posting workflows live on the agent's side, not the platform's; if you do not tell the agent to post, it does not post. Compare to a headless-on-VPS architecture, where the script keeps running with no eyes on it.

Can the same setup do anything besides social?

Yes, and that is the actual reason it exists. The product is a general macOS computer-use agent. Browser automation for social posting is one workflow shape. Other shapes that use the same plumbing: filling out Stripe dashboard invoices, replying to Gmail threads, updating a HubSpot record, scraping a logged-in admin panel, navigating a GitHub PR review. Anything you can do in your real Chrome, the agent can do without a separate stealth profile.

Does this work on Windows or Linux?

Not today. Fazm is macOS 14 or newer only. The underlying technique (Playwright MCP with --extension over CDP) is cross-platform on paper, but the rest of the product, especially the accessibility-tree integration with non-browser apps, is Mac-specific. If you only need the browser-extension piece, the upstream Playwright MCP package supports it directly.

What about LLM-side ban risk like content moderation?

Separate problem. Posting AI-generated text that looks formulaic, repetitive, or off-tone gets flagged by humans and by platform classifiers regardless of how you delivered it. The browser-automation layer cannot fix bad copy. Read the actual thread before replying. Match the audience's register. If the AI's draft sounds like an AI, that is the bottleneck, not the browser.

How do you handle 2FA, CAPTCHAs, and 'are you human' challenges?

If you are already logged in to your real Chrome, you have already passed 2FA on this device; the session cookie carries that through. CAPTCHAs are designed for cases the platform suspects automation, so they fire less often when the session, IP, and fingerprint all match a human user. When one does fire, the agent stops and the visible overlay tells you to solve it. You solve it. The agent resumes. This is the opposite of the headless workflow, where a CAPTCHA usually means the script silently fails.