Perplexity Computer Browser Control: Setup, Permissions, and What You Actually Get
Perplexity Computer Browser Control: Setup, Permissions, and What You Actually Get
Perplexity's computer agent can take over your browser and execute tasks on your behalf. You type a natural language instruction, and the agent navigates, clicks, types, and scrolls through web pages without you touching the mouse. But what does "browser control" actually mean here? What permissions does Perplexity need, how does the control mechanism work, and what are the boundaries of what it can and cannot control?
This guide covers the specifics of how Perplexity controls your browser, how to set it up, what level of access you are granting, and where the control model breaks down.
How Perplexity Browser Control Works
Perplexity's computer agent uses a vision-based control model. Instead of injecting JavaScript into web pages (the way developer tools like Playwright or Puppeteer work), it takes screenshots of the browser viewport, feeds them to a multimodal AI model, and generates mouse/keyboard actions based on what it "sees."
The control loop runs like this:
- You give a natural language instruction ("Book the cheapest flight from SFO to JFK next Friday")
- The agent captures a screenshot of the current browser state
- A vision model analyzes the screenshot and decides the next action (click coordinates, text to type, scroll direction)
- The action is executed in the browser
- A new screenshot is captured, and the loop repeats until the task is complete or the agent gets stuck
This is fundamentally different from script-based browser automation. Perplexity does not read the DOM, does not have access to CSS selectors, and does not execute arbitrary JavaScript. It interacts with the browser the same way a human would: by looking at the screen and moving the cursor.
Setting Up Perplexity Browser Control
Getting Perplexity's computer agent running requires a few steps:
1. Access the feature. Perplexity's computer agent (sometimes called "Comet") is available on Perplexity Pro plans. You will not find it on the free tier. Open Perplexity in your browser and look for the computer use toggle or the agent prompt interface.
2. Grant browser permissions. When you first activate the agent, Perplexity asks for screen viewing access. On macOS, this means granting screen recording permission in System Settings > Privacy & Security > Screen Recording. On Windows, a similar UAC prompt appears. Without this permission, the agent cannot capture screenshots and the control loop never starts.
3. Choose your browser. Perplexity currently works best with Chromium-based browsers (Chrome, Arc, Edge). Firefox support exists but some users report slower screenshot capture and occasional missed elements. Safari has limited support.
4. Start a task. Type your instruction in the Perplexity agent prompt. The agent will take over mouse and keyboard input within the browser window. You should avoid interacting with the browser while the agent is running, because your mouse movements will conflict with the agent's actions.
Warning
While the agent is running, do not move your mouse into the browser window or switch tabs. The agent navigates by screen coordinates, so unexpected cursor positions or tab switches will cause it to misclick or lose its place.
What Level of Control Does Perplexity Actually Get?
This is the question most people skip past. When you enable Perplexity's computer agent, here is what it can and cannot access:
| Capability | Perplexity Browser Control | Script-Based Tools (Playwright) | Desktop Agents (Fazm) | |---|---|---|---| | Click web page elements | Yes (vision-based) | Yes (selector-based) | Yes (accessibility API) | | Type into form fields | Yes | Yes | Yes, any app | | Read page content | Screenshot only | Full DOM access | Screen + accessibility tree | | Execute JavaScript | No | Yes | No (not needed) | | Access cookies/storage | No | Yes | No | | Control multiple tabs | Limited | Yes | Yes | | Work outside the browser | No | No | Yes | | Handle CAPTCHAs | Sometimes (vision) | No | Sometimes (vision) | | Access browser DevTools | No | Yes | No | | Control native apps | No | No | Yes |
The key takeaway: Perplexity's control is surface-level. It can see what is on screen and interact with visible elements, but it has no deep access to the browser's internals. It cannot read network requests, inspect the DOM structure, access localStorage, or manipulate cookies. For many everyday tasks (filling forms, navigating pages, extracting visible text) this is sufficient. For anything that requires programmatic access to page internals, it is not.
Permissions and Security Implications
Granting screen recording access to Perplexity means the agent can see everything visible in your browser window. This includes:
- Passwords as you type them (if the field is not masked)
- Banking information displayed on screen
- Private messages in open tabs
- Authentication tokens visible in URL bars or cookies panels
Perplexity states that screenshots are processed by their AI models for action decisions and are not stored long-term. However, you are still sending screenshots of your browser to Perplexity's servers during every control loop iteration. If you are logged into sensitive accounts while the agent runs, those screens are being transmitted.
Tip
Before running the agent on sensitive tasks, close any tabs with banking, medical, or other private information. Use a dedicated browser profile for agent tasks if possible.
Local vs. Cloud Control
One important distinction: Perplexity's browser control is cloud-dependent. Every screenshot goes to Perplexity's servers, the vision model runs remotely, and action decisions come back over the network. This means:
- Latency: each action in the control loop takes 2-5 seconds depending on network speed and server load
- Privacy: your screen content leaves your machine on every iteration
- Offline: the agent does not work without an internet connection
- Rate limits: heavy usage may hit API throttling during peak times
Desktop agents like Fazm take a different approach. The AI model runs locally on your Mac, screenshots never leave your machine, and the control loop runs at native speed without network round-trips. The tradeoff is that local models are smaller and may be less capable on complex reasoning tasks, but the privacy and latency benefits are significant.
Common Tasks and How Well Control Works
Perplexity's browser control handles structured, predictable web tasks well. Here is where it succeeds and where it struggles:
Works reliably
Struggles or fails
Common Pitfalls
-
Leaving sensitive tabs open. The agent screenshots your entire browser viewport. If your banking tab is visible, that screenshot goes to Perplexity's servers. Always close sensitive tabs before starting an agent task.
-
Interfering mid-task. Moving your mouse or clicking while the agent runs causes coordinate mismatches. The agent planned to click at (450, 320) but your mouse movement shifted focus. The task fails and you have to restart.
-
Expecting cross-app control. Perplexity's agent controls the browser only. If your task requires copying data from a browser into Excel, sending it via Slack, or running a terminal command, the agent stops at the browser boundary. You need a desktop agent for workflows that span multiple applications.
-
Running on slow connections. Each screenshot-action cycle requires a round trip to Perplexity's servers. On connections slower than ~10 Mbps, the latency per action can exceed 5 seconds, making even simple tasks painfully slow.
-
Trusting the agent with transactions. The agent can click "Buy" or "Submit" buttons. If you give it a task like "buy the cheapest flight," it may actually complete a purchase. Always add guardrails like "show me the options but do not click purchase" for financial transactions.
Browser Control vs. Desktop Control
The browser is one application. Your computer has dozens. The real question is whether browser-only control is sufficient for what you actually need to automate.
If your workflow lives entirely in web apps (Gmail, Google Docs, Notion, Jira), Perplexity's browser control covers most of it. The moment your workflow touches a native application, a file system operation, or a terminal command, browser control hits a wall.
Desktop agents work at the operating system level. On macOS, tools like Fazm use the accessibility API to read and interact with any application, not just the browser. The agent can switch between Chrome, Terminal, Finder, Slack, and Xcode in the same task because it controls the whole desktop, not a single application window.
Quick Setup Checklist
Here is the minimum you need to get Perplexity browser control working:
- Subscribe to Perplexity Pro (the computer agent is not available on free plans)
- Open Perplexity in a Chromium-based browser
- Grant screen recording permission when prompted (macOS: System Settings > Privacy & Security > Screen Recording)
- Close any tabs with sensitive information
- Type your task in natural language and let the agent run
- Do not touch the mouse or keyboard until the agent indicates it is done
- Review the result before the agent takes any irreversible action (purchases, deletions, form submissions)
Wrapping Up
Perplexity's computer browser control is a practical tool for automating repetitive web tasks. The vision-based approach means it works on any website without requiring custom selectors or scripts. The tradeoffs are real, though: every screenshot leaves your machine, latency adds up, and the agent stops at the browser boundary. For tasks that stay inside the browser, it works well. For workflows that span your whole desktop, you need something that controls the entire operating system, not just one application.
Fazm is an open source macOS AI agent that controls your entire desktop, not just the browser. Open source on GitHub.