MCP Architecture
MCP Server Composability: Stacking AI Capabilities Without Custom Code
Someone recently built a Perplexity-style MCP server using nothing but Playwright for browser automation - no API key, no custom scraping logic, just one MCP server calling through another. That project captures something important about where AI tooling is heading. Instead of building monolithic integrations, you stack small, focused MCP servers that each do one thing well. The agent discovers them dynamically and combines them as needed. This article breaks down how MCP composability works, why it matters, and how to build practical stacking patterns for your own workflows.
1. The Composability Problem in AI Tooling
Before MCP, every AI integration was a custom job. Want your agent to browse the web? Write a Playwright wrapper with custom tool definitions. Need database access? Build another wrapper. Want to send a WhatsApp message? Write yet another integration, handle authentication, parse responses, and wire it all into your agent's tool schema.
The result was fragile, tightly coupled tooling. Every team built their own version of the same integrations. A browser tool built for one agent framework would not work with another. Database connectors were duplicated across projects. When an API changed, every custom wrapper needed updating independently. The integration layer consumed more engineering time than the actual agent logic.
This is a composability problem. The individual pieces - browser automation, file access, messaging, search - all existed. But there was no standard way to combine them. Each integration was an island, and connecting islands required custom bridges every time.
MCP - the Model Context Protocol - solves this by defining a universal interface between agents and tools. Each tool is an MCP server that exposes its capabilities through a standard schema. The agent discovers available servers, reads their capability descriptions, and calls them through a consistent protocol. No custom wiring. No framework-specific adapters. Just a shared contract that any agent and any tool can speak.
2. How MCP Servers Compose
The core insight behind MCP composability is that each server adds capabilities independently, and the agent discovers them dynamically at runtime. You do not pre-wire connections between servers. You do not build a dependency graph. You just add servers to your configuration, and the agent gains new abilities.
Here is how the composition works in practice. When an MCP client (the agent) starts up, it connects to every configured MCP server and asks each one to list its tools. Each server responds with a set of tool definitions - names, descriptions, parameter schemas. The agent aggregates all of these into its available tool set. When the agent decides it needs to use a tool, it calls the appropriate server through the standard protocol.
The powerful part is what happens when servers stack. Consider the Perplexity MCP server example. Instead of calling the Perplexity API directly, someone built a server that uses Playwright to automate a browser, navigate to Perplexity, type a query, and extract the results. The Perplexity MCP server does not know or care that Playwright is also an MCP server. The agent can use both independently - Playwright for general browser automation and the Perplexity server for search - or the Perplexity server can use Playwright internally as an implementation detail.
This creates a layered capability model. Low-level servers provide primitive operations - click a button, read a file, execute a query. Higher-level servers combine these primitives into domain-specific workflows - search the web, send a message, deploy an application. The agent does not need to understand the internal composition. It just sees a flat list of available tools and picks the right one for the task at hand.
3. Real MCP Server Examples
The MCP ecosystem has grown rapidly. Here are some of the most practical servers available today, organized by the capability they add to your agent:
| Capability | MCP Server | How It Works | API Key Required |
|---|---|---|---|
| Browser automation | Playwright MCP | Controls Chromium via Playwright, navigates pages, fills forms, extracts content | No |
| Desktop control | Fazm / macOS-use | Uses macOS accessibility APIs to click, type, and read native app interfaces | No |
| Messaging | WhatsApp MCP | Controls the native WhatsApp app to send and read messages via accessibility | No |
| Web search | Perplexity MCP (via Playwright) | Automates Perplexity in a browser, no API key needed | No |
| Database access | PostgreSQL / SQLite MCP | Connects to databases, runs queries, inspects schemas | Connection string |
| File system | Filesystem MCP | Sandboxed file read/write/search within specified directories | No |
| Memory / persistence | Memory MCP | Stores and retrieves knowledge across sessions using a local knowledge graph | No |
Notice the pattern. Several of these servers require no API key at all. They work by automating interfaces that already exist - browsers, native apps, local databases, the file system. This is a fundamental shift from the API-first model of traditional integrations. Instead of negotiating API access, paying for tokens, and managing rate limits, you automate the same interfaces a human would use.
4. Building Your Own MCP Server
You should build a custom MCP server when your workflow requires a tool that does not exist yet, or when existing servers do not handle your specific use case well enough. The threshold is lower than you might expect. A minimal MCP server is under 100 lines of code.
The typical structure is straightforward. You define a set of tools - each with a name, description, and input schema. You implement a handler for each tool that does the actual work. You run the server over stdio or HTTP, and your MCP client connects to it like any other server. The MCP SDK (available in TypeScript and Python) handles the protocol details.
When to build your own:
- Internal tools - Your company has internal dashboards, admin panels, or proprietary systems that no public MCP server covers.
- Workflow-specific composition - You need a higher-level server that combines multiple lower-level tools into a single operation (like the Perplexity example).
- Security boundaries - You want to expose a subset of a system's capabilities with specific access controls that generic servers do not provide.
- Performance optimization - A generic server makes too many round trips. A custom server can batch operations or cache results for your specific access patterns.
When not to build your own: if an existing MCP server does 80% of what you need, use it. The marginal value of a custom server rarely justifies the maintenance burden unless you are hitting a hard limitation. Check the MCP server registries and community repos before writing custom code.
5. The Accessibility API Approach for Native Desktop MCP Servers
One of the most interesting patterns in MCP server development is using accessibility APIs to control native desktop applications. Instead of relying on application-specific APIs or reverse engineering protocols, you interact with apps through the same interface that screen readers and assistive technologies use.
On macOS, the Accessibility API exposes the full UI tree of every running application - buttons, text fields, menus, labels, scroll views. An MCP server can traverse this tree, find specific elements by role or label, click them, type into them, and read their current values. This works with any native macOS application without the app needing to support the integration.
Fazm's macOS MCP server uses exactly this approach. Rather than building custom integrations for each application, it exposes general-purpose tools for reading the accessibility tree, clicking elements, typing text, and pressing keyboard shortcuts. The agent decides which elements to interact with based on their accessibility labels and roles. This means the same MCP server can control Slack, Finder, Terminal, Xcode, or any other native application.
The accessibility approach has clear advantages. It is application-agnostic - you do not need a separate MCP server for each app. It does not require API keys or authentication beyond macOS accessibility permissions. It works with apps that have no API at all. And it mirrors how a human would interact with the application, which makes it naturally aligned with how agents reason about UI tasks.
The trade-off is reliability. Accessibility trees can change between app versions. Some apps have poor accessibility support. And element-level interactions are slower than direct API calls. For high-throughput data operations, a database MCP server will always be faster than clicking through a GUI. But for workflows that involve native desktop apps - moving files, configuring settings, interacting with chat apps - the accessibility approach is often the only viable path.
6. API-Based Integrations vs. MCP Composability
The traditional approach to AI agent tooling is API-first: get an API key, write a wrapper, handle authentication, parse responses. MCP does not replace APIs - many MCP servers use APIs internally. But MCP changes the integration model fundamentally. Here is how the two approaches compare:
| Dimension | API-Based Integrations | MCP Composability |
|---|---|---|
| Setup time | Hours to days per integration (auth, wrappers, testing) | Minutes (add server to config, restart) |
| Maintenance | Each wrapper must be updated when APIs change | Server maintainer handles updates, agent auto-discovers changes |
| Flexibility | Limited to what the API exposes | Can automate any interface (APIs, UIs, CLIs, file systems) |
| Authentication | API keys, OAuth tokens, per-service credentials | Often none (accessibility, browser automation) or connection-level auth |
| Composability | Manual - you write the glue code between services | Automatic - agent combines tools at runtime based on the task |
| Portability | Framework-specific, often not reusable | Any MCP client can use any MCP server |
| Cost | Per-call API pricing adds up | Many servers are free (local automation, open source) |
The key advantage of MCP is not any single dimension - it is the compound effect. When setup is fast, maintenance is externalized, and composability is automatic, you can add capabilities to your agent much faster than with custom integrations. A team that would spend a week building five custom API wrappers can configure five MCP servers in an afternoon.
That said, APIs still win for high-performance, high-reliability use cases. If you need to process thousands of database queries per minute, a direct PostgreSQL connection will outperform any MCP layer. If you need guaranteed delivery for critical messages, a direct API integration with retries and acknowledgments is more robust. MCP excels at breadth of capability and speed of integration. APIs excel at depth and reliability for individual services.
7. Practical Stacking Patterns
The real power of MCP composability shows up when you stack servers into purpose-built capability sets. Here are the most useful patterns we have seen in production agent setups:
The developer stack: coding + terminal + browser + search. This is the most common pattern. A coding agent gets filesystem access, terminal execution, browser automation (for testing and documentation), and web search (for looking up APIs and error messages). Claude Code with MCP servers is the canonical example. The agent can write code, run it, check the result in a browser, and search for solutions when something fails - all without leaving the conversation.
The desktop automation stack: native apps + browser + messaging. This pattern targets workflows that span multiple applications. An agent with Fazm's macOS MCP server for native app control, Playwright for browser automation, and a WhatsApp or Slack MCP server for messaging can handle complex multi-app workflows. Move data from a spreadsheet to a web form, take a screenshot, and send it to a colleague on WhatsApp - all as a single agent task.
The research stack: search + browser + memory + file system. For deep research tasks, you stack a search MCP server (Perplexity-style or direct web search), browser automation for following links and reading full pages, a memory server for storing findings across sessions, and filesystem access for writing reports. The agent can search, read, synthesize, and persist knowledge without the user managing any of the intermediate steps.
The data pipeline stack: database + file system + APIs + notifications. For data-heavy workflows, combine a database MCP server with filesystem access, domain-specific API servers, and a messaging server for alerts. The agent can query production data, write analysis scripts, run them, and notify the team about results - a complete data pipeline orchestrated through natural language.
The important principle across all these patterns is minimal scoping. Do not give every agent every server. Give each agent only the servers it needs for its specific task. This reduces token usage (fewer tool descriptions in the context), reduces the risk of unintended actions, and makes the agent's behavior more predictable. A research agent does not need database access. A data pipeline agent does not need WhatsApp. Scope your stacks tightly.
Looking ahead, the MCP server ecosystem is growing fast enough that most common capabilities already have community-maintained servers. The bottleneck is no longer building tools - it is choosing the right combination for your workflow and configuring them well. The teams that move fastest will be the ones that treat MCP servers like building blocks: small, composable, and swapped out as better options emerge.
Stack desktop control into your agent
Fazm is an open-source macOS MCP server that gives your agent native desktop control via accessibility APIs. Compose it with Playwright, databases, and messaging servers for full-stack automation. Free to start.
Get Fazm Free