Agentic Coding Guide

Claude Code Hooks, MCP Servers, and Multi-Agent Orchestration: The Complete Setup Guide

Claude Code has evolved well beyond a simple AI coding assistant. With the hooks system, Model Context Protocol (MCP) servers, and native multi-agent orchestration, it is now a platform for building agentic coding workflows that run complex, multi-step operations with minimal human intervention. But the documentation for these features is scattered across release notes, GitHub discussions, and community posts. This guide brings it together into a single practical reference for setting up hooks, building custom MCP servers, and orchestrating parallel subagents in production workflows.

OSS

“Fazm uses real accessibility APIs instead of screenshots, so it interacts with any app on your Mac reliably and fast. Free to start, fully open source.”

fazm.ai

1. The Hooks System: Lifecycle Events and Guardrails

Hooks are the most underused feature in Claude Code. They let you run custom code at specific points in Claude Code's lifecycle: before a prompt is sent, after a response is received, before a tool is executed, and after a tool completes. This turns Claude Code from a black box into a programmable system where you can inject validation, logging, and guardrails at every step.

The hook system is configured in your project's settings.json file (typically at .claude/settings.json). Each hook specifies a lifecycle event, a match pattern (which tools or commands it applies to), and a command to execute. The command runs as a subprocess and receives context about the current operation via stdin as JSON.

The practical applications fall into three categories. Safety guardrails prevent destructive operations - a pre-tool hook can block git push --force, prevent deletion of production configuration files, or require confirmation before running commands that modify system state. Quality gates enforce standards - a post-response hook can run a linter on generated code before it gets written to disk, or verify that generated tests actually pass. Logging and audit hooks capture a complete record of what Claude Code did, which is essential for debugging complex workflows and for compliance in enterprise environments.

Hook Event	When It Fires	Common Use Cases
PreToolUse	Before any tool (Bash, Read, Write, etc.) executes	Block dangerous commands, validate file paths, enforce permissions
PostToolUse	After a tool completes successfully	Run linters, format code, log file changes
Notification	When Claude Code wants to notify the user	Send Slack messages, desktop notifications, webhook alerts
Stop	When the agent completes a turn	Run test suites, commit changes, trigger CI

Key insight: The PreToolUse hook can return a JSON response that blocks the tool execution entirely. This means you can build a policy layer that prevents Claude Code from performing specific actions in specific contexts. For example, block all file writes outside the src/ directory, or block Bash commands that contain "rm -rf" with a path outside the project root.

Hooks execute synchronously, which means they add latency to each operation. Keep hook scripts fast - under 100 milliseconds is ideal. Anything that requires network calls or heavy computation should be handled asynchronously in a PostToolUse hook that does not block the next operation.

2. MCP Server Architecture: How It Works

The Model Context Protocol is an open standard that defines how AI models communicate with external tools and data sources. Instead of hardcoding tool integrations into the model or the client, MCP provides a standardized interface that any tool can implement. Claude Code acts as an MCP client and connects to one or more MCP servers, each of which exposes a set of tools that Claude Code can invoke.

The architecture is deliberately simple. An MCP server is a process that speaks JSON-RPC over stdio or HTTP. When Claude Code starts, it launches the configured MCP servers as subprocesses. The servers declare their available tools (name, description, input schema) and Claude Code includes these tool definitions in its context. When the model decides to use an MCP tool, Claude Code sends the tool call to the appropriate server and returns the result to the model.

This architecture has several advantages over traditional plugin systems. MCP servers run in their own process, so a buggy server cannot crash Claude Code. Servers can be written in any language - Python, TypeScript, Rust, Go - as long as they implement the protocol. Servers can maintain their own state between calls, which is critical for tools that need persistent connections (databases, browsers, long-running processes). And because the protocol is standardized, MCP servers are portable across different AI clients, not just Claude Code.

The MCP ecosystem already includes hundreds of community-built servers for databases, file systems, browsers, APIs, and more. But the real power comes from building custom servers tailored to your specific workflow needs.

Try the AI agent that actually works with your apps

Fazm uses accessibility APIs to control your Mac natively. Voice-first, open source, runs locally.

3. Building Custom MCP Servers

Building a custom MCP server is straightforward with the official SDKs. The TypeScript SDK (modelcontextprotocol/typescript-sdk) and Python SDK (modelcontextprotocol/python-sdk) handle the protocol plumbing so you can focus on implementing your tools.

A minimal MCP server has three parts: tool definitions that describe what the server can do, handler functions that implement the actual logic, and a transport layer that connects to Claude Code. The tool definitions are JSON schemas that describe each tool's name, purpose, and input parameters. The handler functions receive validated input and return structured results. The transport layer is usually stdio for local servers.

The most useful custom MCP servers fall into a few categories. Data access servers connect to internal databases, APIs, or knowledge bases that the model needs to query during its work. Action servers perform operations in external systems - creating tickets, sending notifications, deploying code, or updating dashboards. Integration servers bridge the gap between the AI agent and systems that were not designed for programmatic access.

Server Type	Example Tools	Complexity
Data access	query_database, search_docs, read_config	Low
Action	create_ticket, send_slack, deploy_service	Medium
Integration	browser_navigate, desktop_click, app_control	High
Composite	multi-step workflows exposed as single tools	High

One pattern that works exceptionally well is building MCP servers that wrap existing CLI tools or system capabilities. For example, Fazm operates as a macOS MCP server that exposes desktop automation capabilities through the accessibility API. This gives Claude Code the ability to see and interact with any application on the screen - clicking buttons, reading text, filling forms, navigating menus - all through standard MCP tool calls. The AI agent does not need to know anything about AppleScript or accessibility APIs; it just calls tools like "click element" or "read screen text" and the MCP server handles the OS-level integration.

When building your own MCP servers, invest time in writing clear tool descriptions. The model uses these descriptions to decide when and how to use each tool. A vague description like "manages database operations" will lead to incorrect tool usage. A specific description like "executes a read-only SQL query against the production PostgreSQL database and returns results as a JSON array" gives the model enough context to use the tool correctly.

4. Multi-Agent Orchestration: Parallel Subagents

Claude Code supports spawning subagents that run in parallel, each working on an independent task. The orchestrating agent decomposes a complex problem into subtasks, dispatches them to subagents, and synthesizes the results. In practice, running 5 parallel subagents on a well-decomposed problem can reduce completion time by 3x to 4x compared to sequential processing.

The key to effective multi-agent orchestration is task decomposition. Subagents work best on tasks that are independent - they do not need to read each other's output, they do not modify the same files, and they do not depend on shared state. Good decomposition examples include implementing different API endpoints, writing tests for separate modules, refactoring independent components, or researching different aspects of a problem.

Bad decomposition creates more problems than it solves. If two subagents need to edit the same file, they will produce conflicting changes. If one subagent depends on another's output, it will either wait (losing the parallelism benefit) or proceed with stale assumptions (producing wrong results). The orchestrating agent needs to be explicit about boundaries: which files each subagent owns, what interfaces they must conform to, and what they should not touch.

Task isolation - each subagent gets a clear scope. "Implement the user authentication endpoint in src/api/auth.ts. Do not modify any other files." Explicit boundaries prevent conflicts.
Interface contracts - when subagents produce components that need to work together, define the interfaces upfront. The orchestrating agent specifies the function signatures, data types, and API contracts before dispatching subtasks.
Result synthesis - after subagents complete, the orchestrating agent reviews all results together. This is where integration issues surface. The orchestrator may need to resolve naming conflicts, reconcile different approaches, or run integration tests.
Error handling - if a subagent fails, the orchestrator decides whether to retry, reassign, or skip that subtask. Having a clear policy for failure modes prevents the entire workflow from stalling on a single stuck subagent.
Context management - each subagent has its own context window. The orchestrator should provide only the context each subagent needs, not dump the entire project context into every subtask. Leaner context means faster execution and more room for the subagent to work.

Practical limit: Most developers report that 3 to 5 parallel subagents is the sweet spot. Beyond 5, the orchestration overhead increases faster than the throughput gains. The orchestrating agent spends more time managing subagents than it saves from parallelism. Start with 3 subagents and increase only if your tasks are truly independent and well-scoped.

5. Real-World Workflow Examples

Theory is useful, but seeing how these pieces fit together in real workflows makes the concepts concrete. Here are three patterns that production teams use with hooks, MCP, and multi-agent orchestration.

Full-stack feature implementation. The orchestrating agent receives a feature specification. It spawns three subagents in parallel: one implements the backend API endpoint, one builds the frontend component, and one writes the tests. A PreToolUse hook ensures no subagent modifies files outside its designated directory. A PostToolUse hook runs the linter after every file write. When all three subagents complete, the orchestrator runs the full test suite via a Stop hook, and if tests pass, it creates a pull request using a GitHub MCP server.

Codebase migration. Migrating a large codebase from one framework to another is ideal for parallel subagents because each file can be migrated independently. The orchestrator scans the codebase, groups files by module, and dispatches one subagent per module. Each subagent has access to a custom MCP server that provides migration rules and examples specific to the framework being migrated from and to. A PostToolUse hook runs the type checker after each file is written, catching type errors immediately rather than at the end.

Research and implementation. For tasks that require both investigation and coding, a two-phase approach works well. Phase one dispatches subagents to research different aspects of the problem in parallel - one reads documentation, one analyzes existing code patterns, one searches for relevant examples using a web search MCP server. The orchestrator synthesizes the research into an implementation plan. Phase two dispatches implementation subagents based on the plan. A desktop automation MCP server can extend this further by allowing subagents to interact with design tools, test in browsers, or verify behavior in applications that do not have programmatic APIs.

In each of these examples, the hooks provide safety and quality enforcement, the MCP servers extend the agent's capabilities beyond the filesystem and shell, and the multi-agent orchestration turns a sequential workflow into a parallel one. The three systems are complementary, and the real power emerges when you use all three together.

6. Performance Tips and Common Pitfalls

After helping teams adopt these patterns, certain mistakes come up repeatedly. Avoiding them will save you hours of debugging.

Keep MCP servers stateless when possible. Stateful servers (those that maintain connections, caches, or session data) introduce failure modes that are hard to debug. If a server crashes and restarts, all state is lost. Design servers so that each tool call is self-contained. If state is necessary (like a browser session), implement health checks and automatic recovery.
Do not overload the tool namespace. Every MCP tool you add increases the model's context and decision surface. A server with 50 tools confuses the model more than it helps. Group related operations into fewer tools with well-designed parameters rather than creating a separate tool for every operation. Five well-designed tools outperform 30 narrow ones.
Test hooks in isolation. Hooks run as subprocesses, and debugging a subprocess that fails inside Claude Code's execution loop is painful. Write your hook scripts so they can be tested standalone with sample JSON input piped to stdin. Get them working perfectly outside Claude Code before integrating.
Monitor subagent token usage. Each parallel subagent consumes tokens independently. Five subagents running simultaneously means five times the token throughput. On plans with rate limits, this can trigger throttling faster than you expect. Monitor usage and adjust parallelism accordingly.
Use CLAUDE.md files for subagent context. Instead of passing lengthy instructions in each subagent prompt, put shared context in CLAUDE.md files at the project or directory level. Subagents read these automatically, which keeps prompts short and ensures consistent context across all agents working on the project.

Pitfall	Symptom	Fix
Too many MCP tools	Model picks wrong tools or hallucinates tool names	Consolidate into fewer, well-described tools
Slow PreToolUse hooks	Noticeable lag between every operation	Keep hooks under 100ms, move heavy logic to PostToolUse
Overlapping subagent scope	File conflicts, overwritten changes, test failures	Define explicit file ownership per subagent
Stateful MCP server crashes	Tool calls fail after server restarts	Design for stateless operation or add recovery logic
No subagent error handling	Entire workflow stalls on one failed subtask	Implement retry and skip policies in orchestrator

The most impactful performance optimization is usually not technical. It is improving the quality of your prompts and tool descriptions. A subagent that understands its task clearly from the first prompt completes in one pass. A subagent that misunderstands and needs correction takes three to five passes. Spending 10 minutes writing a better task description saves 30 minutes of wasted computation.

7. Putting It All Together

The progression for adopting these features should be incremental. Start with hooks because they are the simplest to implement and provide immediate value. Add a PreToolUse hook that prevents destructive git operations and a PostToolUse hook that runs your linter after file writes. These two hooks alone will prevent a class of mistakes that every team encounters when using AI coding agents.

Next, add MCP servers for the external systems your workflow touches most frequently. If you spend time switching between Claude Code and a browser to check documentation, add a documentation search MCP server. If you frequently need to check deployment status, add a server that queries your CI/CD system. Each MCP server you add reduces the friction between the AI agent and the information or capabilities it needs.

Multi-agent orchestration should come last, after you have confidence in your hooks and MCP setup. Parallel subagents amplify both your productivity and any problems in your configuration. If your hooks are not catching destructive operations reliably, five parallel agents can make five times the mess. If your MCP servers are flaky, five agents hitting them simultaneously will surface every race condition.

The teams getting the most from these features share a common trait: they treat their Claude Code configuration as a codebase. The hooks, MCP server configs, CLAUDE.md files, and prompt templates are all version controlled, reviewed in PRs, and iterated on just like application code. This means improvements compound. When one developer figures out a better hook pattern or writes a useful MCP server, the entire team benefits immediately.

The agentic coding paradigm is still early, and the tooling is evolving rapidly. But the patterns described here - lifecycle hooks for safety and quality, MCP servers for extensibility, and parallel subagents for throughput - are stable architectural patterns that will remain relevant regardless of which specific tools dominate. Investing in understanding these patterns now puts you ahead of the curve as the ecosystem matures.

Give your AI agents desktop control

Fazm is an open-source macOS MCP server that gives AI agents the ability to see and interact with any desktop application via accessibility APIs. Use it with Claude Code to extend your agentic workflows beyond the terminal.

Free to start. Fully open source. Runs locally on your Mac.