Multi-Agent Coworking: Experiments with AI Agents Collaborating on Shared Codebases

A post about Claude Cowork sparked a question that everyone working with AI agents eventually hits: can multiple agents productively share the same workspace? After running extensive experiments, here is what works, what fails, and what the coordination patterns look like in practice.

OSS

“Fazm uses real accessibility APIs instead of screenshots, so it interacts with any app on your Mac reliably and fast. Free to start, fully open source.”

fazm.ai

1. The Multi-Agent Coordination Problem

Running one AI agent on a codebase is straightforward. Running two introduces an entirely new class of problems. Running five starts to feel like managing a team of junior developers who cannot talk to each other and have no awareness of what their colleagues are doing.

The core tension is between parallelism and consistency. You want agents to work simultaneously for throughput, but simultaneous work on shared files creates conflicts, broken builds, and wasted compute. Every multi-agent system has to make a trade-off somewhere on this spectrum.

The fundamental question: Is the overhead of coordinating multiple agents worth the throughput gain? The answer depends entirely on how independent the tasks are. For fully independent tasks, multi-agent is a clear win. For coupled tasks, a single agent with full context usually wins.

2. Experiment Setup: How We Tested This

To understand multi-agent collaboration, we ran a series of controlled experiments with 2-5 Claude Code agents working on the same repository simultaneously. The experiments covered:

Independent features - Different agents working on completely separate features in non-overlapping files
Related features - Agents building features that share some common files (shared types, shared utilities)
Same-file edits - Two agents needing to modify the same file for different reasons
Sequential dependencies - Agent B needs Agent A's output before it can proceed
Shared resource contention - Multiple agents trying to run tests, build, or deploy simultaneously

We measured success rate (did the task complete correctly), conflict rate (how often agents interfered with each other), total time (wall clock from start to all tasks done), and wasted compute (tokens spent on work that had to be redone due to conflicts).

The results were clear but nuanced. Independent tasks saw a near-linear speedup with more agents. Same-file edits failed about 60% of the time without coordination. Related features fell somewhere in between, depending on the degree of file overlap.

Try the AI agent that actually works with your apps

Fazm uses accessibility APIs to control your Mac natively. Voice-first, open source, runs locally.

3. Types of Conflicts Between Agents

Not all conflicts are equal. Understanding the taxonomy helps you choose the right coordination strategy:

Conflict Type	Description	Severity	Mitigation
File write collision	Two agents modify the same file simultaneously	High	File-level locking or worktrees
Semantic conflict	Agent A changes an interface that Agent B depends on	High	Dependency analysis before task assignment
Build contention	Multiple agents trying to build or test at the same time	Medium	Build queues or separate build directories
Stale read	Agent reads a file that another agent is about to change	Medium	Refresh context before critical writes
Style inconsistency	Agents adopt different patterns for similar problems	Low	Shared CLAUDE.md with explicit patterns

File write collisions are the most visible but not the most damaging. Semantic conflicts are worse because they pass initial checks but create bugs that surface later. The best coordination systems address both.

4. Coordination Patterns That Work

After testing various approaches, three coordination patterns emerged as reliable:

Git worktree isolation - Each agent gets its own worktree (effectively its own copy of the repo). They commit to separate branches and merge when done. This eliminates file conflicts entirely at the cost of harder merges. Best for independent features.
File-level territory assignment - Before agents start, assign each agent a set of files they are allowed to modify. This is simpler than worktrees but requires upfront planning. The CLAUDE.md can encode these boundaries. Best for related features with clear file boundaries.
Retry-on-conflict - Let agents work freely but detect conflicts (via build failures or file change detection) and have the conflicting agent retry. This is the simplest to implement but wastes compute on the retries. Best for low-conflict workloads.

The pattern we see used most in practice is a hybrid: worktrees for code changes combined with a shared test/build environment. Agents work in isolation but validate against the integrated codebase.

What does not work: giving agents the ability to send messages to each other. Every attempt at inter-agent messaging resulted in more token waste and slower completion than simply giving one agent all the context. Agent-to-agent communication is a research problem, not a production pattern.

5. Managing Shared State and Context

Even with isolated workspaces, agents need some shared context. Database state, API endpoints, environment variables, and project-level instructions are shared resources that all agents need to read consistently.

The CLAUDE.md file serves as the primary shared context document. It tells every agent the same rules, conventions, and project details. This is surprisingly effective - it is the closest thing to a "team standup" for AI agents. Without it, agents develop inconsistent patterns that create subtle conflicts during merge.

Lock files for shared resources - If agents need to run tests that use the same database, implement a simple lock mechanism. A file-based lock works: check for lock file, create if absent, remove when done.
Separate test environments - Give each agent its own test database or use randomized prefixes for test data. This eliminates data contention entirely.
Build artifacts - Agents should never share build output directories. Each worktree or workspace should have its own output path.
Error visibility - When an agent encounters a build error caused by another agent's incomplete work, it should wait and retry rather than trying to fix it. Explicit instructions for this in CLAUDE.md prevent wasted effort.

The meta-lesson: multi-agent setups work best when agents share read-only context (CLAUDE.md, docs, types) and have isolated write paths (their own branches, build directories, test databases).

6. Tools and Approaches in the Wild

Several tools and frameworks are tackling multi-agent collaboration from different angles:

Claude Code with tmux - The simplest approach. Run multiple Claude Code instances in different tmux panes, each working in its own git worktree. Manual orchestration, but transparent and debuggable.
Claude Cowork - Anthropic's approach to multi-agent coordination with built-in awareness of other running agents. Still early but shows the direction of native multi-agent support.
Desktop AI agents - Tools like Fazm approach multi-agent work from the OS level. Instead of running multiple code agents, they use a single agent that controls the entire desktop - browser, terminal, apps - which sidesteps the coordination problem for tasks that span multiple applications.
Custom orchestration - Teams building their own coordination layer, typically a script that assigns tasks, manages worktrees, and merges results. More setup work but maximum control.
Agent frameworks (CrewAI, AutoGen, etc.) - General purpose multi-agent frameworks. Good for research and prototyping, but the agent-to-agent communication overhead is significant for production code tasks.

The trend is toward simpler isolation-based patterns rather than sophisticated communication protocols. The teams getting the best results are the ones that make agent independence the priority, not inter-agent messaging.

7. Practical Recommendations

Based on all experiments, here is the pragmatic advice for teams wanting to run multiple AI agents:

Start with two agents, not five. Understand the conflict patterns at small scale before scaling up.
Use git worktrees. The upfront setup cost is worth the elimination of file-level conflicts.
Invest in your CLAUDE.md. It is the single most effective coordination mechanism because it gives all agents the same constraints without runtime overhead.
Split tasks along module boundaries. If your repo has clear package/directory boundaries, assign agents to different packages.
Do not try agent-to-agent messaging. Give each agent complete context upfront instead.
Measure wasted compute. Track how often agents redo work due to conflicts. If it exceeds 20%, your task decomposition needs work.
Have a human review merge points. Agents are good at independent execution but unreliable at resolving semantic conflicts between branches.

Multi-agent coworking is not magic. It is parallelism with coordination costs, just like multi-threaded programming. The developers who succeed with it are the ones who minimize shared mutable state - the same principle that makes concurrent software reliable.

One agent that handles your whole desktop

Fazm is an open-source macOS AI agent that controls browser, terminal, and apps from a single agent - no multi-agent coordination needed for cross-application workflows.

View on GitHub