Multi-Agent Coworking: Experiments with AI Agents Collaborating on Shared Codebases
A LinkedIn post about Claude Cowork sparked a question that everyone working with AI agents eventually hits: can multiple agents productively share the same workspace? After running extensive experiments, here is what works, what fails, and what the coordination patterns look like in practice.
1. The Multi-Agent Coordination Problem
Running one AI agent on a codebase is straightforward. Running two introduces an entirely new class of problems. Running five starts to feel like managing a team of junior developers who cannot talk to each other and have no awareness of what their colleagues are doing.
The core tension is between parallelism and consistency. You want agents to work simultaneously for throughput, but simultaneous work on shared files creates conflicts, broken builds, and wasted compute. Every multi-agent system has to make a trade-off somewhere on this spectrum.
The fundamental question: Is the overhead of coordinating multiple agents worth the throughput gain? The answer depends entirely on how independent the tasks are. For fully independent tasks, multi-agent is a clear win. For coupled tasks, a single agent with full context usually wins.
2. Experiment Setup: How We Tested This
To understand multi-agent collaboration, we ran a series of controlled experiments with 2-5 Claude Code agents working on the same repository simultaneously. The experiments covered:
- Independent features - Different agents working on completely separate features in non-overlapping files
- Related features - Agents building features that share some common files (shared types, shared utilities)
- Same-file edits - Two agents needing to modify the same file for different reasons
- Sequential dependencies - Agent B needs Agent A's output before it can proceed
- Shared resource contention - Multiple agents trying to run tests, build, or deploy simultaneously
We measured success rate (did the task complete correctly), conflict rate (how often agents interfered with each other), total time (wall clock from start to all tasks done), and wasted compute (tokens spent on work that had to be redone due to conflicts).
The results were clear but nuanced. Independent tasks saw a near-linear speedup with more agents. Same-file edits failed about 60% of the time without coordination. Related features fell somewhere in between, depending on the degree of file overlap.
3. Types of Conflicts Between Agents
Not all conflicts are equal. Understanding the taxonomy helps you choose the right coordination strategy:
| Conflict Type | Description | Severity | Mitigation |
|---|---|---|---|
| File write collision | Two agents modify the same file simultaneously | High | File-level locking or worktrees |
| Semantic conflict | Agent A changes an interface that Agent B depends on | High | Dependency analysis before task assignment |
| Build contention | Multiple agents trying to build or test at the same time | Medium | Build queues or separate build directories |
| Stale read | Agent reads a file that another agent is about to change | Medium | Refresh context before critical writes |
| Style inconsistency | Agents adopt different patterns for similar problems | Low | Shared CLAUDE.md with explicit patterns |
File write collisions are the most visible but not the most damaging. Semantic conflicts are worse because they pass initial checks but create bugs that surface later. The best coordination systems address both.
4. Coordination Patterns That Work
After testing various approaches, three coordination patterns emerged as reliable:
- Git worktree isolation - Each agent gets its own worktree (effectively its own copy of the repo). They commit to separate branches and merge when done. This eliminates file conflicts entirely at the cost of harder merges. Best for independent features.
- File-level territory assignment - Before agents start, assign each agent a set of files they are allowed to modify. This is simpler than worktrees but requires upfront planning. The CLAUDE.md can encode these boundaries. Best for related features with clear file boundaries.
- Retry-on-conflict - Let agents work freely but detect conflicts (via build failures or file change detection) and have the conflicting agent retry. This is the simplest to implement but wastes compute on the retries. Best for low-conflict workloads.
The pattern we see used most in practice is a hybrid: worktrees for code changes combined with a shared test/build environment. Agents work in isolation but validate against the integrated codebase.
What does not work: giving agents the ability to send messages to each other. Every attempt at inter-agent messaging resulted in more token waste and slower completion than simply giving one agent all the context. Agent-to-agent communication is a research problem, not a production pattern.
6. Tools and Approaches in the Wild
Several tools and frameworks are tackling multi-agent collaboration from different angles:
- Claude Code with tmux - The simplest approach. Run multiple Claude Code instances in different tmux panes, each working in its own git worktree. Manual orchestration, but transparent and debuggable.
- Claude Cowork - Anthropic's approach to multi-agent coordination with built-in awareness of other running agents. Still early but shows the direction of native multi-agent support.
- Desktop AI agents - Tools like Fazm approach multi-agent work from the OS level. Instead of running multiple code agents, they use a single agent that controls the entire desktop - browser, terminal, apps - which sidesteps the coordination problem for tasks that span multiple applications.
- Custom orchestration - Teams building their own coordination layer, typically a script that assigns tasks, manages worktrees, and merges results. More setup work but maximum control.
- Agent frameworks (CrewAI, AutoGen, etc.) - General purpose multi-agent frameworks. Good for research and prototyping, but the agent-to-agent communication overhead is significant for production code tasks.
The trend is toward simpler isolation-based patterns rather than sophisticated communication protocols. The teams getting the best results are the ones that make agent independence the priority, not inter-agent messaging.
7. Practical Recommendations
Based on all experiments, here is the pragmatic advice for teams wanting to run multiple AI agents:
- Start with two agents, not five. Understand the conflict patterns at small scale before scaling up.
- Use git worktrees. The upfront setup cost is worth the elimination of file-level conflicts.
- Invest in your CLAUDE.md. It is the single most effective coordination mechanism because it gives all agents the same constraints without runtime overhead.
- Split tasks along module boundaries. If your repo has clear package/directory boundaries, assign agents to different packages.
- Do not try agent-to-agent messaging. Give each agent complete context upfront instead.
- Measure wasted compute. Track how often agents redo work due to conflicts. If it exceeds 20%, your task decomposition needs work.
- Have a human review merge points. Agents are good at independent execution but unreliable at resolving semantic conflicts between branches.
Multi-agent coworking is not magic. It is parallelism with coordination costs, just like multi-threaded programming. The developers who succeed with it are the ones who minimize shared mutable state - the same principle that makes concurrent software reliable.
One agent that handles your whole desktop
Fazm is an open-source macOS AI agent that controls browser, terminal, and apps from a single agent - no multi-agent coordination needed for cross-application workflows.
View on GitHub