Essay
The Boring Parts That Ship
By Benjamin Taini · Founder, Bouletteproof
The unglamorous machinery that decides whether multi-agent delivery actually works.
Most writing about multi-agent software engineering focuses on the cognitive layer: how agents reason, which LLM has the best planning capabilities, or how to structure prompts so an agent doesn't get stuck in an infinite loop.
These are interesting problems, but they are not the reasons multi-agent systems fail in production.
When you move from a single agent running on a developer's laptop to a fleet of agents working concurrently on a commercial codebase, the bottleneck shifts immediately from cognition to coordination. The hardest problems are suddenly the classic, unglamorous challenges of distributed systems: file conflicts, sequencing, state synchronization, and resource contention.
If you do not solve these boring parts, your agents will spend half their token budget overwriting each other's changes, breaking the build, and corrupting the git history.
The Concurrency Trap
In a typical single-agent setup, the loop is linear: the agent reads a file, plans a change, writes the file, runs tests, and repeats. This works because there is only one writer.
The moment you introduce parallel execution—say, one agent refactoring a database schema while another updates the API endpoints and a third writes integration tests—the linear model collapses.
- The Shared File Problem: Two agents attempt to modify the same configuration file or package manifest simultaneously. Without a locking mechanism, the agent that writes last wins, silently discarding the other's work.
- The Out-of-Date Context: Agent A reads
auth.tsat 10:00 AM. Agent B modifiesauth.tsand commits at 10:01 AM. Agent A, still working with the 10:00 AM context, writes its changes at 10:02 AM, completely unaware that the underlying file has changed. - The Broken Dependency: Agent A changes a function signature in a shared utility library. Agent B is writing code that calls that utility. If Agent B does not receive the updated signature immediately, it will write invalid code that fails compilation.
How We Solve This in Bouletteproof OS
To make multi-agent delivery reliable, we had to build a coordination layer that treats agents not as magical thinkers, but as untrusted concurrent processes. We designed three core mechanisms to handle this:
1. Optimistic Concurrency Control with File-Level Locks
We do not allow agents to write directly to the workspace. Instead, all file modifications must go through a virtual file system layer. When an agent wants to edit a file, it must acquire a lease. If another agent holds an active lease on that file, the requesting agent is either queued or redirected to another task.
2. Transactional Workspace Isolation
Every agent task runs in its own isolated branch or ephemeral workspace. Changes are not merged into the main working branch until they have passed a strict validation suite (linting, compilation, and unit tests). This ensures that a broken agent run cannot poison the environment for other active agents.
3. Real-Time Context Invalidation
When an agent successfully merges a change, a workspace event is broadcast to all other active agents. If an agent is currently working on a file that was affected by the merge, its execution is paused, its context window is refreshed with the new file state, and it is prompted to re-evaluate its plan before resuming.
The Scorer, Not the Judge
Another critical piece of the machinery is how we evaluate whether an agent's work is actually complete. We avoid using LLMs as subjective "judges" of code quality. Instead, we rely on deterministic scorers:
- Does the code compile?
- Do the existing tests pass?
- Does the new code meet the coverage threshold?
- Are there any new static analysis warnings?
Only when these objective criteria are met does the system allow the change to proceed. This keeps the delivery pipeline predictable and prevents "hallucinated progress" where an agent claims a task is done but the build is broken.
The Path Forward
Building these coordination systems is tedious. It involves writing file watchers, managing lock timeouts, handling git merge conflicts programmatically, and optimizing test runners for speed. It is not as exciting as prompt engineering or fine-tuning models.
But it is the only way to build a multi-agent system that actually ships software instead of just generating demos.
Related reading
- We Deleted 20 of Our Own Quality Checks — auditing the gates themselves.
- The Model Is the Smallest Part — why the system beats the model.
We are open-sourcing our coordination and sprint health monitoring tools soon under the sprint-health companion project.
Back to writing