Architecture Diagrams vs Working Systems - How AI Agents Expose the Gap

M
Matthew Diakonov

Architecture Diagrams vs Working Systems - How AI Agents Expose the Gap

There is a common pattern in tech companies: someone draws a diagram with boxes and arrows, labels it "architecture," and calls the job done. The boxes say things like "API Gateway" and "Event Bus" and "ML Pipeline." The arrows connect them in satisfying ways. But nobody has verified that this design actually works.

AI agents are exposing this gap in a way that is hard to ignore.

What "Architecture" Usually Means in Practice

Software architecture erosion - the gradual divergence between intended and actual system design - is one of the oldest problems in software engineering. Mozilla's web browser is a textbook example: a codebase that grew from a clear original architecture into something progressively harder to understand and maintain because implementation decisions kept drifting from the design. Each individual change made sense locally; collectively they produced something nobody designed.

The usual pattern:

  1. An architect produces a diagram showing how components interact
  2. Teams implement components with assumptions that were never written down
  3. Six months later the diagram bears only a passing resemblance to what was built
  4. New engineers ask what the "real" architecture is and nobody can answer confidently

This happens even on teams that care about architecture. The diagram is a communication tool, not a specification. The gaps between boxes are filled by tribal knowledge, verbal clarifications, and individual judgment. None of that is in the document.

What AI Agents Do Differently

When you ask an AI agent to implement an architecture from a document, it takes the specification literally. It builds exactly what the document describes. This is both the feature and the diagnostic tool.

Consider a typical microservices architecture diagram. It shows:

  • Service A sending events to an Event Bus
  • Service B consuming from the Event Bus
  • Both services talking to a shared User Service

An AI agent attempting to implement this will immediately hit questions the diagram does not answer:

  • What is the message format for events on the Event Bus? Is it JSON? Protobuf? What fields are required?
  • What happens when Service B is down and events queue up? Is there a dead letter queue? A retry policy?
  • Does "shared User Service" mean both services call the same endpoint, or do they each have a read replica?
  • How does authentication work between services? mTLS? JWT with a shared secret? API keys?
  • What does "auto-scale" mean for the ML Pipeline - scale on what metric, to what limit?

A human engineer would fill in these gaps with reasonable assumptions, ask a coworker, or look at existing code for precedent. An agent builds what the document says and gets stuck or makes wrong assumptions at every gap.

The resulting prototype will have bugs and missing features - but the bugs and missing features will map precisely to the underspecified sections of the architecture. That is the diagnostic value.

A Concrete Example

Here is the kind of gap that appears in almost every architecture review:

The diagram shows: Frontend -> API Gateway -> Auth Service -> Backend

What the diagram does not specify:

  • What happens when Auth Service is unavailable? Does API Gateway fail open, fail closed, or return a specific error code?
  • Is the JWT validated at the API Gateway or passed through for Backend to validate?
  • What is the token expiry and refresh strategy?
  • How does the Backend know which user made a request - does the Gateway add a header?

An agent implementing this will make a choice for each unspecified behavior. Whether that choice matches what the architecture team intended is essentially random. When the prototype misbehaves, the failure tells you exactly which assumptions the architect made without writing them down.

Using Agents as Architecture Validators

This produces a useful workflow:

  1. Write the architecture document
  2. Ask an agent to build a prototype from the document alone - no verbal explanation, no supplementary context
  3. Review where the agent got stuck, what it assumed, and what it built wrong
  4. Each gap is a section of the architecture that needs to be made explicit
  5. Revise the document and repeat until the prototype is correct

The cost of this loop is a few hours of agent time and review. The cost of finding the same gaps after a team has built on the architecture for six months is measured in weeks of rework.

C4 container diagrams and architecture recovery tools are one approach to keeping design and implementation synchronized. Using an AI agent as a prototype builder is a complementary technique that validates the specification before implementation begins rather than after divergence has occurred.

What Good Architecture Documentation Looks Like

If you are going to use an agent as a validator, you need documentation that can survive the process. That means:

Message formats defined explicitly - not "events flow through the bus" but "events are JSON objects with these required fields and this schema version strategy."

Error handling specified - not "services communicate" but "service-to-service calls time out after 2 seconds, retry 3 times with exponential backoff, circuit break after 5 consecutive failures."

Data ownership clarified - not "shared database" but "Service A owns the users table, Service B reads through the User Service API, no direct database access from Service B."

Scaling triggers written down - not "auto-scale" but "scale out when CPU > 70% for 3 consecutive minutes, maximum 10 instances."

This level of detail feels excessive until you watch an agent fail in exactly the places where you left it vague.

The Team Dynamic

There is a softer benefit to this approach. Architects who know their documents will be tested by an agent tend to write more precise documents. The vague "event bus" becomes a specific Kafka topic with a schema registry. The hand-wavy "auth" becomes an explicit OAuth 2.0 client credentials flow with named scopes.

The agent does not care about the status or seniority of whoever drew the diagram. It implements what is written. That neutrality is sometimes exactly what a team needs to have an honest conversation about whether the architecture is actually specified well enough to build.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts