AI Development Patterns

Spec-First AI Coding: Why Your CLAUDE.md Matters More Than Your Code at Scale

There is a phase transition in AI-assisted development that most people discover the hard way. Below about 15 files, you can vibe code your way to a working application. The AI holds enough context to make consistent decisions. Above 15 files, the AI starts contradicting itself - different naming conventions, duplicate utilities, competing patterns for the same problem. The fix is counterintuitive: spend more time writing specs and less time writing code. The specs become the source of truth that keeps the AI coherent as your codebase grows.

OSS

“Fazm uses real accessibility APIs instead of screenshots, so it interacts with any app on your Mac reliably and fast. Free to start, fully open source.”

fazm.ai

1. The 15-File Phase Transition

The number is not magic, but it is remarkably consistent. Developers building with AI coding assistants - whether Cursor, Claude Code, GitHub Copilot, or Windsurf - report hitting a coherence wall somewhere between 10 and 20 files. The symptoms are always the same.

First, naming inconsistencies appear. One file uses camelCase for function names while another uses snake_case. One component is called UserProfile while a similar one is called profile_card. The AI is not being careless - it simply cannot see all the existing files at once and makes locally reasonable choices that are globally inconsistent.

Second, utility duplication starts. The AI creates a formatDate function in one file because it does not know that a dateFormatter already exists three directories away. By the time you notice, there are four different date formatting implementations, each slightly different.

Third, architectural drift takes hold. Early files use one pattern (say, fetching data in components). Later files use a different pattern (say, fetching in server-side functions). The AI does not remember which pattern was established first, so it picks whatever seems most natural for the current prompt.

The root cause is context window limitations. Even models with 200K token windows cannot hold an entire codebase in context once it reaches moderate size. And even when the window is large enough, the model's attention degrades over long contexts - conventions mentioned 50,000 tokens ago have less influence than recent content. The solution is not a bigger context window. It is a concise spec that the AI reads at the start of every interaction.

2. What Spec-First Actually Means

Spec-first development with AI is not traditional waterfall specification. You are not writing 50-page documents before coding. You are writing concise, machine-readable instructions that the AI consumes at the start of every session. The spec evolves with the code, and updating the spec is part of every development cycle.

In practice, spec-first means three things:

Before adding a feature: write a brief description of what it does, where it fits in the architecture, and what conventions it should follow. This takes 5-10 minutes and saves 30-60 minutes of rework.
Before each AI session: ensure the spec file (CLAUDE.md, .cursorrules, or equivalent) is up to date. If you added a new convention or pattern since the last session, add it to the spec.
After each AI session: review what the AI generated and update the spec if the AI made a good decision that should become a convention, or if it made a bad decision that should be explicitly prohibited.

The spec becomes a living document that captures your project's accumulated decisions. It is the shortest path between "what I want" and "what the AI generates," because it encodes all the context that the AI would otherwise need to infer (often incorrectly) from the codebase.

Try the AI agent that actually works with your apps

Fazm uses accessibility APIs to control your Mac natively. Voice-first, open source, runs locally.

3. Spec Detail by Project Size

The amount of spec detail you need scales with project complexity. Here is a practical progression:

Project Size	Spec Length	What to Include	Time Spent on Specs
1-10 files	0-10 lines	Nothing or a brief project description	0% of dev time
10-25 files	20-50 lines	Naming conventions, import patterns, folder structure	10% of dev time
25-75 files	50-150 lines	Above + architecture rules, data flow, testing conventions	20% of dev time
75-200 files	150-300 lines	Above + module specs, API contracts, deployment constraints	30% of dev time
200+ files	300+ lines (hierarchical)	Root spec + per-module specs + ADRs	40%+ of dev time

The 40% figure at the high end surprises people, but it reflects reality. In a large AI-assisted codebase, the spec is the primary artifact you maintain. The code is generated from the spec. When you need to change behavior, you update the spec first and then let the AI regenerate the affected code. This is a fundamental inversion of the traditional development model, where code is primary and documentation is secondary.

4. CLAUDE.md Patterns That Work

CLAUDE.md has become the de facto standard for project-level AI specs, used by Claude Code and increasingly adopted by other tools. The most effective CLAUDE.md files share several patterns:

Constraints over preferences. Instead of "prefer functional components," write "never use class components." Constraints are unambiguous. Preferences leave room for interpretation, and AI models will interpret them differently across sessions.

Examples over abstractions. Instead of "follow the repository's naming conventions," write "component files: PascalCase (UserProfile.tsx). Utility files: camelCase (formatDate.ts). API routes: kebab-case (get-user-profile.ts)." Concrete examples eliminate ambiguity.

Hierarchical organization. Large projects benefit from a root CLAUDE.md with global conventions and per-directory CLAUDE.md files with module-specific instructions. This keeps each file short (under 150 lines) while providing detailed guidance where needed. The AI reads the root file plus the relevant directory file, getting global and local context efficiently.

Explicit anti-patterns. If the AI repeatedly makes a specific mistake, add a "DO NOT" section that explicitly prohibits it. Common entries include "do not create new utility files - add utilities to the existing utils/ directory" and "do not add dependencies without checking if an existing dependency already solves the problem."

Version tracking. Add a "last updated" date and a brief changelog to the spec. This helps you notice when the spec is getting stale and reminds you to review it periodically.

5. Common Pitfalls of Vibe Prompting at Scale

Vibe prompting - giving the AI loose, natural-language instructions without a structured spec - works well for small projects but creates specific, predictable problems at scale:

The "last prompt wins" problem. Without a persistent spec, the AI's behavior is dominated by the most recent prompt. If you tell it to use a new pattern in today's session, it forgets tomorrow. The result is a codebase where different sections reflect different prompting sessions, each internally consistent but globally chaotic.
Accumulated drift. Each session introduces small deviations. A slightly different variable naming here, a slightly different error handling pattern there. Over 50 sessions, these small deviations compound into a codebase that feels like it was written by 50 different people - because in a sense, it was.
The refactoring trap. Once inconsistencies accumulate, developers spend increasing time asking the AI to refactor existing code for consistency rather than building new features. This is a direct tax on productivity that a spec would have prevented.
Context fragility. Without a spec, the AI's understanding of the project depends entirely on which files are in context. Open a different set of files and you get different conventions. This makes the AI's behavior unpredictable, which erodes trust and slows development.
Onboarding difficulty. If you want another person (or another AI tool) to work on the codebase, there is no document that explains the conventions. The knowledge is distributed across dozens of prompting sessions and exists only in the code itself, where it is mixed with generated content that may not reflect current intentions.

6. The Practice of Writing Specs for AI

Writing specs for AI consumption is a different skill than writing documentation for humans. AI specs need to be more explicit, more concise, and more focused on constraints. Here are practical guidelines:

Be terse. Every line in a spec consumes context window tokens. A 300-line spec costs about 2,000 tokens - manageable, but not free. Eliminate filler, preamble, and explanation that does not change behavior. If a rule can be stated in 10 words, do not use 50.

Test your specs. After updating a spec, start a new AI session and ask it to generate a component or function. Check whether the output follows the spec. If not, the spec is ambiguous - rewrite the relevant section. This is the spec equivalent of a unit test.

Organize by frequency. Put the most commonly needed conventions at the top of the spec. AI models pay more attention to content earlier in the context. If your most important rule is buried on line 280, it will have less influence than one on line 5.

Use structured formats. Tables, bullet lists, and code blocks are easier for AI to parse than prose paragraphs. A table mapping file types to naming conventions is more reliable than a paragraph explaining the same thing.

Review weekly. Set a recurring reminder to review your spec files. Add any new conventions that emerged during the week. Remove any that are no longer relevant. A stale spec is worse than no spec because it gives the AI outdated instructions with high confidence.

7. A Real-World Example

The Fazm project - an open-source macOS AI agent available at github.com/m13v/fazm - was built using the spec-first approach. As the codebase grew past the initial prototype stage, the development workflow shifted from writing code directly to maintaining a detailed CLAUDE.md that specified architecture decisions, naming conventions, module boundaries, and testing requirements.

The spec covers everything from color values and writing style (no em dashes, teal accent colors) to architectural patterns (accessibility APIs over screenshots, MCP for extensibility) to development workflows (test every change before considering it done, use specific commit formats). When multiple AI agents work on the codebase simultaneously, the CLAUDE.md serves as the coordination mechanism that prevents conflicting changes.

The result is a codebase that maintains consistency despite being primarily AI-generated. New features follow established patterns because the spec enforces them. Naming is consistent because the spec defines it explicitly. Testing happens automatically because the spec requires it. The spec is the product; the code is a derivative.

This is the spec-first inversion in practice. The most valuable file in the repository is not a source code file - it is the specification that shapes how all the source code gets written. If you are building an AI-assisted codebase that needs to scale, the spec is where you should invest your time first.

See spec-first development in action

Fazm is an open-source macOS agent built with the spec-first approach. Explore the CLAUDE.md, the architecture, and the code on GitHub. Free to start.

Free to start. Fully open source. Runs locally on your Mac.