AI Development Patterns
Spec-First AI Coding: Why Your CLAUDE.md Matters More Than Your Code at Scale
There is a phase transition in AI-assisted development that most people discover the hard way. Below about 15 files, you can vibe code your way to a working application. The AI holds enough context to make consistent decisions. Above 15 files, the AI starts contradicting itself - different naming conventions, duplicate utilities, competing patterns for the same problem. The fix is counterintuitive: spend more time writing specs and less time writing code. The specs become the source of truth that keeps the AI coherent as your codebase grows.
1. The 15-File Phase Transition
The number is not magic, but it is remarkably consistent. Developers building with AI coding assistants - whether Cursor, Claude Code, GitHub Copilot, or Windsurf - report hitting a coherence wall somewhere between 10 and 20 files. The symptoms are always the same.
First, naming inconsistencies appear. One file uses camelCase for function names while another uses snake_case. One component is called UserProfile while a similar one is called profile_card. The AI is not being careless - it simply cannot see all the existing files at once and makes locally reasonable choices that are globally inconsistent.
Second, utility duplication starts. The AI creates a formatDate function in one file because it does not know that a dateFormatter already exists three directories away. By the time you notice, there are four different date formatting implementations, each slightly different.
Third, architectural drift takes hold. Early files use one pattern (say, fetching data in components). Later files use a different pattern (say, fetching in server-side functions). The AI does not remember which pattern was established first, so it picks whatever seems most natural for the current prompt.
The root cause is context window limitations. Even models with 200K token windows cannot hold an entire codebase in context once it reaches moderate size. And even when the window is large enough, the model's attention degrades over long contexts - conventions mentioned 50,000 tokens ago have less influence than recent content. The solution is not a bigger context window. It is a concise spec that the AI reads at the start of every interaction.
2. What Spec-First Actually Means
Spec-first development with AI is not traditional waterfall specification. You are not writing 50-page documents before coding. You are writing concise, machine-readable instructions that the AI consumes at the start of every session. The spec evolves with the code, and updating the spec is part of every development cycle.
In practice, spec-first means three things:
- Before adding a feature: write a brief description of what it does, where it fits in the architecture, and what conventions it should follow. This takes 5-10 minutes and saves 30-60 minutes of rework.
- Before each AI session: ensure the spec file (CLAUDE.md, .cursorrules, or equivalent) is up to date. If you added a new convention or pattern since the last session, add it to the spec.
- After each AI session: review what the AI generated and update the spec if the AI made a good decision that should become a convention, or if it made a bad decision that should be explicitly prohibited.
The spec becomes a living document that captures your project's accumulated decisions. It is the shortest path between "what I want" and "what the AI generates," because it encodes all the context that the AI would otherwise need to infer (often incorrectly) from the codebase.
3. Spec Detail by Project Size
The amount of spec detail you need scales with project complexity. Here is a practical progression:
| Project Size | Spec Length | What to Include | Time Spent on Specs |
|---|---|---|---|
| 1-10 files | 0-10 lines | Nothing or a brief project description | 0% of dev time |
| 10-25 files | 20-50 lines | Naming conventions, import patterns, folder structure | 10% of dev time |
| 25-75 files | 50-150 lines | Above + architecture rules, data flow, testing conventions | 20% of dev time |
| 75-200 files | 150-300 lines | Above + module specs, API contracts, deployment constraints | 30% of dev time |
| 200+ files | 300+ lines (hierarchical) | Root spec + per-module specs + ADRs | 40%+ of dev time |
The 40% figure at the high end surprises people, but it reflects reality. In a large AI-assisted codebase, the spec is the primary artifact you maintain. The code is generated from the spec. When you need to change behavior, you update the spec first and then let the AI regenerate the affected code. This is a fundamental inversion of the traditional development model, where code is primary and documentation is secondary.
4. CLAUDE.md Patterns That Work
CLAUDE.md has become the de facto standard for project-level AI specs, used by Claude Code and increasingly adopted by other tools. The most effective CLAUDE.md files share several patterns:
Constraints over preferences. Instead of "prefer functional components," write "never use class components." Constraints are unambiguous. Preferences leave room for interpretation, and AI models will interpret them differently across sessions.
Examples over abstractions. Instead of "follow the repository's naming conventions," write "component files: PascalCase (UserProfile.tsx). Utility files: camelCase (formatDate.ts). API routes: kebab-case (get-user-profile.ts)." Concrete examples eliminate ambiguity.
Hierarchical organization. Large projects benefit from a root CLAUDE.md with global conventions and per-directory CLAUDE.md files with module-specific instructions. This keeps each file short (under 150 lines) while providing detailed guidance where needed. The AI reads the root file plus the relevant directory file, getting global and local context efficiently.
Explicit anti-patterns. If the AI repeatedly makes a specific mistake, add a "DO NOT" section that explicitly prohibits it. Common entries include "do not create new utility files - add utilities to the existing utils/ directory" and "do not add dependencies without checking if an existing dependency already solves the problem."
Version tracking. Add a "last updated" date and a brief changelog to the spec. This helps you notice when the spec is getting stale and reminds you to review it periodically.
5. Common Pitfalls of Vibe Prompting at Scale
Vibe prompting - giving the AI loose, natural-language instructions without a structured spec - works well for small projects but creates specific, predictable problems at scale:
- The "last prompt wins" problem. Without a persistent spec, the AI's behavior is dominated by the most recent prompt. If you tell it to use a new pattern in today's session, it forgets tomorrow. The result is a codebase where different sections reflect different prompting sessions, each internally consistent but globally chaotic.
- Accumulated drift. Each session introduces small deviations. A slightly different variable naming here, a slightly different error handling pattern there. Over 50 sessions, these small deviations compound into a codebase that feels like it was written by 50 different people - because in a sense, it was.
- The refactoring trap. Once inconsistencies accumulate, developers spend increasing time asking the AI to refactor existing code for consistency rather than building new features. This is a direct tax on productivity that a spec would have prevented.
- Context fragility. Without a spec, the AI's understanding of the project depends entirely on which files are in context. Open a different set of files and you get different conventions. This makes the AI's behavior unpredictable, which erodes trust and slows development.
- Onboarding difficulty. If you want another person (or another AI tool) to work on the codebase, there is no document that explains the conventions. The knowledge is distributed across dozens of prompting sessions and exists only in the code itself, where it is mixed with generated content that may not reflect current intentions.
6. The Practice of Writing Specs for AI
Writing specs for AI consumption is a different skill than writing documentation for humans. AI specs need to be more explicit, more concise, and more focused on constraints. Here are practical guidelines:
Be terse. Every line in a spec consumes context window tokens. A 300-line spec costs about 2,000 tokens - manageable, but not free. Eliminate filler, preamble, and explanation that does not change behavior. If a rule can be stated in 10 words, do not use 50.
Test your specs. After updating a spec, start a new AI session and ask it to generate a component or function. Check whether the output follows the spec. If not, the spec is ambiguous - rewrite the relevant section. This is the spec equivalent of a unit test.
Organize by frequency. Put the most commonly needed conventions at the top of the spec. AI models pay more attention to content earlier in the context. If your most important rule is buried on line 280, it will have less influence than one on line 5.
Use structured formats. Tables, bullet lists, and code blocks are easier for AI to parse than prose paragraphs. A table mapping file types to naming conventions is more reliable than a paragraph explaining the same thing.
Review weekly. Set a recurring reminder to review your spec files. Add any new conventions that emerged during the week. Remove any that are no longer relevant. A stale spec is worse than no spec because it gives the AI outdated instructions with high confidence.
7. A Real-World Example
The Fazm project - an open-source macOS AI agent available at github.com/m13v/fazm - was built using the spec-first approach. As the codebase grew past the initial prototype stage, the development workflow shifted from writing code directly to maintaining a detailed CLAUDE.md that specified architecture decisions, naming conventions, module boundaries, and testing requirements.
The spec covers everything from color values and writing style (no em dashes, teal accent colors) to architectural patterns (accessibility APIs over screenshots, MCP for extensibility) to development workflows (test every change before considering it done, use specific commit formats). When multiple AI agents work on the codebase simultaneously, the CLAUDE.md serves as the coordination mechanism that prevents conflicting changes.
The result is a codebase that maintains consistency despite being primarily AI-generated. New features follow established patterns because the spec enforces them. Naming is consistent because the spec defines it explicitly. Testing happens automatically because the spec requires it. The spec is the product; the code is a derivative.
This is the spec-first inversion in practice. The most valuable file in the repository is not a source code file - it is the specification that shapes how all the source code gets written. If you are building an AI-assisted codebase that needs to scale, the spec is where you should invest your time first.
See spec-first development in action
Fazm is an open-source macOS agent built with the spec-first approach. Explore the CLAUDE.md, the architecture, and the code on GitHub. Free to start.
Get Fazm Free