AI Coding Quality

AI Coding Spec Docs: Quality Guardrails That Save You From the Vibe Coding Trap

There is a moment every developer hits when vibe coding with AI. You are flying through features, the code is flowing, everything compiles, and then you hit a wall. The AI stored your API keys in plain text. The authentication logic has a race condition. The database schema cannot support the feature you need next week. The fix is not to stop using AI for coding. The fix is to write spec docs before you start, treating AI the way you would treat a talented junior developer who takes shortcuts when left unsupervised.

OSS

“Fazm uses real accessibility APIs instead of screenshots, so it interacts with any app on your Mac reliably and fast. Fully open source.”

fazm.ai

1. The Wall Every Vibe Coder Hits

A developer recently shared their experience building an AI-powered resume scorer with Cursor and Claude. The first few hours were magical. Feature after feature appeared. The UI looked polished. The API calls worked. Then came the walls.

The AI had stored API keys in UserDefaults, visible to any process on the machine. The scoring logic had hardcoded weights that made no sense for different industries. The file handling code assumed every resume was a PDF, crashing on .docx files. Each of these issues took longer to debug than the original feature took to build.

This is the vibe coding trap. The initial velocity is real, but it creates a false sense of progress. You are building on sand unless you have specified the foundation. The features that take 10 minutes to generate can take 2 hours to fix when the AI made incorrect assumptions about security, error handling, or data flow.

The pattern is consistent across projects and across AI tools. It does not matter whether you use Claude, GPT, Gemini, or a local model. Without explicit constraints, AI will always optimize for the shortest path to working code, not the shortest path to correct code.

2. The Talented Junior Dev Mental Model

The most useful way to think about AI coding assistants is as a talented junior developer. They can write code fast. They know the syntax. They have read a lot of documentation. But they take shortcuts.

A junior dev who is told "build a login system" will build something that works. They probably will not hash passwords with bcrypt unless told to. They probably will not add rate limiting. They probably will not handle the case where the database is down. They are not lazy or incompetent, they just optimize for the visible requirement and miss the invisible ones.

AI coding tools behave exactly the same way. The difference is that a junior dev learns from code review and gradually internalizes your team's patterns. AI starts fresh every session (unless you give it persistent context through spec docs).

What you would do with a junior dev:

Give them a written spec before they start coding
List the security requirements explicitly
Point them to existing patterns in the codebase
Tell them what NOT to do, not just what to do
Review their work before it ships

Try the AI agent that actually works with your apps

Fazm uses accessibility APIs to control your Mac natively. Voice-first, open source, runs locally.

3. What Goes in a Spec Doc

A spec doc for AI coding does not need to be a 50-page design document. A focused .md file with 30-100 lines is usually enough. The goal is to encode the decisions you have already made so the AI does not make different ones.

Essential sections:

Architecture constraints - Which patterns to follow, which to avoid. "Use the repository pattern for data access. Never call the database directly from controllers."
Security requirements - Where to store secrets, how to handle auth tokens, input validation rules. Be specific: "Store API keys in Keychain, not UserDefaults or environment variables."
Error handling strategy - How errors should propagate, what gets logged, what the user sees. "All network errors must show a retry button. Never show raw error messages to users."
Data flow - How data moves through the system. Which components own state, which receive it as props. Where caching happens.
Anti-patterns - Explicit list of things the AI should not do. This is often the most valuable section because AI has strong defaults that may conflict with your project.

Different tools read these specs differently. Claude Code reads CLAUDE.md automatically. Cursor reads .cursorrules. Some teams use a /docs/ai-specs/ directory that they reference in prompts. The format matters less than the habit of writing specs before coding.

4. Security Pitfalls AI Creates by Default

Security is where AI coding shortcuts cause the most damage. The resume scorer example is instructive: the AI stored API keys in UserDefaults because it is the simplest persistence mechanism on macOS. It works. It is also completely insecure, any process running as the same user can read UserDefaults.

Common security pitfalls in AI-generated code:

What AI Does	What It Should Do	Risk Level
Stores API keys in UserDefaults	Store in Keychain with access control	Critical
Logs request/response bodies	Log metadata only, redact PII	High
Hardcodes secrets in source	Use environment variables or secret manager	Critical
Trusts all user input	Validate and sanitize every input	High
Uses HTTP for API calls	Enforce HTTPS, pin certificates if needed	Medium
Disables SSL verification for testing	Use proper test certificates	Critical

The Keychain vs UserDefaults distinction is worth highlighting because it comes up in almost every macOS or iOS project. UserDefaults is a property list file sitting in ~/Library/Preferences/. Any app, any script, any terminal command can read it. Keychain encrypts data and requires explicit permission grants. AI defaults to UserDefaults because the API is simpler - fewer lines of code, no error handling for access control.

A single line in your spec doc prevents this: "All secrets, tokens, and API keys must be stored in Keychain. Never use UserDefaults for sensitive data." That one sentence saves hours of security remediation.

5. Time Investment: Specs vs Debugging

The most common objection to writing spec docs is time. "I am using AI to go faster, and you want me to slow down to write docs?" The data says otherwise.

Typical time breakdown for a medium-complexity feature without specs:

Initial AI generation: 10-20 minutes
First round of bug fixes: 30-60 minutes
Security remediation: 30-45 minutes
Refactoring to match project patterns: 20-40 minutes
Edge case handling: 30-60 minutes
Total: 2-4 hours

Same feature with a spec doc:

Writing the spec: 15-30 minutes
AI generation with spec context: 10-20 minutes
Review and minor adjustments: 15-30 minutes
Total: 40-80 minutes

The spec pays for itself on the first feature. On subsequent features that share the same architectural patterns, you amortize the spec writing cost further. A project-level CLAUDE.md or .cursorrules file that took 2 hours to write saves that time on every single AI coding session for the life of the project.

The real cost of not writing specs is not just the debugging time. It is the architectural debt that accumulates when AI makes different decisions in different parts of the codebase. Inconsistent error handling, multiple data access patterns, and mixed state management approaches create a codebase that is harder for both humans and AI to work with over time.

6. Real-World Spec Patterns That Work

After reviewing hundreds of projects that use AI coding tools effectively, certain spec patterns appear repeatedly:

The "Never/Always" list. A simple two-column list of things the AI should never do and things it should always do. "Never use force unwrapping in Swift. Always use guard let or if let." "Never create new utility files. Always add utilities to the existing utils/ directory." This format is easy to scan, easy to maintain, and directly actionable for AI.

The dependency lockdown. Many teams include a section that says "Do not add new dependencies without explicit approval." AI loves to npm install or pip install new packages to solve problems. A spec that constrains dependencies forces the AI to use what is already in the project, which usually leads to better, more consistent code.

The example-driven spec. Instead of describing patterns in prose, include a code snippet showing the correct pattern. "When creating a new API endpoint, follow this structure:" followed by 15 lines of example code. AI tools are extremely good at following concrete examples.

The error budget. Some specs include a section on acceptable error rates and performance budgets. "API response time must be under 200ms at p95. Error rate must stay below 0.1%." This gives AI concrete targets to optimize toward rather than vague directives about performance.

7. Tools That Enforce Spec Compliance

Writing specs is step one. Enforcing them is step two. Several approaches help ensure AI-generated code actually follows your specifications:

Claude Code hooks - Run automated checks after every file edit. Linting, type checking, and custom scripts that verify spec compliance. If the check fails, the AI agent sees the error and fixes it automatically.
Pre-commit hooks - Catch secrets in code, verify import patterns, check file sizes. These run before code is committed regardless of whether a human or AI wrote it.
Custom linting rules - ESLint, SwiftLint, and similar tools can encode project-specific rules. "No direct UserDefaults access for keys containing secret, token, or key" is a rule you can automate.
AI-assisted review - Use a second AI pass to review generated code against the spec. Some teams have a "reviewer" agent that checks code against CLAUDE.md before it gets committed.

Desktop AI agents add another layer of verification. Tools like Fazm, an open source AI computer agent for macOS, can actually run the built application and verify behavior through accessibility APIs. This catches issues that static analysis misses, like a button that renders but does not respond to clicks, or a form that submits but shows the wrong confirmation message.

The combination of spec docs, automated checks, and runtime verification creates a quality feedback loop. AI generates code guided by specs, automated checks catch static violations, and runtime verification catches behavioral issues. Each layer catches problems the others miss.

Verify AI-Generated Code on Your Actual Desktop

Fazm is an AI computer agent for macOS that uses accessibility APIs to test your app the way a real user would, catching what spec docs and linters cannot.

Try Fazm