Claude Code

Getting Consistent Results From Claude Code: A Practical Workflow

Claude Code is the best AI coding tool most teams have used. It is also an inconsistent one. Some sessions land a three-file refactor in fifteen minutes. Others spin for an hour and produce something that does not even compile. The difference is almost never the model. It is the workflow around the model. This guide is a practical walk through the habits that move your hit rate from "lucky when it works" to "reliable by default." Tight CLAUDE.md files, context hygiene, fresh sessions, test hooks, and pairing the tool with a desktop agent for the parts it does not do on its own.

OSS

“Fazm uses real accessibility APIs instead of screenshots, so it interacts with any app on your Mac reliably and fast. Fully open source.”

fazm.ai

1. Why Claude Code Feels Inconsistent

The user-visible inconsistency of Claude Code comes from a few sources that are worth separating. First, the model is probabilistic. The same prompt can produce different outputs on different runs. For most tasks this is fine. For multi-step agentic work it means small differences early compound into bigger ones later.

Second, the context the model is working from varies. A session that started with a clean repo, a precise task, and a good CLAUDE.md is operating from a very different place than a session that started mid-debugging after four hours of unrelated back-and-forth.

Third, vendor-side changes happen. Effort levels, routing to cheaper variants during peak hours, system prompt updates. These are mostly outside your control, but you can limit how much they matter by keeping your own workflow tight.

You cannot fix source one. You can mostly fix source two. That is where the leverage is.

2. A Tight CLAUDE.md

CLAUDE.md is the file Claude Code reads at the start of a session in a project. It is the single most important surface you control. A good CLAUDE.md transforms Claude Code from a generic coding assistant into a colleague who knows your project. A bloated CLAUDE.md is worse than nothing because it burns context for no benefit.

Keep it short. The real ceiling is "can a human skim this in 90 seconds?" If not, it is too long. Most of the value is in the first page.

Include the shape of the project. What language, what framework, what are the main directories, what is the entry point. This grounds the model so it stops asking basic orientation questions.

Include the conventions that you actually enforce. "We use TypeScript strict mode. We prefer named exports. We use React Server Components by default. We have a lint rule against console.log in production code." These are the things the model will otherwise get wrong and waste iterations on.

Include the commands. How to build. How to test. How to run the dev server. These save the model from guessing at package.json contents every session.

Include the gotchas. The weird thing about your auth library. The reason the date parser exists. The library that looks deprecated but is actually the canonical choice. Every one of these saves you from the same bug being re-introduced.

Do not include long tutorials. Do not include the entire style guide. Do not include "how to code in React" type information the model already knows. Trust the model on the general skills and only tell it what is specific to your project.

Review CLAUDE.md every few weeks. Projects evolve. The file should too. Delete entries that are no longer true and add entries for new patterns as they emerge.

Automate the workflow around Claude Code

Fazm handles the machine-control work Claude Code does not do: running the app, filling forms, verifying UI changes on your Mac.

3. Context Hygiene and Fresh Sessions

The single biggest hit-rate improvement most teams find is starting a fresh session for each logically distinct task. Claude Code sessions degrade over time. Not because the model forgets things (it mostly does not) but because the context fills up with irrelevant history. Every unused file read, every tangent, every rejected approach stays in the window, quietly eating attention.

Treat a session like a unit of work. One feature, one bug fix, one refactor, one session. When you are done, close the session. Start the next task clean. The initial overhead of re-reading a couple of files is small compared to the drag of carrying twenty messages of unrelated context.

When a session is going badly, restart. If Claude Code has made three attempts at a fix and none are working, it is usually not going to get better with a fourth attempt in the same session. Close, start fresh, re-describe the problem with what you learned. Nine times out of ten the new session solves it in the first try.

Be deliberate about what enters context. Do not ask the model to explore the repo when you already know which file matters. Just tell it. "Edit src/auth/login.ts to add retry logic" is a better prompt than "find the login code and add retries," even though the second is conversationally friendlier.

Summarize between sessions. At the end of a working block, ask Claude to write a one-paragraph summary of what was done, what is left, and any surprises. Save it. Next session, paste that summary into the first message. You have restored state in 100 tokens instead of re-exploring for 10,000.

4. Automated Test Hooks

The reason many Claude Code sessions go wrong is that the model believes it finished the task, but the task was actually broken. No tests ran. No manual verification happened. The model moves on, the user trusts it, and the bug surfaces two days later in production.

The fix is to make verification an automatic part of every session. In CLAUDE.md, include a rule like "after any code change, run the test suite and report the result before declaring the task done." This simple line changes behavior a lot.

For projects with slow test suites, define a fast subset that Claude Code should run every time. Unit tests of the files that were changed, a single end-to-end smoke test, a type check. The full suite can run in CI.

For projects where tests do not catch everything (most GUI work, anything user-facing), add a manual verification hook. Ask Claude Code to describe how to verify the change by hand and, if possible, to run that verification itself through an automation tool.

Log what tests pass or fail in the session. Do not trust "the tests pass" as a statement. Ask for the actual output. This catches the case where Claude thinks a test passes because it believes it should, not because it actually ran.

5. Pairing With a Desktop Agent

Claude Code is excellent at writing code. It is not designed to drive GUIs, fill out web forms, click through multi-app workflows, or test a change by actually using the product. Those are the steps where most real-world work dies after the code is written.

A desktop agent fills that gap. After Claude Code writes the feature, a desktop agent like Fazm can run the build, launch the app, navigate to the changed screen, perform the user action, and take a screenshot. Combined with a vision verification pass, this gives you something close to automated end-to-end testing for changes that were previously hard to verify.

The workflow looks like: Claude Code makes the code change. Claude Code asks the desktop agent (through an MCP server or a shell command) to run a named verification. The agent runs the app, performs the action, reports back. Claude Code either declares success or iterates.

For non-coding work the pairing is even more valuable. Claude Code is strong at writing a report, a summary, a draft email. It cannot paste that draft into the right vendor portal field and submit. A desktop agent can. Most real workflows in a small business are a mix of "think" work (where Claude shines) and "move data around" work (where the desktop agent shines). The combined stack covers more of a day than either alone.

Accessibility-API desktop agents are the right choice for this pairing. They are fast enough to be inline in a development loop, reliable enough to use without a human staring at the screen, and observable enough that you can see what they did afterwards. Fazm is one such tool. Several others exist in the same category.

Let a desktop agent close the loop on your AI coding work

Fazm runs your app, exercises the changed UI, and verifies the result. All through accessibility APIs on macOS.

6. The Short List of Daily Habits

Condensed to the things worth doing every day:

Keep CLAUDE.md under two pages. Update it when patterns change.

Start a fresh session for each new task. Close the previous one.

Describe the task precisely. Point at the files. Do not ask Claude to explore what you already know.

Require tests or some form of verification at the end of every code change. Ask for the actual output, not a claim.

When a session goes sideways, restart. Do not sink another thirty minutes into the same context.

Pair with a desktop agent for any task that touches a GUI, a form, or a step Claude Code cannot drive on its own.

Log what worked. When you hit a consistent pattern that produces good results, write it down in CLAUDE.md so future sessions inherit the win.

None of these are clever. They are the opposite of clever. They are the boring discipline that turns a probabilistic tool into a reliable one. The teams that get the most out of Claude Code are not using different prompts than everyone else. They are running sessions cleaner, verifying output more seriously, and covering the parts of the workflow Claude Code does not cover with other tools. That is the whole trick.

Close the loop around Claude Code

Fazm automates the machine-control work that AI coding tools do not do: running apps, filling forms, verifying UI on macOS.

Open source. Runs locally. Pairs with any AI coding workflow.