Reviewing AI Agent Code Changes - What Was Not Modified Matters More

M
Matthew Diakonov

Reviewing AI Agent Code Changes

Code shows what changed. The more important signal is in what the agent decided not to change.

When you review an AI agent's diff, the green and red lines tell you what happened. The silence around them tells you what the agent missed, or chose to ignore, or did not realize needed updating. That silence is where the bugs hide.

Three Interpretations of Unchanged Code

When an agent changes one part of a system but leaves a related part untouched, it means one of three things:

Intentional scope limitation. The agent correctly identified that the related code does not need to change. This is the good case - a precise, well-scoped change.

Missed dependency. The agent changed a function signature but did not update all the call sites. The unchanged code now has type errors or runtime failures that the agent did not see. This is the common bad case.

Incomplete understanding. The agent did not realize the related code existed. It solved the local problem without modeling the global effect.

The difficulty is you cannot tell which interpretation is correct just by reading the diff. You have to check each case manually. AI-generated code introduces 1.7x more issues than human-written code at the same volume (CodeRabbit's 2025 analysis) - not because the code is wrong in isolation, but because AI agents optimize for local correctness and miss global dependencies.

A Systematic Review Checklist

Check call sites. If the agent changed a function signature, search for every place that function is called. Were they all updated? A function renamed from getUser to getUserById with a new required parameter needs every call site updated. Agents frequently update the definition and miss 30% of the call sites.

# Find all call sites for a changed function
git grep -n "getUser(" --  "*.ts" "*.tsx"
# Compare to the diff - are all these sites in the changed files?

Check related files by data flow. If the agent modified a data model, trace where that model flows. Is it serialized? Deserialized? Validated? Tested? The agent likely updated where it directly accessed but missed the downstream transformations.

A changed User type that adds a required organizationId field needs updates in: the model definition, the database query that populates it, the API response serializer, the frontend type, the form that creates users, and the tests that mock users. Agents often update 3-4 of those 6 places and ship.

Check configuration and infrastructure. Adding a new environment variable in code? The .env.example, deployment manifests, CI configuration, and documentation need updates. Agents almost never do this unprompted.

Check the test coverage delta. If the agent changed implementation files but not test files, that is a yellow flag. The agent may have correctly determined tests do not need updating - or it may have changed behavior that the tests do not cover. Distinguish these by running the tests and examining coverage.

Check error handling symmetry. If the agent added error handling to one code path, are similar paths also handled? Agents often patch the specific failure they were asked to fix without auditing the surrounding code for the same pattern.

Tools That Help

Modern diff tooling goes beyond line-by-line comparison. Difftastic (difftastic.wilfred.me.uk) is a language-aware diff tool that understands code structure - it shows structural changes rather than text changes. When a function was moved, Difftastic shows it as a move rather than a deletion plus addition, which makes the intent clearer.

Greptile and similar tools build dependency graphs of the codebase and can flag when a changed symbol has callers that were not updated. This automates the "check call sites" step.

Tree-sitter-based analysis can track symbol identity across file changes - if a function was renamed, Tree-sitter knows it is the same function and can find all references that need updating even when the name changed.

For catching missed dependencies at CI time, pre-commit hooks with TypeScript's strict mode or equivalent in your language catch type-level mismatches at commit time rather than in production.

Building Validation Into the Workflow

The review checklist is manual work. Automate what you can:

Type checking. TypeScript strict mode catches most function signature mismatches and many missed call site updates. If the agent left a call site with the old parameter signature, tsc catches it.

Pre-commit hooks. Run type checking and a fast subset of tests before every agent commit. Agents make many small commits; catch problems at each one rather than accumulating them.

Integration tests over unit tests. Unit tests test the component the agent changed. Integration tests test the pipeline that includes the changed component. The missed dependency bugs live in the pipeline, not the component.

The Mental Model for AI Code Review

Human code review assumes the author understood what they were changing. The reviewer's job is to catch logic errors, style issues, and design problems.

AI code review requires a different starting assumption: the agent understood the local context but may have missed the global context. The reviewer's job is to audit scope - what should have changed but did not.

That shift changes what you look for. Not "is this code correct in isolation?" but "given this change, what else in the codebase is now inconsistent with it?"

The agent wrote correct code. The code does not lie about what it does. But it is silent about what it chose to ignore. Listening to that silence is the reviewer's job.

Fazm is an open source macOS AI agent. Open source on GitHub.

More on This Topic

Related Posts