How to Verify AI-Generated Code and Dependencies: Red Flags, Auditing Tools, and Trust Checklist
A developer recently discovered that a scientific Python package they had been using in production was entirely written by ChatGPT. The code looked professional, had docstrings, included type hints, and passed basic tests. But deeper inspection revealed subtle numerical errors, hallucinated algorithm implementations, and test cases that verified the wrong behavior. As AI-generated code becomes more common in open source packages, npm modules, and PyPI libraries, the ability to verify code provenance and quality is becoming a critical developer skill. This guide covers how to spot AI-generated code, what to check when you find it, and how to build verification into your dependency management workflow.
1. Why AI-Generated Dependencies Are a New Risk Category
Before AI coding tools, creating a convincing open source package required genuine expertise. Writing a scientific library, a database driver, or a cryptography module demanded deep domain knowledge. The barrier to entry was high enough that most published packages, while varying in quality, were at least written by someone who understood the problem domain.
AI has lowered that barrier to near zero. Someone can prompt ChatGPT or Claude to "write a Python package for statistical analysis of time series data with proper packaging, tests, and documentation" and have a publishable package in an hour. The package will look professional. It will have a README, type hints, docstrings, and a test suite. But the implementation may contain:
- Hallucinated algorithms - Code that appears to implement a known algorithm but actually implements something subtly different. The function names and docstrings describe the correct behavior, but the logic does not match.
- Plausible but incorrect math - Statistical functions that return reasonable-looking numbers for common inputs but produce wrong results for edge cases. Without domain expertise, these errors are nearly impossible to catch by inspection.
- Circular test logic - Tests that verify the code does what it does rather than what it should do. The AI generates the implementation first, then generates tests based on the implementation's actual output rather than the correct expected output.
- Security vulnerabilities - Unsafe deserialization, hardcoded defaults, insufficient input validation, and other security issues that AI commonly introduces because it optimizes for functionality over safety.
The problem is not that AI-generated code is always bad. The problem is that you cannot tell from the surface whether it is good or bad, and the traditional signals of quality (documentation, tests, type hints) are trivially cheap for AI to produce regardless of actual code quality.
2. Red Flags: How to Spot AI-Generated Code
No single indicator definitively identifies AI-generated code, but several patterns in combination are strong signals:
- Uniform code style across the entire project - Human projects evolve over time. Early code looks different from recent code. Different contributors have different styles. AI-generated projects have unnervingly consistent formatting, naming conventions, and comment style from the first commit to the last.
- Suspiciously complete documentation - Every function has a docstring. Every parameter is documented. The README covers installation, usage, API reference, and contributing guidelines. Real open source projects rarely have documentation this thorough, especially small ones.
- Large initial commits - The project appeared fully formed in a single commit or a small number of commits over a few days. No iterative development visible in the git history. No dead ends, reverts, or experimental branches.
- Generic variable names with perfect grammar - AI tends to use descriptive variable names like data_processor, result_container, and validation_handler. Human developers use shorter, context-dependent names. AI comments are grammatically perfect and read like documentation, not developer notes.
- Test coverage that matches implementation exactly - AI-generated tests tend to test exactly the code paths that exist, with no tests for edge cases the implementation does not handle. Human-written tests often include cases that reveal unimplemented functionality.
- No issues, discussions, or community activity - The repository has stars but no filed issues, no discussions, no pull requests from other contributors. Real used packages always have bug reports.
3. Green Flags: Signs of Trustworthy Code
Whether code is AI-generated or human-written, these indicators suggest genuine quality:
| Red Flag | Green Flag |
|---|---|
| Single massive initial commit | Iterative development visible in git history over weeks/months |
| Perfect documentation from day one | Documentation that grows over time, with gaps in early commits |
| No open issues or bug reports | Active issue tracker with real user reports and developer responses |
| Tests mirror implementation exactly | Tests include edge cases, error conditions, and regression tests from past bugs |
| Uniform style throughout project history | Style evolution visible over time, occasional inconsistencies |
| Author has no other projects or contributions | Author has a history of contributions to related projects |
| README has generic installation/usage template | README addresses specific use cases and known limitations |
| No CHANGELOG or release notes | Detailed CHANGELOG with breaking changes and migration guides |
The presence of green flags does not guarantee quality, but their absence combined with multiple red flags should trigger deeper investigation before depending on the package.
4. The Verification Checklist
Before adding any dependency to a production project, run through this checklist:
- Check the git history - Look at the first 10 commits. Is there iterative development or did the project appear fully formed? Check commit messages for AI patterns (overly descriptive, perfectly formatted, no typos or casual language).
- Read the tests critically - Do tests verify correct behavior or just current behavior? Are there edge case tests? Do assertion messages describe expected behavior? Try changing a constant and see if tests catch it.
- Check the author - Does the author have other repositories? Do they contribute to other projects? Is there a history of domain expertise? A brand new GitHub account with one perfect package is a warning sign.
- Verify against known implementations - For scientific or mathematical code, compare output against a known-good implementation (SciPy, R, MATLAB). For utility functions, test against established libraries with the same functionality.
- Check dependency health metrics - Download trends, dependent packages, open issues, last commit date, bus factor (number of active maintainers). Tools like Snyk, Socket, and npm audit provide some of these signals.
- Run security scanning - Static analysis tools (Bandit for Python, ESLint security plugins for JavaScript) catch common vulnerability patterns that AI-generated code frequently introduces.
- Read the actual code - For critical dependencies, there is no substitute for reading the source. Focus on error handling, input validation, and the core algorithm implementation.
5. Tools for Auditing Dependencies
Several tools help automate parts of the verification process:
- Socket.dev - Analyzes npm and PyPI packages for supply chain risks. Detects suspicious patterns like install scripts, network access, filesystem access, and obfuscated code. Particularly useful for catching malicious AI-generated packages.
- Snyk - Scans dependencies for known vulnerabilities and provides fix recommendations. Integrates with CI/CD pipelines for continuous monitoring.
- npm audit / pip-audit - Built-in tools that check for known vulnerabilities in your dependency tree. Free and easy to run, but only catch known issues in existing databases.
- Semgrep - Custom static analysis rules that can detect patterns common in AI-generated code. You can write rules to flag specific anti-patterns relevant to your stack.
- Scorecard (OpenSSF) - Evaluates open source projects against security best practices. Checks for signed releases, branch protection, code review practices, and CI/CD security. A low scorecard score combined with AI-generated code patterns is a strong warning.
- Desktop AI agents for auditing - Tools like Fazm can automate the manual parts of dependency auditing. Fazm is an AI computer agent for macOS that controls your browser, writes code, handles documents, and operates across apps. It can navigate to a GitHub repository, check the commit history, read the code, run analysis scripts, and compile a verification report, all through voice commands or automated workflows. Being fully open source and local, your code never leaves your machine during the audit.
6. Building Automated Verification into Your Pipeline
Manual verification does not scale. As your project grows, you need automated checks that catch problematic dependencies before they reach production:
- Lock file monitoring - Set up alerts for any change to package.json, package-lock.json, requirements.txt, or Pipfile.lock. New dependencies should trigger an automatic review workflow that checks the package against your verification criteria.
- Dependency approval workflows - Require explicit approval for new dependencies. The approval process includes automated security scanning plus manual review of the package's git history, author reputation, and code quality.
- Continuous vulnerability monitoring - Tools like Dependabot, Snyk, or Renovate continuously check your dependencies for newly discovered vulnerabilities and create PRs to update affected packages.
- License compliance checks - AI-generated code has ambiguous licensing since the AI was trained on code with various licenses. Automated license checking tools flag packages with unclear or incompatible licenses.
- Integration test gates - Before merging any dependency update, run your integration tests and end-to-end tests. This catches functional regressions that security scanning would miss.
The goal is a pipeline where new dependencies are automatically flagged, scanned, and held for review. No new package enters your dependency tree without passing through at least basic automated verification.
7. A Trust Framework for AI-Era Dependencies
The traditional open source trust model was based on reputation: known maintainers, established organizations, community consensus. In the AI era, this model needs updating. A practical trust framework assigns each dependency a trust tier:
- Tier 1 - Trusted foundations - Packages from major organizations (React, Django, NumPy, etc.) with long track records, many contributors, and extensive real-world usage. Use with standard security scanning.
- Tier 2 - Established community packages - Packages with 50+ contributors, 1000+ dependents, and multi-year active development. Use with enhanced monitoring and update review.
- Tier 3 - Small but verified - Packages with few contributors but verifiable development history, domain expertise evidence, and genuine community usage (real issues, real PRs). Use with thorough code review of relevant modules.
- Tier 4 - Unverified - New packages, single-author packages with no contribution history, or packages showing AI-generation red flags. Vendor the code (copy it into your project) rather than depending on the package. Review every line before using.
The key insight is that the tier determines the verification effort, not a pass/fail decision. Even Tier 4 code might be perfectly fine. It just requires more verification effort before you trust it. In a world where anyone can generate professional-looking code in minutes, the cost of verification is the new price of open source consumption.
Automate Code Auditing Across Your Entire Desktop
Fazm navigates repos, reads code, runs analysis tools, and compiles reports, all controlled by voice on your Mac.
Try Fazm Free