The 'LGTM' trap
AI will say code "looks good" unless you demand specific evidence. A proper code review prompt produces line-number citations, not vague approval. If AI can't point to the line, it hasn't reviewed the code.
The Safety-Readability-Tests (SRT) Framework
Every code review should cover three dimensions:
Safety Review Prompt
Review this code for SAFETY issues:
1. Input validation: Are all external inputs sanitized?
2. Authentication: Are auth checks present where required?
3. Data exposure: Could sensitive data leak in logs/responses?
4. Error handling: Do errors fail safely without exposing internals?
5. Dependencies: Are there known vulnerabilities in imports?
For each issue found:
- Cite the exact line number
- Explain the risk (severity: critical/high/medium/low)
- Suggest a specific fix
Code:
[paste code]
Readability Review Prompt
Review this code for READABILITY issues:
1. Naming: Are functions/variables self-documenting?
2. Structure: Is logic easy to follow? Any deep nesting?
3. Comments: Are complex sections explained? Any stale comments?
4. Consistency: Does style match project conventions?
5. Complexity: Any functions over 30 lines that should split?
For each issue:
- Cite the exact line number
- Explain why it hurts maintainability
- Show the improved version
Code:
[paste code]
Test Coverage Prompt
Review this code for TEST COVERAGE gaps:
1. Happy path: Is the main flow tested?
2. Edge cases: Are boundary conditions covered?
3. Error cases: Do tests verify error handling?
4. Integration points: Are external calls mocked appropriately?
5. Assertions: Are tests actually asserting outcomes?
For each gap:
- Identify the untested scenario
- Explain the risk if this breaks
- Provide a test case skeleton
Code:
[paste code]
Tests:
[paste tests]
- ✓Safety review: Input validation, auth, data exposure, error handling
- ✓Readability review: Naming, structure, comments, consistency
- ✓Test coverage: Happy path, edge cases, error cases, integration
- ✓All issues cite specific line numbers
- ✓All issues include severity and fix suggestion
Evidence Enforcement Patterns
Pattern: Demand Citations
Bad: "Are there any bugs in this code?"
Better: "List bugs with: line number, bug description, reproduction steps, and fix. If no bugs found, explain what you checked."
Pattern: Require Proof of Review
Bad: "Review this PR"
Better: "For this PR, confirm you checked: (1) all new functions have input validation, (2) all external calls have error handling, (3) all public methods have tests. Cite line numbers for each check."
Pattern: Structured Output
Bad: "Any issues?"
Better:
Review output format:
SAFETY: [issue count] | [critical/high/medium/low breakdown]
READABILITY: [issue count] | [categories affected]
TESTS: [coverage %] | [missing scenarios]
Details:
[structured findings with line citations]
“"We reduced post-merge defects by 28% when we started requiring line-number citations in AI reviews. No citation = no credit for reviewing."”
Related Resources
- SWE Quickstart — full workflow for engineering packs
- AI Code Review Best Practices — broader guidance
Apply this now
Practice prompt
Apply the SRT framework to a recent PR and compare findings to original review.
Try this now
Run the SWE Mini-Pack with the structured review prompts.
Common pitfall
Accepting 'code looks good' without line-number evidence—AI is guessing, not reviewing.
Key takeaways
- •Use the SRT framework: Safety, Readability, Tests for every review
- •Demand line-number citations—no citation means no real review
- •Structure output format so issues are actionable
See it in action
Drop this into a measured run—demo it, then tie it back to your methodology.
See also
Pair this play with related resources, methodology notes, or quickstarts.
Further reading
Next Steps
Ready to measure your AI impact? Start with a quick demo to see your Overestimation Δ and cognitive load metrics.