The confidence trap
AI outputs sound confident even when wrong. Red-teaming exposes the gaps. If you can break the output in 5 minutes, your stakeholders will break it in the meeting. Break it first.
The Two-Prompt Framework
For every AI output you plan to share, run these two adversarial prompts:
Prompt 1: The Contrarian
I'm about to share this [document/analysis/recommendation] with stakeholders.
Play devil's advocate:
1. What's the strongest argument AGAINST this conclusion?
2. What evidence would DISPROVE the main claim?
3. What's the most embarrassing question a skeptic could ask?
Be harsh. I need to know the weaknesses before my audience does.
[paste your output]
Prompt 2: The Edge Case Hunter
This [document/analysis/recommendation] needs to survive scrutiny.
1. Name two scenarios where this recommendation would FAIL
2. Identify one assumption that might be wrong
3. What data point, if different, would change the conclusion?
Be specific. Vague concerns don't help me improve this.
[paste your output]
Example: Red-Teaming a Market Analysis
Original AI output:
"Based on market trends, we should expand into the APAC region. Growth rates are 2x North America."
Contrarian result:
- Strongest counter: APAC growth is driven by two countries; others are flat
- Disproving evidence: Competitor exits from APAC in last 2 years
- Embarrassing question: "What's our APAC team's track record?"
Edge Case result:
- Fails if: Currency volatility exceeds 15%; regulatory changes in key markets
- Wrong assumption: That our product translates without localization
- Different data: If we used 5-year growth instead of 2-year, trend reverses
Action: Added sections on currency risk, competitor exits, and localization requirements before sharing.
- ✓Run Contrarian prompt: counter-arguments, disproving evidence, embarrassing questions
- ✓Run Edge Case prompt: failure scenarios, wrong assumptions, sensitive data points
- ✓Address or acknowledge weaknesses in final output
- ✓Know when to stop: 3 passes max
When to Stop Red-Teaming
Diminishing returns kick in fast. Stop when:
- Same issues repeat: Third pass finds what first two found
- Issues are theoretical: "Could fail if aliens invade" level
- Time exceeds value: Spending 30 min red-teaming a 5-min task
- Stakes don't justify: Internal draft vs. board presentation
Rule of thumb: High-stakes = 2 prompts. Medium = 1 prompt. Low = skip.
Quick Reference: Adversarial Prompts by Output Type
| Output Type | Best Adversarial Prompt | |-------------|------------------------| | Strategy doc | "What would a competitor say is naive about this?" | | Technical spec | "How could this be exploited or abused?" | | Market analysis | "What market shift would invalidate this?" | | Code/architecture | "How would this break at 10x scale?" | | Process change | "Who would resist this and why?" |
“"I started red-teaming my AI drafts and stopped getting blindsided in reviews. Five minutes saved hours of rework."”
Related Resources
- Bias Spotting Checklist — systematic bias detection
- Judgment Pack — practice adversarial evaluation
Apply this now
Practice prompt
Take your most recent AI output and run both adversarial prompts. Note what surfaces.
Try this now
Run the Judgment Pack and apply red-teaming to the sample outputs.
Common pitfall
Red-teaming after sharing—by then the damage is done.
Key takeaways
- •Two prompts: Contrarian (argue against) and Edge Case (find failures)
- •Five minutes is enough—don't skip because you lack time
- •Stop after 3 passes or when issues repeat
See it in action
Drop this into a measured run—demo it, then tie it back to your methodology.
See also
Pair this play with related resources, methodology notes, or quickstarts.
Further reading
Next Steps
Ready to measure your AI impact? Start with a quick demo to see your Overestimation Δ and cognitive load metrics.